Fabric Data Ingestion: A personal opinion

Blogs Benito van Breugel

4-9-2025

Over the past 18 months, I’ve had the opportunity to work with Microsoft Fabric across multiple client projects. During this time, I’ve come to appreciate Fabric as a powerful and versatile platform for data analytics. Yet, one recurring theme keeps surfacing: questions and discussions around how and why I choose different methods to integrate data into the platform.

Fabric offers a unified environment with a variety of options to ingest data into its Lakehouse — one of its core components. But with so many choices available, the question naturally arises:

Which ingestion method is the right one for your scenario?

In this post, I’ll share my perspective based on hands-on experience for the 3 most used options: Notebooks, Copy Data Activity and Dataflow Gen2. I’ll skip the event stream for now. Please note, that in general the best methods always “depends” on the situation at the client, with respect to team growth, experienced and future strategy and vision. Feel free to reach out if you have questions or want to dive deeper into any of the approaches!

NOTEBOOKS

Article content

The first method to integrate data in Microsoft Fabric is through notebooks, which support custom code using PySpark or T-SQL.

Personally, I love working with PySpark notebooks, they give me full control over how I ingest and transform data, with the flexibility to structure it exactly as I envision. Leveraging Spark’s parallel processing capabilities, I can efficiently handle multiple datasets, whether from APIs or other sources, all within the notebook environment.

Why it works for me?

As an engineer, this approach gives me the freedom to do things my way. I can embed data quality checks and validations directly into the workflow before storing anything, ensuring reliability from the start.

Yes it does require coding skills, but tools like Copilot and ChatGPT drastically reduce development time, 80% of the solution is often ready in minutes. It’s that final 20%, the fine-tuning and client-specific adjustments, that demand the most effort. And let’s be honest: generated code almost never works perfectly on the first try.

When to use?

For me, notebooks are the go-to tool for (experienced) data engineers, especially when dealing with complex or unconventional data sources. They shine when your platform needs custom logic or manual tweaks before storing the data.

However, if you're working within a medallion architecture, it's generally best practice to land raw data in the bronze layer without transformations. In that case, a Copy Data activity might be a better first step, simple, efficient, and purpose-built for raw ingestion. So let’s explore this one.

Copy Data (Pipelines)

Article content

The Copy Data activity in Microsoft Fabric is a straightforward yet powerful way to move data from source to destination within a pipeline. With support for over 50+ connectors and various formats, it enables fast, scalable, and automated ingestion.

I find it especially useful when the source system is accessible via one of these connectors and you want to load the data 1:1 into Fabric. No transformations, just raw data in your bronze layer as designed.

Why it works for me?

Compared to notebooks, Copy Data is much more plug-and-play. It’s easy to configure, schedule, and monitor, making it ideal not only for engineers but also for skilled business analysts.

Personally, I always use a metadata table to scope the objects I want to load. This gives me control over which datasets run in parallel and which in series, optimizing performance and resource usage in this way. With this, I have one central place that contains all the information for the data platform.

One important note: if you're working with on-premises sources, you'll need to set up a data gateway to connect through the Copy Data activity. But once that’s in place, it’s a smooth and efficient way to bring data into Fabric.

When to use?

The Copy Data activity is the ideal solution for 1:1 data ingestion, especially when working in a small data team. It’s low-code, easy to manage, and perfect for handling straightforward tasks without needing deep engineering skills. When pulling data from various database types, this is my preferred approach.

However, when it comes to retrieving data from APIs, I would go for a notebook in general. Notebooks offer the flexibility and control needed to handle more complex logic and dynamic data structures, which makes them better suited for API integrations. Moreover with notebooks I can:

Import additional Python packages
Retrieve API keys securely from Key Vault
Build custom logic tailored to the API’s structure

Dataflows Gen2

Article content

Dataflow Gen2 in Microsoft Fabric is a powerful, low-code transformation tool built on Power Query, designed to ingest, clean, and reshape data from various sources into Lakehouses or Warehouses.

With over 140+ connectors, it offers even a broader connectivity than Copy Data, which is impressive. It delivers this functionality in a familiar, Excel-like interface, making it especially accessible for business users and analysts who don’t have deep coding expertise.

Why it works for me?

I see real value in building repeatable, business-friendly logic with Dataflow Gen2. It’s intuitive, visual, and great for collaboration across technical and non-technical teams.

But here’s the catch: Dataflow Gen2 consumes 4 to 8 times more Capacity Units (CU) compared to Notebooks or Copy Data activities. That’s a serious consideration, especially when working with large volumes or requiring frequent executions. Given my technical background and focus for efficiency, I almost always opt for notebooks or copy data activity unless there's a compelling reason to use the Dataflow Gen2.

When to use?

Dataflow Gen2 is only relevant when the other two options: Notebooks or Copy Data activities aren’t viable. I choose it in explicit, well-justified scenarios, after exploring and ruling out more efficient alternatives.

Even then, the final decision depends heavily on the client’s team setup, both functionally and technically. If the team includes business users who need to build and maintain ingestion logic without writing code, Dataflow Gen2 can be a great fit. But for engineer-led platforms or performance-critical workloads, I recommend sticking with Notebooks or Copy Data.

To summarize, the above.

Notebooks:

For API integration
When you need full control
To add validations and transformations
When you have a core engineering team for your data platform

Copy Data

When you want a 1:1 copy of source data
When the source is not an API
When the connector is available in Copy Data acitivity

Dataflow Gen2

When the connector is not available in Copy Data
When the team lacks engineering skills
For a business-friendly development experience

Want help choosing the best ingestion strategy for your specific use case? Connect with me or reach out via Wortell, happy to help!

And if you're joining European Microsoft Fabric Community Conference , hope to see you there!

Connect

Our author

Benito van Breugel

Benito is an experienced and enthusiastic Data & Analytics Consultant and Team Lead at Wortell, with a strong passion for data-driven working and Microsoft Fabric technology. With his extensive background as a Data Engineer, Architect, and Team Lead, he serves as a trusted analytical bridge between technology and business. Benito actively translates new developments in Data and AI into tangible value for clients. Energy, sharp insights, and impact are always at the core of his approach.

Contact me. See all articles of Benito.