Should I migrate from Dataflows Gen1 to Gen2?

Yes, Gen2 offers significant performance improvements, OneLake integration, and better Fabric compatibility. Plan your migration to take advantage of these benefits.

Can Dataflows Gen2 replace traditional ETL tools?

For many scenarios, yes. Gen2 dataflows handle common ETL patterns well. For complex enterprise requirements, you might combine with Data Factory pipelines for orchestration.

Dataflows Gen2 in Microsoft Fabric

Dataflows Gen2 in Microsoft Fabric fundamentally reimagine how organizations build and maintain data preparation pipelines. While they share the familiar Power Query interface from Gen1 and Excel, Gen2 dataflows run on an entirely different compute engine, output to OneLake in Delta format, and integrate natively with every Fabric workload. For organizations already using Power BI dataflows, Gen2 represents a significant upgrade in performance, flexibility, and governance. For organizations new to Fabric, Gen2 dataflows are the recommended starting point for no-code/low-code ETL.

Gen1 vs Gen2: What Changed

Understanding the architectural differences helps you appreciate why Gen2 is a meaningful upgrade, not just a version number:

| Capability | Dataflows Gen1 | Dataflows Gen2 | |---|---|---| | Compute engine | Power Query Online (mashup) | Fabric compute (optimized mashup + Spark) | | Output destination | Power BI dataset (internal) | Lakehouse, Warehouse, KQL Database, or dataset | | Storage format | Proprietary CDM format | Delta Parquet in OneLake | | Query by other tools | Only through Power BI | SQL, Spark, KQL, and Power BI | | Scheduling | Separate scheduling per dataflow | Orchestrated via Data Factory pipelines | | Cost model | Dedicated capacity per dataflow refresh | Shared Fabric capacity units | | Query folding visibility | Limited | Enhanced diagnostics with step-level indicators | | Staging | Automatic (opaque) | Configurable with explicit staging Lakehouses |

The most significant change is the output destination. Gen1 dataflows load data into an internal storage layer accessible only through Power BI. Gen2 dataflows write Delta tables to OneLake Lakehouses, making the prepared data accessible to Spark notebooks, SQL queries, KQL analytics, and Direct Lake semantic models—all without data duplication.

When to Use Dataflows Gen2

Dataflows Gen2 excel in specific scenarios within the broader Fabric data pipeline landscape:

Ideal Use Cases

Data preparation for analysts: Business analysts who know Power Query but not Python/SQL can build reusable data transformations
Source connectivity: Power Query supports 300+ connectors, often the easiest way to pull data from SaaS applications, APIs, and legacy databases
Reusable transformations: Multiple Lakehouses or semantic models need the same cleaned and standardized data—build once in a dataflow, output to a shared Lakehouse
Incremental refresh patterns: Gen2 supports incremental refresh for large source tables, loading only new or changed rows on each run

When to Use Notebooks Instead

Complex transformations: Multi-step joins across many tables, window functions, or custom algorithms are easier in PySpark
Large data volumes: Datasets exceeding 10GB per table refresh faster in Spark notebooks with optimized partitioning
Version control: Notebook code integrates with Git; Power Query M code in dataflows has limited version control support
Testing and validation: Notebooks enable data quality assertions, unit testing, and parameterized execution

When to Use Data Pipelines Instead

Orchestration: If you need conditional logic (if dataset A refreshes successfully, then refresh dataset B), use Data Factory pipelines to orchestrate dataflows
Copy activity: For simple data movement without transformation (e.g., copy files between storage accounts), Copy Activity is faster than a dataflow

Building a Dataflow Gen2

Step 1: Create and Configure Destination

In your Fabric workspace, select New > Dataflow Gen2. The critical first decision is your output destination—typically a Lakehouse in the same workspace. You can configure different destinations per query within the same dataflow, allowing a single dataflow to populate multiple Lakehouse tables.

Step 2: Connect to Data Sources

Use the Power Query "Get Data" experience to connect to your sources. Common enterprise sources include:

Databases: SQL Server, Azure SQL, PostgreSQL, Oracle, MySQL
Cloud services: Salesforce, Dynamics 365, SharePoint, Dataverse
Files: Excel, CSV, JSON, Parquet from ADLS Gen2, S3, or local gateways
APIs: REST and OData endpoints with pagination support

For on-premises sources, configure a Fabric data gateway. The gateway runs as a Windows service in your network and securely tunnels queries to on-premises databases without opening inbound firewall ports.

Step 3: Transform with Power Query

Apply transformations using the visual Power Query editor:

Column operations: Rename, reorder, change type, split, merge, remove
Row operations: Filter, sort, remove duplicates, remove errors, keep/remove top/bottom rows
Combine operations: Merge (join) queries by key columns, append (union) queries with matching schemas
Pivot/Unpivot: Reshape data between wide and narrow formats
Custom columns: Add calculated columns using M expressions
Group by: Aggregate rows by key columns with SUM, COUNT, AVG, MIN, MAX

Step 4: Optimize with Query Folding

Query folding is the process where Power Query translates your transformation steps into native source queries (SQL for databases, API parameters for services). When folding occurs, the source database performs the work—dramatically reducing data transfer and processing time.

Monitor folding status using the step-level indicators in Gen2:

| Indicator | Meaning | Action | |---|---|---| | Green checkmark | Step is folded to source | Optimal—no action needed | | Yellow warning | Step partially folds | Review—consider reordering steps | | Red X | Step breaks folding | All subsequent steps run in-memory; move this step later if possible |

Common folding breakers: custom M functions, certain text transformations, pivot/unpivot on some sources, and merge with non-database queries. When folding breaks, all data from that point forward is processed in the Fabric compute engine rather than at the source.

Step 5: Configure Load Settings

For each query, configure the destination table settings:

Update method: Replace (truncate and reload) or Append (add new rows to existing data)
Table name: Maps to the Delta table name in your Lakehouse
Column mapping: Verify source columns map to correct destination columns and types

For incremental patterns, configure the incremental refresh settings specifying the date column, refresh period, and detection method.

Migration from Gen1 to Gen2

Migration Planning

Inventory Gen1 dataflows: List all dataflows, their refresh schedules, and downstream dependencies
Identify destination changes: Gen1 outputs go to internal storage; Gen2 outputs go to Lakehouses. Downstream datasets and reports must be reconfigured.
Test query folding: Gen2 has a different folding engine. Some queries that folded in Gen1 may not fold in Gen2 (and vice versa). Test each query.
Plan gateway changes: Gen2 uses Fabric gateways (VNet data gateways) in addition to on-premises gateways. Evaluate which gateway type suits each source.

Migration Execution

Recreate dataflows in Gen2 by copying M queries from Gen1
Configure Lakehouse destinations for each query
Update downstream semantic models to point to Lakehouse tables instead of Gen1 dataflow entities
Run parallel refreshes (Gen1 and Gen2) for a validation period
Decommission Gen1 dataflows after confirming data accuracy

Performance Best Practices

Maximize query folding: Place foldable steps (filters, column selection, type changes) before non-foldable steps
Use staging Lakehouses: Configure an explicit staging Lakehouse for intermediate results to avoid memory pressure
Limit query count per dataflow: Keep to 10-15 queries per dataflow; split large dataflows into multiple smaller ones for parallelism
Schedule during off-peak hours: Gen2 dataflows consume Fabric capacity; schedule large refreshes when other workloads are quiet
Monitor refresh duration: Use the Fabric Monitoring Hub to track refresh times and identify queries that degrade over time

Dataflows Gen2 in Microsoft Fabric

Gen1 vs Gen2: What Changed

When to Use Dataflows Gen2

Ideal Use Cases

When to Use Notebooks Instead

When to Use Data Pipelines Instead

Building a Dataflow Gen2

Step 1: Create and Configure Destination

Step 2: Connect to Data Sources

Step 3: Transform with Power Query

Step 4: Optimize with Query Folding

Step 5: Configure Load Settings

Migration from Gen1 to Gen2

Migration Planning

Migration Execution

Performance Best Practices

Related Resources

Frequently Asked Questions

Should I migrate from Dataflows Gen1 to Gen2?

Can Dataflows Gen2 replace traditional ETL tools?

Related Articles

Getting Started with Microsoft Fabric: A Complete Guide for 2025

Microsoft Fabric Capacity Planning Guide

Microsoft Fabric: Warehouse vs Lakehouse - When to Use Each

Related Services

Microsoft Fabric Consulting

Data Analytics

Architecture Consulting

Industry Solutions

Need Help With Power BI?

Ready to Transform Your Data Strategy?