Dataflows Gen2 in Microsoft Fabric
Microsoft Fabric
Microsoft Fabric9 min read

Dataflows Gen2 in Microsoft Fabric

Upgrade to Dataflows Gen2 in Microsoft Fabric for improved performance, OneLake integration, and simplified data preparation with Power Query Online.

By Administrator

Dataflows Gen2 in Microsoft Fabric fundamentally reimagine how organizations build and maintain data preparation pipelines. While they share the familiar Power Query interface from Gen1 and Excel, Gen2 dataflows run on an entirely different compute engine, output to OneLake in Delta format, and integrate natively with every Fabric workload. For organizations already using Power BI dataflows, Gen2 represents a significant upgrade in performance, flexibility, and governance. For organizations new to Fabric, Gen2 dataflows are the recommended starting point for no-code/low-code ETL.

Gen1 vs Gen2: What Changed

Understanding the architectural differences helps you appreciate why Gen2 is a meaningful upgrade, not just a version number:

| Capability | Dataflows Gen1 | Dataflows Gen2 | |---|---|---| | Compute engine | Power Query Online (mashup) | Fabric compute (optimized mashup + Spark) | | Output destination | Power BI dataset (internal) | Lakehouse, Warehouse, KQL Database, or dataset | | Storage format | Proprietary CDM format | Delta Parquet in OneLake | | Query by other tools | Only through Power BI | SQL, Spark, KQL, and Power BI | | Scheduling | Separate scheduling per dataflow | Orchestrated via Data Factory pipelines | | Cost model | Dedicated capacity per dataflow refresh | Shared Fabric capacity units | | Query folding visibility | Limited | Enhanced diagnostics with step-level indicators | | Staging | Automatic (opaque) | Configurable with explicit staging Lakehouses |

The most significant change is the output destination. Gen1 dataflows load data into an internal storage layer accessible only through Power BI. Gen2 dataflows write Delta tables to OneLake Lakehouses, making the prepared data accessible to Spark notebooks, SQL queries, KQL analytics, and Direct Lake semantic models—all without data duplication.

When to Use Dataflows Gen2

Dataflows Gen2 excel in specific scenarios within the broader Fabric data pipeline landscape:

Ideal Use Cases

  • Data preparation for analysts: Business analysts who know Power Query but not Python/SQL can build reusable data transformations
  • Source connectivity: Power Query supports 300+ connectors, often the easiest way to pull data from SaaS applications, APIs, and legacy databases
  • Reusable transformations: Multiple Lakehouses or semantic models need the same cleaned and standardized data—build once in a dataflow, output to a shared Lakehouse
  • Incremental refresh patterns: Gen2 supports incremental refresh for large source tables, loading only new or changed rows on each run

When to Use Notebooks Instead

  • Complex transformations: Multi-step joins across many tables, window functions, or custom algorithms are easier in PySpark
  • Large data volumes: Datasets exceeding 10GB per table refresh faster in Spark notebooks with optimized partitioning
  • Version control: Notebook code integrates with Git; Power Query M code in dataflows has limited version control support
  • Testing and validation: Notebooks enable data quality assertions, unit testing, and parameterized execution

When to Use Data Pipelines Instead

  • Orchestration: If you need conditional logic (if dataset A refreshes successfully, then refresh dataset B), use Data Factory pipelines to orchestrate dataflows
  • Copy activity: For simple data movement without transformation (e.g., copy files between storage accounts), Copy Activity is faster than a dataflow

Building a Dataflow Gen2

Step 1: Create and Configure Destination

In your Fabric workspace, select New > Dataflow Gen2. The critical first decision is your output destination—typically a Lakehouse in the same workspace. You can configure different destinations per query within the same dataflow, allowing a single dataflow to populate multiple Lakehouse tables.

Step 2: Connect to Data Sources

Use the Power Query "Get Data" experience to connect to your sources. Common enterprise sources include:

  • Databases: SQL Server, Azure SQL, PostgreSQL, Oracle, MySQL
  • Cloud services: Salesforce, Dynamics 365, SharePoint, Dataverse
  • Files: Excel, CSV, JSON, Parquet from ADLS Gen2, S3, or local gateways
  • APIs: REST and OData endpoints with pagination support

For on-premises sources, configure a Fabric data gateway. The gateway runs as a Windows service in your network and securely tunnels queries to on-premises databases without opening inbound firewall ports.

Step 3: Transform with Power Query

Apply transformations using the visual Power Query editor:

  • Column operations: Rename, reorder, change type, split, merge, remove
  • Row operations: Filter, sort, remove duplicates, remove errors, keep/remove top/bottom rows
  • Combine operations: Merge (join) queries by key columns, append (union) queries with matching schemas
  • Pivot/Unpivot: Reshape data between wide and narrow formats
  • Custom columns: Add calculated columns using M expressions
  • Group by: Aggregate rows by key columns with SUM, COUNT, AVG, MIN, MAX

Step 4: Optimize with Query Folding

Query folding is the process where Power Query translates your transformation steps into native source queries (SQL for databases, API parameters for services). When folding occurs, the source database performs the work—dramatically reducing data transfer and processing time.

Monitor folding status using the step-level indicators in Gen2:

| Indicator | Meaning | Action | |---|---|---| | Green checkmark | Step is folded to source | Optimal—no action needed | | Yellow warning | Step partially folds | Review—consider reordering steps | | Red X | Step breaks folding | All subsequent steps run in-memory; move this step later if possible |

Common folding breakers: custom M functions, certain text transformations, pivot/unpivot on some sources, and merge with non-database queries. When folding breaks, all data from that point forward is processed in the Fabric compute engine rather than at the source.

Step 5: Configure Load Settings

For each query, configure the destination table settings:

  • Update method: Replace (truncate and reload) or Append (add new rows to existing data)
  • Table name: Maps to the Delta table name in your Lakehouse
  • Column mapping: Verify source columns map to correct destination columns and types

For incremental patterns, configure the incremental refresh settings specifying the date column, refresh period, and detection method.

Migration from Gen1 to Gen2

Migration Planning

  1. Inventory Gen1 dataflows: List all dataflows, their refresh schedules, and downstream dependencies
  2. Identify destination changes: Gen1 outputs go to internal storage; Gen2 outputs go to Lakehouses. Downstream datasets and reports must be reconfigured.
  3. Test query folding: Gen2 has a different folding engine. Some queries that folded in Gen1 may not fold in Gen2 (and vice versa). Test each query.
  4. Plan gateway changes: Gen2 uses Fabric gateways (VNet data gateways) in addition to on-premises gateways. Evaluate which gateway type suits each source.

Migration Execution

  • Recreate dataflows in Gen2 by copying M queries from Gen1
  • Configure Lakehouse destinations for each query
  • Update downstream semantic models to point to Lakehouse tables instead of Gen1 dataflow entities
  • Run parallel refreshes (Gen1 and Gen2) for a validation period
  • Decommission Gen1 dataflows after confirming data accuracy

Performance Best Practices

  • Maximize query folding: Place foldable steps (filters, column selection, type changes) before non-foldable steps
  • Use staging Lakehouses: Configure an explicit staging Lakehouse for intermediate results to avoid memory pressure
  • Limit query count per dataflow: Keep to 10-15 queries per dataflow; split large dataflows into multiple smaller ones for parallelism
  • Schedule during off-peak hours: Gen2 dataflows consume Fabric capacity; schedule large refreshes when other workloads are quiet
  • Monitor refresh duration: Use the Fabric Monitoring Hub to track refresh times and identify queries that degrade over time

Related Resources

Frequently Asked Questions

Should I migrate from Dataflows Gen1 to Gen2?

Yes, Gen2 offers significant performance improvements, OneLake integration, and better Fabric compatibility. Plan your migration to take advantage of these benefits.

Can Dataflows Gen2 replace traditional ETL tools?

For many scenarios, yes. Gen2 dataflows handle common ETL patterns well. For complex enterprise requirements, you might combine with Data Factory pipelines for orchestration.

Microsoft FabricDataflowsETLData Preparation

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.