Should I use Dataflows Gen2 or Fabric notebooks for ETL?

Use Dataflows Gen2 when: the transformations are achievable in Power Query (joins, filters, pivots, type conversions), the users are business analysts familiar with Power Query, and the data volumes are moderate. Use notebooks when: you need complex logic (ML, advanced statistics), data volumes are very large (billions of rows), or you need programming language flexibility (Python, Scala, SQL).

Can I migrate my Gen1 dataflows to Gen2?

There is no automated migration tool. You need to recreate dataflows in the Gen2 experience. The M code can be copied, but output configuration changes from CDM folders to Fabric lakehouse tables. Plan for testing all downstream dependencies (semantic models, reports) after migration.

How does query folding work in Dataflows Gen2?

Query folding in Gen2 works the same as in Power BI Desktop — the Power Query engine translates M operations into native source queries (SQL, OData, etc.) when possible. Gen2 adds automatic staging in OneLake, which means even non-foldable operations benefit from intermediate persistence rather than re-querying the source.

What is the maximum data volume for Dataflows Gen2?

Dataflows Gen2 can handle substantial data volumes, limited by your Fabric capacity size and refresh timeout settings. For very large datasets (hundreds of millions of rows), use incremental refresh to process only changed data. If you hit capacity limits, consider Fabric notebooks with Spark for parallel processing.

How do Dataflows Gen2 affect Fabric capacity consumption?

Dataflows Gen2 consume Fabric Capacity Units (CUs) during refresh operations. The compute cost depends on data volume, transformation complexity, and whether query folding is achieved. Staging operations also consume storage in OneLake. Monitor consumption via Fabric Capacity Metrics to optimize costs.

Power BI Dataflows Gen2: Self-Service ETL at Enterprise Scale

<h2>Dataflows Gen2: The Evolution of Self-Service ETL</h2> <p>Dataflows Gen2 in Microsoft Fabric represents a significant evolution from Power BI Dataflows Gen1. Built on the same Power Query engine that millions of users already know, Gen2 adds Fabric lakehouse output, enhanced compute, and tighter integration with the broader data platform.</p> <p><a href="/services/data-analytics">Data analytics consulting</a> helps enterprises design dataflow architectures that balance self-service flexibility with governance and performance requirements.</p>

<h2>Gen1 vs Gen2: Key Differences</h2> <p>Critical differences that affect enterprise architecture:</p> <ul> <li><strong>Output destination</strong> — Gen1 writes to CDM folders in Azure Data Lake Storage Gen2; Gen2 writes directly to Fabric lakehouse tables (Delta Lake format)</li> <li><strong>Compute engine</strong> — Gen2 uses Fabric Spark compute for enhanced mashup operations</li> <li><strong>Staging</strong> — Gen2 automatically stages data in OneLake for better performance</li> <li><strong>Integration</strong> — Gen2 is a first-class Fabric item, integrable with Data Pipelines, notebooks, and other Fabric items</li> <li><strong>Refresh</strong> — Gen2 supports orchestration via Data Pipelines with dependencies and error handling</li> <li><strong>Licensing</strong> — Gen1 requires Power BI Premium or PPU; Gen2 requires Fabric capacity</li> </ul>

<h2>Creating Dataflows in Fabric</h2> <p>Navigate to your Fabric workspace, select New > Dataflow Gen2. The Power Query Online editor opens with familiar transformation capabilities: connect to 200+ data sources, apply transforms, and configure output to lakehouse tables. The authoring experience is virtually identical to Power BI Desktop's Power Query editor.</p>

<h2>Power Query Online Capabilities</h2> <p>The full Power Query M language is available with additional capabilities in Gen2:</p> <ul> <li><strong>Query folding</strong> — Pushes transformations to source databases for optimal performance. <a href="/blog/query-folding-power-query-troubleshooting-guide-2026">Query folding troubleshooting</a> is critical for dataflow performance.</li> <li><strong>Diagram view</strong> — Visual representation of query dependencies and data lineage</li> <li><strong>AI Insights</strong> — Text analytics, vision, and Azure ML model scoring within Power Query</li> <li><strong>Schema detection</strong> — Automatic schema inference with explicit type mapping</li> </ul>

<h2>Incremental Refresh in Dataflows</h2> <p>Configure incremental refresh to process only new or changed data:</p> <ol> <li>Add RangeStart and RangeEnd parameters (DateTime type)</li> <li>Filter your source query using these parameters</li> <li>Enable incremental refresh in dataflow settings</li> <li>Configure the refresh and archive windows</li> </ol> <p>This is especially important for large datasets where full refresh would exceed timeout or capacity limits. See <a href="/blog/power-bi-incremental-refresh-data-partitioning-guide-2026">incremental refresh patterns</a> for detailed implementation guidance.</p>

<h2>Computed and Linked Entities</h2> <p>In Gen1, computed entities reference other entities within the same dataflow (avoiding re-querying the source). Linked entities reference entities from other dataflows. In Gen2, the staging lakehouse achieves similar benefits — all intermediate data is persisted in Delta Lake format, enabling downstream queries to read from the lakehouse rather than re-querying sources.</p>

<h2>Enterprise Dataflow Patterns</h2> <h3>Staging Pattern</h3> <p>Create a staging dataflow that ingests raw data from sources with minimal transformation. Downstream dataflows or notebooks then transform the staged data. This separates ingestion from transformation and provides a reusable data layer.</p>

<h3>Medallion Architecture</h3> <p>Use dataflows for Bronze (raw ingestion) and Silver (cleansed, conformed) layers. Gold-layer aggregations can be handled by dataflows or <a href="/blog/microsoft-fabric-notebooks-pyspark-data-engineering-2026">Fabric notebooks</a> depending on complexity.</p>

<h3>Shared Certified Dataflows</h3> <p>Publish certified dataflows that multiple teams can reference. This creates a governed self-service data layer where IT manages the core data pipelines and business users consume certified entities in their own dataflows or semantic models.</p>

<h2>Monitoring and Scheduling</h2> <p>Schedule dataflows individually or orchestrate them via Fabric Data Pipelines. Monitoring options include refresh history in the workspace, Fabric Monitoring Hub for capacity-level visibility, and custom monitoring via the <a href="/blog/power-bi-rest-api-automating-enterprise-operations-2026">REST API</a>. Configure alerts for refresh failures using Data Activator or Power Automate.</p>

<h2>Migration from Gen1 to Gen2</h2> <p>Migration involves recreating dataflows in the Gen2 experience. While direct migration tools are limited, the query logic (M code) can be copied. The main effort is redirecting output from CDM folders to Fabric lakehouse tables and updating downstream dependencies.</p>

<p>Ready to modernize your ETL with Dataflows Gen2? <a href="/contact">Contact EPC Group</a> for a free consultation on Fabric data engineering.</p>

Power BI Dataflows Gen2: Self-Service ETL at Enterprise Scale

Frequently Asked Questions

Should I use Dataflows Gen2 or Fabric notebooks for ETL?

Can I migrate my Gen1 dataflows to Gen2?

How does query folding work in Dataflows Gen2?

What is the maximum data volume for Dataflows Gen2?

How do Dataflows Gen2 affect Fabric capacity consumption?

Related Articles

Building a Modern Data Lakehouse with Microsoft Fabric

Getting Started with Fabric Notebooks and PySpark

Building ML Models in Microsoft Fabric

Related Services

Microsoft Fabric Consulting

Data Analytics

Architecture Consulting

Industry Solutions

Need Help With Power BI?

Ready to Transform Your Data Strategy?