Dataflows Gen2 Migration Guide: Upgrading Power BI Dataflows to Microsoft Fabric
Microsoft Fabric
Microsoft Fabric18 min read

Dataflows Gen2 Migration Guide: Upgrading Power BI Dataflows to Microsoft Fabric

Migrate from Power BI Dataflows Gen1 to Dataflows Gen2 in Fabric for enhanced performance, Spark transformations, and OneLake integration.

By Errin O'Connor, Chief AI Architect

Dataflows Gen2 in Microsoft Fabric is the next-generation cloud-based data transformation engine that replaces Power BI Dataflows Gen1 with Apache Spark processing, native OneLake storage, and full Fabric ecosystem integration. If you are planning a Dataflows Gen2 migration, the key steps are: inventory your Gen1 dataflows, assess compatibility, run side-by-side validation, and switch consumers once outputs match. I have led over 40 Dataflows Gen2 migrations across enterprise clients in healthcare, finance, and manufacturing, and this guide covers every lesson learned along the way.

The migration from Gen1 to Gen2 is not optional for organizations serious about Microsoft Fabric. Gen1 dataflows continue to work, but they receive no new features, cannot output to OneLake, and lack the Spark-based processing that makes Gen2 transformations 5-10x faster on large datasets. In my experience, organizations that delay migration accumulate technical debt that makes the eventual transition harder. One financial services client waited 18 months and ended up with 87 Gen1 dataflows so tightly coupled that migration took 3x longer than it would have at the 30-dataflow mark.

Gen1 vs Gen2: Key Differences

Understanding what changed helps plan an effective migration. Here is a comprehensive comparison:

FeatureGen1 (Power BI)Gen2 (Fabric)
StorageInternal Azure Data Lake (Power BI only)OneLake Delta tables (accessible everywhere)
ProcessingPower Query mashup engine onlyMashup engine + Apache Spark
SchedulingPower BI built-in refreshFabric Data Pipelines orchestration
MonitoringBasic success/failure/durationSpark logs, step-level timing, Monitoring Hub
OutputPower BI internal storage onlyLakehouse, Warehouse, KQL databases
IncrementalBasic incremental refreshSpark-based flexible incremental processing
ComputePower BI capacityFabric capacity (CU-based)

Storage is the most impactful difference. Gen1 stores transformed data in an internal Azure Data Lake that is only accessible through Power BI. Gen2 stores data as Delta tables in OneLake, making it accessible from Lakehouses, Warehouses, Notebooks, and any tool that can read Delta format. This single change opens up your transformed data to the entire Fabric ecosystem.

Processing Engine: Gen1 uses the Power Query mashup engine exclusively. Gen2 uses the mashup engine for most transformations but can also leverage Apache Spark for heavy processing. I have measured this hybrid approach delivering 8x performance improvement on a 200M-row healthcare claims dataset that took 4 hours in Gen1 and 28 minutes in Gen2.

Scheduling: Gen1 uses Power BI's built-in refresh scheduling. Gen2 integrates with Fabric Data Pipelines, enabling complex orchestration - trigger dataflow after upstream pipeline completes, chain multiple dataflows with conditional logic, and implement retry policies. For one manufacturing client, we replaced 12 separate Gen1 schedules with a single pipeline that orchestrates all dataflows with dependency awareness and automatic retry on transient failures.

Monitoring: Gen1 provides basic refresh history (success/failure/duration). Gen2 provides detailed Spark execution logs, step-level timing, and integration with Fabric Monitoring Hub for comprehensive observability. You can now see exactly which transformation step is slow, which is a game-changer for debugging.

Migration Assessment

Before migrating, inventory your Gen1 dataflows thoroughly. I recommend creating a spreadsheet with every dataflow and scoring each on four dimensions.

Complexity Audit: Catalog each dataflow by number of entities (tables), transformation complexity, data volume, refresh frequency, and downstream dependencies (which datasets connect to this dataflow). In my experience, organizations typically have 3-5 "mission critical" dataflows that feed executive dashboards and 20-50 departmental dataflows with fewer dependencies. Prioritize accordingly.

Compatibility Check: Most Power Query M transformations work identically in Gen2. However, some features have differences: - Custom connectors may need updating for Fabric compatibility - On-premises gateway connections work differently in Fabric - Enhanced compute engine features from Gen1 are replaced by Spark in Gen2 - Some M functions that relied on Gen1-specific behavior may need adjustment

I have found that 85-90% of Gen1 M code migrates without changes. The remaining 10-15% typically involves gateway configurations and custom connector updates.

Dependency Mapping: Document which Power BI datasets, reports, and other dataflows depend on each Gen1 dataflow. Migration must maintain these connections or establish new ones. Draw a dependency graph - even a simple one in Visio or draw.io saves hours of debugging later.

Capacity Assessment: Gen2 dataflows run on Fabric capacity (CU-based). Estimate your required capacity by benchmarking Gen1 refresh durations and data volumes. A rough rule: F64 capacity handles 20-30 concurrent Gen2 dataflow refreshes for mid-sized datasets (1-50M rows each).

Migration Strategy

Approach 1: Side-by-Side Migration (Recommended)

Create Gen2 versions alongside Gen1, validate outputs match, then switch consumers:

  1. Create a new Gen2 dataflow in a Fabric workspace
  2. Copy the Power Query M code from Gen1 entities to Gen2 entities
  3. Configure Gen2 output destination (Lakehouse table recommended)
  4. Run both Gen1 and Gen2 on the same schedule for 1-2 weeks
  5. Compare outputs to verify data matches (row counts, checksums, spot-check values)
  6. Redirect consuming datasets to the Gen2 output
  7. Decommission Gen1 after confirmed stability (keep disabled for 30 days as rollback)

This is the safest approach but requires temporary double processing. I use this for every production dataflow - the extra cost of 2 weeks of parallel processing is negligible compared to the risk of a bad migration breaking executive dashboards.

Approach 2: In-Place Upgrade

For simple dataflows with straightforward transformations and low business impact:

  1. Note the Gen1 dataflow configuration (M code, schedule, connections)
  2. Delete the Gen1 dataflow
  3. Create a Gen2 dataflow with the same logic
  4. Reconfigure downstream datasets to connect to the new output location
  5. Test and validate within 24 hours

Faster but riskier - no parallel validation period. I only recommend this for dev/test dataflows or dataflows that feed non-critical reports.

Approach 3: Phased Migration (Enterprise Scale)

For organizations with dozens or hundreds of dataflows, this is the only practical approach:

  • Phase 1 (Weeks 1-2): Migrate standalone dataflows with no dependencies (typically 30-40% of total)
  • Phase 2 (Weeks 3-4): Migrate dataflows that are dependencies for Phase 1 outputs
  • Phase 3 (Weeks 5-8): Migrate complex dataflows with custom connectors or gateway dependencies
  • Phase 4 (Week 9-10): Decommission Gen1 infrastructure and clean up

For a recent financial services client with 87 Gen1 dataflows, we completed the phased migration in 10 weeks with zero production incidents.

Common Migration Challenges

Gateway Changes: Gen1 dataflows using on-premises data gateways may need gateway reconfiguration for Fabric. Verify gateway compatibility with Fabric before migrating. I have seen organizations skip this step and lose 2 weeks troubleshooting gateway connectivity issues mid-migration.

Output Format Changes: Gen1 outputs are consumed differently than Gen2 Lakehouse tables. Datasets connecting to Gen1 dataflows need reconfiguration to read from Lakehouse tables via Direct Lake or Import mode. This is the most time-consuming step for datasets with complex relationships.

Scheduling Differences: Gen1 refresh schedules do not migrate automatically. Recreate schedules in Fabric or configure pipeline-based triggering. Document all schedules before starting migration.

Custom Functions: Gen1 dataflows with shared Power Query functions must recreate those functions in the Gen2 environment. If you have a library of reusable M functions, migrate those first as a foundation.

Incremental Refresh: Gen1 incremental refresh policies need translation to Gen2 equivalents. Gen2 offers more flexible incremental processing through Spark, which often means you can improve the incremental strategy during migration rather than simply replicating it.

Data Type Mismatches: Delta table types are stricter than Gen1 internal storage. I have encountered issues where Gen1 silently handled mixed types in a column (some rows text, some numeric) that Gen2 rejects. Audit data types before migration.

Post-Migration Optimization

After successful migration, take advantage of Gen2 capabilities that were not available in Gen1:

  • Lakehouse Integration: Gen2 outputs are instantly queryable from Lakehouse SQL endpoint - no additional loading required. This alone saved one client 6 hours of daily processing time
  • Pipeline Orchestration: Replace simple schedules with pipeline-driven execution for complex dependency chains. Add error handling, retry logic, and notification steps
  • Spark Processing: For large-volume transformations (100M+ rows), switch from mashup engine to Spark notebooks for 10x performance improvement. We migrated a 500M-row healthcare dataset from mashup to Spark and reduced processing from 8 hours to 45 minutes
  • Monitoring: Use Fabric Monitoring Hub for detailed execution insights across all dataflows. Set up alerts for refresh failures and performance degradation
  • Cost Optimization: Gen2 on Fabric capacity is often 30-40% cheaper than equivalent Gen1 on Premium capacity, especially when you consolidate multiple Gen1 Premium workspaces onto shared Fabric capacity

Migration Checklist

Use this checklist to track your migration progress:

  • Inventory all Gen1 dataflows (count, complexity, dependencies)
  • Document downstream consumers for each dataflow
  • Verify gateway compatibility with Fabric
  • Estimate Fabric capacity requirements
  • Create Fabric workspace and Lakehouse for Gen2 outputs
  • Migrate and validate Phase 1 (standalone dataflows)
  • Migrate and validate Phase 2 (dependent dataflows)
  • Migrate and validate Phase 3 (complex dataflows)
  • Reconfigure all downstream datasets
  • Decommission Gen1 dataflows (disable, then delete after 30 days)
  • Optimize Gen2 dataflows with Spark and pipeline orchestration

Related Resources

Frequently Asked Questions

What are the main differences between Dataflows Gen1 and Gen2?

Dataflows Gen2 (Fabric) adds several capabilities beyond Gen1 (Power BI): (1) Storage location—Gen2 stores data in OneLake as Delta tables (queryable from lakehouses), Gen1 stores in internal Azure Data Lake, (2) Transformation engine—Gen2 supports both Power Query mashup engine AND Apache Spark notebooks, Gen1 only supports Power Query, (3) Scheduling—Gen2 integrates with Fabric pipelines and job scheduler, Gen1 uses Power BI refresh schedules, (4) Monitoring—Gen2 provides detailed Spark logs and metrics, Gen1 has limited diagnostics, and (5) Capacity—Gen2 runs on Fabric capacity (any F-SKU), Gen1 requires Power BI Premium. Both support Power Query M language and incremental refresh. Gen2 is the future direction—Gen1 will continue working but receive no new features. Migration is one-way—you cannot downgrade Gen2 back to Gen1.

Will my existing dataflows stop working if I do not migrate to Gen2?

No, Power BI Dataflows Gen1 will continue functioning indefinitely—Microsoft has not announced any deprecation timeline. However, all new Dataflow features are only available in Gen2 (Spark transformations, OneLake integration, advanced scheduling). For organizations staying on Power BI Premium (not migrating to Fabric), Gen1 dataflows remain fully supported. Migration is recommended when: (1) Moving to Microsoft Fabric capacity, (2) Needing Spark-based transformations for complex data engineering, (3) Wanting to query dataflow output in lakehouses/warehouses, or (4) Requiring advanced monitoring and pipeline orchestration. If current Gen1 dataflows meet your needs and you are staying on Power BI Premium, migration is optional. Most organizations migrate as part of broader Fabric adoption, not due to Gen1 limitations.

How long does a typical dataflow migration from Gen1 to Gen2 take?

Simple dataflows (5-10 tables, basic Power Query transformations, no complex dependencies) migrate in 1-2 hours per dataflow. Complex dataflows (50+ tables, computed entities, incremental refresh, linked entities) require 1-2 days each. Process: (1) Export Gen1 dataflow as Power Query template (30 minutes), (2) Create new Gen2 dataflow in Fabric workspace (15 minutes), (3) Import template and reconnect data sources (1-2 hours), (4) Reconfigure incremental refresh and parameters (30 minutes), (5) Test refresh and validate output (1-2 hours), (6) Update downstream Power BI reports to use Gen2 dataflow (30 minutes per report). Enterprises with dozens of dataflows typically migrate 2-3 per week, completing full migration in 2-3 months. Parallel migration possible for independent dataflows. Allow extra time for testing—always validate row counts and data quality match Gen1 before retiring old dataflow.

Microsoft FabricDataflowsMigrationOneLakePower Query

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.