Data Factory in Microsoft Fabric: Complete Pipeline Guide
Data Engineering
Data Engineering13 min read

Data Factory in Microsoft Fabric: Complete Pipeline Guide

Master Data Factory in Fabric — data pipelines, Dataflows Gen2, connectors, scheduling, monitoring, and migration from Azure Data Factory.

By Errin O'Connor, Chief AI Architect

Data Factory in Microsoft Fabric provides visual data integration and orchestration capabilities for building ETL/ELT pipelines. With 1,000 monthly searches for "data factory," this workload is central to enterprise data engineering in Fabric.

What Is Data Factory in Fabric?

Data Factory in Fabric brings the pipeline and dataflow capabilities of Azure Data Factory into the unified Fabric experience. It provides:

  • Data Pipelines: Orchestrate data movement and transformation with a visual designer
  • Dataflows Gen2: Self-service data preparation with Power Query (no-code)
  • 200+ Connectors: Connect to cloud services, databases, files, and APIs
  • Scheduling: Automate data loads on time-based or event-based triggers
  • Monitoring: Track pipeline runs, errors, and performance

Data Pipelines vs Dataflows Gen2

FeatureData PipelinesDataflows Gen2
InterfaceVisual pipeline designerPower Query editor
Skill levelData engineerBusiness analyst
ScaleEnterprise ETL/ELTSelf-service data prep
CodingNo-code + expressionsNo-code + M language
OutputAny Fabric destinationLakehouse or Warehouse
SchedulingTime + event triggersTime triggers
Error handlingAdvanced (retry, branching)Basic

Building Your First Pipeline

Step 1: Create a Pipeline 1. Open a Fabric workspace 2. Click New → Data Pipeline 3. Name your pipeline (e.g., "Daily Sales Load")

Step 2: Add Activities Drag activities from the toolbox onto the canvas: - Copy Data: Move data between sources and destinations - Dataflow: Run a Dataflows Gen2 transformation - Notebook: Execute a Spark notebook - Stored Procedure: Run SQL in a warehouse - ForEach: Loop over a set of items - If Condition: Branch based on expressions - Web: Call REST APIs - Wait: Pause execution

Step 3: Configure Copy Data 1. Set Source: connection, table/query, authentication 2. Set Destination: Lakehouse table, warehouse table, or files 3. Configure mapping: column mapping, data types 4. Set performance: parallel copies, staging

Step 4: Add Scheduling 1. Click Schedule on the pipeline toolbar 2. Set frequency: hourly, daily, weekly 3. Set time zone and start time 4. Enable/disable as needed

Step 5: Monitor View pipeline runs in the Monitoring Hub: - Run status (succeeded, failed, in progress) - Duration and data volumes - Error messages and retry counts - Activity-level details

Dataflows Gen2: Self-Service ETL

Dataflows Gen2 use the Power Query interface familiar to Excel and Power BI users:

  1. Create New → Dataflow Gen2
  2. Connect to data source
  3. Transform data (filter, merge, pivot, calculate)
  4. Set destination (Lakehouse or Warehouse table)
  5. Schedule refresh

Key advantage: Business analysts can build data pipelines without learning Spark or SQL.

See our Power Query guide for transformation techniques.

Migration from Azure Data Factory

If you're currently using Azure Data Factory (ADF):

What Migrates Easily - Copy Data activities (same interface) - Pipeline orchestration patterns - Most connectors are available - Scheduling and triggers

What Changes - Linked Services → Connections (simplified) - Integration Runtimes → Managed within Fabric - Mapping Data Flows → Dataflows Gen2 (different engine) - Storage → OneLake replaces ADLS

Migration Approach 1. Assess current ADF pipelines and prioritize by business value 2. Recreate high-priority pipelines in Fabric Data Factory 3. Test data quality and performance 4. Transition scheduling and decommission ADF pipelines 5. Retain ADF for unsupported scenarios (some connectors)

Best Practices

  1. Use Dataflows Gen2 for simple transformations — Don't over-engineer with pipelines when Power Query suffices
  2. Use pipelines for orchestration — Coordinate notebooks, stored procedures, and dataflows
  3. Implement medallion architecture — Bronze (raw) → Silver (cleaned) → Gold (business-ready)
  4. Monitor actively — Set up alerts for pipeline failures
  5. Version control — Use Fabric Git integration for pipeline version history

Our Microsoft Fabric consulting team specializes in data pipeline design and migration from Azure Data Factory. Contact us for a migration assessment.

## Architecture Considerations

Selecting the right architecture pattern for your implementation determines long-term scalability, performance, and total cost of ownership. These architectural decisions should be made early and revisited quarterly as your environment evolves.

Data Model Design: Star schema is the foundation of every performant Power BI implementation. Separate your fact tables (transactions, events, measurements) from dimension tables (customers, products, dates, geography) and connect them through single-direction one-to-many relationships. Organizations that skip proper modeling and use flat, denormalized tables consistently report 3-5x slower query performance and significantly higher capacity costs.

**Storage Mode Selection**: Choose between Import, DirectQuery, Direct Lake, and Composite models based on your data freshness requirements and volume. Import mode delivers the fastest query performance but requires scheduled refreshes. DirectQuery provides real-time data but shifts compute to the source system. Direct Lake, available with Microsoft Fabric, combines the performance of Import with the freshness of DirectQuery by reading Delta tables directly from OneLake.

Workspace Strategy: Organize workspaces by business function (Sales Analytics, Finance Reporting, Operations Dashboard) rather than by technical role. Assign each workspace to the appropriate capacity tier based on usage patterns. Implement deployment pipelines for workspaces that support Dev/Test/Prod promotion to prevent untested changes from reaching business users.

**Gateway Architecture**: For hybrid environments connecting to on-premises data sources, deploy gateways in a clustered configuration across at least two servers for high availability. Size gateway servers based on concurrent refresh and DirectQuery load. Monitor gateway performance through the Power BI management tools and scale proactively when CPU utilization consistently exceeds 60%.

Security and Compliance Framework

Enterprise Power BI deployments in regulated industries must satisfy stringent security and compliance requirements. This framework, refined through implementations in healthcare (HIPAA), financial services (SOC 2, SEC), and government (FedRAMP), provides the controls necessary to pass audits and protect sensitive data.

Authentication and Authorization: Enforce Azure AD Conditional Access policies for Power BI access. Require multi-factor authentication for all users, restrict access from unmanaged devices, and block access from untrusted locations. Layer workspace-level access controls with item-level sharing permissions to implement least-privilege access across your entire Power BI environment.

Data Protection: Implement Microsoft Purview sensitivity labels on Power BI semantic models and reports containing confidential data. Labels enforce encryption, restrict export capabilities, and add visual markings that persist when content is exported or shared. Configure Data Loss Prevention policies to detect and prevent sharing of reports containing sensitive data patterns such as Social Security numbers, credit card numbers, or protected health information.

**Audit and Monitoring**: Enable unified audit logging in the Microsoft 365 compliance center to capture every Power BI action including report views, data exports, sharing events, and administrative changes. Export audit logs to your SIEM solution for correlation with other security events. Configure alerts for high-risk activities such as bulk data exports, sharing with external users, or privilege escalation. Our managed analytics services include continuous security monitoring as a standard capability.

Data Residency: For organizations with data sovereignty requirements, configure Power BI tenant settings to restrict data storage to specific geographic regions. Verify that your Premium or Fabric capacity is provisioned in the correct region and that cross-region data flows comply with your regulatory obligations. ## Enterprise Best Practices

Every enterprise Power BI deployment we have managed over the past 25 years reinforces the same truth: technology without governance and adoption strategy delivers a fraction of its potential value. These practices, refined across implementations in healthcare and government, are the ones that separate successful analytics programs from expensive shelf-ware.

  • Standardize Naming Conventions Across All Models: Every table, column, measure, and calculated column should follow a consistent naming convention documented in your style guide. Use business-friendly names (Total Revenue, not SUM_REV_AMT). Standardized naming improves Copilot accuracy by 40% and makes reports self-documenting for new team members joining the organization.
  • Implement Incremental Refresh for Large Datasets: For datasets exceeding 10 million rows, incremental refresh reduces processing time by 80-95% by only refreshing new and changed data. Configure partition boundaries based on your data update patterns and test thoroughly before deploying to production. This optimization alone can reduce your capacity consumption by half.
  • **Design Mobile-First Dashboards**: Over 35% of enterprise Power BI consumption now occurs on mobile devices. Design dedicated mobile layouts for every critical dashboard, prioritize the top 3-5 KPIs for small screens, and test on actual devices before publishing. Our dashboard development team creates responsive layouts optimized for every screen size used in your organization.
  • Establish Data Quality Gates at Every Pipeline Stage: Implement automated data quality checks that validate row counts, check for null values in key fields, verify referential integrity, and flag statistical outliers. Data quality gates catch issues before they reach executive dashboards and erode trust in the entire analytics platform.
  • Document Everything in a Living Data Dictionary: Maintain a data dictionary that defines every measure, its business context, its calculation logic, and its data source. Update the dictionary with every model change. Teams with comprehensive documentation onboard new analysts 60% faster and reduce measure duplication by 75% because developers can find existing calculations instead of rebuilding them.
  • Schedule Regular Architecture Reviews: Conduct quarterly reviews of your Power BI architecture with stakeholders from IT, business units, and leadership. Assess whether the current setup meets evolving requirements, identify performance bottlenecks, and plan capacity upgrades before they become urgent.

ROI and Success Metrics

Tracking the right metrics ensures your Power BI investment delivers sustained business value rather than becoming another underutilized technology platform. Enterprises working with our analytics team measure success across these dimensions:

  • Time-to-insight reduction of 65-80% compared to legacy reporting workflows. Decisions that previously required 2-week report development cycles now happen in hours with interactive dashboards and natural language queries through Copilot.
  • Report proliferation reduction of 55% by consolidating redundant reports into governed, parameterized dashboards that serve multiple audiences. Fewer reports mean lower maintenance overhead and consistent data across the organization.
  • User satisfaction scores above 4.3 out of 5 in quarterly surveys when organizations follow structured onboarding, provide ongoing training, and maintain a responsive support model through their Center of Excellence.
  • **Compliance audit preparation time cut by 50%** through automated lineage documentation, row-level security enforcement, and centralized access logging in regulated industries. Auditors receive consistent, verifiable evidence without manual data gathering.
  • Capacity utilization optimization saving 20-35% on Premium or Fabric licensing by right-sizing workspaces, implementing query reduction techniques, and scheduling refreshes during off-peak hours based on actual usage telemetry.

Ready to build a Power BI environment that delivers measurable, sustained business value? Our consultants bring 25 years of enterprise analytics expertise to every engagement. Contact our team for a complimentary assessment and a roadmap designed for your organization.

Frequently Asked Questions

What is the difference between Data Factory in Fabric and Azure Data Factory?

Data Factory in Fabric is a simplified, SaaS version of Azure Data Factory integrated into the Fabric platform. It shares the same pipeline design interface but uses Fabric connections instead of linked services, stores output in OneLake instead of ADLS, and benefits from unified Fabric governance and billing. Azure Data Factory remains available as a standalone Azure service for organizations not yet on Fabric or needing specific ADF features not yet in Fabric.

When should I use Dataflows Gen2 vs Data Pipelines?

Use Dataflows Gen2 when: business analysts need to prepare data without coding, transformations are straightforward (filter, merge, calculate), and the output goes to a single Lakehouse or Warehouse table. Use Data Pipelines when: you need orchestration (coordinate multiple steps), require error handling with retries and branching, need to call notebooks or stored procedures, or are building complex multi-step ETL processes.

Can Data Factory in Fabric connect to on-premises data sources?

Yes, through the on-premises data gateway. Install the gateway on a Windows server with access to your on-premises databases (SQL Server, Oracle, SAP, file shares), then configure the connection in Fabric. The gateway acts as a secure bridge between your on-premises network and Fabric cloud. For enterprise deployments, configure gateway clustering for high availability with 2-3 gateway servers.

Data FactoryMicrosoft FabricETLdata pipelinesDataflows Gen2data engineering

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.