
Data Factory in Microsoft Fabric: Complete Pipeline Guide
Master Data Factory in Fabric — data pipelines, Dataflows Gen2, connectors, scheduling, monitoring, and migration from Azure Data Factory.
Data Factory in Microsoft Fabric provides visual data integration and orchestration capabilities for building ETL/ELT pipelines. With 1,000 monthly searches for "data factory," this workload is central to enterprise data engineering in Fabric.
What Is Data Factory in Fabric?
Data Factory in Fabric brings the pipeline and dataflow capabilities of Azure Data Factory into the unified Fabric experience. It provides:
- Data Pipelines: Orchestrate data movement and transformation with a visual designer
- Dataflows Gen2: Self-service data preparation with Power Query (no-code)
- 200+ Connectors: Connect to cloud services, databases, files, and APIs
- Scheduling: Automate data loads on time-based or event-based triggers
- Monitoring: Track pipeline runs, errors, and performance
Data Pipelines vs Dataflows Gen2
| Feature | Data Pipelines | Dataflows Gen2 |
|---|---|---|
| Interface | Visual pipeline designer | Power Query editor |
| Skill level | Data engineer | Business analyst |
| Scale | Enterprise ETL/ELT | Self-service data prep |
| Coding | No-code + expressions | No-code + M language |
| Output | Any Fabric destination | Lakehouse or Warehouse |
| Scheduling | Time + event triggers | Time triggers |
| Error handling | Advanced (retry, branching) | Basic |
Building Your First Pipeline
Step 1: Create a Pipeline 1. Open a Fabric workspace 2. Click New → Data Pipeline 3. Name your pipeline (e.g., "Daily Sales Load")
Step 2: Add Activities Drag activities from the toolbox onto the canvas: - Copy Data: Move data between sources and destinations - Dataflow: Run a Dataflows Gen2 transformation - Notebook: Execute a Spark notebook - Stored Procedure: Run SQL in a warehouse - ForEach: Loop over a set of items - If Condition: Branch based on expressions - Web: Call REST APIs - Wait: Pause execution
Step 3: Configure Copy Data 1. Set Source: connection, table/query, authentication 2. Set Destination: Lakehouse table, warehouse table, or files 3. Configure mapping: column mapping, data types 4. Set performance: parallel copies, staging
Step 4: Add Scheduling 1. Click Schedule on the pipeline toolbar 2. Set frequency: hourly, daily, weekly 3. Set time zone and start time 4. Enable/disable as needed
Step 5: Monitor View pipeline runs in the Monitoring Hub: - Run status (succeeded, failed, in progress) - Duration and data volumes - Error messages and retry counts - Activity-level details
Dataflows Gen2: Self-Service ETL
Dataflows Gen2 use the Power Query interface familiar to Excel and Power BI users:
- Create New → Dataflow Gen2
- Connect to data source
- Transform data (filter, merge, pivot, calculate)
- Set destination (Lakehouse or Warehouse table)
- Schedule refresh
Key advantage: Business analysts can build data pipelines without learning Spark or SQL.
See our Power Query guide for transformation techniques.
Migration from Azure Data Factory
If you're currently using Azure Data Factory (ADF):
What Migrates Easily - Copy Data activities (same interface) - Pipeline orchestration patterns - Most connectors are available - Scheduling and triggers
What Changes - Linked Services → Connections (simplified) - Integration Runtimes → Managed within Fabric - Mapping Data Flows → Dataflows Gen2 (different engine) - Storage → OneLake replaces ADLS
Migration Approach 1. Assess current ADF pipelines and prioritize by business value 2. Recreate high-priority pipelines in Fabric Data Factory 3. Test data quality and performance 4. Transition scheduling and decommission ADF pipelines 5. Retain ADF for unsupported scenarios (some connectors)
Best Practices
- Use Dataflows Gen2 for simple transformations — Don't over-engineer with pipelines when Power Query suffices
- Use pipelines for orchestration — Coordinate notebooks, stored procedures, and dataflows
- Implement medallion architecture — Bronze (raw) → Silver (cleaned) → Gold (business-ready)
- Monitor actively — Set up alerts for pipeline failures
- Version control — Use Fabric Git integration for pipeline version history
Our Microsoft Fabric consulting team specializes in data pipeline design and migration from Azure Data Factory. Contact us for a migration assessment.
Enterprise Implementation Best Practices
Deploying Microsoft Fabric at enterprise scale requires a structured approach that addresses governance, security, and organizational readiness from day one. Organizations that skip the planning phase typically face costly rework within the first 90 days.
Establish a Fabric Center of Excellence (CoE) before provisioning production capacities. The CoE should include a Fabric admin, at least one data engineer, a Power BI developer, and a business stakeholder who understands the reporting requirements. This cross-functional team defines workspace naming conventions, capacity allocation policies, and data classification standards that prevent sprawl as adoption grows.
Implement environment separation from the start. Use dedicated workspaces for development, testing, and production with deployment pipelines automating the promotion process. Every Lakehouse, warehouse, and semantic model should follow a consistent naming convention that includes the business domain, data layer (bronze, silver, gold), and environment identifier. This structure makes governance auditable and reduces the risk of accidental production changes.
Right-size your Fabric capacity based on actual workload profiles, not vendor sizing guides. Run a two-week proof of concept on an F64 capacity with representative data volumes and query patterns. Monitor CU consumption using the Fabric Capacity Metrics app, then adjust the SKU based on measured peak and sustained usage. Over-provisioning wastes budget; under-provisioning creates throttling that frustrates users during critical reporting windows.
Data security must be layered. Configure workspace-level RBAC for broad access control, OneLake data access roles for table-level permissions, and row-level security in semantic models for row-level filtering. Sensitivity labels from Microsoft Purview should be applied to all datasets containing PII, financial data, or protected health information to ensure compliance with HIPAA, SOC 2, and GDPR requirements.
Measuring Success and ROI
Quantifying Microsoft Fabric impact requires tracking metrics across infrastructure cost reduction, operational efficiency, and business value creation.
Infrastructure savings are the most immediately measurable. Compare monthly Azure spend before and after Fabric migration, including compute, storage, and data movement costs across all replaced services. Organizations typically see 30-60% reduction in total analytics infrastructure costs within the first six months, primarily from eliminating redundant storage copies and consolidating multiple service SKUs into a single Fabric capacity.
Operational efficiency gains show up in reduced time-to-insight. Measure the average time from data availability to published report before and after Fabric adoption. Track pipeline failure rates, data freshness SLAs, and the number of manual data preparation steps eliminated by OneLake unified storage. Target a 40-50% reduction in data engineering effort within the first year.
Business value metrics connect Fabric capabilities to revenue and decision-making speed. Track the number of business decisions supported by Fabric-powered analytics per quarter, the time to answer ad-hoc business questions, and user adoption rates across departments. Establish quarterly business reviews where stakeholders quantify decisions that were enabled or accelerated by the platform.
Ready to move from strategy to execution? Our team of certified consultants has delivered 500+ enterprise analytics projects across healthcare, financial services, manufacturing, and government. Whether you need architecture design, hands-on implementation, or ongoing optimization, our Microsoft Fabric implementation services are designed for organizations that demand production-grade results. Contact us today for a free assessment and learn how we can accelerate your analytics transformation.
Frequently Asked Questions
What is the difference between Data Factory in Fabric and Azure Data Factory?
Data Factory in Fabric is a simplified, SaaS version of Azure Data Factory integrated into the Fabric platform. It shares the same pipeline design interface but uses Fabric connections instead of linked services, stores output in OneLake instead of ADLS, and benefits from unified Fabric governance and billing. Azure Data Factory remains available as a standalone Azure service for organizations not yet on Fabric or needing specific ADF features not yet in Fabric.
When should I use Dataflows Gen2 vs Data Pipelines?
Use Dataflows Gen2 when: business analysts need to prepare data without coding, transformations are straightforward (filter, merge, calculate), and the output goes to a single Lakehouse or Warehouse table. Use Data Pipelines when: you need orchestration (coordinate multiple steps), require error handling with retries and branching, need to call notebooks or stored procedures, or are building complex multi-step ETL processes.
Can Data Factory in Fabric connect to on-premises data sources?
Yes, through the on-premises data gateway. Install the gateway on a Windows server with access to your on-premises databases (SQL Server, Oracle, SAP, file shares), then configure the connection in Fabric. The gateway acts as a secure bridge between your on-premises network and Fabric cloud. For enterprise deployments, configure gateway clustering for high availability with 2-3 gateway servers.