Data Factory Pipelines in Fabric
Microsoft Fabric
Microsoft Fabric11 min read

Data Factory Pipelines in Fabric

Orchestrate complex data workflows with Microsoft Fabric Data Factory pipelines. ETL patterns, scheduling, monitoring, and error handling best practices.

By Administrator

Data Factory in Microsoft Fabric provides enterprise-grade data orchestration for building, scheduling, and monitoring complex data workflows. It replaces the need for standalone Azure Data Factory in many scenarios by integrating pipeline orchestration directly into the Fabric platform with native OneLake connectivity, capacity-based billing, and unified monitoring. For organizations building data platforms on Fabric, Data Factory pipelines are the glue that coordinates data movement, transformation, and loading across lakehouses, warehouses, and external sources.

Pipeline Architecture

A Data Factory pipeline is a container of activities executed in a defined sequence with control flow logic. Pipelines support three types of flow:

Sequential Execution: Activities run one after another. Activity B starts only after Activity A succeeds. This is the default pattern for dependent operations like: extract data → transform data → load to warehouse.

Parallel Execution: Independent activities run simultaneously. Loading data from five different source systems can happen in parallel, reducing total pipeline duration from the sum of individual load times to the duration of the slowest single load.

Conditional Branching: Activities execute based on the success, failure, or completion status of previous activities. If the data extraction succeeds, proceed to transformation. If it fails, send an alert notification and skip downstream activities.

Core Activities

Copy Activity The Copy activity moves data between 100+ supported sources and sinks. It handles schema mapping, data type conversion, and can process terabytes of data efficiently.

Common Patterns: SQL Server to Lakehouse (initial load), REST API to Lakehouse (API ingestion), file storage to OneLake (file-based ingestion), Lakehouse to Warehouse (promotion from raw to curated).

Performance Tuning: Configure parallel copy degree (number of concurrent data movement threads), set appropriate batch sizes for database sources, enable staging through OneLake for cross-region copies, and use column mapping to select only required columns.

Notebook Activity The Notebook activity executes a Fabric Spark notebook as a pipeline step. This is the primary mechanism for running PySpark transformations within an orchestrated workflow.

Parameterization: Pass pipeline parameters to notebooks as cell parameters. The notebook receives values for date ranges, file paths, processing modes, and configuration settings. This makes notebooks reusable across different pipeline contexts.

Output Capture: Notebooks can return values to the pipeline using the mssparkutils.notebook.exit(value) function. Subsequent pipeline activities can reference these output values for conditional logic or parameter passing.

Dataflow Gen2 Activity Execute Power Query-based transformations within a pipeline. Dataflow Gen2 provides a visual, low-code transformation experience that complements code-heavy notebook transformations.

Use Cases: Data cleansing that business analysts can maintain (column mapping, type conversion, filtering), simple aggregations, and transformations that do not require Spark's distributed computing power.

Stored Procedure Activity Call stored procedures in Fabric Warehouse or external SQL databases. This enables leveraging existing SQL-based transformation logic without rewriting in PySpark.

Web Activity Call external REST APIs, trigger Azure Functions, or interact with third-party services. Useful for notifying external systems, triggering downstream processes, or fetching metadata.

Advanced Pipeline Patterns

ForEach Loop Iterate over a collection of items and execute a set of activities for each item. Common applications include processing all files in a folder, loading data from a list of source tables, or running transformations for each business unit.

Configuration: Define the items collection (an array expression), set the batch count for parallel execution within the loop (default 20, max 50), and place activities inside the loop body.

Lookup Activity Query a source and return a result set for use in subsequent activities. Paired with ForEach, Lookup enables metadata-driven pipelines: query a configuration table listing all source tables, then loop through each table executing a Copy activity with parameters from the Lookup result.

Until Loop Repeat activities until a condition is met. Useful for polling scenarios: check if a source file has arrived, wait 5 minutes if not, check again, repeat until the file appears or timeout is reached.

Error Handling and Monitoring

Robust error handling separates production-grade pipelines from fragile prototypes:

Retry Policies: Configure retry count and interval on individual activities. Transient failures (network timeouts, temporary API unavailability) resolve automatically with retries. Set 3 retries with 30-second intervals as a starting default.

Failure Paths: Connect activities with "On Failure" dependencies to execute error-handling activities when upstream activities fail. Common failure handlers include sending email/Teams notifications, logging errors to a monitoring table, and cleaning up partial outputs.

Timeout Configuration: Set activity-level timeouts to prevent runaway processes. A notebook that normally completes in 10 minutes should have a 30-minute timeout—long enough for occasional slow runs, short enough to catch infinite loops.

Monitoring: All pipeline runs appear in the Fabric Monitoring Hub with status, duration, and error details. Configure Data Activator alerts for pipeline failure events in production. Build a monitoring dashboard that tracks pipeline success rates, average durations, and error patterns over time.

Scheduling and Triggers

Scheduled Triggers: Run pipelines on a recurring schedule (hourly, daily, weekly). Define the start time, recurrence pattern, and end time. Multiple schedules can trigger the same pipeline.

Event-Based Triggers: Trigger pipelines when events occur—a new file arrives in OneLake, a table is updated, or an external webhook is received. Event triggers enable near-real-time data processing without polling.

Manual Triggers: Execute pipelines on demand through the Fabric portal, REST API, or from other pipelines (pipeline chaining).

Related Resources

Frequently Asked Questions

Is Data Factory in Fabric the same as Azure Data Factory?

They share similar concepts and interface but have differences. Fabric Data Factory is integrated with the Fabric platform, uses capacity billing, and has native OneLake integration. ADF remains a separate Azure service.

Can I migrate ADF pipelines to Fabric?

Some migration is possible by exporting and importing pipeline definitions. However, connections and Fabric-specific features may require reconfiguration. Microsoft provides migration guidance and tools.

Microsoft FabricData FactoryETLPipelines

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.