What is the difference between Data Factory in Fabric and Azure Data Factory?

Data Factory in Fabric is a simplified, SaaS version of Azure Data Factory integrated into the Fabric platform. It shares the same pipeline design interface but uses Fabric connections instead of linked services, stores output in OneLake instead of ADLS, and benefits from unified Fabric governance and billing. Azure Data Factory remains available as a standalone Azure service for organizations not yet on Fabric or needing specific ADF features not yet in Fabric.

When should I use Dataflows Gen2 vs Data Pipelines?

Use Dataflows Gen2 when: business analysts need to prepare data without coding, transformations are straightforward (filter, merge, calculate), and the output goes to a single Lakehouse or Warehouse table. Use Data Pipelines when: you need orchestration (coordinate multiple steps), require error handling with retries and branching, need to call notebooks or stored procedures, or are building complex multi-step ETL processes.

Can Data Factory in Fabric connect to on-premises data sources?

Yes, through the on-premises data gateway. Install the gateway on a Windows server with access to your on-premises databases (SQL Server, Oracle, SAP, file shares), then configure the connection in Fabric. The gateway acts as a secure bridge between your on-premises network and Fabric cloud. For enterprise deployments, configure gateway clustering for high availability with 2-3 gateway servers.

How is a Power BI engagement priced for enterprise workloads?

Pricing is modeled against three variables: the complexity of the semantic layer, the volume and velocity of source data, and the governance footprint required after go-live. A scoped implementation typically runs as a fixed-fee discovery sprint followed by a time-and-materials build, because dataset refresh patterns and row-level security rules almost always evolve once real user personas review early drafts. Licensing is separated from consulting fees and mapped against Power BI Pro, Premium Per User, and Fabric capacity SKUs so finance teams can plan capacity uplift independently of delivery cost. A typical mid-market rollout lands between 120 and 480 consulting hours across data modeling, DAX optimization, report design, and deployment pipelines. Fabric workloads add capacity sizing sessions to avoid over-provisioning F-SKUs on day one.

How long does a production-ready Power BI rollout take?

A single subject-area workspace with a conformed star schema, deployment pipeline, and row-level security ships in roughly six to eight weeks once source access is granted. Multi-domain rollouts that span finance, operations, and customer analytics typically run three to five months because the semantic model has to reconcile calendar, product, and organizational hierarchies that rarely align across source systems. Fabric lakehouse projects add four to six weeks for medallion design, Direct Lake tuning, and OneLake shortcut setup. Timelines compress when an authoritative data dictionary already exists and lengthen when stakeholders are discovering definitions (for example, what counts as "active customer") for the first time. Governance, training, and center-of-excellence enablement are run in parallel rather than sequentially.

What delivery methodology do you use for Power BI and Fabric projects?

Delivery runs on a five-stage pattern: Discover, Model, Visualize, Operationalize, and Enable. Discover captures questions stakeholders actually want answered, not just source tables. Model builds a Kimball-style star schema (or a medallion lakehouse in Fabric) with conformed dimensions and documented grains. Visualize produces reports against a shared theme file, applies accessibility tokens, and uses bookmarks and field parameters instead of bespoke page duplication. Operationalize wires up deployment pipelines, dataset refresh monitoring, and Azure DevOps or GitHub source control via TMDL. Enable delivers hands-on training for citizen developers and a lightweight center-of-excellence charter. Each stage ends with a demo and a written acceptance checklist, so scope creep is visible before it becomes rework.

How do you secure sensitive data inside Power BI reports?

Security is layered rather than relying on a single control. Row-level security and object-level security are enforced inside the semantic model using DAX filter expressions driven by Entra ID group membership, so filters survive refresh and cannot be bypassed by report-level tricks. Sensitivity labels from Microsoft Purview are applied at dataset and report scope and follow exports into Excel and PDF. Workspace roles follow a least-privilege pattern (Admin, Member, Contributor, Viewer) and are granted to Entra ID groups, never individuals. Private endpoints and VNet data gateways keep on-premises sources off the public internet, and tenant-level settings restrict publish-to-web, external sharing, and guest access. Every production dataset is audited quarterly against a written RLS test plan.

Can existing Excel and Tableau reports be migrated into Power BI?

Yes, but a migration is an opportunity to redesign rather than a literal port. Excel financial models usually translate cleanly into a semantic model: named ranges become dimensions, pivot tables become matrix visuals, and SUMIFS chains become well-formed DAX measures with variables. Tableau workbooks require more interpretation because Tableau extracts and Power BI datasets have different refresh and relationship semantics; calculated fields are rewritten as DAX, LOD expressions become CALCULATE patterns, and dashboard actions are rebuilt using bookmarks and drillthrough. A discovery audit catalogs every report, ranks them by business value, and retires the long tail of duplicates before migration starts. This typically reduces the report estate by 30–50 percent.

How do you optimize DAX performance on large semantic models?

DAX performance starts with the model, not the measure. Star schemas with integer surrogate keys, single-direction relationships, and calculated columns materialized in Power Query outperform ambiguous snowflakes every time. Measures are written using variables (VAR/RETURN) to avoid repeated context transitions, and expensive patterns like FILTER over entire tables are replaced with KEEPFILTERS and Boolean filter arguments. Aggregation tables and composite models offload detail-level queries from imported caches to Direct Lake or DirectQuery sources. Every measure that appears in a production report is profiled with DAX Studio and VertiPaq Analyzer; queries above a 2-second threshold on representative hardware are rewritten before release. Fabric capacity metrics are reviewed weekly to catch runaway refreshes and interactive CPU spikes.

Do you support Microsoft Fabric, OneLake, and Direct Lake workloads?

Fabric is now the default landing zone for new analytics workloads unless a client has a specific reason to stay on legacy Power BI Premium capacity. Engagements cover medallion architecture design in lakehouses, data engineering pipelines built in Fabric notebooks or Dataflows Gen2, shortcut-based integration with Azure Data Lake Storage and Amazon S3 through OneLake, and Direct Lake semantic models that skip import refresh entirely. Capacity sizing is based on measured workload patterns rather than vendor rules of thumb, and F-SKU scaling is automated through Azure Logic Apps or the Fabric REST API so off-hours costs stay predictable. Copilot in Fabric is configured with tenant-level data boundaries and audit logging before it is enabled for end users.

What training and enablement do end users receive after go-live?

Training is tiered by persona. Report consumers get a 45-minute guided tour covering filters, bookmarks, subscriptions, and mobile access. Analysts get a three-session workshop on self-service semantic model extensions, composite models, and publishing to workspaces that follow the governance pattern. Data modelers receive a two-day immersion on star schema design, DAX fundamentals, and deployment pipeline usage. Every session is recorded, indexed, and paired with a sandbox workspace seeded with representative sample data. A written center-of-excellence charter defines who owns certified datasets, who can promote a report to production, and how to request enhancements. Office hours run for 30 days post-launch so questions do not pile up into a formal change request.

How are Power BI deployments managed across dev, test, and production?

Every production workspace is paired with matching development and test workspaces wired through Power BI deployment pipelines. Datasets are source-controlled as TMDL files in Azure DevOps or GitHub so pull requests can be reviewed by a second modeler before merge. Environment-specific parameters (connection strings, sensitivity label rules, capacity assignments) are swapped at deployment time using parameter rules rather than manual edits. Fabric deployment pipelines handle lakehouses, notebooks, and data pipelines with the same promotion pattern. Refresh schedules, gateway assignments, and alerting are applied through the Power BI REST API so they survive redeployment. A written change-management checklist covers dataset certification, dependency impact analysis, and rollback procedure for every promotion to production.

Which industries and data sources do you support most often?

Heaviest experience sits in healthcare, financial services, energy, manufacturing, and public sector, because each of those verticals pushes a different dimension of Power BI: HIPAA-bound PHI handling, transaction-grain reconciliation, time-series sensor data, cost-center-driven manufacturing variance, and FedRAMP-aligned government reporting. Source systems frequently include Microsoft Dynamics 365, SAP ECC and S/4HANA, Salesforce, Workday, Epic and Cerner (Oracle Health), Infor, and a long tail of legacy ODBC databases connected through on-premises data gateways. Fabric engagements add Azure Data Lake Storage, Snowflake, Databricks, BigQuery, and Amazon S3 through OneLake shortcuts. Regardless of source, the modeling discipline is identical: conformed dimensions, documented grains, and a semantic layer that hides the join complexity from report authors.

Data Factory in Microsoft Fabric: Complete Pipeline

Data Factory in Microsoft Fabric provides visual data integration and orchestration capabilities for building ETL/ELT pipelines. With 1,000 monthly searches for "data factory," this workload is central to enterprise data engineering in Fabric.

What Is Data Factory in Fabric?

Data Factory in Fabric brings the pipeline and dataflow capabilities of Azure Data Factory into the unified Fabric experience. It provides:

Data Pipelines: Orchestrate data movement and transformation with a visual designer
Dataflows Gen2: Self-service data preparation with Power Query (no-code)
200+ Connectors: Connect to cloud services, databases, files, and APIs
Scheduling: Automate data loads on time-based or event-based triggers
Monitoring: Track pipeline runs, errors, and performance

Data Pipelines vs Dataflows Gen2

Feature	Data Pipelines	Dataflows Gen2
Interface	Visual pipeline designer	Power Query editor
Skill level	Data engineer	Business analyst
Scale	Enterprise ETL/ELT	Self-service data prep
Coding	No-code + expressions	No-code + M language
Output	Any Fabric destination	Lakehouse or Warehouse
Scheduling	Time + event triggers	Time triggers
Error handling	Advanced (retry, branching)	Basic

Building Your First Pipeline

Step 1: Create a Pipeline 1. Open a Fabric workspace 2. Click New → Data Pipeline 3. Name your pipeline (e.g., "Daily Sales Load")

Step 2: Add Activities Drag activities from the toolbox onto the canvas: - Copy Data: Move data between sources and destinations - Dataflow: Run a Dataflows Gen2 transformation - Notebook: Execute a Spark notebook - Stored Procedure: Run SQL in a warehouse - ForEach: Loop over a set of items - If Condition: Branch based on expressions - Web: Call REST APIs - Wait: Pause execution

Step 3: Configure Copy Data 1. Set Source: connection, table/query, authentication 2. Set Destination: Lakehouse table, warehouse table, or files 3. Configure mapping: column mapping, data types 4. Set performance: parallel copies, staging

Step 4: Add Scheduling 1. Click Schedule on the pipeline toolbar 2. Set frequency: hourly, daily, weekly 3. Set time zone and start time 4. Enable/disable as needed

Step 5: Monitor View pipeline runs in the Monitoring Hub: - Run status (succeeded, failed, in progress) - Duration and data volumes - Error messages and retry counts - Activity-level details

Dataflows Gen2: Self-Service ETL

Dataflows Gen2 use the Power Query interface familiar to Excel and Power BI users:

Create New → Dataflow Gen2
Connect to data source
Transform data (filter, merge, pivot, calculate)
Set destination (Lakehouse or Warehouse table)
Schedule refresh

Key advantage: Business analysts can build data pipelines without learning Spark or SQL.

See our Power Query guide for transformation techniques.

Migration from Azure Data Factory

If you're currently using Azure Data Factory (ADF):

What Migrates Easily - Copy Data activities (same interface) - Pipeline orchestration patterns - Most connectors are available - Scheduling and triggers

What Changes - Linked Services → Connections (simplified) - Integration Runtimes → Managed within Fabric - Mapping Data Flows → Dataflows Gen2 (different engine) - Storage → OneLake replaces ADLS

Migration Approach 1. Assess current ADF pipelines and prioritize by business value 2. Recreate high-priority pipelines in Fabric Data Factory 3. Test data quality and performance 4. Transition scheduling and decommission ADF pipelines 5. Retain ADF for unsupported scenarios (some connectors)

Best Practices

Use Dataflows Gen2 for simple transformations — Don't over-engineer with pipelines when Power Query suffices
Use pipelines for orchestration — Coordinate notebooks, stored procedures, and dataflows
Implement medallion architecture — Bronze (raw) → Silver (cleaned) → Gold (business-ready)
Monitor actively — Set up alerts for pipeline failures
Version control — Use Fabric Git integration for pipeline version history

Our Microsoft Fabric consulting team specializes in data pipeline design and migration from Azure Data Factory. Contact us for a migration assessment.

## Architecture Considerations

Selecting the right architecture pattern for your implementation determines long-term scalability, performance, and total cost of ownership. These architectural decisions should be made early and revisited quarterly as your environment evolves.

Data Model Design: Star schema is the foundation of every performant Power BI implementation. Separate your fact tables (transactions, events, measurements) from dimension tables (customers, products, dates, geography) and connect them through single-direction one-to-many relationships. Organizations that skip proper modeling and use flat, denormalized tables consistently report 3-5x slower query performance and significantly higher capacity costs.

**Storage Mode Selection**: Choose between Import, DirectQuery, Direct Lake, and Composite models based on your data freshness requirements and volume. Import mode delivers the fastest query performance but requires scheduled refreshes. DirectQuery provides real-time data but shifts compute to the source system. Direct Lake, available with Microsoft Fabric, combines the performance of Import with the freshness of DirectQuery by reading Delta tables directly from OneLake.

Workspace Strategy: Organize workspaces by business function (Sales Analytics, Finance Reporting, Operations Dashboard) rather than by technical role. Assign each workspace to the appropriate capacity tier based on usage patterns. Implement deployment pipelines for workspaces that support Dev/Test/Prod promotion to prevent untested changes from reaching business users.

**Gateway Architecture**: For hybrid environments connecting to on-premises data sources, deploy gateways in a clustered configuration across at least two servers for high availability. Size gateway servers based on concurrent refresh and DirectQuery load. Monitor gateway performance through the Power BI management tools and scale proactively when CPU utilization consistently exceeds 60%.

Security and Compliance Framework

Enterprise Power BI deployments in regulated industries must satisfy stringent security and compliance requirements. This framework, refined through implementations in healthcare (HIPAA), financial services (SOC 2, SEC), and government (FedRAMP), provides the controls necessary to pass audits and protect sensitive data.

Authentication and Authorization: Enforce Azure AD Conditional Access policies for Power BI access. Require multi-factor authentication for all users, restrict access from unmanaged devices, and block access from untrusted locations. Layer workspace-level access controls with item-level sharing permissions to implement least-privilege access across your entire Power BI environment.

Data Protection: Implement Microsoft Purview sensitivity labels on Power BI semantic models and reports containing confidential data. Labels enforce encryption, restrict export capabilities, and add visual markings that persist when content is exported or shared. Configure Data Loss Prevention policies to detect and prevent sharing of reports containing sensitive data patterns such as Social Security numbers, credit card numbers, or protected health information.

**Audit and Monitoring**: Enable unified audit logging in the Microsoft 365 compliance center to capture every Power BI action including report views, data exports, sharing events, and administrative changes. Export audit logs to your SIEM solution for correlation with other security events. Configure alerts for high-risk activities such as bulk data exports, sharing with external users, or privilege escalation. Our managed analytics services include continuous security monitoring as a standard capability.

Data Residency: For organizations with data sovereignty requirements, configure Power BI tenant settings to restrict data storage to specific geographic regions. Verify that your Premium or Fabric capacity is provisioned in the correct region and that cross-region data flows comply with your regulatory obligations. ## Enterprise Best Practices

Every enterprise Power BI deployment we have managed over the past 25 years reinforces the same truth: technology without governance and adoption strategy delivers a fraction of its potential value. These practices, refined across implementations in healthcare and government, are the ones that separate successful analytics programs from expensive shelf-ware.

Standardize Naming Conventions Across All Models: Every table, column, measure, and calculated column should follow a consistent naming convention documented in your style guide. Use business-friendly names (Total Revenue, not SUM_REV_AMT). Standardized naming improves Copilot accuracy by 40% and makes reports self-documenting for new team members joining the organization.
Implement Incremental Refresh for Large Datasets: For datasets exceeding 10 million rows, incremental refresh reduces processing time by 80-95% by only refreshing new and changed data. Configure partition boundaries based on your data update patterns and test thoroughly before deploying to production. This optimization alone can reduce your capacity consumption by half.
**Design Mobile-First Dashboards**: Over 35% of enterprise Power BI consumption now occurs on mobile devices. Design dedicated mobile layouts for every critical dashboard, prioritize the top 3-5 KPIs for small screens, and test on actual devices before publishing. Our dashboard development team creates responsive layouts optimized for every screen size used in your organization.
Establish Data Quality Gates at Every Pipeline Stage: Implement automated data quality checks that validate row counts, check for null values in key fields, verify referential integrity, and flag statistical outliers. Data quality gates catch issues before they reach executive dashboards and erode trust in the entire analytics platform.
Document Everything in a Living Data Dictionary: Maintain a data dictionary that defines every measure, its business context, its calculation logic, and its data source. Update the dictionary with every model change. Teams with comprehensive documentation onboard new analysts 60% faster and reduce measure duplication by 75% because developers can find existing calculations instead of rebuilding them.
Schedule Regular Architecture Reviews: Conduct quarterly reviews of your Power BI architecture with stakeholders from IT, business units, and leadership. Assess whether the current setup meets evolving requirements, identify performance bottlenecks, and plan capacity upgrades before they become urgent.

ROI and Success Metrics

Tracking the right metrics ensures your Power BI investment delivers sustained business value rather than becoming another underutilized technology platform. Enterprises working with our analytics team measure success across these dimensions:

Time-to-insight reduction of 65-80% compared to legacy reporting workflows. Decisions that previously required 2-week report development cycles now happen in hours with interactive dashboards and natural language queries through Copilot.
Report proliferation reduction of 55% by consolidating redundant reports into governed, parameterized dashboards that serve multiple audiences. Fewer reports mean lower maintenance overhead and consistent data across the organization.
User satisfaction scores above 4.3 out of 5 in quarterly surveys when organizations follow structured onboarding, provide ongoing training, and maintain a responsive support model through their Center of Excellence.
**Compliance audit preparation time cut by 50%** through automated lineage documentation, row-level security enforcement, and centralized access logging in regulated industries. Auditors receive consistent, verifiable evidence without manual data gathering.
Capacity utilization optimization saving 20-35% on Premium or Fabric licensing by right-sizing workspaces, implementing query reduction techniques, and scheduling refreshes during off-peak hours based on actual usage telemetry.

Ready to build a Power BI environment that delivers measurable, sustained business value? Our consultants bring 25 years of enterprise analytics expertise to every engagement. Contact our team for a complimentary assessment and a roadmap designed for your organization.

Data Factory in Microsoft Fabric: Complete Pipeline Guide

What Is Data Factory in Fabric?

Data Pipelines vs Dataflows Gen2

Building Your First Pipeline

Step 1: Create a Pipeline 1. Open a Fabric workspace 2. Click New → Data Pipeline 3. Name your pipeline (e.g., "Daily Sales Load")

Step 3: Configure Copy Data 1. Set Source: connection, table/query, authentication 2. Set Destination: Lakehouse table, warehouse table, or files 3. Configure mapping: column mapping, data types 4. Set performance: parallel copies, staging

Step 4: Add Scheduling 1. Click Schedule on the pipeline toolbar 2. Set frequency: hourly, daily, weekly 3. Set time zone and start time 4. Enable/disable as needed

Step 5: Monitor View pipeline runs in the Monitoring Hub: - Run status (succeeded, failed, in progress) - Duration and data volumes - Error messages and retry counts - Activity-level details

Dataflows Gen2: Self-Service ETL

Migration from Azure Data Factory

What Migrates Easily - Copy Data activities (same interface) - Pipeline orchestration patterns - Most connectors are available - Scheduling and triggers

What Changes - Linked Services → Connections (simplified) - Integration Runtimes → Managed within Fabric - Mapping Data Flows → Dataflows Gen2 (different engine) - Storage → OneLake replaces ADLS

Migration Approach 1. Assess current ADF pipelines and prioritize by business value 2. Recreate high-priority pipelines in Fabric Data Factory 3. Test data quality and performance 4. Transition scheduling and decommission ADF pipelines 5. Retain ADF for unsupported scenarios (some connectors)

Best Practices

Security and Compliance Framework

ROI and Success Metrics

Frequently Asked Questions

What is the difference between Data Factory in Fabric and Azure Data Factory?

When should I use Dataflows Gen2 vs Data Pipelines?

Can Data Factory in Fabric connect to on-premises data sources?

Related Articles

Building a Modern Data Lakehouse with Microsoft Fabric

Getting Started with Fabric Notebooks and PySpark

Building ML Models in Microsoft Fabric

Related Services

Microsoft Fabric Consulting

Data Analytics

Architecture Consulting

Industry Solutions

Need Help With Power BI?

Ready to Transform Your Data Strategy?