Data Quality in Microsoft Fabric
Microsoft Fabric
Microsoft Fabric8 min read

Data Quality in Microsoft Fabric

Ensure data reliability in Microsoft Fabric with quality rules, validation checks, profiling, and automated monitoring across your lakehouse and warehouse.

By Administrator

Data quality is the silent killer of analytics programs. When executives discover that the revenue number in their dashboard does not match the number in the finance system, trust in the entire analytics platform evaporates—and rebuilding that trust takes months. In Microsoft Fabric, data quality must be engineered into every layer of the data pipeline, from ingestion through transformation to reporting. This guide covers practical strategies for implementing data quality in Fabric using notebooks, dataflows, pipelines, and monitoring tools.

The Six Dimensions of Data Quality

Every data quality program should measure and track these six dimensions:

| Dimension | Definition | Example Failure | Business Impact | |---|---|---|---| | Accuracy | Data correctly represents real-world values | Customer revenue shows $1M but actual is $100K (decimal error) | Incorrect business decisions | | Completeness | Required fields are populated | 30% of customer records missing email addresses | Incomplete analysis, failed campaigns | | Consistency | Same data appears the same across systems | "New York" vs "NY" vs "new york" in different tables | Incorrect aggregations | | Timeliness | Data is current and refreshed as expected | Sales dashboard shows data from 3 days ago, not yesterday | Decisions based on stale information | | Uniqueness | No unexpected duplicates exist | Same order appears twice, doubling reported revenue | Inflated metrics | | Validity | Data conforms to expected formats and ranges | Date field contains "13/32/2024" or age field shows -5 | Processing errors, report failures |

Implementing Quality Checks in Notebooks

Fabric notebooks with PySpark provide the most flexible data quality validation framework:

Schema Validation: After loading data, verify that expected columns exist with correct data types. Check for unexpected new columns (schema drift) and missing expected columns (breaking changes). Fail the pipeline if critical schema changes are detected.

Null Checks: For each column with a business requirement for completeness, calculate the null percentage. Set thresholds: critical columns (customer ID, order date, amount) must be 100% complete. Important columns (email, phone) should be above 90%. Log null percentages to a quality metrics table for trending.

Range Validation: Verify that numeric values fall within expected ranges. Order amounts should be positive and below a reasonable maximum. Dates should fall within expected windows. Percentages should be between 0 and 100. Flag outliers for review rather than silently deleting them.

Referential Integrity: Verify that foreign keys in fact tables have matching records in dimension tables. Orphaned foreign keys indicate data loading order issues or source system problems. Calculate the orphan rate and alert if it exceeds threshold.

Duplicate Detection: Check for duplicate primary keys or business keys. In slowly changing dimension scenarios, verify that only one current record exists per business key. Deduplicate or flag duplicates based on business rules (keep latest, keep first, merge).

Quality Checks in Dataflows Gen2

Dataflow Gen2 provides a visual, low-code approach to data quality suitable for business-analyst-maintained transformations:

Remove Errors Step: Apply error removal on columns where Power Query detects type conversion failures. Review removed errors periodically to ensure they are truly invalid rather than legitimate data that fails type inference.

Conditional Column Validation: Create conditional columns that flag rows failing business rules. For example, add a "Quality Flag" column that marks rows where OrderDate is in the future, Amount is negative, or CustomerID is null.

Reference Table Lookups: Merge with reference tables to validate categorical values. Join customer records against a valid country code table—unmatched rows indicate invalid country values.

Automated Quality Pipelines

Build quality validation directly into your Data Factory pipelines:

Quality Gate Pattern: After each transformation step, run a quality validation notebook. If critical quality metrics fail, the pipeline stops before loading bad data into the gold layer. If non-critical metrics fail, the pipeline continues but logs warnings for review.

Quality Metrics Table: Create a dedicated quality metrics table in your Lakehouse that logs the results of every quality check: timestamp, table name, check name, result (pass/fail), metric value, threshold, and details. This table becomes the data source for your quality monitoring dashboard.

Alerting on Failures: Configure Data Activator or Power Automate to send alerts when quality checks fail. Critical failures (broken referential integrity, missing tables) should page the on-call data engineer. Non-critical failures (completeness below 95%) should send daily summary emails.

Data Freshness Monitoring

Stale data is a quality failure that users notice immediately:

Last Refresh Tracking: For each dataset, track the timestamp of the last successful refresh. Display refresh timestamps prominently in report headers so users know how current their data is.

Freshness SLAs: Define freshness Service Level Agreements for each dataset. Executive dashboards might require data from the previous business day by 7 AM. Operational dashboards might require hourly freshness. Monitor against SLAs and alert on violations.

Dependency Chain Monitoring: Track the full dependency chain. If a source system feed is 4 hours late, every downstream dataset is also late. Monitor source system delivery times alongside Fabric processing times to identify the root cause of freshness issues.

Quality Dashboards

Build a dedicated Power BI report that tracks quality metrics:

Executive Summary Page: Overall quality score (weighted average across dimensions), trend over time, count of active quality issues, and top 5 tables with quality problems.

Detailed Quality Page: Per-table quality metrics, drill-through to specific quality check results, historical trend for each metric, and comparison against SLA thresholds.

Freshness Page: Last refresh time for every dataset, freshness SLA compliance, refresh duration trends, and failed refresh history.

Best Practices

  • Define quality rules collaboratively with business users—they know what "correct" data looks like
  • Fix quality issues at the source whenever possible rather than patching in the analytics layer
  • Document known data limitations prominently (in report subtitles or a dedicated data quality page)
  • Review quality metrics weekly in a standing meeting with data stewards from each business domain
  • Treat data quality as a continuous process, not a one-time project

Related Resources

Frequently Asked Questions

Does Fabric have built-in data quality tools?

Fabric provides data quality capabilities through notebooks, dataflows, and Data Activator for monitoring. Additional governance features are available through Purview integration for enterprise data catalog and quality tracking.

How do I monitor data freshness in Fabric?

Use Data Activator to create triggers on data timestamps, set up monitoring dashboards tracking last refresh times, and configure alerts when data exceeds expected staleness thresholds.

Microsoft FabricData QualityGovernanceMonitoring

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.