
Data Quality in Microsoft Fabric
Ensure data reliability in Microsoft Fabric with quality rules, validation checks, profiling, and automated monitoring across your lakehouse and warehouse.
Data quality is the silent killer of analytics programs. When executives discover that the revenue number in their dashboard does not match the number in the finance system, trust in the entire analytics platform evaporates—and rebuilding that trust takes months. In Microsoft Fabric, data quality must be engineered into every layer of the data pipeline, from ingestion through transformation to reporting. This guide covers the frameworks, tools, and implementation patterns for building data quality into your Fabric deployment from day one. Our Microsoft Fabric consulting team implements data quality frameworks for enterprises where inaccurate data has regulatory, financial, or operational consequences.
I have been building enterprise data platforms for over 25 years, and I have seen more analytics programs fail due to data quality issues than any other single cause—including budget cuts, technology problems, and organizational resistance. The pattern is always the same: the platform launches with excitement, users discover discrepancies between the dashboard and their trusted spreadsheets, word spreads that "the data is wrong," and adoption collapses. Preventing this requires a systematic approach to data quality that goes far beyond basic null checks.
The Six Dimensions of Data Quality
Data quality is not a binary state. It is measured across six dimensions, each requiring specific validation strategies:
| Dimension | Definition | Example Failure | Fabric Validation Tool |
|---|---|---|---|
| Accuracy | Data correctly reflects real-world values | Customer revenue shows $1M when actual is $1.2M | Row-level validation rules in notebooks |
| Completeness | No missing values where required | 15% of orders missing region data | NOT NULL constraints, completeness checks |
| Consistency | Same data matches across systems | Revenue in Fabric differs from ERP by $50K | Cross-system reconciliation queries |
| Timeliness | Data is available when needed | Dashboard shows yesterday's data at 2 PM instead of 7 AM | Pipeline SLA monitoring |
| Uniqueness | No unintended duplicate records | Same customer appears 3 times with different IDs | Deduplication logic in Silver layer |
| Validity | Data conforms to defined rules and formats | Email field contains "N/A" instead of valid email | Schema validation, regex checks |
Building a Data Quality Framework in Fabric
Layer 1: Ingestion Validation (Bronze Layer)
The medallion architecture provides a natural framework for progressive data quality improvement. At the Bronze layer, focus on raw data integrity:
| Validation Check | Implementation | Action on Failure |
|---|---|---|
| Schema validation | Compare incoming schema against expected schema | Reject load, alert pipeline owner |
| Row count verification | Compare source row count with ingested count | Alert if delta exceeds threshold |
| File format validation | Verify file type, encoding, delimiter | Reject malformed files |
| Freshness check | Compare file timestamp against expected schedule | Alert if data is stale |
| Duplicate file detection | Hash-based detection of re-delivered files | Skip or flag for review |
| Completeness threshold | Verify critical columns are not >5% null | Flag batch for investigation |
Implement these checks in Data Factory pipelines or Spark notebooks that run as the first step after data lands in OneLake. The key principle at Bronze is: do not block ingestion, but flag quality issues for investigation. Store raw data as-is and attach quality metadata.
Layer 2: Transformation Quality (Silver Layer)
The Silver layer is where data quality rules are actively enforced:
| Quality Rule Category | Examples | Implementation |
|---|---|---|
| Null handling | Replace nulls with business defaults, flag nulls in critical fields | COALESCE() in SQL, fillna() in PySpark |
| Type casting | Enforce correct data types (dates, numbers, strings) | CAST/CONVERT in SQL, schema enforcement in Spark |
| Referential integrity | Every fact record has matching dimension records | LEFT ANTI JOIN to identify orphan records |
| Business rule validation | Revenue >= 0, dates within valid range, status in allowed values | CHECK-style validation queries |
| Deduplication | Remove exact duplicates, merge near-duplicates | ROW_NUMBER() OVER (PARTITION BY key ORDER BY updated_date DESC) |
| Standardization | Consistent date formats, trimmed strings, uppercase codes | UPPER(), TRIM(), FORMAT_DATE() |
Best practice: Create a dedicated quality validation notebook or stored procedure that runs after every Silver layer transformation. Output results to a quality_audit table:
The quality_audit table should capture: rule_name, layer, table_name, check_timestamp, records_checked, records_failed, failure_rate, status (PASS/WARN/FAIL). This creates a historical record of data quality that you can trend over time and use for compliance evidence.
Layer 3: Business Quality (Gold Layer)
Gold layer quality focuses on business accuracy—does the data tell the correct story?
| Business Quality Check | Validation Approach | Frequency |
|---|---|---|
| Cross-system reconciliation | Compare Fabric totals against source system reports | Daily |
| KPI threshold validation | Verify KPIs fall within expected ranges | Every refresh |
| Historical consistency | Compare current period against same period last year (flag anomalies) | Weekly |
| Aggregation integrity | Detail records sum to summary totals | Every refresh |
| Business rule compliance | Industry-specific rules (HIPAA, SOX, etc.) | Daily |
| Trend break detection | Statistical anomaly detection on key metrics | Daily |
Cross-system reconciliation is the most important Gold layer quality check. Build automated reconciliation queries that compare Fabric output against the source of record. For financial data, this means comparing Fabric revenue totals against the ERP general ledger. For healthcare data, this means comparing patient counts against the EMR system. Discrepancies must be investigated and resolved before the data reaches dashboards.
Data Quality Monitoring Dashboard
Build a dedicated data quality monitoring dashboard in Power BI that tracks quality metrics across all layers:
| Dashboard Component | Metric | Visual Type |
|---|---|---|
| Overall quality score | Weighted average across all dimensions | Card with trend sparkline |
| Quality by table | Pass/Warn/Fail status per table | Heat map matrix |
| Quality trend | Daily quality score over 30 days | Line chart |
| Failure details | Specific failed checks with record counts | Detail table with drill-through |
| SLA compliance | Percentage of pipelines completing on time | Gauge visual |
| Reconciliation variance | Source vs. Fabric delta per system | Bar chart |
Pin this dashboard to your Fabric monitoring hub and set up data alerts for critical quality threshold breaches. Configure Power Automate flows to notify data stewards when quality scores drop below acceptable thresholds.
Implementing Data Quality with Fabric-Specific Tools
Great Expectations Integration
Great Expectations is a leading open-source data quality framework that integrates well with Fabric Spark notebooks:
| Great Expectations Feature | Fabric Implementation | Value |
|---|---|---|
| Expectation suites | Define quality rules as code | Version-controlled quality definitions |
| Checkpoints | Automated validation runs | Pipeline-integrated quality gates |
| Data docs | Auto-generated quality documentation | Self-service quality transparency |
| Custom expectations | Business-specific validation rules | Industry-specific quality checks |
Install Great Expectations in your Fabric Spark environment and create expectation suites for each Bronze, Silver, and Gold layer table. Checkpoints run as pipeline steps and produce pass/fail results that gate downstream processing.
SQL-Based Quality Checks in Fabric Warehouse
For organizations with strong SQL skills, implement quality checks directly in the Fabric SQL warehouse:
Create stored procedures that validate data quality rules and insert results into a centralized quality_results table. Schedule these procedures to run after each ETL cycle using Data Factory pipelines.
Fabric Data Activator for Real-Time Quality Alerts
Data Activator (Reflex) provides real-time quality monitoring:
- Monitor quality metric streams in real time
- Trigger alerts when quality scores drop below thresholds
- Automatically pause downstream pipelines when critical quality issues are detected
- Notify data stewards via Teams or email
This is particularly valuable for streaming data scenarios where traditional batch quality checks are insufficient.
Data Quality Governance Structure
Technical tools alone do not solve data quality. You need an organizational structure:
| Role | Responsibility | Accountability |
|---|---|---|
| Data Steward | Define quality rules, investigate failures, approve fixes | Per-domain quality ownership |
| Data Engineer | Implement quality checks, build monitoring pipelines | Technical implementation |
| Data Owner | Approve data definitions, resolve business disputes | Final authority on "correct" values |
| Platform Admin | Monitor infrastructure quality (SLAs, freshness) | Operational quality |
| Quality Champion | Drive quality culture, train users, report metrics | Cross-domain coordination |
Assign data stewards to each business domain (Finance, Sales, HR, Operations). Each steward is responsible for defining quality rules, reviewing quality dashboards, and investigating failures within their domain. This distributed ownership model scales better than a centralized data quality team.
Common Data Quality Mistakes in Fabric
Mistake 1: Checking quality only at ingestion Data quality can degrade at any layer. Transformations introduce bugs, business rule changes are not reflected in validation, and source systems change without notice. Check quality at every layer.
Mistake 2: No reconciliation against source systems If your Fabric output does not reconcile to the source of record, you do not know whether your data is correct. Build reconciliation checks for every critical data domain.
Mistake 3: Ignoring quality for "non-critical" tables Dimension tables that are "just lookups" can have quality issues that cascade into fact table joins, producing wrong numbers. Validate all tables, not just the most visible ones.
Mistake 4: Quality checks that never fail If your quality checks have a 100% pass rate for months, your thresholds are too lenient. Tighten thresholds gradually until checks catch real issues.
Mistake 5: No quality trend tracking A single quality snapshot is useful. Quality trends over time are transformative—they show whether your data platform is improving or degrading. Store every quality check result and build trend dashboards.
Data Quality for Regulated Industries
Organizations in regulated industries have additional quality requirements:
| Industry | Quality Requirement | Fabric Implementation |
|---|---|---|
| Healthcare | HIPAA data integrity | Audit trails on every data modification, reconciliation against EMR |
| Financial Services | SOX compliance | Automated reconciliation, segregation of duties in data access |
| Government | FedRAMP data protection | Encryption validation, access logging, quality documentation |
| Life Sciences | FDA 21 CFR Part 11 | Validation protocols, electronic signatures, audit trails |
For these industries, data quality documentation is not optional—it is a compliance deliverable. Automate quality documentation generation using Power BI Helper or custom scripts that export quality audit results to PDF format for regulatory evidence.
Measuring Data Quality ROI
| Metric | Before Quality Framework | After Quality Framework | Business Impact |
|---|---|---|---|
| Dashboard trust score (user survey) | 3.2/5 | 4.5/5 | Higher adoption and usage |
| Ad-hoc data investigation requests | 25/month | 5/month | 80% reduction in analyst time spent on data issues |
| Data discrepancy incidents | 12/month | 2/month | Fewer executive escalations |
| Report revision cycles | 3 per report | 1 per report | Faster time to insight |
| Regulatory audit findings | 3-5 per audit | 0-1 per audit | Reduced compliance risk |
Getting Started with Data Quality in Fabric
- Week 1-2: Inventory your data sources and identify the 10 most critical tables
- Week 3-4: Define quality rules for each dimension (accuracy, completeness, etc.)
- Week 5-6: Implement Bronze and Silver layer quality checks
- Week 7-8: Build the quality monitoring dashboard and configure alerts
- Month 3: Add Gold layer reconciliation checks and trend tracking
- Ongoing: Review quality metrics monthly, tighten thresholds quarterly
For organizations that need expert data quality implementation, our Fabric consulting team provides data quality assessment, framework implementation, and ongoing monitoring. We also offer data analytics consulting that includes data quality as a core component of every engagement. Contact us to discuss your data quality strategy.
Frequently Asked Questions
Does Fabric have built-in data quality tools?
Fabric provides data quality capabilities through notebooks, dataflows, and Data Activator for monitoring. Additional governance features are available through Purview integration for enterprise data catalog and quality tracking.
How do I monitor data freshness in Fabric?
Use Data Activator to create triggers on data timestamps, set up monitoring dashboards tracking last refresh times, and configure alerts when data exceeds expected staleness thresholds.