Data Quality in Microsoft Fabric
Microsoft Fabric
Microsoft Fabric10 min read

Data Quality in Microsoft Fabric

Ensure data reliability in Microsoft Fabric with quality rules, validation checks, profiling, and automated monitoring across your lakehouse and warehouse.

By Errin O'Connor, Chief AI Architect

Data quality is the silent killer of analytics programs. When executives discover that the revenue number in their dashboard does not match the number in the finance system, trust in the entire analytics platform evaporates—and rebuilding that trust takes months. In Microsoft Fabric, data quality must be engineered into every layer of the data pipeline, from ingestion through transformation to reporting. This guide covers the frameworks, tools, and implementation patterns for building data quality into your Fabric deployment from day one. Our Microsoft Fabric consulting team implements data quality frameworks for enterprises where inaccurate data has regulatory, financial, or operational consequences.

I have been building enterprise data platforms for over 25 years, and I have seen more analytics programs fail due to data quality issues than any other single cause—including budget cuts, technology problems, and organizational resistance. The pattern is always the same: the platform launches with excitement, users discover discrepancies between the dashboard and their trusted spreadsheets, word spreads that "the data is wrong," and adoption collapses. Preventing this requires a systematic approach to data quality that goes far beyond basic null checks.

The Six Dimensions of Data Quality

Data quality is not a binary state. It is measured across six dimensions, each requiring specific validation strategies:

DimensionDefinitionExample FailureFabric Validation Tool
AccuracyData correctly reflects real-world valuesCustomer revenue shows $1M when actual is $1.2MRow-level validation rules in notebooks
CompletenessNo missing values where required15% of orders missing region dataNOT NULL constraints, completeness checks
ConsistencySame data matches across systemsRevenue in Fabric differs from ERP by $50KCross-system reconciliation queries
TimelinessData is available when neededDashboard shows yesterday's data at 2 PM instead of 7 AMPipeline SLA monitoring
UniquenessNo unintended duplicate recordsSame customer appears 3 times with different IDsDeduplication logic in Silver layer
ValidityData conforms to defined rules and formatsEmail field contains "N/A" instead of valid emailSchema validation, regex checks

Building a Data Quality Framework in Fabric

Layer 1: Ingestion Validation (Bronze Layer)

The medallion architecture provides a natural framework for progressive data quality improvement. At the Bronze layer, focus on raw data integrity:

Validation CheckImplementationAction on Failure
Schema validationCompare incoming schema against expected schemaReject load, alert pipeline owner
Row count verificationCompare source row count with ingested countAlert if delta exceeds threshold
File format validationVerify file type, encoding, delimiterReject malformed files
Freshness checkCompare file timestamp against expected scheduleAlert if data is stale
Duplicate file detectionHash-based detection of re-delivered filesSkip or flag for review
Completeness thresholdVerify critical columns are not >5% nullFlag batch for investigation

Implement these checks in Data Factory pipelines or Spark notebooks that run as the first step after data lands in OneLake. The key principle at Bronze is: do not block ingestion, but flag quality issues for investigation. Store raw data as-is and attach quality metadata.

Layer 2: Transformation Quality (Silver Layer)

The Silver layer is where data quality rules are actively enforced:

Quality Rule CategoryExamplesImplementation
Null handlingReplace nulls with business defaults, flag nulls in critical fieldsCOALESCE() in SQL, fillna() in PySpark
Type castingEnforce correct data types (dates, numbers, strings)CAST/CONVERT in SQL, schema enforcement in Spark
Referential integrityEvery fact record has matching dimension recordsLEFT ANTI JOIN to identify orphan records
Business rule validationRevenue >= 0, dates within valid range, status in allowed valuesCHECK-style validation queries
DeduplicationRemove exact duplicates, merge near-duplicatesROW_NUMBER() OVER (PARTITION BY key ORDER BY updated_date DESC)
StandardizationConsistent date formats, trimmed strings, uppercase codesUPPER(), TRIM(), FORMAT_DATE()

Best practice: Create a dedicated quality validation notebook or stored procedure that runs after every Silver layer transformation. Output results to a quality_audit table:

The quality_audit table should capture: rule_name, layer, table_name, check_timestamp, records_checked, records_failed, failure_rate, status (PASS/WARN/FAIL). This creates a historical record of data quality that you can trend over time and use for compliance evidence.

Layer 3: Business Quality (Gold Layer)

Gold layer quality focuses on business accuracy—does the data tell the correct story?

Business Quality CheckValidation ApproachFrequency
Cross-system reconciliationCompare Fabric totals against source system reportsDaily
KPI threshold validationVerify KPIs fall within expected rangesEvery refresh
Historical consistencyCompare current period against same period last year (flag anomalies)Weekly
Aggregation integrityDetail records sum to summary totalsEvery refresh
Business rule complianceIndustry-specific rules (HIPAA, SOX, etc.)Daily
Trend break detectionStatistical anomaly detection on key metricsDaily

Cross-system reconciliation is the most important Gold layer quality check. Build automated reconciliation queries that compare Fabric output against the source of record. For financial data, this means comparing Fabric revenue totals against the ERP general ledger. For healthcare data, this means comparing patient counts against the EMR system. Discrepancies must be investigated and resolved before the data reaches dashboards.

Data Quality Monitoring Dashboard

Build a dedicated data quality monitoring dashboard in Power BI that tracks quality metrics across all layers:

Dashboard ComponentMetricVisual Type
Overall quality scoreWeighted average across all dimensionsCard with trend sparkline
Quality by tablePass/Warn/Fail status per tableHeat map matrix
Quality trendDaily quality score over 30 daysLine chart
Failure detailsSpecific failed checks with record countsDetail table with drill-through
SLA compliancePercentage of pipelines completing on timeGauge visual
Reconciliation varianceSource vs. Fabric delta per systemBar chart

Pin this dashboard to your Fabric monitoring hub and set up data alerts for critical quality threshold breaches. Configure Power Automate flows to notify data stewards when quality scores drop below acceptable thresholds.

Implementing Data Quality with Fabric-Specific Tools

Great Expectations Integration

Great Expectations is a leading open-source data quality framework that integrates well with Fabric Spark notebooks:

Great Expectations FeatureFabric ImplementationValue
Expectation suitesDefine quality rules as codeVersion-controlled quality definitions
CheckpointsAutomated validation runsPipeline-integrated quality gates
Data docsAuto-generated quality documentationSelf-service quality transparency
Custom expectationsBusiness-specific validation rulesIndustry-specific quality checks

Install Great Expectations in your Fabric Spark environment and create expectation suites for each Bronze, Silver, and Gold layer table. Checkpoints run as pipeline steps and produce pass/fail results that gate downstream processing.

SQL-Based Quality Checks in Fabric Warehouse

For organizations with strong SQL skills, implement quality checks directly in the Fabric SQL warehouse:

Create stored procedures that validate data quality rules and insert results into a centralized quality_results table. Schedule these procedures to run after each ETL cycle using Data Factory pipelines.

Fabric Data Activator for Real-Time Quality Alerts

Data Activator (Reflex) provides real-time quality monitoring:

  • Monitor quality metric streams in real time
  • Trigger alerts when quality scores drop below thresholds
  • Automatically pause downstream pipelines when critical quality issues are detected
  • Notify data stewards via Teams or email

This is particularly valuable for streaming data scenarios where traditional batch quality checks are insufficient.

Data Quality Governance Structure

Technical tools alone do not solve data quality. You need an organizational structure:

RoleResponsibilityAccountability
Data StewardDefine quality rules, investigate failures, approve fixesPer-domain quality ownership
Data EngineerImplement quality checks, build monitoring pipelinesTechnical implementation
Data OwnerApprove data definitions, resolve business disputesFinal authority on "correct" values
Platform AdminMonitor infrastructure quality (SLAs, freshness)Operational quality
Quality ChampionDrive quality culture, train users, report metricsCross-domain coordination

Assign data stewards to each business domain (Finance, Sales, HR, Operations). Each steward is responsible for defining quality rules, reviewing quality dashboards, and investigating failures within their domain. This distributed ownership model scales better than a centralized data quality team.

Common Data Quality Mistakes in Fabric

Mistake 1: Checking quality only at ingestion Data quality can degrade at any layer. Transformations introduce bugs, business rule changes are not reflected in validation, and source systems change without notice. Check quality at every layer.

Mistake 2: No reconciliation against source systems If your Fabric output does not reconcile to the source of record, you do not know whether your data is correct. Build reconciliation checks for every critical data domain.

Mistake 3: Ignoring quality for "non-critical" tables Dimension tables that are "just lookups" can have quality issues that cascade into fact table joins, producing wrong numbers. Validate all tables, not just the most visible ones.

Mistake 4: Quality checks that never fail If your quality checks have a 100% pass rate for months, your thresholds are too lenient. Tighten thresholds gradually until checks catch real issues.

Mistake 5: No quality trend tracking A single quality snapshot is useful. Quality trends over time are transformative—they show whether your data platform is improving or degrading. Store every quality check result and build trend dashboards.

Data Quality for Regulated Industries

Organizations in regulated industries have additional quality requirements:

IndustryQuality RequirementFabric Implementation
HealthcareHIPAA data integrityAudit trails on every data modification, reconciliation against EMR
Financial ServicesSOX complianceAutomated reconciliation, segregation of duties in data access
GovernmentFedRAMP data protectionEncryption validation, access logging, quality documentation
Life SciencesFDA 21 CFR Part 11Validation protocols, electronic signatures, audit trails

For these industries, data quality documentation is not optional—it is a compliance deliverable. Automate quality documentation generation using Power BI Helper or custom scripts that export quality audit results to PDF format for regulatory evidence.

Measuring Data Quality ROI

MetricBefore Quality FrameworkAfter Quality FrameworkBusiness Impact
Dashboard trust score (user survey)3.2/54.5/5Higher adoption and usage
Ad-hoc data investigation requests25/month5/month80% reduction in analyst time spent on data issues
Data discrepancy incidents12/month2/monthFewer executive escalations
Report revision cycles3 per report1 per reportFaster time to insight
Regulatory audit findings3-5 per audit0-1 per auditReduced compliance risk

Getting Started with Data Quality in Fabric

  1. Week 1-2: Inventory your data sources and identify the 10 most critical tables
  2. Week 3-4: Define quality rules for each dimension (accuracy, completeness, etc.)
  3. Week 5-6: Implement Bronze and Silver layer quality checks
  4. Week 7-8: Build the quality monitoring dashboard and configure alerts
  5. Month 3: Add Gold layer reconciliation checks and trend tracking
  6. Ongoing: Review quality metrics monthly, tighten thresholds quarterly

For organizations that need expert data quality implementation, our Fabric consulting team provides data quality assessment, framework implementation, and ongoing monitoring. We also offer data analytics consulting that includes data quality as a core component of every engagement. Contact us to discuss your data quality strategy.

Frequently Asked Questions

Does Fabric have built-in data quality tools?

Fabric provides data quality capabilities through notebooks, dataflows, and Data Activator for monitoring. Additional governance features are available through Purview integration for enterprise data catalog and quality tracking.

How do I monitor data freshness in Fabric?

Use Data Activator to create triggers on data timestamps, set up monitoring dashboards tracking last refresh times, and configure alerts when data exceeds expected staleness thresholds.

Microsoft FabricData QualityGovernanceMonitoring

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.