Is medallion architecture required in Fabric?

No, medallion is a recommended pattern but not required. It provides clear data lineage and progressive quality improvement. Smaller projects might simplify, but enterprise implementations benefit from the structure.

How do I handle data quality failures in medallion architecture?

Implement quarantine tables in the Bronze or Silver layer. Records failing quality checks go to quarantine for review while valid data progresses. Track and resolve quarantined records through a defined process.

Deep Dive into Medallion Architecture

The medallion architecture (Bronze → Silver → Gold) provides a structured framework for progressively refining raw data into business-ready analytics assets. While the concept is simple—ingest raw data, cleanse and standardize it, then curate it for consumption—the implementation details determine whether your data platform is maintainable, performant, and trustworthy at scale. This guide covers advanced implementation patterns for each layer in Microsoft Fabric, including specific techniques for handling slowly changing dimensions, data quality enforcement, and optimization for Direct Lake semantic models. Our data analytics consulting team implements medallion architectures with automated quality gates and monitoring.

Architecture Overview

Each medallion layer serves a distinct purpose with different quality guarantees, schemas, and access patterns:

| Layer | Purpose | Data Quality | Schema | Primary Users | Storage Pattern | |---|---|---|---|---|---| | Bronze | Raw ingestion landing zone | As-is from source (no quality guarantees) | Schema-on-read, preserves source format | Data engineers (debugging, reprocessing) | Append-only, partitioned by ingestion date | | Silver | Cleansed, conformed, validated | Quality gates enforced, nulls handled, deduped | Schema-on-write, standardized types | Data engineers, analysts | Upsert with merge, SCD Type 2 for dimensions | | Gold | Business-ready, optimized | Fully validated, aggregated, documented | Star schema for BI, wide tables for ML | Analysts, BI tools, data scientists | Optimized for query (V-Order, partitioned by business key) |

In Fabric, each layer typically maps to a separate Lakehouse (Bronze Lakehouse, Silver Lakehouse, Gold Lakehouse) within the same workspace or across workspaces based on your governance model. Separating layers into distinct Lakehouses enables independent access control, capacity management, and lifecycle policies.

Bronze Layer: Raw Ingestion

Design Principles

The Bronze layer is your safety net. Its job is to capture an exact copy of source data with minimal transformation so that you can always reprocess from raw data if downstream logic changes.

Rules for Bronze: - Never modify source data structure or content - Add metadata columns: ingestion timestamp, source system identifier, batch ID - Preserve source data types (do not convert or parse) - Store in Delta format for ACID transactions and time travel - Partition by ingestion date for efficient reprocessing

Ingestion Patterns

Full Snapshot Pattern: Copy the entire source table on each run. Simple but storage-intensive. Best for small reference tables (< 1 million rows) or sources without reliable change tracking.

Incremental Append Pattern: Capture only new rows since the last ingestion using a watermark column (last modified timestamp, auto-increment ID). Efficient for large transaction tables. Implement using Fabric Data Pipelines with a lookup activity to get the last watermark and a copy activity filtered to rows after that watermark.

CDC (Change Data Capture) Pattern: Capture inserts, updates, and deletes as individual change events. Use Fabric Mirroring for supported databases (Azure SQL, Cosmos DB, Snowflake) or configure CDC on the source and stream changes through Eventstreams. This is the most storage-efficient and lowest-latency pattern for large operational databases.

File-Based Ingestion: For file drops (CSV, JSON, Parquet from SFTP, S3, or Azure Blob), use a Data Pipeline with a ForEach activity to process each file. Move processed files to an archive folder. Track file-level metadata (filename, size, row count) in a Bronze control table.

Bronze Table Structure

Every Bronze table should include standard metadata columns alongside source columns:

`_ingestion_timestamp`: When the row was ingested (TIMESTAMP)
`_source_system`: Source identifier (STRING, e.g., "SAP_ERP", "Salesforce")
`_batch_id`: Unique identifier for the ingestion batch (STRING/GUID)
`_source_file`: For file-based ingestion, the source filename (STRING)

These columns enable lineage tracking, debugging, and selective reprocessing without modifying source data.

Silver Layer: Cleansing and Conforming

Design Principles

The Silver layer transforms raw Bronze data into a clean, standardized, trusted dataset. This is where you enforce data quality, handle slowly changing attributes, deduplicate records, and conform data types and codes across source systems.

Data Quality Gates

Implement quality checks as explicit steps in your Silver layer notebooks or dataflows. When records fail quality checks, route them to a quarantine table rather than dropping them silently:

| Quality Check | Implementation | Failure Action | |---|---|---| | Null checks on required fields | `df.filter(col("customer_id").isNull())` → quarantine | Route to quarantine, alert data steward | | Range validations | Filter amounts outside expected bounds (e.g., negative revenue) | Route to quarantine with violation reason | | Referential integrity | Left anti-join between fact FK and dimension PK | Route orphan records to quarantine | | Duplicate detection | Window function ranking by business key, ordered by timestamp | Keep latest record, archive duplicates | | Data type conformity | Try-parse with fallback to null | Route unparseable records to quarantine |

Track quarantine metrics over time. A sudden spike in quarantine volume indicates a source system issue that needs investigation.

Slowly Changing Dimension (SCD) Type 2

Silver layer dimension tables should implement SCD Type 2 for attributes where historical accuracy matters (customer region, product category, employee department):

New record arrives: If no existing record with this business key exists, insert with EffectiveStartDate = current date, EffectiveEndDate = null, IsCurrent = true
Changed record arrives: Close the existing record (set EffectiveEndDate = current date - 1, IsCurrent = false), then insert the new version with EffectiveStartDate = current date
No change: Skip—do not create duplicate records for unchanged data

Implement in Spark notebooks using a merge (upsert) operation against the Silver dimension table. The merge condition checks the business key plus attribute equality to detect changes.

Cross-Source Conforming

When multiple source systems contribute data to the same Silver table, resolve conflicts:

Code standardization: Map source-specific codes to a common code table (e.g., "US", "USA", "United States" → "US")
Currency conversion: Convert all monetary amounts to a standard currency using exchange rates from a reference table
Date normalization: Parse all date formats to a standard YYYY-MM-DD format in UTC timezone
Entity resolution: Match customer records across systems using fuzzy matching on name + address or deterministic matching on email/phone

Gold Layer: Business-Ready Consumption

Design Principles

The Gold layer produces the final datasets consumed by Power BI semantic models, machine learning pipelines, and operational applications. Gold tables should be optimized for their primary consumer—typically star schema for BI and wide denormalized tables for ML.

Star Schema for BI

Build Gold layer star schema tables specifically for Power BI Direct Lake consumption:

Fact tables: Aggregate to the grain needed for reporting. If transaction-level detail is rarely needed, aggregate to daily grain and store transaction detail in Silver for drill-through.
Dimension tables: Fully denormalized with all hierarchy levels. Include the SCD Type 2 current-state view (IsCurrent = true) for standard reporting and the full history view for point-in-time analysis.
V-Order optimization: Ensure Gold tables are written with V-Order enabled for optimal Direct Lake compression. Run OPTIMIZE after bulk writes.
Integer surrogate keys: Replace business keys with sequential integer surrogate keys in Gold. This maximizes VertiPaq compression and relationship performance.

Aggregation Tables

For datasets exceeding 100 million rows, create pre-aggregated Gold tables alongside the detail tables:

Daily summary: Aggregate hourly or transaction data to daily grain with SUM, COUNT, AVG measures
Monthly summary: Further aggregate for dashboard-level KPIs
Dimensional aggregation: Pre-compute totals by common groupings (region, product category, customer segment)

Power BI's aggregation feature automatically routes queries to the appropriate aggregation table based on the columns and measures requested, delivering sub-second performance on billion-row datasets.

Refresh Orchestration

Orchestrate the Bronze → Silver → Gold pipeline using Fabric Data Factory pipelines:

Bronze notebooks/copy activities run first, ingesting raw data
Silver notebooks run after Bronze completes, applying quality gates and conforming
Gold notebooks run after Silver completes, building star schema and aggregations
Semantic model framing triggers automatically when Gold tables update (Direct Lake detects Delta log changes)

Use pipeline dependencies to ensure correct execution order. Add error handling at each stage: if Bronze fails for one source, Silver and Gold can still process other sources that succeeded.

Deep Dive into Medallion Architecture

Architecture Overview

Bronze Layer: Raw Ingestion

Design Principles

Ingestion Patterns

Bronze Table Structure

Silver Layer: Cleansing and Conforming

Design Principles

Data Quality Gates

Slowly Changing Dimension (SCD) Type 2

Cross-Source Conforming

Gold Layer: Business-Ready Consumption

Design Principles

Star Schema for BI

Aggregation Tables

Refresh Orchestration

Related Resources

Frequently Asked Questions

Is medallion architecture required in Fabric?

How do I handle data quality failures in medallion architecture?

Related Articles

Building a Modern Data Lakehouse with Microsoft Fabric

Getting Started with Fabric Notebooks and PySpark

Building ML Models in Microsoft Fabric

Related Services

Microsoft Fabric Consulting

Data Analytics

Architecture Consulting

Industry Solutions

Need Help With Power BI?

Ready to Transform Your Data Strategy?