Deep Dive into Medallion Architecture
Data Engineering
Data Engineering14 min read

Deep Dive into Medallion Architecture

Advanced implementation patterns for Bronze, Silver, and Gold medallion architecture layers in Microsoft Fabric. Data quality, SCD handling, and optimization.

By Errin O'Connor, Chief AI Architect

<h2>Deep Dive into Medallion Architecture in Microsoft Fabric</h2>

<p>The medallion architecture (Bronze, Silver, Gold) is a data engineering pattern that progressively refines raw data into business-ready analytics assets through three distinct layers, each serving a specific purpose in data quality, transformation, and consumption. This architecture has become the de facto standard for organizing enterprise lakehouses in Microsoft Fabric, providing clear boundaries between raw ingestion, validated transformation, and curated business models.</p>

<p>Having implemented medallion architectures for organizations processing everything from healthcare claims to retail point-of-sale transactions, I can tell you that the concept is deceptively simple but the execution demands careful planning. The organizations that succeed treat each layer as a contract — with defined schemas, quality gates, and ownership — while those that fail treat it as a loose suggestion and end up with a disorganized data swamp wearing a "lakehouse" label.</p>

<h2>Why Medallion Architecture Matters in 2026</h2>

<p>Before medallion architecture became standard, most data lakes followed a "dump everything in and figure it out later" approach. The result was predictable: data scientists spent 80% of their time finding, understanding, and cleaning data rather than analyzing it. Business users had no confidence in the numbers because different teams applied different transformation logic to the same raw data, producing conflicting reports.</p>

<p>The medallion pattern solves this by creating explicit layers with clear responsibilities:</p>

<ul> <li><strong>Bronze (Raw):</strong> Exact copies of source data, preserving the original format and all records including errors. This is your audit trail and reprocessing safety net.</li> <li><strong>Silver (Validated):</strong> Cleaned, deduplicated, conformed data with standardized schemas. This is where data engineering applies business rules and quality checks.</li> <li><strong>Gold (Business):</strong> Aggregated, modeled, business-ready datasets optimized for specific consumption patterns — dashboards, ML models, or operational reports.</li> </ul>

<p>In Microsoft Fabric, this architecture maps naturally to the Lakehouse and Warehouse workloads, with Delta tables providing ACID transactions, time travel, and schema enforcement at every layer.</p>

<h2>Bronze Layer: Raw Ingestion Done Right</h2>

<p>The bronze layer captures source data in its original form with minimal transformation. The cardinal rule is: never lose data at this layer. Every record from every source system lands here, including malformed records, duplicates, and late-arriving data.</p>

<p><strong>Bronze layer design principles:</strong></p>

<ul> <li><strong>Append-only ingestion:</strong> Never overwrite bronze data. Use append mode so you maintain a complete history of everything received from source systems. This enables reprocessing when business rules change.</li> <li><strong>Include metadata columns:</strong> Add ingestion_timestamp, source_system, source_file_name, and batch_id to every bronze table. These columns are invaluable for debugging data issues, tracking lineage, and identifying when a source system sent bad data.</li> <li><strong>Preserve original data types:</strong> If the source sends dates as strings, store them as strings in bronze. Type conversion happens in silver. This prevents ingestion failures when source systems send unexpected formats.</li> <li><strong>Partition by ingestion date:</strong> This enables efficient incremental processing in the silver layer — the silver pipeline reads only new bronze partitions rather than rescanning the entire table.</li> </ul>

<p><strong>Fabric-specific bronze patterns:</strong></p>

<p>Use Fabric Data Factory pipelines or Spark notebooks for bronze ingestion. For real-time sources, Fabric Eventstream captures streaming data into bronze Delta tables. For file-based sources, use Lakehouse file shortcuts to reference files in OneLake or external storage (ADLS Gen2, S3) without copying data.</p>

<p>For high-volume ingestion scenarios, optimize your Spark jobs with the techniques in our <a href="/blog/fabric-spark-optimization">Spark optimization guide</a> to ensure bronze ingestion completes within your processing windows.</p>

<h2>Silver Layer: Where Data Engineering Happens</h2>

<p>The silver layer is where the heavy lifting occurs. This layer transforms raw bronze data into clean, validated, conformed datasets that multiple gold-layer consumers can trust. Getting silver right is the most important investment in your medallion architecture.</p>

<p><strong>Silver layer transformations:</strong></p>

TransformationPurposeExample
Schema enforcementStandardize column names and typesConvert "cust_nm" to "customer_name" (STRING)
DeduplicationRemove duplicate recordsDeduplicate by business key + timestamp
Data type castingConvert to correct typesParse date strings to DATE, amounts to DECIMAL
Null handlingApply business rules for missing dataDefault unknown regions to "Unassigned"
Referential integrityValidate foreign key relationshipsFlag orders with invalid customer IDs
StandardizationNormalize values across sourcesMap "US," "USA," "United States" to "US"
Slowly changing dimensionsTrack historical changesImplement SCD Type 2 for customer addresses

<p><strong>Delta merge for incremental processing:</strong></p>

<p>The silver layer should use Delta MERGE operations to process only new or changed records from bronze. A typical pattern reads bronze records with ingestion_timestamp greater than the last silver processing watermark, applies transformations, and merges results into the silver table. This incremental approach processes only changed data rather than recomputing the entire silver table on every run.</p>

<p><strong>Data quality gates:</strong></p>

<p>Implement explicit quality checks between bronze and silver. I use a quarantine pattern: records that fail validation rules land in a separate quarantine table with the failure reason, while clean records proceed to silver. This prevents bad data from propagating to gold while preserving the problematic records for investigation. Common quality checks include:</p>

<ul> <li>Required fields are not null</li> <li>Dates fall within reasonable ranges (not year 1900 or year 2099)</li> <li>Numeric values are within expected bounds</li> <li>Categorical values match allowed value lists</li> <li>Cross-field consistency (end_date >= start_date)</li> </ul>

<h2>Gold Layer: Business-Ready Analytics</h2>

<p>The gold layer contains datasets optimized for specific business consumption patterns. Unlike silver — which aims to be a general-purpose clean data store — gold tables are purpose-built for their consumers. A gold table for a Power BI executive dashboard looks very different from a gold table feeding a machine learning model.</p>

<p><strong>Gold layer design patterns:</strong></p>

<p><strong>Star Schema for BI:</strong> For Power BI consumption, gold tables should follow star schema design — fact tables containing measures and foreign keys, surrounded by dimension tables containing descriptive attributes. This structure optimizes both Power BI import performance and DAX query patterns. See our guide on <a href="/blog/semantic-model-practices">semantic model best practices</a> for how gold layer design flows into Power BI.</p>

<p><strong>Aggregation Tables:</strong> Pre-aggregate common query patterns. If your executive dashboard always shows monthly revenue by region, create a gold table at that grain rather than forcing Power BI to aggregate millions of daily transaction rows at query time. This reduces dashboard load times from 30 seconds to under 2 seconds.</p>

<p><strong>Feature Tables for ML:</strong> Machine learning models need wide, denormalized tables with engineered features. Gold tables for ML consumption should pre-compute features like "customer_lifetime_value," "days_since_last_purchase," and "rolling_30day_average" so data scientists can focus on modeling rather than feature engineering.</p>

<p><strong>Denormalized Reporting Tables:</strong> Some reporting scenarios benefit from fully denormalized tables that combine dimensions and facts into a single wide table. This simplifies queries for Fabric SQL endpoints and non-Power-BI consumers like Excel direct query users.</p>

<h2>Implementing Medallion Architecture in Fabric Lakehouse</h2>

<p>Microsoft Fabric provides several options for organizing medallion layers:</p>

<p><strong>Option 1: Single Lakehouse, Multiple Schemas</strong></p> <p>Create bronze, silver, and gold schemas within a single Lakehouse. This is the simplest approach and works well for small-to-medium implementations. All data stays in one Lakehouse, simplifying security and management.</p>

<p><strong>Option 2: Separate Lakehouses per Layer</strong></p> <p>Create bronze_lakehouse, silver_lakehouse, and gold_lakehouse as separate Fabric items. This provides stronger isolation — you can assign different capacity, security policies, and access controls to each layer. The bronze lakehouse might allow data engineers full access, while the gold lakehouse is read-only for analysts.</p>

<p><strong>Option 3: Hybrid with Shortcuts</strong></p> <p>Use separate Lakehouses with OneLake shortcuts connecting them. The silver lakehouse references bronze tables via shortcuts (no data copy), and the gold lakehouse references silver tables similarly. This combines isolation benefits with the convenience of cross-lakehouse querying.</p>

<p>For most enterprise deployments, I recommend Option 2 (separate Lakehouses) because it provides the clearest security boundaries and enables independent scaling. The small additional management overhead is worth the governance clarity.</p>

<h2>Handling Cross-Cutting Concerns</h2>

<p><strong>Data Lineage:</strong> Track how data flows from bronze through silver to gold. Fabric's built-in lineage tracking helps, but I also recommend maintaining a metadata table that records processing timestamps, row counts, and transformation versions at each layer boundary. When a business user questions a number, you can trace it back through every transformation to the original source record.</p>

<p><strong>Schema Evolution:</strong> Source systems change their schemas regularly — new columns appear, data types change, columns get renamed. Bronze handles this naturally (append everything). Silver must handle schema evolution gracefully using Delta's schema evolution capabilities (mergeSchema option). Gold schemas should change through governed release processes because downstream reports depend on them.</p>

<p><strong>Reprocessing:</strong> When you discover a bug in your silver transformation logic, you need to reprocess historical data. The bronze layer's complete history enables this — fix your silver pipeline, clear the affected silver partitions, and rerun from bronze. Without a complete bronze layer, this reprocessing is impossible, and you are stuck with incorrect historical data.</p>

<p><strong>Testing:</strong> Treat your medallion pipelines like software. Unit test individual transformations, integration test the bronze-to-silver-to-gold flow, and run data quality assertions after every pipeline execution. Fabric notebooks support assertion-based testing that can fail a pipeline run when quality thresholds are not met.</p>

<h2>Performance Optimization Across Layers</h2>

<p>Each layer has different performance characteristics and optimization strategies:</p>

<ul> <li><strong>Bronze:</strong> Optimize for write throughput. Use large batch sizes, minimize partition count, and disable unnecessary indexes. Write speed matters more than read speed at this layer.</li> <li><strong>Silver:</strong> Optimize for both read and write. Use Z-ordering on frequently filtered columns, compact small files regularly with OPTIMIZE, and implement efficient incremental processing.</li> <li><strong>Gold:</strong> Optimize aggressively for read performance. Use V-Order (Fabric's read-optimized format), Z-order on query-specific columns, pre-aggregate wherever possible, and tune partition sizes for your consumption patterns.</li> </ul>

<p>For connecting your gold layer to Power BI with optimal performance, explore <a href="/blog/real-time-analytics-fabric">real-time analytics in Fabric</a> and our guide on <a href="/blog/power-bi-direct-lake-mode-guide-2026">Direct Lake mode</a> which reads gold Delta tables directly without import.</p>

<h2>Common Mistakes to Avoid</h2>

<ul> <li><strong>Skipping bronze:</strong> Teams that load directly into silver lose their reprocessing safety net. Always land raw data first.</li> <li><strong>Too many gold tables:</strong> Creating a separate gold table for every report leads to duplication and inconsistency. Build reusable gold datasets that serve multiple reports.</li> <li><strong>Silver = dump zone:</strong> Without clear transformation standards, silver becomes another unstructured mess. Define and enforce transformation patterns.</li> <li><strong>Ignoring data quality:</strong> Without quality gates between layers, errors in source data propagate all the way to executive dashboards. Implement validation at every layer boundary.</li> <li><strong>Overcomplicating the architecture:</strong> Some organizations add Platinum, Copper, and Diamond layers. Stick to three layers unless you have a very specific, documented reason to add more.</li> </ul>

<p>The medallion architecture is not just a technical pattern — it is an organizational framework that assigns clear responsibilities, quality standards, and ownership at each data transformation stage. Get the architecture right, and your entire analytics platform becomes more trustworthy, maintainable, and scalable.</p>

Frequently Asked Questions

Is medallion architecture required in Fabric?

No, medallion is a recommended pattern but not required. It provides clear data lineage and progressive quality improvement. Smaller projects might simplify, but enterprise implementations benefit from the structure.

How do I handle data quality failures in medallion architecture?

Implement quarantine tables in the Bronze or Silver layer. Records failing quality checks go to quarantine for review while valid data progresses. Track and resolve quarantined records through a defined process.

Microsoft FabricMedallion ArchitectureData EngineeringBest Practices

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.