
Power BI Composite Models and Aggregations: Scale to Billions of Rows
Build high-performance composite models with aggregation tables for sub-second queries on billion-row datasets in Power BI.
Power BI has a practical limit on how much data can be loaded into Import mode - typically 1-10 billion rows depending on cardinality and available Premium capacity memory. But enterprise fact tables routinely contain tens of billions of rows spanning years of transactional data. Composite models with aggregations bridge this gap by combining the speed of Import mode with the scale of DirectQuery.
The Scale Challenge
Import mode stores data in compressed columnar format (VertiPaq), delivering sub-second queries on most datasets. But loading 50 billion rows of transactional data into memory is impractical - it would require enormous Premium capacity and refresh times measured in hours.
DirectQuery solves the scale problem by querying the source database directly, but query performance depends on the source system. Complex DAX calculations over DirectQuery tables can take 10-30 seconds per visual, creating unacceptable user experiences.
Composite models with aggregations give you both: Import-speed performance for the majority of queries and DirectQuery access to detail rows when needed.
Aggregation Architecture
The architecture follows a tiered approach:
Tier 1 - Aggregation Tables (Import): Pre-summarized tables stored in Power BI memory. Monthly or daily aggregates that answer 80% of queries in milliseconds. Typical size: 0.1-1% of the detail table.
Tier 2 - Detail Tables (DirectQuery): Full granularity data queried from the source when users drill to transaction-level detail. Only accessed when aggregation tables cannot answer the query.
Tier 3 - Dimension Tables (Dual): Shared dimension tables that can join efficiently with both Import aggregation tables and DirectQuery detail tables.
Power BI automatically routes each query to the most efficient tier without any DAX modifications. Users never see the routing logic - they just experience fast results.
Designing Aggregation Tables
Effective aggregation design requires understanding query patterns:
Step 1: Analyze Query Patterns using Premium Capacity Metrics or Performance Analyzer. Identify the most common grouping levels: are users mostly viewing monthly revenue by region? Daily sales by product? Quarterly trends by customer segment?
Step 2: Create Summary Tables at the most common granularity levels. If 70% of queries are monthly summaries, create a monthly aggregation table with columns for Year, Month, and each dimension key that users filter by.
Step 3: Define Aggregation Mappings in Power BI. For each column in the aggregation table, specify: - GroupBy: Dimension key columns that map to the detail table's foreign keys - Sum/Count/Min/Max: Measure columns with the appropriate aggregation function - Detail Table: Reference to the DirectQuery source table
Step 4: Set Storage Modes. Aggregation tables use Import mode. Detail tables use DirectQuery mode. Shared dimension tables use Dual mode.
Multi-Level Aggregations
For maximum performance, create multiple aggregation layers:
- Monthly aggregation (Import): Answers monthly/quarterly/yearly queries instantly
- Daily aggregation (Import): Answers daily-level queries without hitting the source
- Transaction detail (DirectQuery): Only used for individual record lookups
Power BI evaluates aggregations in order, using the most detailed aggregation that can satisfy the query. If a user asks for monthly sales, the monthly aggregation responds. If they ask for daily sales, the daily aggregation responds. Only transaction-level queries reach DirectQuery.
Aggregation Table Sizing
| Detail Table Size | Monthly Agg Size | Daily Agg Size | Memory Needed | |-------------------|-----------------|----------------|---------------| | 1 billion rows | ~10K rows | ~3M rows | < 1 GB | | 10 billion rows | ~100K rows | ~30M rows | 2-5 GB | | 50 billion rows | ~500K rows | ~150M rows | 10-20 GB | | 100 billion rows | ~1M rows | ~300M rows | 20-40 GB |
The compression ratio is dramatic: a 50 billion row table requires perhaps 15 GB as aggregations versus hundreds of GB if imported fully.
Query Routing Verification
After configuring aggregations, verify they are working:
- Open Performance Analyzer in Power BI Desktop
- Refresh visuals and examine the DAX query generated
- Use DAX Studio to run the query and check the execution plan
- Verify the plan shows "VertiPaq" (aggregation hit) vs "DirectQuery" (source hit)
If aggregations are not being used, check: are all GroupBy columns present in the aggregation? Is the aggregation function correct (Sum vs Count)? Are relationships configured properly?
Performance Benchmarks
A well-designed composite model with aggregations typically achieves:
- Aggregate queries: 100-500 milliseconds (Import speed)
- Detail queries: 2-10 seconds (DirectQuery, depends on source)
- Mixed queries: Sub-second for aggregated portion, seconds for detail drill-through
- Refresh time: Minutes for aggregation tables vs hours for full Import
Best Practices
- Start with one aggregation level, add more based on query patterns
- Include all commonly filtered dimension keys in the aggregation
- Refresh aggregation tables more frequently than full datasets (aggregations are small)
- Monitor aggregation hit rates monthly - add new aggregations for uncovered patterns
- Keep DirectQuery source optimized with proper indexes on join and filter columns
- Test with realistic data volumes - aggregation behavior changes with scale
Related Resources
Frequently Asked Questions
What are aggregations in Power BI and when should I use them?
Aggregations are pre-summarized Import mode tables that Power BI automatically queries instead of hitting large DirectQuery detail tables. Example: 10 billion row FactSales (DirectQuery) + 100 million row AggSales_Daily (Import with daily granularity). User queries sales by month—Power BI routes to AggSales_Daily (fast Import query) instead of FactSales DirectQuery. User drills to transaction detail—Power BI automatically switches to FactSales DirectQuery. Use aggregations when: (1) Fact tables exceed 1 billion rows making Import impractical, (2) Most queries are summary level (80% monthly/daily aggregates, 20% detail), (3) DirectQuery performance unacceptable (multi-second visual rendering). Setup: create aggregation table matching fact table schema but summarized (SUM, COUNT, MIN, MAX), set storage mode to Import, define aggregation mappings (GroupBy columns match dimensions, Detail Table points to source). Power BI Aggregation Manager auto-detects when to use aggregations—no DAX changes needed. Performance improvement: 10-100x faster for aggregate queries. Cost: aggregation storage (typically 1-5% of detail table size), refresh time for aggregation tables. Best practice: start with date granularity aggregations (daily, monthly), add product/customer aggregations based on query patterns. Monitor using Performance Analyzer—verify queries hit aggregations not detail table.
How do I create aggregation tables in Power BI composite models?
Aggregation creation workflow: (1) Identify slow DirectQuery tables and common query patterns via Premium Metrics app, (2) Create aggregation table in source database or Power Query—SELECT Date, ProductKey, CustomerKey, SUM(Amount) as TotalAmount FROM FactSales GROUP BY Date, ProductKey, CustomerKey, (3) Load aggregation as Import mode table in Power BI model, (4) Configure aggregation mappings: Model → Manage Aggregations → select aggregation table → map columns to detail table (GroupBy for dimensions, Sum/Count/Min/Max for measures, DetailTable=FactSales), (5) Set detail table to DirectQuery storage mode, (6) Test—query should automatically use aggregation when applicable. Multiple aggregation layers possible: AggSales_Daily (Import), AggSales_Monthly (Import), FactSales (DirectQuery)—Power BI chooses most detailed aggregation matching query. Automatic vs manual: Power BI auto-creates aggregations in some scenarios (Direct Lake automatic aggregations in Fabric), manual creation gives more control. Validation: use DAX Studio or Performance Analyzer to verify query plan shows aggregation table usage. Common mistakes: (1) Aggregation grain too detailed (daily when monthly sufficient)—wastes Import storage, (2) Missing columns—query cannot use aggregation if filters columns not in aggregation, (3) Incorrect mappings—GroupBy vs Sum confusion causes wrong results. Testing: compare query results with aggregations enabled vs disabled—should return identical results, just faster. Document aggregation strategy for maintenance—future developers need to know which tables are aggregations.
What is the difference between composite models and dual storage mode?
Composite models allow mixing storage modes (Import, DirectQuery, Dual) within single Power BI dataset—previously all tables same mode. Dual storage mode is specific table configuration in composite models. Dual table behavior: acts as Import when queried alone or with other Import/Dual tables, acts as DirectQuery when queried with DirectQuery tables. Use case: dimension tables set to Dual—join to Import fact tables as Import, join to DirectQuery fact tables as DirectQuery. Example: DimProduct (Dual) + FactSales_Summary (Import) + FactSales_Detail (DirectQuery). Query sales summary by product—DimProduct acts as Import (joins to FactSales_Summary Import). Drill to detail—DimProduct acts as DirectQuery (joins to FactSales_Detail DirectQuery). Benefits: (1) Dimension tables loaded once (smaller than facts), (2) Flexible join behavior, (3) Reduced model complexity. Gotcha: Dual tables count toward Import storage and refresh time—not free. When to use Dual: shared dimension tables in composite models with mixed storage modes. When to use DirectQuery for dimensions: dimensions larger than capacity memory, real-time dimension updates required. Best practice: dimension tables Dual, fact tables Import (aggregations) or DirectQuery (detail), relationship between facts and Dual dimensions seamless. Alternative: separate Import and DirectQuery models—cleaner architecture but requires two datasets and limited cross-querying. Composite models with Dual storage mode provide best of both worlds—unified model with flexible performance optimization.