
Power BI Performance Tuning: VertiPaq Analyzer Deep Dive
Master Power BI performance tuning with VertiPaq Analyzer. Learn how the columnar storage engine works, analyze dictionary size, data size, hierarchy size, identify bloated columns, understand cardinality analysis, encoding types (value vs hash vs RLE), and apply optimization techniques for model size reduction.
<h2>What Is VertiPaq and Why It Matters</h2>
<p>VertiPaq is the in-memory columnar storage engine that powers every Import mode semantic model in Power BI, Analysis Services, and Microsoft Fabric. When you import data into a Power BI model, VertiPaq does not store it in rows like a traditional database. Instead, it stores each column independently, applies sophisticated compression algorithms, and builds internal data structures (dictionaries, hierarchies, and relationship indexes) that enable the DAX formula engine to scan billions of values in milliseconds.</p>
<p>Understanding how VertiPaq works is the difference between building a 200 MB model that refreshes in 2 minutes and responds to queries in under a second, versus building a 4 GB model from the same data that takes 30 minutes to refresh and generates 10-second query times. The storage engine determines everything: model size, refresh duration, query performance, and memory consumption on shared capacity. Our <a href="/services/dax-optimization">DAX optimization</a> services always begin with VertiPaq analysis because model structure problems cannot be solved by DAX tuning alone.</p>
<p>VertiPaq Analyzer is the diagnostic tool that makes the internal storage structures visible. It reports the actual memory consumption of every column, table, and relationship in your model, broken down by component type (dictionary, data, hierarchy). This visibility enables targeted optimization: instead of guessing which columns are causing bloat, you can identify the exact columns consuming the most memory and apply the appropriate compression technique.</p>
<h2>How the Columnar Storage Engine Works</h2>
<p>To optimize VertiPaq effectively, you need to understand the three fundamental components of columnar storage: dictionaries, data segments, and hierarchies.</p>
<h3>Dictionaries</h3>
<p>Every column in a VertiPaq model has a dictionary: a sorted, deduplicated list of all unique values in that column. If a column contains 10 million rows but only 500 unique values (like a Status column with values "Open", "Closed", "Pending", etc.), the dictionary stores only 500 entries. The actual column data then stores integer indexes into this dictionary rather than the full values, achieving massive compression.</p>
<p>Dictionary size is determined by two factors: the number of unique values (cardinality) and the size of each unique value (data type and string length). A column with 1,000 unique short strings ("USA", "UK", "DE") produces a small dictionary. A column with 5 million unique GUIDs or long URLs produces a massive dictionary that may dominate the model's memory footprint.</p>
<h3>Data Segments</h3>
<p>VertiPaq divides each column into segments of approximately 1 million rows (the exact segment size varies). Within each segment, the engine stores dictionary index values using one of several encoding schemes chosen based on the data distribution:</p>
<ul> <li><strong>Value Encoding</strong>: For numeric columns where the actual values can be stored more efficiently than dictionary references. VertiPaq applies a mathematical transformation (subtracting the minimum value and dividing by the GCD of all values) to produce small integers that compress well. Value encoding bypasses the dictionary entirely and is the most memory-efficient encoding.</li> <li><strong>Hash Encoding</strong>: For columns where value encoding is not applicable (strings, high-cardinality numerics). The dictionary stores unique values, and the data segment stores hash indexes into the dictionary. Hash encoding is less efficient than value encoding because it requires both dictionary and data segment storage.</li> <li><strong>Run-Length Encoding (RLE)</strong>: Applied on top of value or hash encoding when consecutive rows contain the same value. Instead of storing the value N times, RLE stores the value once plus a count. Sort order dramatically impacts RLE effectiveness: a column sorted so that identical values are adjacent compresses much better than a randomly ordered column.</li> </ul>
<h3>Hierarchies (Relationship Indexes)</h3>
<p>For every relationship in the model, VertiPaq builds internal hierarchy structures that enable rapid filtering across tables. When a slicer on the Product table filters the Sales fact table, VertiPaq uses these hierarchy indexes to identify which rows in Sales match the filter without scanning every row. Hierarchy size is proportional to the cardinality of the columns involved in the relationship and the number of distinct combinations that exist in the data.</p>
<h2>Running VertiPaq Analyzer</h2>
<p>VertiPaq Analyzer is available through several tools:</p>
<ul> <li><strong>DAX Studio</strong> (free, open-source): The most popular tool for running VertiPaq Analyzer. Connect to your model (local Power BI Desktop instance or XMLA endpoint), then select Advanced > View Metrics to generate the VertiPaq analysis. DAX Studio provides tabbed views for tables, columns, relationships, and partitions.</li> <li><strong>Tabular Editor 3</strong> (commercial): Includes an integrated VertiPaq Analyzer accessible through the model explorer. Provides the same metrics as DAX Studio with the additional convenience of being within the model development environment. See our <a href="/blog/power-bi-tabular-editor-model-development-guide-2026">Tabular Editor guide</a> for setup details.</li> <li><strong>DMV Queries</strong>: Query the VertiPaq metadata directly using DAX Studio or SSMS through Dynamic Management Views (DMVs). The key DMVs are DISCOVER_STORAGE_TABLE_COLUMNS, DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS, and DISCOVER_STORAGE_TABLES.</li> </ul>
<h3>Key VertiPaq Analyzer Metrics</h3>
<p>The VertiPaq Analyzer output contains several critical metrics for each column:</p>
<ul> <li><strong>Dictionary Size</strong>: Memory consumed by the column's dictionary (unique value list). Large dictionaries indicate high cardinality or large individual values (long strings).</li> <li><strong>Data Size</strong>: Memory consumed by the encoded column data segments. Large data size relative to row count indicates poor compression, typically caused by high cardinality preventing effective RLE.</li> <li><strong>Hierarchy Size</strong>: Memory consumed by relationship indexes involving this column. Disproportionately large hierarchy sizes indicate complex many-to-many relationship patterns or high-cardinality relationship columns.</li> <li><strong>Total Size</strong>: Dictionary Size + Data Size + Hierarchy Size. The total memory footprint of the column.</li> <li><strong>Cardinality</strong>: The number of unique values in the column. The single most important metric for predicting compression efficiency.</li> <li><strong>Encoding</strong>: Whether the column uses Value encoding or Hash encoding. Value-encoded columns are more memory efficient.</li> <li><strong>Rows</strong>: Total row count for the table (same for all columns in a table).</li> <li><strong>Table Size</strong>: Aggregate of all column sizes plus table-level overhead.</li> </ul>
<h2>Identifying Bloated Columns</h2>
<p>The primary goal of VertiPaq analysis is identifying columns that consume disproportionate memory relative to their analytical value. A "bloated" column is one where the memory cost exceeds the business benefit of including it in the model.</p>
<h3>Common Bloat Patterns</h3>
<ul> <li><strong>High-Cardinality Text Columns</strong>: Columns like description fields, comments, email addresses, full names, or free-text notes that have unique or near-unique values for every row. These columns produce massive dictionaries (one entry per row) and minimal RLE compression. Unless the column is actively used in filters, grouping, or display, it should be removed from the model.</li> <li><strong>DateTime Columns with Time Precision</strong>: A DateTime column with time-of-day precision (e.g., "2026-02-24 14:32:17.483") has extremely high cardinality because every timestamp is effectively unique. This prevents both dictionary compression and RLE. The solution is to split the DateTime into a Date column (low cardinality: ~365 values per year) and a Time column (or hour bucket column) if time analysis is needed.</li> <li><strong>GUID/UUID Columns</strong>: Technical identifier columns (transaction IDs, correlation IDs, system-generated keys) with 100% unique values. These are the worst case for VertiPaq compression: every value is unique, every dictionary entry is a long string, and RLE provides zero benefit. Remove GUIDs from the model and use integer surrogate keys instead.</li> <li><strong>Concatenated Key Columns</strong>: Columns created by concatenating multiple fields (e.g., "USA-2026-A1234-XYZ") for relationship purposes. The concatenation produces high cardinality and long string values. Use integer keys or create <a href="/blog/power-bi-star-schema">star schema</a> bridge tables instead.</li> <li><strong>Unused Columns</strong>: Columns imported from the source but never referenced in any measure, visual, filter, or relationship. These consume memory for zero analytical benefit. VertiPaq Analyzer combined with <a href="/blog/power-bi-tabular-editor-model-development-guide-2026">Tabular Editor's</a> Best Practice Analyzer identifies unused columns automatically.</li> </ul>
<h3>Bloat Analysis DAX Query</h3>
<p>Use this DAX query in DAX Studio to identify the largest columns in your model:</p>
<pre><code>// Query DISCOVER_STORAGE_TABLE_COLUMNS DMV // Run in DAX Studio connected to your model SELECT [TABLE_ID], [COLUMN_ID], [DICTIONARY_SIZE], [COLUMN_ENCODING], [COLUMN_CARDINALITY] FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS ORDER BY [DICTIONARY_SIZE] DESC</code></pre>
<h2>Cardinality Analysis</h2>
<p>Cardinality (the number of unique values in a column) is the single most important factor in VertiPaq compression efficiency. Low-cardinality columns compress extremely well; high-cardinality columns compress poorly.</p>
<h3>Cardinality Tiers and Impact</h3>
<table> <tr><th>Cardinality Level</th><th>Unique Values</th><th>Compression</th><th>Example</th></tr> <tr><td>Very Low</td><td>< 100</td><td>Excellent (95%+ compression)</td><td>Status, Category, Country</td></tr> <tr><td>Low</td><td>100 - 10,000</td><td>Very Good (85-95%)</td><td>Product, City, Account</td></tr> <tr><td>Medium</td><td>10,000 - 500,000</td><td>Good (60-85%)</td><td>Customer ID, Date</td></tr> <tr><td>High</td><td>500,000 - 5,000,000</td><td>Poor (20-60%)</td><td>Transaction ID, Timestamp</td></tr> <tr><td>Very High</td><td>> 5,000,000</td><td>Minimal (<20%)</td><td>GUID, Full DateTime, URL</td></tr> </table>
<p>The relationship between cardinality and model size is not linear but exponential. Doubling the cardinality of a column does not double its memory consumption; it can increase it by 3-5x because higher cardinality degrades both dictionary compression and RLE compression simultaneously.</p>
<h3>Cardinality Reduction Techniques</h3>
<ul> <li><strong>Remove unnecessary precision</strong>: Round decimal columns to required precision (2 decimal places instead of 15). Replace DateTime with Date. Replace timestamps with hour or minute buckets.</li> <li><strong>Replace strings with integers</strong>: Create integer surrogate keys in the data source and use them for relationships instead of natural string keys.</li> <li><strong>Bin continuous values</strong>: Replace exact numeric values with ranges ("0-100", "101-500", "501+") when exact values are not needed for analysis.</li> <li><strong>Split compound columns</strong>: Separate "City, State, ZIP" into individual columns. Each has lower cardinality than the combined string.</li> <li><strong>Filter before import</strong>: Remove historical data beyond the analysis window. If you only analyze 3 years of data, do not import 10 years.</li> </ul>
<h2>Encoding Types Deep Dive</h2>
<h3>Value Encoding</h3>
<p>Value encoding is VertiPaq's most efficient encoding. It applies to integer and decimal columns where the engine determines it can store the actual values (after mathematical transformation) more efficiently than dictionary references. The transformation works as follows:</p>
<ol> <li>Find the minimum value in the column</li> <li>Subtract the minimum from all values (rebase to zero)</li> <li>Find the Greatest Common Divisor (GCD) of all rebased values</li> <li>Divide all rebased values by the GCD</li> <li>Store the resulting small integers using minimal bits per value</li> </ol>
<p>Value encoding works best for integer columns with limited range and regular spacing (like fiscal year, month number, quantity in whole units). It works poorly for decimal columns with many significant digits or columns with large value ranges.</p>
<h3>Hash Encoding</h3>
<p>Hash encoding stores a dictionary of unique values and replaces each row value with an index into the dictionary. This is the default encoding for string columns and numeric columns where value encoding is not efficient. Hash encoding performance depends on dictionary size (determined by cardinality) and the effectiveness of RLE compression on the index values (determined by sort order).</p>
<h3>Run-Length Encoding (RLE)</h3>
<p>RLE is applied as a secondary compression on top of value or hash encoding. When consecutive rows contain the same encoded value, RLE stores the value once with a repetition count instead of storing it N times. The effectiveness of RLE depends entirely on the physical sort order of rows within each segment.</p>
<p>Consider a Sales table with 10 million rows and a Country column with 50 unique values. If the table is sorted by Country, consecutive rows have the same Country value, and RLE compresses the column to essentially 50 entries. If the table is sorted by TransactionDate (or unsorted), Country values alternate randomly across rows, and RLE provides minimal compression. The same column with the same data can vary by 100x in memory consumption depending on sort order.</p>
<h3>Sort Order Optimization</h3>
<p>VertiPaq determines segment sort order automatically based on the column with the lowest cardinality in each table. You can influence sort order by:</p>
<ul> <li>Ensuring the source query sorts data by low-cardinality columns first</li> <li>Using Power Query Sort Rows transformation before loading</li> <li>In Fabric/Analysis Services, using the <code>ORDER BY</code> hint in partition expressions</li> </ul>
<p>The optimal sort order places low-cardinality columns first (maximizing RLE for the most compressible columns) and high-cardinality columns last (where RLE provides the least benefit regardless of sort order).</p>
<h2>Optimization Techniques</h2>
<h3>1. Remove High-Cardinality Columns</h3>
<p>The highest-impact optimization is removing columns that should not be in the model. Use VertiPaq Analyzer to identify the top 10 columns by total size, then evaluate each:</p>
<ul> <li>Is this column used in any visual, filter, slicer, or measure? If no, remove it.</li> <li>Is this a technical column (GUID, system ID, audit timestamp) needed only for ETL? Remove it from the model; keep it only in the source.</li> <li>Is this a free-text column (notes, descriptions, comments)? Move it to a detail table with row-level drill-through rather than importing into the main fact table.</li> </ul>
<h3>2. Split Date and Time</h3>
<p>DateTime columns are one of the most common sources of model bloat. A DateTime with second precision across 5 years of data has approximately 157 million unique values. Splitting this into Date (1,826 unique values) and Time or Hour (24-86,400 unique values) reduces cardinality by orders of magnitude.</p>
<pre><code>// Power Query: Split DateTime into Date and Time let Source = Sql.Database("server", "database"), Sales = Source{[Schema="dbo",Item="Sales"]}[Data], AddDateColumn = Table.AddColumn(Sales, "OrderDate", each DateTime.Date([OrderDateTime]), type date), AddTimeColumn = Table.AddColumn(AddDateColumn, "OrderHour", each Time.Hour([OrderDateTime]), Int64.Type), RemoveOriginal = Table.RemoveColumns(AddTimeColumn, {"OrderDateTime"}) in RemoveOriginal</code></pre>
<h3>3. Implement Star Schema</h3>
<p>A proper <a href="/blog/power-bi-star-schema">star schema</a> design with narrow fact tables (integer keys and numeric measures only) and dimension tables (descriptive attributes with text and hierarchies) is the most fundamental optimization for VertiPaq. Fact table columns should have low cardinality relative to row count (integer foreign keys that reference dimension tables), enabling excellent compression. High-cardinality descriptive columns belong in dimension tables where row counts are smaller and the memory impact is proportionally reduced.</p>
<h3>4. Use Integer Keys for Relationships</h3>
<p>Replace string-based relationship keys with integer surrogate keys. An integer key column uses 4-8 bytes per dictionary entry versus 20-200+ bytes for a string key. For a fact table with 50 million rows and a relationship key with 100,000 unique values, switching from string to integer keys can reduce the relationship column size by 80-90%.</p>
<h3>5. Reduce Decimal Precision</h3>
<p>Decimal columns with unnecessary precision prevent value encoding and increase dictionary size. Round monetary values to 2 decimal places, percentages to 4 decimal places, and quantities to whole numbers where appropriate. In Power Query:</p>
<pre><code>// Round Amount to 2 decimal places Table.TransformColumns(Source, {{"Amount", each Number.Round(_, 2), type number}})</code></pre>
<h3>6. Disable Auto Date/Time Tables</h3>
<p>Power BI Desktop creates hidden auto date/time tables for every date column in the model. Each auto date/time table contains a calendar spanning the entire date range of the column, with columns for Year, Quarter, Month, Day, and other date attributes. For models with many date columns, these hidden tables can consume significant memory. Disable auto date/time in Power BI Desktop options (File > Options > Current File > Data Load > uncheck "Auto date/time") and create a single explicit Date dimension table instead.</p>
<h3>7. Optimize Relationship Columns</h3>
<p>Relationship hierarchy indexes consume memory proportional to the cardinality of the columns on both sides of the relationship. To minimize hierarchy size:</p>
<ul> <li>Use integer keys for relationships (smaller dictionary entries)</li> <li>Avoid relationships on high-cardinality columns when lower-cardinality alternatives exist</li> <li>Remove unused relationships (every relationship creates hierarchy storage even if never used in a query)</li> <li>Use single-direction relationships unless bidirectional filtering is explicitly required (bidirectional relationships create larger hierarchy indexes)</li> </ul>
<h2>Memory Compression Ratios and Benchmarks</h2>
<p>Understanding typical compression ratios helps you estimate model sizes before building and identify models that are compressing poorly.</p>
<h3>Typical Compression Benchmarks</h3>
<table> <tr><th>Source Data Size</th><th>Well-Optimized Model</th><th>Compression Ratio</th></tr> <tr><td>1 GB CSV</td><td>50-150 MB</td><td>7-20x</td></tr> <tr><td>10 GB SQL Table</td><td>500 MB - 2 GB</td><td>5-20x</td></tr> <tr><td>100 GB Data Warehouse</td><td>5-15 GB</td><td>7-20x</td></tr> </table>
<p>Models compressing at less than 5x their source data size likely have optimization opportunities. Models compressing at 10-20x are well-optimized. Compression above 20x is achievable for data with very low cardinality (aggregated data, summary tables).</p>
<h3>Model Size Reduction Strategies Summary</h3>
<table> <tr><th>Technique</th><th>Typical Size Reduction</th><th>Effort</th></tr> <tr><td>Remove unused columns</td><td>10-40%</td><td>Low</td></tr> <tr><td>Split DateTime columns</td><td>5-25%</td><td>Low</td></tr> <tr><td>Replace string keys with integers</td><td>10-30%</td><td>Medium</td></tr> <tr><td>Implement star schema</td><td>20-50%</td><td>High</td></tr> <tr><td>Remove high-cardinality text columns</td><td>15-40%</td><td>Low</td></tr> <tr><td>Reduce decimal precision</td><td>5-15%</td><td>Low</td></tr> <tr><td>Disable auto date/time</td><td>5-20%</td><td>Low</td></tr> <tr><td>Optimize sort order for RLE</td><td>5-20%</td><td>Medium</td></tr> </table>
<h2>Advanced VertiPaq Analysis Patterns</h2>
<h3>Segment-Level Analysis</h3>
<p>For very large models, segment-level analysis reveals compression efficiency within individual column segments. A column might have good average compression but one or two segments with poor compression due to data distribution anomalies. Use the DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS DMV to examine per-segment statistics.</p>
<h3>Relationship Cost Analysis</h3>
<p>Relationship hierarchy sizes are often overlooked because they do not appear as visible columns in the model. VertiPaq Analyzer reports hierarchy sizes separately. For models with many relationships (20+ tables in a star/snowflake schema), relationship hierarchies can consume 10-30% of total model memory. Optimize by reducing the cardinality of relationship columns and removing unused relationships.</p>
<h3>Partition Analysis</h3>
<p>For models using <a href="/blog/power-bi-incremental-refresh-advanced-patterns-2025">incremental refresh</a>, VertiPaq Analyzer shows per-partition statistics. This reveals whether historical partitions are well-compressed (they should be, since data does not change) and whether the active partition is appropriately sized. Partitions that are too large (more than 10 million rows) may benefit from finer granularity.</p>
<h2>Implementation Workflow</h2>
<h3>Step 1: Baseline Analysis (1-2 hours)</h3>
<ul> <li>Connect DAX Studio to the target model</li> <li>Run VertiPaq Analyzer (Advanced > View Metrics)</li> <li>Export results to Excel for analysis</li> <li>Sort columns by Total Size descending to identify the top memory consumers</li> <li>Document baseline model size, top 20 columns by size, and overall compression ratio</li> </ul>
<h3>Step 2: Identify Optimization Targets (2-4 hours)</h3>
<ul> <li>Flag columns with cardinality greater than 500,000 for review</li> <li>Identify columns with dictionary size greater than 10 MB</li> <li>Check for DateTime columns that should be split</li> <li>Identify string relationship keys that should be integer keys</li> <li>Cross-reference with <a href="/blog/power-bi-tabular-editor-model-development-guide-2026">Tabular Editor BPA</a> for unused columns</li> </ul>
<h3>Step 3: Implement Optimizations (1-2 weeks)</h3>
<ul> <li>Remove unused columns from the model</li> <li>Modify Power Query transformations to split DateTimes, round decimals, and replace string keys</li> <li>Restructure source queries to implement star schema where applicable</li> <li>Disable auto date/time and create explicit date dimension</li> <li>Configure <a href="/services/power-bi-architecture">Power BI architecture</a> for optimal data model design</li> </ul>
<h3>Step 4: Validate Results (2-4 hours)</h3>
<ul> <li>Run VertiPaq Analyzer on the optimized model</li> <li>Compare Total Size, Dictionary Size, and Data Size against baseline</li> <li>Verify query performance improvement using <a href="/blog/power-bi-report-performance-analyzer-diagnostics-2026">Performance Analyzer</a></li> <li>Document the optimizations applied and resulting size/performance improvements</li> </ul>
<h2>ROI and Business Impact</h2>
<p>VertiPaq optimization delivers measurable business value:</p>
<ul> <li><strong>Reduced capacity costs</strong>: Smaller models consume less Premium/Fabric capacity. A 50% model size reduction can enable more models on existing capacity or allow downsizing to a lower SKU, saving $1,000-$10,000+ per month.</li> <li><strong>Faster refresh times</strong>: Smaller, well-compressed models refresh 2-5x faster. A model that refreshed in 30 minutes may refresh in 10 minutes after optimization, expanding refresh scheduling flexibility.</li> <li><strong>Better query performance</strong>: Compressed data scans faster. Users experience 2-10x faster report rendering, improving adoption and satisfaction.</li> <li><strong>Larger data volumes</strong>: Optimization enables importing more data within the same capacity constraints. A 1 GB model limit becomes effectively 3-5 GB of source data with proper compression.</li> <li><strong>Reduced gateway load</strong>: Faster refreshes and smaller data transfers reduce on-premises data gateway load, improving reliability for all connected models.</li> </ul>
<p><a href="/contact">Contact EPC Group</a> to optimize your Power BI models with VertiPaq analysis. Our <a href="/services/dax-optimization">DAX optimization</a> and <a href="/services/power-bi-architecture">Power BI architecture</a> teams perform comprehensive model assessments including VertiPaq analysis, cardinality optimization, star schema design, and performance benchmarking. We help organizations reduce model sizes by 40-70%, accelerate refresh times, and improve query performance for enterprise-scale deployments on <a href="/services/microsoft-fabric">Microsoft Fabric</a> and Power BI Premium.</p>
Frequently Asked Questions
What is the difference between VertiPaq and DirectQuery in Power BI?
VertiPaq (Import mode) and DirectQuery are two fundamentally different storage modes in Power BI. VertiPaq loads data into memory during refresh, compresses it using columnar storage with dictionaries and run-length encoding, and serves all queries from the in-memory cache. This provides the fastest query performance (sub-second for most queries) but requires periodic data refresh and consumes memory proportional to the data volume. DirectQuery does not store data in the Power BI model at all. Instead, every visual interaction generates a SQL query that runs against the source database in real time. DirectQuery provides always-current data without refresh scheduling but query performance depends entirely on the source database performance, which is typically 5-50x slower than VertiPaq for analytical queries. Composite models combine both modes: high-volume fact tables use DirectQuery while dimension tables and aggregation tables use VertiPaq Import mode, providing a balance of performance and data freshness. Direct Lake mode in Microsoft Fabric represents a third approach where data is stored in OneLake Delta tables and loaded into VertiPaq on demand without explicit refresh. Understanding these modes is critical for choosing the right architecture for your data volume, freshness requirements, and performance targets.
How do I identify which columns are consuming the most memory in my Power BI model?
Connect DAX Studio to your Power BI model (either a local Power BI Desktop instance or a published dataset via XMLA endpoint), then navigate to Advanced and select View Metrics. This runs VertiPaq Analyzer and displays a comprehensive breakdown of memory consumption by table and column. The Columns tab shows every column with its Dictionary Size, Data Size, Hierarchy Size, Total Size, Cardinality, and Encoding type. Sort by Total Size descending to identify the largest columns. Typically, 80 percent of model memory is consumed by 20 percent of columns. The largest columns are almost always high-cardinality text columns (descriptions, GUIDs, URLs, free-text fields), DateTime columns with time precision, or columns used in relationships with high cardinality on both sides. Export the VertiPaq Analyzer results to Excel for deeper analysis. You can also run DMV queries directly: SELECT TABLE_ID, COLUMN_ID, DICTIONARY_SIZE, COLUMN_CARDINALITY FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS ORDER BY DICTIONARY_SIZE DESC. Tabular Editor 3 also includes an integrated VertiPaq Analyzer that shows the same metrics within the model development environment.
Why is my Power BI model so much larger than the source data?
A Power BI model can be larger than source data in several scenarios. First, auto date/time tables: Power BI Desktop creates hidden date tables for every date column, each containing calendar data spanning the full date range. A model with 10 date columns generates 10 hidden date tables, potentially adding hundreds of megabytes. Disable auto date/time in Options and create a single explicit date dimension. Second, relationship hierarchies: every relationship creates internal VertiPaq hierarchy indexes that consume memory beyond the visible column data. Models with many relationships (20+ tables) can have substantial hierarchy overhead. Third, data type expansion: source databases may store values as compressed formats (such as tinyint or smallint) while Power BI promotes them to Int64, increasing per-value storage. Fourth, string duplication: if the same text values appear in multiple columns across tables (like country names in both Customer and Supplier tables), each column maintains its own dictionary. Fifth, poor source query design: if Power Query loads more columns or rows than needed, or if transformations create intermediate high-cardinality columns, the model inflates unnecessarily. Run VertiPaq Analyzer to identify the specific columns and components consuming the most memory, then apply targeted optimizations.
What is a good compression ratio for a Power BI model?
A well-optimized Power BI model typically achieves 7-20x compression relative to the source data size (measured as the raw CSV or uncompressed database table size). This means 1 GB of source data should produce a 50-150 MB Power BI model. Compression ratios above 10x indicate excellent model design with low-cardinality columns, proper star schema structure, and efficient data types. Ratios of 5-7x are acceptable but suggest optimization opportunities. Ratios below 5x indicate significant problems: high-cardinality text columns, DateTime columns with time precision, GUID columns, or missing star schema design. Some specific benchmarks: fact tables with integer keys and numeric measures typically compress at 15-25x. Dimension tables with text descriptions compress at 5-10x. Date tables compress at 20-50x due to very low cardinality. Models with large text columns or GUIDs may compress at only 2-3x. The compression ratio depends heavily on data characteristics, so compare your model against similar models rather than absolute thresholds. EPC Group performs model assessments that benchmark your compression against industry standards for your data profile and recommend specific optimizations to achieve target ratios.
How often should I run VertiPaq Analyzer on my Power BI models?
Run VertiPaq Analyzer at four key points in the model lifecycle. First, during initial development: analyze the model after building the data model and before creating reports to identify structural issues early when they are cheapest to fix. Second, after significant data growth: when source data volumes increase substantially (doubling row counts, adding new tables, or expanding date ranges), re-analyze to verify that compression remains efficient and that new data patterns have not introduced bloat. Third, before capacity upgrades: if you are considering upgrading Premium capacity due to memory pressure, run VertiPaq Analyzer on your largest models first. Optimization may eliminate the need for a capacity upgrade, saving thousands of dollars monthly. Fourth, as part of regular governance: for enterprise deployments with dozens of published models, schedule quarterly VertiPaq analysis of the top 10 models by size as part of your Power BI Center of Excellence governance cadence. Track model sizes over time to catch gradual bloat before it impacts capacity. Automate the analysis by incorporating VertiPaq DMV queries into a monitoring Power BI report that tracks model sizes and compression ratios across your tenant.