Quick Answer
Incremental refresh lets Power BI scale to billion-row fact tables by partitioning on a date column and refreshing only recent partitions. Hybrid tables add a DirectQuery partition for real-time current data. Together they handle the largest analytical workloads on Power BI Premium or Fabric F SKUs. The critical requirement is query folding to the source.
1. Incremental Refresh Basics
Three Power Query steps enable incremental refresh:
- Create DateTime parameters called RangeStart and RangeEnd.
- Filter the source query: Date column >= RangeStart AND Date column < RangeEnd.
- In Power BI Desktop, right-click the table and select Incremental Refresh. Configure the policy: store N years of data, refresh last N days.
// Power Query M example
let
Source = Sql.Database(ServerName, DatabaseName),
Sales = Source{[Schema="dbo",Item="FactSales"]}[Data],
FilteredByDate = Table.SelectRows(
Sales,
each [OrderDate] >= RangeStart and [OrderDate] < RangeEnd
)
in
FilteredByDateAlways verify that the filter folds to the source by right-clicking the filter step and selecting View Native Query.
2. Policy Design Patterns
- Store 5 years, refresh last 30 days: typical for sales, finance, operational facts with slowly changing historical data.
- Store 10 years, refresh last 7 days, detect data changes: for compliance-driven retention where you keep a decade of history but need to detect backdated adjustments.
- Store 2 years, refresh last 24 hours, hybrid DirectQuery: for operational dashboards that need both history and real-time state.
- Store 90 days, refresh last 4 hours: for high-frequency operational data that does not need long retention.
3. Query Folding Requirements
Query folding is the translation of Power Query M steps into native SQL (or equivalent) that runs on the source. Incremental refresh requires folding, because otherwise Power BI pulls the entire table and filters in the mashup engine.
Common folding killers
- Adding custom columns with M functions not supported by the source.
- Merging with a table from a different source.
- Using Table.Buffer() which materializes the table in memory.
- Index-based row operations that require materialization.
Push transformation logic into views or stored procedures at the source when possible. A well-designed source view that exposes exactly the shape needed for Power BI is the most reliable way to guarantee folding.
4. Hybrid Tables: Real-Time Current Partition
Hybrid tables extend incremental refresh with a DirectQuery partition for the most recent data. In the incremental refresh dialog, check "Get the latest data in real time with DirectQuery (Premium only)." The most recent partition (typically today or this week) becomes a DirectQuery partition while older partitions remain imported.
At query time, Power BI combines imported partitions and the DirectQuery partition to produce a single result. Users see historical data blazingly fast (from memory) and current data up-to-the-second (from the live source). This pattern replaces the old technique of having two separate datasets (one history, one current) and manually union-ing them.
5. Handling the Initial Load
The first refresh populates all historical partitions. For multi-billion-row tables, this can take 6 to 24 hours. Strategies:
- Temporary capacity upgrade: scale to a larger SKU during initial load, then scale back down. F SKUs support fast scale operations.
- XMLA backfill: use SSMS or Tabular Editor to create partitions manually and issue Process commands partition-by-partition with controlled parallelism. This allows scheduling partition refreshes during off-peak hours over multiple days.
- Parameterized starter data: load a recent 90-day window initially, publish the model, then backfill historical partitions via XMLA over subsequent days.
6. Monitoring Incremental Refresh
- Enable incremental refresh diagnostics to capture per-partition refresh times.
- Track partition row counts over time. Sudden drops can indicate query folding regression or source data changes.
- Monitor detect-data-changes skip rate. A high skip rate confirms the feature is working; a low skip rate suggests ModifiedDate tracking is broken.
- For hybrid tables, monitor DirectQuery partition query count and latency. Unexpectedly high query counts can indicate the DirectQuery partition is too large and should be split.
Frequently Asked Questions
What is incremental refresh in Power BI?
Incremental refresh automatically partitions a table by date and refreshes only the partitions that have new or changed data. Older partitions stay untouched, saving refresh time and source database load. A common policy is to store 5 years of data and refresh only the last 30 days. The first refresh loads all 5 years; subsequent refreshes process only the last 30 days. Incremental refresh scales Power BI models to hundreds of millions or billions of rows with manageable refresh windows.
What is a hybrid table?
A hybrid table combines incremental refresh with a DirectQuery partition for the most recent data. Historical partitions are imported (fast queries), while the newest data lives in a DirectQuery partition that always reflects the source in real time. This gives you fast historical analytics and real-time current data in a single table. Hybrid tables are ideal for operational dashboards that need both history and live state.
What are RangeStart and RangeEnd parameters?
RangeStart and RangeEnd are special datetime parameters in Power Query that Power BI injects when evaluating incremental refresh partitions. Your source query filters on these parameters: WHERE [OrderDate] >= RangeStart AND [OrderDate] < RangeEnd. At design time, you set them to a small window to test. At refresh time, Power BI sets them to the boundaries of each partition being refreshed. The parameters must be called exactly RangeStart and RangeEnd and be of type DateTime.
Does incremental refresh require query folding?
Yes, strictly. Power BI generates a SQL query (or source-native query) that includes the RangeStart/RangeEnd filter. If the filter does not fold to the source, Power BI will pull the entire table and filter in the mashup engine, defeating the purpose of incremental refresh. Always verify folding by right-clicking the filter step in Power Query and selecting View Native Query. If View Native Query is grayed out, folding is broken and must be fixed before enabling incremental refresh.
How many partitions should I use?
The partition grain should match your refresh frequency and historical retention. For a 5-year dataset refreshed daily, 60 month partitions plus daily refresh of the last 30 days is typical. For a 3-year dataset refreshed hourly, hourly partitions for the last 24 hours and daily partitions for older data. Too few partitions miss the efficiency gain. Too many (more than 1,000) slow down refresh metadata operations. Aim for 50 to 500 partitions in most deployments.
Can I detect data changes with incremental refresh?
Yes. The Detect Data Changes option checks a column (usually ModifiedDate) before refreshing a partition. If no rows in the partition have changed since the last refresh, Power BI skips it entirely. This further reduces refresh time for slowly changing historical data. The column must be indexed in the source for detect-data-changes to be efficient. Without an index, the detect query scans the full partition and negates the benefit.
How do I handle backfill for existing data?
The first refresh after enabling incremental refresh loads all historical partitions. For very large tables (multi-billion rows), this initial load can take many hours. Two strategies: first, temporarily increase capacity SKU during initial load then scale down afterward; second, use an external process to pre-populate partitions via XMLA endpoint and TMSL commands, then enable automatic incremental refresh only for ongoing updates. The second approach is standard for 5-billion-row and larger tables.
How do hybrid tables compare to Direct Lake?
Both provide fresh data without scheduled refreshes. Direct Lake is preferred when your data lives in OneLake and you can adopt Fabric architecture. Hybrid tables are preferred when your data lives in an on-premises or non-Fabric source and you need fresh data without moving the source. Hybrid tables also work with Power BI Premium capacities that do not have Direct Lake. For greenfield Fabric deployments, Direct Lake is almost always the better choice.
Scaling Power BI to Billion-Row Tables?
Our consultants design partitioning strategies, hybrid tables, and backfill patterns for the largest analytic workloads. Contact us for a scale assessment.