Monitoring Fabric with Capacity Metrics
Microsoft Fabric
Microsoft Fabric8 min read

Monitoring Fabric with Capacity Metrics

Track Microsoft Fabric capacity utilization and identify performance bottlenecks. Dashboard monitoring, alerts, and optimization recommendations.

By Administrator

The Microsoft Fabric Capacity Metrics app is the single most important monitoring tool for Fabric administrators. Without it, you are operating blind—unable to determine whether your capacity is right-sized, which workloads consume the most resources, when throttling occurs, or whether individual users or jobs are monopolizing shared compute. The app provides a pre-built Power BI report that visualizes capacity utilization across all Fabric workloads (Power BI, Data Engineering, Data Factory, Real-Time Analytics, Data Science) with historical trends, item-level detail, and throttling analysis. Mastering this app is essential for cost optimization, performance troubleshooting, and capacity planning.

Understanding Fabric Capacity Units

All Fabric workloads consume a shared pool of Capacity Units (CUs). Understanding how CUs work is fundamental to interpreting metrics:

| Concept | Description | Key Detail | |---|---|---| | CU Seconds | Unit of compute measurement | 1 CU consumed for 1 second = 1 CU-second | | CU Allocation | CUs available per SKU | F2 = 2 CUs, F8 = 8 CUs, F64 = 64 CUs, etc. | | Interactive Operations | User-initiated queries, report renders | Evaluated in 30-second windows, throttled at 100% utilization | | Background Operations | Scheduled refreshes, pipelines, Spark jobs | Evaluated in 24-hour windows, throttled at sustained overuse | | Smoothing | CU consumption is smoothed over evaluation windows | A 10-second spike does not immediately trigger throttling | | Burst | Short-term consumption can exceed SKU allocation | Bursting borrows from future capacity—repaid over the evaluation window |

The distinction between interactive and background operations is critical. Interactive operations (report queries, dashboard refreshes) are throttled quickly when capacity is overloaded—users see slow or failed reports. Background operations (scheduled refreshes, Spark jobs) are throttled more gradually, with jobs queued rather than rejected.

Installing and Configuring the App

Prerequisites

  • Fabric Capacity Administrator or Global Administrator role
  • At least one Fabric capacity (F2 or higher, or Power BI Premium P1 or higher)
  • A workspace to host the app

Installation Steps

  1. Navigate to Power BI Service > Apps > Get apps and search for "Microsoft Fabric Capacity Metrics"
  2. Click Install and select the workspace where the app will be stored
  3. After installation, open the app and click Connect to configure the data source
  4. Enter your Capacity ID (found in the Fabric Admin Portal under Capacity Settings) and select the date range
  5. The app begins loading historical data—initial load may take several minutes for large capacities

Data Retention

The app stores 14 days of detailed metrics and 30 days of summarized data. For longer retention, create a dataflow that exports capacity metrics to a Lakehouse on a scheduled basis.

Key Dashboard Pages

Overview Page

The overview page shows the most critical health indicators at a glance:

  • CU Utilization Trend: Line chart showing CU consumption percentage over time. Sustained utilization above 80% indicates you are approaching capacity limits. Sustained utilization below 20% suggests over-provisioning.
  • Throttling Events: Count and duration of throttling events. Any throttling of interactive operations means users experienced degraded performance.
  • Top Items by CU: Ranked list of Fabric items (datasets, notebooks, pipelines) consuming the most CUs. This identifies optimization targets.

Workload Breakdown Page

Shows CU consumption segmented by workload type:

  • Power BI: Report queries, dataset refreshes, paginated report renders
  • Data Engineering: Spark notebook execution, Lakehouse operations
  • Data Factory: Pipeline runs, dataflow refreshes, copy activities
  • Real-Time Analytics: Eventstream processing, KQL queries
  • Data Science: ML experiment runs, model training

This breakdown reveals which workloads dominate your capacity. If 70% of CU consumption comes from Spark notebooks, optimizing those notebooks delivers the most savings.

Item-Level Detail Page

Drill into individual items to see:

  • CU consumption per refresh or execution
  • Duration trends over time (increasing duration signals degradation)
  • User who triggered the operation
  • Success/failure status
  • Queuing time (time waiting for available CUs before execution starts)

Interpreting Capacity Health

Healthy Capacity

  • CU utilization averages 40-60% during peak hours
  • Zero or minimal interactive throttling events
  • Background operations complete within scheduled windows
  • No single item consumes more than 20% of total CU budget

Capacity at Risk

  • CU utilization peaks above 80% regularly
  • Occasional interactive throttling during peak hours
  • Background operations starting to queue and delay
  • One or two items dominate CU consumption

Capacity in Crisis

  • CU utilization sustained above 100% (consuming burst capacity)
  • Frequent interactive throttling—users reporting slow or failed reports
  • Background operations significantly delayed or failing
  • Throttling policy may reject new operations

Optimization Strategies by Finding

Finding: Single Refresh Consuming Excessive CUs

Root cause: A large dataset refresh (often with full refresh instead of incremental) monopolizes capacity during refresh windows.

Solution: Implement incremental refresh to reduce the data volume refreshed each cycle. Switch from full refresh to partition-level refresh where possible. Schedule the refresh during off-peak hours when capacity headroom is available.

Finding: Spark Notebooks Consuming Majority of CUs

Root cause: Unoptimized Spark jobs with excessive shuffling, missing partitioning, or oversized cluster configurations.

Solution: Review Spark job optimization—apply proper partitioning, use broadcast joins for small tables, cache intermediate results, and reduce the cluster size to the minimum needed for the workload.

Finding: Many Users Running Reports Simultaneously at Peak

Root cause: Report rendering generates interactive CU demand that exceeds capacity during business hours.

Solution: Optimize DAX queries in the most-consumed reports (use Performance Analyzer to identify slow measures). Enable query caching on frequently accessed datasets. Consider scaling the capacity up during peak hours and down during off-hours (Fabric supports capacity pause/resume via API for scheduling).

Finding: Development Workloads Consuming Production Capacity

Root cause: Data engineers and analysts running ad-hoc Spark notebooks or dataflow tests on the production capacity.

Solution: Create a separate development capacity (smaller SKU, paused when not in use). Move development workspaces to the development capacity. Implement workspace governance that prevents development workloads from running on production.

Setting Up Alerts

Create proactive alerts to catch capacity issues before users report them:

  • Power BI data alert: Pin the CU Utilization card to a dashboard, then create an alert for when utilization exceeds 85%
  • Power Automate flow: Trigger a Teams notification or email when throttling events are detected
  • Azure Monitor integration: For Premium capacities, configure Azure Monitor alerts on capacity metrics with automatic scaling responses

Capacity Planning with Metrics Data

Use historical metrics data to plan capacity changes:

  • Growth trending: If average CU utilization increases 5% month-over-month, project when you will hit capacity limits
  • Workload forecasting: When onboarding a new department or workload, review similar existing workloads to estimate CU impact
  • SKU optimization: If utilization never exceeds 30%, consider downgrading to a smaller SKU. If throttling occurs regularly, upgrade to the next SKU tier.
  • Cost modeling: Calculate cost-per-CU-second to determine ROI of optimization efforts. A $100/hour optimization effort that reduces daily CU consumption by 20% may pay for itself within a week.

Related Resources

Frequently Asked Questions

How often does the Capacity Metrics app refresh?

The app refreshes every 30 minutes by default. You can see real-time utilization in the Azure portal, but the app provides more detailed historical analysis.

Can I set up alerts for capacity issues?

Yes, you can create Power BI data alerts on key metrics like utilization percentage or throttling events to receive notifications when thresholds are exceeded.

Microsoft FabricMonitoringCapacityPerformance

Industry Solutions

See how we apply these solutions across industries:

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.