
Power BI Gateway Architecture and High Availability Guide
Design enterprise-grade on-premises data gateway deployments with clustering, load balancing, network configuration, and monitoring for Power BI and Microsoft Fabric connectivity.
<h2>The Role of the On-Premises Data Gateway in Enterprise Power BI</h2>
<p>The on-premises data gateway is the bridge between cloud-hosted Power BI (and Microsoft Fabric) workloads and data sources that reside inside your corporate network—SQL Server databases, Oracle, SAP, file shares, SSAS Tabular models, and hundreds of other connectors. For enterprises that cannot move all data to the cloud due to regulatory requirements, latency constraints, or legacy system dependencies, the gateway is a critical infrastructure component. A gateway failure means scheduled refreshes stop, DirectQuery reports return errors, and <a href="/blog/power-bi-paginated-reports-enterprise-guide-2025">paginated reports</a> connected to on-premises sources fail.</p>
<p>At <a href="/services/power-bi-consulting">EPC Group</a>, we have designed gateway architectures for organizations running 500+ scheduled refreshes daily across dozens of on-premises data sources. This guide covers the architecture fundamentals, high availability patterns, network requirements, monitoring strategies, and troubleshooting techniques that ensure your gateway infrastructure meets enterprise uptime requirements.</p>
<h2>Gateway Types and When to Use Each</h2>
<h3>Standard (Enterprise) Gateway</h3>
<p>The standard gateway (also called the enterprise gateway or on-premises data gateway) is designed for shared, multi-user, production environments:</p>
<ul> <li><strong>Shared across the organization</strong>: Multiple users, datasets, dataflows, and reports can use the same gateway installation. Gateway administrators control which data source connections are available.</li> <li><strong>Supports clustering</strong>: Multiple gateway installations form a high-availability cluster with automatic failover and load distribution.</li> <li><strong>Centrally managed</strong>: Administered through the Power BI Admin portal (Manage gateways section) or the Power Platform admin center.</li> <li><strong>Runs as a Windows service</strong>: The gateway service runs under a dedicated service account with access to on-premises data sources.</li> <li><strong>Supports all data source types</strong>: Import mode refresh, DirectQuery, live connections to SSAS, paginated reports, dataflows, and Fabric pipelines.</li> </ul>
<h3>Personal Gateway</h3>
<p>The personal gateway (on-premises data gateway - personal mode) is designed for individual use during development and testing:</p>
<ul> <li><strong>Single user only</strong>: Only the person who installed it can use it. Cannot be shared or clustered.</li> <li><strong>Import mode only</strong>: Does not support DirectQuery, live connections, or paginated reports.</li> <li><strong>Runs as a user process</strong>: Requires the installing user to be logged in to Windows (or configured to run as a service under their account).</li> <li><strong>No clustering or HA</strong>: Single instance only.</li> </ul>
<p><strong>Enterprise recommendation</strong>: Never use personal gateways in production. They create single points of failure, cannot be managed centrally, and break when the user changes their password or leaves the organization. Always deploy standard gateways for any production workload.</p>
<h2>Gateway Architecture Deep Dive</h2>
<h3>How the Gateway Communicates</h3>
<p>Understanding the gateway communication architecture is essential for network configuration and troubleshooting:</p>
<ol> <li><strong>Power BI Service initiates</strong>: The cloud service queues a request (refresh, DirectQuery, live connection query) in Azure Service Bus / Azure Relay.</li> <li><strong>Gateway polls the relay</strong>: The gateway service establishes an outbound HTTPS connection to Azure Relay on TCP port 443 (and optionally 9350-9354 for legacy relay). This is an <strong>outbound-only</strong> connection—no inbound firewall ports need to be opened.</li> <li><strong>Gateway receives the request</strong>: The gateway picks up the queued request from Azure Relay.</li> <li><strong>Gateway queries the data source</strong>: The gateway connects to the on-premises data source using the configured credentials and connection string (SQL Server on port 1433, Oracle on 1521, etc.).</li> <li><strong>Gateway returns results</strong>: Query results are encrypted and sent back through the Azure Relay channel to the Power BI Service.</li> </ol>
<p>Key architectural implications:</p>
<ul> <li><strong>No inbound ports required</strong>: The gateway initiates all connections outbound. This simplifies firewall configuration significantly compared to traditional VPN-based approaches.</li> <li><strong>Azure Relay region matters</strong>: The gateway communicates through an Azure Relay instance in the same region as your Power BI tenant. Ensure the gateway machine has low-latency connectivity to this Azure region.</li> <li><strong>Data in transit is encrypted</strong>: All communication between the gateway and Azure Relay uses TLS 1.2+. Data source credentials are encrypted with an asymmetric key that only the gateway machine can decrypt.</li> <li><strong>The gateway machine processes queries</strong>: For Import mode refreshes, the gateway executes the data source query, receives the full result set, compresses it, encrypts it, and streams it to the cloud. The gateway machine needs sufficient CPU, memory, and network bandwidth to handle concurrent refresh workloads.</li> </ul>
<h3>Azure Relay Internals</h3>
<p>Azure Relay (formerly Azure Service Bus Relay) provides the secure communication channel between the Power BI cloud service and your on-premises gateway. The gateway creates a persistent outbound WebSocket connection to the relay endpoint. When the Power BI Service needs to communicate with the gateway, it sends a message through the relay rather than connecting directly to your network.</p>
<p>The relay endpoint URL follows the pattern: <code>[gateway-cluster-id].servicebus.windows.net</code>. If your organization uses domain-based firewall rules rather than IP-based rules, whitelist <code>*.servicebus.windows.net</code> and <code>*.frontend.clouddatahub.net</code> for gateway communication. For organizations that must use IP-based rules, download the Azure IP Ranges and Service Tags JSON file for the Azure region hosting your Power BI tenant.</p>
<h2>High Availability with Gateway Clustering</h2>
<p>A single gateway is a single point of failure. Enterprise deployments must implement gateway clustering for high availability.</p>
<h3>How Clustering Works</h3>
<p>A gateway cluster consists of two or more standard gateway installations registered to the same logical cluster. All cluster members share the same cluster name, recovery key, and data source definitions. The Power BI Service distributes requests across available cluster members.</p>
<p>To create a cluster:</p>
<ol> <li>Install the first gateway on Machine A. This becomes the primary member and defines the cluster name and recovery key.</li> <li>Install a second gateway on Machine B. During setup, select "Add to an existing gateway cluster" and enter the cluster name and recovery key.</li> <li>Repeat for additional cluster members (Machine C, D, etc.).</li> <li>Data source connections configured on one member automatically replicate to all members in the cluster.</li> </ol>
<h3>Load Balancing and Failover</h3>
<p>Gateway cluster load balancing operates in two modes, configurable in the Power Platform admin center:</p>
<p><strong>Distribute requests across all active gateways (Round Robin)</strong>: Requests are distributed evenly across all online cluster members. This maximizes throughput by spreading the workload. Best for environments with many concurrent refresh operations.</p>
<p><strong>Route requests to the first available gateway (Failover)</strong>: All requests go to the primary member. If the primary is offline, requests fail over to the next member. This minimizes the number of active gateway machines (for licensing or resource conservation) while still providing availability.</p>
<p>Additional load distribution settings:</p>
<ul> <li><strong>CPU threshold</strong>: Set a CPU utilization percentage (default 80%). When a gateway member exceeds this threshold, new requests are routed to other members. This prevents individual gateway machines from becoming overloaded.</li> <li><strong>Memory threshold</strong>: Similar to CPU, but based on available memory. Gateway refresh operations can consume significant memory, especially for large Import mode datasets.</li> </ul>
<h3>Cluster Sizing Recommendations</h3>
<table> <thead> <tr><th>Deployment Scale</th><th>Concurrent Refreshes</th><th>Gateway Machines</th><th>CPU/RAM per Machine</th><th>Notes</th></tr> </thead> <tbody> <tr><td>Small (10-50 datasets)</td><td>5-10</td><td>2 (HA pair)</td><td>4 cores / 16 GB</td><td>Minimum HA configuration</td></tr> <tr><td>Medium (50-200 datasets)</td><td>10-30</td><td>3</td><td>8 cores / 32 GB</td><td>Round-robin distribution recommended</td></tr> <tr><td>Large (200-500 datasets)</td><td>30-75</td><td>4-6</td><td>8 cores / 64 GB</td><td>Separate clusters by workload type</td></tr> <tr><td>Enterprise (500+ datasets)</td><td>75+</td><td>6+</td><td>16 cores / 64-128 GB</td><td>Multiple clusters, dedicated DQ cluster</td></tr> </tbody> </table>
<h3>Multi-Cluster Architecture for Large Enterprises</h3>
<p>For organizations with diverse workload types, consider deploying multiple gateway clusters, each optimized for a specific workload:</p>
<ul> <li><strong>Refresh cluster</strong>: Handles scheduled Import mode refreshes. Size for peak memory consumption during large dataset refreshes. Schedule refreshes to stagger across the day to avoid all-at-once peaks.</li> <li><strong>DirectQuery cluster</strong>: Handles real-time DirectQuery and live connection queries. Size for low latency and high concurrent query throughput. Locate close to the data sources to minimize network round-trip time.</li> <li><strong>Dataflow/Pipeline cluster</strong>: Handles Dataflows Gen1/Gen2 and Fabric pipeline data movement. These operations are data-intensive and benefit from high network bandwidth.</li> </ul>
<p>Separating workloads prevents a large scheduled refresh from consuming all gateway resources and causing DirectQuery timeouts for real-time dashboards.</p>
<h2>Network Configuration and Firewall Requirements</h2>
<h3>Outbound Port Requirements</h3>
<table> <thead> <tr><th>Port</th><th>Protocol</th><th>Destination</th><th>Purpose</th></tr> </thead> <tbody> <tr><td>443</td><td>HTTPS</td><td>*.servicebus.windows.net</td><td>Azure Relay communication (primary)</td></tr> <tr><td>443</td><td>HTTPS</td><td>*.frontend.clouddatahub.net</td><td>Gateway cloud service endpoint</td></tr> <tr><td>443</td><td>HTTPS</td><td>*.core.windows.net</td><td>Azure Storage for telemetry and updates</td></tr> <tr><td>443</td><td>HTTPS</td><td>login.microsoftonline.com</td><td>Azure AD authentication</td></tr> <tr><td>443</td><td>HTTPS</td><td>*.msftncsi.com</td><td>Network connectivity check</td></tr> <tr><td>9350-9354</td><td>TCP</td><td>*.servicebus.windows.net</td><td>Azure Relay (legacy, optional if 443 works)</td></tr> </tbody> </table>
<h3>Proxy Configuration</h3>
<p>If your organization routes outbound traffic through a corporate proxy, configure the gateway to use it:</p>
<ol> <li>Edit the gateway configuration file: <code>Microsoft.PowerBI.DataMovement.Pipeline.GatewayCore.dll.config</code></li> <li>Add the proxy settings in the <code><system.net></code> section with the proxy address and authentication credentials.</li> <li>Restart the gateway service after configuration changes.</li> </ol>
<p>Common proxy issues that affect gateway connectivity:</p>
<ul> <li><strong>SSL inspection</strong>: Proxies that perform SSL/TLS inspection break the gateway's certificate pinning. Add gateway destinations to the SSL inspection bypass list.</li> <li><strong>WebSocket support</strong>: The gateway uses WebSocket connections for Azure Relay. Proxies that do not support WebSocket upgrade headers cause intermittent connectivity failures.</li> <li><strong>Timeout settings</strong>: Large dataset refreshes can maintain connections for extended periods. Ensure proxy idle timeout is set to at least 300 seconds (5 minutes).</li> </ul>
<h3>DNS Requirements</h3>
<p>The gateway machine must resolve Azure endpoints through DNS. In environments with split DNS (internal vs. external resolution), ensure the gateway can resolve public Azure DNS names. Internal-only DNS configurations that cannot resolve <code>*.servicebus.windows.net</code> cause gateway registration failures. Consider configuring conditional DNS forwarding for Azure domains to public DNS resolvers.</p>
<h2>Gateway Installation and Configuration Best Practices</h2>
<h3>Service Account Configuration</h3>
<p>The gateway runs as a Windows service. By default, it runs under the <code>NT SERVICE\PBIEgwService</code> virtual service account. For enterprise deployments, consider using a dedicated Active Directory service account:</p>
<ul> <li>The service account needs <strong>Log on as a service</strong> right on the gateway machine.</li> <li>If using Windows authentication for data sources, the service account needs access to those data sources (or configure Kerberos constrained delegation).</li> <li>Use a <strong>group Managed Service Account (gMSA)</strong> for automatic password rotation without service interruption.</li> <li>Ensure the service account has read/write access to the gateway installation directory and the <code>%LOCALAPPDATA%\Microsoft\On-premises data gateway</code> directory.</li> </ul>
<h3>Kerberos Constrained Delegation for SSO</h3>
<p>For scenarios where Power BI report consumers need to pass their identity through the gateway to on-premises data sources (single sign-on), configure Kerberos constrained delegation:</p>
<ol> <li>Register a Service Principal Name (SPN) for the gateway service account.</li> <li>Configure the gateway service account in Active Directory for constrained delegation to the target data source SPNs (for example, MSSQLSvc/sqlserver.domain.com:1433).</li> <li>In the Power BI gateway data source settings, enable "Use SSO via Kerberos" or "Use SSO via Kerberos for DirectQuery and Import queries."</li> <li>Ensure the Azure AD user principal name (UPN) maps to the on-premises Active Directory UPN for identity pass-through.</li> </ol>
<p>Kerberos SSO enables <a href="/blog/power-bi-row-level-security-complete-guide-2025">row-level security</a> at the data source level, where the database enforces security based on the end user's identity rather than a shared service account.</p>
<h2>Monitoring and Troubleshooting</h2>
<h3>Gateway Monitoring Dashboard</h3>
<p>Implement a monitoring solution that tracks these metrics:</p>
<ul> <li><strong>Gateway status</strong>: Online/offline state of each cluster member. Alert immediately when any member goes offline.</li> <li><strong>CPU utilization</strong>: Per-machine CPU usage. Alert when sustained above 80% for more than 5 minutes.</li> <li><strong>Memory utilization</strong>: Per-machine RAM usage. Large Import mode refreshes can consume 2-4x the dataset size in memory during processing. Alert when available memory drops below 20%.</li> <li><strong>Active connections</strong>: Number of concurrent connections to data sources. Identify connection pool exhaustion.</li> <li><strong>Query duration</strong>: Track average and P95 query durations through the gateway. Increasing durations indicate capacity issues or data source degradation.</li> <li><strong>Refresh success rate</strong>: Track the percentage of scheduled refreshes that complete successfully. Target 99%+ success rate.</li> </ul>
<h3>Gateway Log Analysis</h3>
<p>Gateway logs are stored in <code>%LOCALAPPDATA%\Microsoft\On-premises data gateway\</code>. The primary log files:</p>
<ul> <li><strong>GatewayInfo*.log</strong>: General operational events including service start/stop, cluster registration, and configuration changes.</li> <li><strong>Report*.log</strong>: Detailed query execution events including data source connections, query start/end times, bytes transferred, and error details.</li> <li><strong>GatewayErrors*.log</strong>: Error events and exceptions.</li> </ul>
<p>Use the gateway log collection tool (available in the gateway app under Diagnostics > Export logs) to package all logs for analysis. For ongoing monitoring, forward gateway logs to your SIEM or Log Analytics workspace using the Windows Event Forwarding or a log shipping agent.</p>
<h3>Common Gateway Issues and Resolutions</h3>
<p><strong>Issue: Gateway shows offline in Power BI Service</strong></p> <ul> <li>Verify the Windows service "On-premises data gateway service" is running.</li> <li>Check outbound connectivity to <code>*.servicebus.windows.net</code> on port 443.</li> <li>Verify the gateway machine can resolve Azure DNS names.</li> <li>Check if a recent Windows update changed TLS settings (gateway requires TLS 1.2).</li> <li>Review the GatewayErrors log for specific error messages.</li> </ul>
<p><strong>Issue: Scheduled refresh fails with timeout</strong></p> <ul> <li>Check gateway machine CPU and memory during the refresh window.</li> <li>Verify the data source query completes within the gateway timeout (default 30 minutes for refreshes).</li> <li>Review the data source server for resource contention during refresh times.</li> <li>Consider implementing <a href="/blog/power-bi-incremental-refresh-guide-2025">incremental refresh</a> to reduce the data volume per refresh cycle.</li> </ul>
<p><strong>Issue: DirectQuery reports are slow through the gateway</strong></p> <ul> <li>Check network latency between the gateway machine and the data source (should be < 10ms for optimal performance).</li> <li>Verify the data source query performance directly (bypass the gateway and run the query on the source server).</li> <li>Check if the gateway cluster is overloaded with concurrent refresh operations competing for resources.</li> <li>Consider deploying a dedicated DirectQuery gateway cluster separated from refresh workloads.</li> </ul>
<h2>Microsoft Fabric On-Premises Connectivity</h2>
<p>Microsoft Fabric extends the gateway's role beyond Power BI datasets:</p>
<ul> <li><strong>Fabric Pipelines</strong>: Copy Activity in Fabric Data Factory pipelines can use the on-premises gateway to extract data from on-premises sources into OneLake.</li> <li><strong>Fabric Dataflows Gen2</strong>: Dataflows running in Fabric workspaces connect through the same gateway infrastructure as Power BI dataflows.</li> <li><strong>Fabric Shortcuts</strong>: On-premises data sources accessible through the gateway can be referenced via shortcuts in Fabric Lakehouses.</li> <li><strong>VNet Data Gateway</strong>: For organizations with Azure ExpressRoute or VPN connectivity to Azure, the VNet data gateway runs as a managed Azure resource inside your virtual network, eliminating the need for on-premises gateway machines. Requires Microsoft Fabric or Power BI Premium capacity.</li> </ul>
<p>The VNet data gateway is the recommended architecture for organizations that already have Azure networking established, as it eliminates the operational overhead of managing gateway Windows servers.</p>
<h2>Gateway Sizing and Capacity Planning</h2>
<h3>Memory Sizing Formula</h3>
<p>For Import mode refreshes, the gateway machine needs sufficient memory to hold the intermediate refresh data:</p>
<ul> <li><strong>Minimum memory per concurrent refresh</strong>: 2x the uncompressed dataset size. A 5 GB compressed Power BI dataset may require 10 GB of gateway memory during refresh because the gateway receives the uncompressed query results before compression occurs in the cloud.</li> <li><strong>Total memory calculation</strong>: (Max concurrent refreshes) x (Average 2x dataset size) + 4 GB operating system overhead.</li> <li><strong>Example</strong>: 10 concurrent refreshes of datasets averaging 3 GB compressed = 10 x 6 GB + 4 GB = 64 GB RAM.</li> </ul>
<h3>CPU Sizing</h3>
<p>CPU cores determine how many concurrent operations the gateway can process efficiently:</p>
<ul> <li>Each concurrent refresh or DirectQuery operation consumes approximately 0.5-1 CPU core during active processing.</li> <li>Data compression and encryption (for transit to cloud) are CPU-intensive operations.</li> <li>Minimum: 4 cores for small deployments. Recommended: 8-16 cores for enterprise workloads.</li> </ul>
<h3>Network Bandwidth</h3>
<p>Network bandwidth between the gateway and data sources, and between the gateway and Azure, determines throughput:</p>
<ul> <li><strong>Gateway to data source</strong>: At least 1 Gbps for enterprise deployments. Multiple concurrent refreshes pulling large result sets can saturate slower links.</li> <li><strong>Gateway to Azure</strong>: At least 100 Mbps dedicated bandwidth. For large-scale deployments refreshing terabytes of data daily, consider 1 Gbps or dedicated ExpressRoute circuits.</li> </ul>
<p><a href="/contact">Contact EPC Group</a> for a gateway architecture assessment tailored to your environment. Our <a href="/services/power-bi-consulting">Power BI consulting team</a> designs, deploys, and manages enterprise gateway infrastructure for organizations with hundreds of data sources and thousands of scheduled refreshes, ensuring 99.9%+ gateway uptime and optimal refresh performance across clustered deployments.</p>
Frequently Asked Questions
What is the difference between a personal gateway and a standard (enterprise) gateway?
The standard (enterprise) on-premises data gateway is designed for shared, production use across an organization. It supports multiple users and datasets, gateway clustering for high availability, DirectQuery, live connections to SSAS, paginated reports, and centralized administration through the Power BI Admin portal. It runs as a Windows service under a service account and does not require a user to be logged in. The personal gateway (on-premises data gateway - personal mode) is intended for individual use only. It can only be used by the person who installed it, supports only Import mode scheduled refreshes (no DirectQuery, no live connections, no paginated reports), cannot be clustered for high availability, and requires the installing user to be logged in or the service configured under their account. For any production workload, always use the standard gateway. Personal gateways create single points of failure and cannot be centrally managed or monitored.
Does the on-premises data gateway require opening inbound firewall ports?
No. The on-premises data gateway uses only outbound connections. The gateway service initiates an outbound HTTPS (port 443) WebSocket connection to Azure Relay (*.servicebus.windows.net), which acts as a secure communication bridge between the Power BI cloud service and your on-premises network. Because all connections are outbound-initiated, you do not need to open any inbound firewall ports or configure inbound NAT rules. You only need to allow outbound HTTPS (443) to Azure Relay endpoints, Azure AD (login.microsoftonline.com), and related Azure services. For organizations with strict outbound filtering, whitelist *.servicebus.windows.net, *.frontend.clouddatahub.net, *.core.windows.net, and login.microsoftonline.com. If your proxy performs SSL inspection, add these domains to the SSL bypass list, as certificate pinning issues can break gateway connectivity.
How many gateway machines do I need for high availability?
The minimum for high availability is two gateway machines in the same cluster. With two machines, if one goes offline for maintenance, updates, or failure, the other continues processing all requests. For enterprise deployments, we recommend three or more machines per cluster: three machines provide N+1 redundancy where one machine can fail during a peak period without overloading the remaining two. Size the cluster so that N-1 machines (total minus one) can handle your peak workload. For very large environments, deploy separate clusters for different workload types: one cluster for scheduled Import mode refreshes (sized for memory and throughput), another for DirectQuery and live connections (sized for low latency and high concurrency), and optionally a third for Dataflow and Fabric pipeline operations. This prevents large refresh operations from impacting real-time query performance.
How do I monitor gateway health and performance?
Monitor gateway health at multiple levels. First, the Power Platform admin center (admin.powerplatform.microsoft.com) shows gateway cluster status, member online/offline state, and version information. Second, on each gateway machine, monitor Windows performance counters for CPU utilization, available memory, network throughput, and disk I/O. Set alerts when CPU exceeds 80% sustained, available memory drops below 20%, or network utilization exceeds 70%. Third, analyze gateway logs stored in %LOCALAPPDATA%\Microsoft\On-premises data gateway\ — the Report logs contain query execution times and bytes transferred per operation. Fourth, use the Power BI REST API to monitor scheduled refresh success/failure rates programmatically. Fifth, for comprehensive monitoring, forward gateway Windows Event Logs and performance counters to Azure Monitor or your SIEM platform for centralized dashboarding and alerting. Build a monitoring dashboard that shows gateway cluster health, refresh success rates, average query durations, and resource utilization trends over time.
What is a VNet data gateway and when should I use it instead of an on-premises gateway?
A VNet (Virtual Network) data gateway is a Microsoft-managed gateway that runs as an Azure resource inside your Azure virtual network, eliminating the need to install and maintain gateway software on Windows servers. It is ideal when your on-premises data sources are already accessible from Azure through ExpressRoute, site-to-site VPN, or Azure Private Link. The VNet gateway requires Power BI Premium or Microsoft Fabric capacity (it is not available with Pro or PPU licenses). Benefits include zero server maintenance (Microsoft manages the infrastructure, patching, and scaling), automatic high availability (no manual clustering required), native integration with Azure networking (NSGs, private endpoints, route tables), and reduced operational overhead. Use a VNet gateway when you have established Azure networking connectivity to your data centers and want to minimize infrastructure management. Continue using the traditional on-premises gateway when you do not have Azure networking, need gateway functionality with Pro licenses, or have data sources that are only accessible from specific on-premises network segments not connected to Azure.