
Multi-Cloud BI Architecture: Power BI with AWS and GCP Data
Design patterns and implementation guide for connecting Power BI to AWS (Redshift, S3, Athena) and GCP (BigQuery, Cloud SQL) data sources, including hybrid architecture, gateway configuration, security, and cost optimization strategies.
<h2>Multi-Cloud BI: Using Power BI as Your Unified Analytics Layer Across AWS, Azure, and GCP</h2>
<p>Most enterprises in 2026 are not single-cloud organizations. Mergers and acquisitions bring in workloads from different cloud providers. Data engineering teams choose BigQuery for certain analytical workloads and Redshift for others. Business applications run on AWS while the productivity stack runs on Microsoft 365 and Azure. Marketing data flows through Google Analytics and Google Ads. The result is a multi-cloud data estate that needs a unified analytics layer. <strong>Microsoft Power BI</strong> is uniquely positioned to serve as that unifying platform—connecting to data across AWS, Azure, and Google Cloud from a single semantic model, governed by a single security framework, and consumed through a single reporting interface. This guide covers the architecture patterns, connectivity options, performance considerations, and cost optimization strategies for building a multi-cloud BI platform with Power BI. Our <a href="/services/power-bi-architecture">Power BI architecture consulting</a> team specializes in designing these cross-cloud analytics solutions.</p>
<h2>Connecting Power BI to AWS Data Sources</h2>
<h3>Amazon Redshift</h3>
<p>Amazon Redshift is AWS's cloud data warehouse, used by thousands of organizations for analytical workloads. Power BI provides a native Redshift connector supporting both Import and DirectQuery modes.</p>
<p><strong>Import Mode with Redshift:</strong></p> <ul> <li>Data is extracted from Redshift and loaded into Power BI's in-memory VertiPaq engine</li> <li>Best for datasets up to several hundred million rows (after compression) where sub-second query performance is required</li> <li>Data freshness depends on refresh schedule (minimum 30 minutes with Pro, 8x/day; more frequent with Premium/Fabric capacity)</li> <li>Redshift costs are incurred only during refresh (not during report interaction), making this the most cost-efficient mode for Redshift</li> <li>Use <a href="/blog/power-bi-incremental-refresh-data-partitioning-guide-2026">incremental refresh</a> to minimize data transfer—only new or changed data is extracted from Redshift during each refresh cycle</li> </ul>
<p><strong>DirectQuery Mode with Redshift:</strong></p> <ul> <li>Power BI generates SQL queries against Redshift in real-time for every user interaction</li> <li>Best when data freshness requirements are near-real-time or datasets are too large for Import</li> <li>Query performance depends entirely on Redshift cluster performance and query optimization</li> <li>Every report interaction generates Redshift queries, which incur compute costs—this can become expensive with high concurrency</li> <li>Requires careful DAX design to generate efficient SQL; avoid complex DAX patterns that produce expensive Redshift query plans</li> <li>Consider Redshift Serverless for variable workloads—it scales compute automatically based on query demand from Power BI</li> </ul>
<p><strong>Configuration:</strong> The Redshift connector requires the cluster endpoint (host, port), database name, and credentials (database user/password or IAM authentication). For Redshift clusters in a private VPC, you need either the <a href="/blog/power-bi-gateway-setup">On-premises Data Gateway</a> installed on an EC2 instance within the VPC (or with VPC peering) or Redshift's publicly accessible endpoint (with appropriate security group rules limiting access).</p>
<h3>Amazon S3 and Athena</h3>
<p>Many organizations store raw and processed data in Amazon S3 as Parquet, CSV, or JSON files, queried through Amazon Athena (serverless SQL over S3).</p>
<p><strong>Connecting Power BI to S3/Athena:</strong></p> <ul> <li><strong>Via Amazon Athena connector</strong> — Power BI has a native Athena connector (using the Athena ODBC driver) that lets you query S3 data through Athena's SQL interface. This is the recommended approach for structured S3 data organized into Athena tables.</li> <li><strong>Via direct S3 access</strong> — Power BI's web connector or custom Python/R scripts can read S3 objects directly, but this is not recommended for production workloads due to limited optimization and credential management complexity.</li> <li><strong>Via AWS Glue Data Catalog</strong> — Athena uses the Glue Data Catalog for table metadata. When you connect Power BI to Athena, you browse Glue catalog databases and tables, providing a managed metadata layer over raw S3 data.</li> </ul>
<p><strong>Cost consideration:</strong> Athena charges $5 per TB of data scanned. Power BI Import mode extracts data once per refresh (predictable cost). DirectQuery mode generates a query per interaction (variable cost). Partitioning S3 data and using columnar formats (Parquet) dramatically reduce Athena scan costs—Parquet typically reduces scanned data by 90% compared to CSV for analytical queries.</p>
<h3>Amazon RDS and Aurora</h3>
<p>For operational data in Amazon RDS (PostgreSQL, MySQL, SQL Server) or Aurora, Power BI connects using native database connectors. The connection pattern is the same as connecting to any database: specify the endpoint, database, and credentials. For RDS instances in private VPCs, route through the On-premises Data Gateway installed on an EC2 instance with VPC access.</p>
<h2>Connecting Power BI to Google Cloud Data Sources</h2>
<h3>Google BigQuery</h3>
<p>BigQuery is Google Cloud's serverless data warehouse and one of the most common non-Microsoft data sources connected to Power BI. Power BI provides a native BigQuery connector with full Import and DirectQuery support.</p>
<p><strong>Import Mode with BigQuery:</strong></p> <ul> <li>Data is extracted from BigQuery and loaded into Power BI's VertiPaq engine</li> <li>BigQuery charges for data extraction (egress) during refresh; once imported, report interactions incur zero BigQuery costs</li> <li>Best for datasets where query performance must be consistently fast regardless of BigQuery load</li> <li>Use incremental refresh to minimize data extraction volume and reduce BigQuery egress costs</li> <li>BigQuery Storage Read API is used by the connector for high-performance data extraction—significantly faster than the legacy query-based extraction</li> </ul>
<p><strong>DirectQuery Mode with BigQuery:</strong></p> <ul> <li>Power BI sends SQL queries to BigQuery for every user interaction</li> <li>BigQuery's on-demand pricing charges per query based on data processed—a single Power BI page load can generate multiple queries</li> <li>For cost control, consider BigQuery flat-rate pricing (capacity-based) or BigQuery Editions with autoscaling, which provide predictable costs regardless of query volume from Power BI</li> <li>BigQuery BI Engine can accelerate Power BI DirectQuery performance by caching frequently accessed data in memory on the BigQuery side</li> <li>BigQuery materialized views reduce query costs by pre-computing common aggregations</li> </ul>
<p><strong>Authentication:</strong> The BigQuery connector supports Google organizational accounts (OAuth) and service account authentication. For production deployments, use a service account with a JSON key file. The service account needs BigQuery Data Viewer and BigQuery Job User roles on the relevant datasets.</p>
<p><strong>Configuration for scheduled refresh:</strong> For Power BI Service scheduled refresh with BigQuery, store the service account credentials in the dataset settings. If using OAuth authentication, the credentials expire and require periodic re-authentication. Service account authentication is more reliable for unattended refresh scenarios.</p>
<h3>Google Cloud SQL</h3>
<p>For operational data in Google Cloud SQL (MySQL, PostgreSQL, SQL Server), Power BI connects using the corresponding native database connector. Cloud SQL instances can be configured with a public IP (with authorized networks controlling access) or private IP (requiring the On-premises Data Gateway installed on a GCE instance with VPC access).</p>
<h3>Google Sheets</h3>
<p>Many organizations use Google Sheets for collaborative data entry, planning inputs, or reference data. Power BI can connect to Google Sheets through the web connector or third-party connectors. For production use, consider extracting Google Sheets data into BigQuery (using scheduled queries or Dataform) and connecting Power BI to BigQuery—this provides better performance, data typing, and refresh reliability than direct Sheets connectivity.</p>
<h2>Hybrid Architecture Patterns</h2>
<p>The right architecture depends on your data distribution, latency requirements, query volumes, and cost constraints. Here are the most common patterns we implement at EPC Group.</p>
<h3>Pattern 1: Centralized Import (Recommended for Most Enterprises)</h3>
<p>All data from AWS, GCP, and Azure is imported into Power BI's in-memory engine (or Microsoft Fabric Lakehouse for larger datasets).</p>
<ul> <li><strong>Architecture:</strong> Scheduled extraction from Redshift, BigQuery, and Azure SQL → Power BI Dataflows or Fabric pipelines → Semantic model (Import mode)</li> <li><strong>Advantages:</strong> Fastest query performance, predictable cloud costs (extraction only at scheduled intervals), full DAX capability, single security model</li> <li><strong>Trade-offs:</strong> Data latency (minimum 30-minute refresh with Pro), increased Power BI storage consumption, data duplication</li> <li><strong>Best for:</strong> Most analytical workloads where data does not change minute-by-minute</li> </ul>
<h3>Pattern 2: Federated DirectQuery</h3>
<p>Power BI uses <a href="/blog/composite-models-guide">composite models</a> to query multiple cloud sources in real-time.</p>
<ul> <li><strong>Architecture:</strong> Power BI DirectQuery to Redshift + DirectQuery to BigQuery + Import for dimension tables = single composite model</li> <li><strong>Advantages:</strong> Near-real-time data, no data duplication, reduced Power BI storage</li> <li><strong>Trade-offs:</strong> Query performance depends on slowest source, every interaction generates cloud queries (cost implications), limited DAX patterns (some DAX functions require materialized data), complex troubleshooting when queries span multiple sources</li> <li><strong>Best for:</strong> Real-time operational dashboards where data freshness is critical and query volumes are moderate</li> </ul>
<h3>Pattern 3: Lakehouse Consolidation with Microsoft Fabric</h3>
<p>Use <a href="/blog/microsoft-fabric-onelake-architecture-guide-2026">Microsoft Fabric OneLake</a> as the consolidation layer, with <a href="/blog/onelake-shortcuts">shortcuts</a> to external cloud storage.</p>
<ul> <li><strong>Architecture:</strong> Fabric shortcuts to S3 buckets and GCS buckets → Fabric Lakehouse → <a href="/blog/power-bi-direct-lake-mode-fabric-guide-2026">Direct Lake mode</a> semantic model → Power BI reports</li> <li><strong>Advantages:</strong> Import-like performance with Direct Lake, no data movement for shortcutted data, unified governance in Fabric, scalable to petabytes</li> <li><strong>Trade-offs:</strong> Requires Microsoft Fabric capacity (F64+ for production workloads), shortcuts have some limitations (read-only, limited format support), additional Fabric learning curve</li> <li><strong>Best for:</strong> Organizations investing in Microsoft Fabric as their data platform with large volumes of data in AWS S3 or GCS</li> </ul>
<h3>Pattern 4: Hybrid Import + DirectQuery</h3>
<p>Combine Import mode for stable, high-performance dimensions and historical facts with DirectQuery for real-time transactional data.</p>
<ul> <li><strong>Architecture:</strong> Dimension tables and historical facts imported from all sources; current-day or live transactional data via DirectQuery to the primary transactional database (regardless of cloud)</li> <li><strong>Advantages:</strong> Balance of performance and freshness, cost-efficient (only real-time data incurs per-query cloud costs), full DAX on imported partitions</li> <li><strong>Trade-offs:</strong> More complex model design, requires careful partition management</li> <li><strong>Best for:</strong> Dashboards that need both historical trend analysis (imported) and current-day operational visibility (DirectQuery)</li> </ul>
<h2>Data Gateway Configuration for Multi-Cloud</h2>
<p>The <a href="/blog/power-bi-gateway-setup">On-premises Data Gateway</a> is not just for on-premises data—it is essential for connecting to cloud databases in private networks. In a multi-cloud architecture, you may need gateways in multiple cloud environments.</p>
<h3>Gateway Deployment per Cloud</h3>
<ul> <li><strong>AWS:</strong> Install the gateway on a Windows EC2 instance in the VPC containing your Redshift cluster, RDS instances, or other private AWS resources. Use an m5.large or larger instance. The gateway communicates outbound to Power BI Service over HTTPS (port 443)—no inbound ports need to be opened.</li> <li><strong>GCP:</strong> Install the gateway on a Windows Compute Engine instance in the VPC containing your Cloud SQL instances or private BigQuery endpoints. Same sizing and network requirements as AWS.</li> <li><strong>Azure:</strong> For Azure SQL, Synapse, and other Azure services with public endpoints, the gateway is typically not required (Power BI Service connects directly). For Azure resources in private VNets, install the gateway on a Windows VM in the VNet.</li> </ul>
<h3>Gateway Clustering for High Availability</h3>
<p>For production multi-cloud deployments, deploy gateway clusters (2+ gateway nodes) in each cloud to avoid a single point of failure. Power BI automatically distributes queries across cluster members and fails over if a node becomes unavailable. Configure gateway clusters with consistent data source definitions and credentials across all nodes.</p>
<h3>Gateway Security Considerations</h3>
<ul> <li>Gateway VMs should be hardened (remove unnecessary services, apply OS patches, enable Windows Defender)</li> <li>Use managed identities or service accounts for database authentication—not personal credentials</li> <li>Encrypt data source credentials in the gateway configuration (encrypted by default using the gateway recovery key)</li> <li>Place gateway VMs in dedicated security groups/network security groups with minimal port exposure</li> <li>Monitor gateway health through the Power BI admin portal and set up alerts for gateway offline events</li> </ul>
<h2>DirectQuery vs Import: Decision Framework for Cloud Sources</h2>
<p>The Import vs DirectQuery decision is the most impactful architectural choice in a multi-cloud Power BI deployment. Here is the decision framework we use at EPC Group.</p>
<table> <thead><tr><th>Factor</th><th>Choose Import</th><th>Choose DirectQuery</th></tr></thead> <tbody> <tr><td>Data freshness requirement</td><td>Hourly or less frequent is acceptable</td><td>Near-real-time (< 15 minutes) is required</td></tr> <tr><td>Dataset size</td><td>Up to ~500M rows (after compression) in Premium/Fabric</td><td>Billions of rows where Import is not feasible</td></tr> <tr><td>Cloud query cost model</td><td>Cost-sensitive (pay only during refresh)</td><td>Flat-rate or capacity pricing on the cloud side</td></tr> <tr><td>Query concurrency</td><td>High (hundreds of simultaneous users)</td><td>Moderate (each user generates cloud queries)</td></tr> <tr><td>DAX complexity</td><td>Full DAX library available</td><td>Some DAX patterns are restricted or slow in DirectQuery</td></tr> <tr><td>Network latency</td><td>Not a factor (data is local)</td><td>Latency between Power BI and cloud source affects every query</td></tr> <tr><td>Source system impact</td><td>None during report use (only during refresh)</td><td>Every report interaction queries the source</td></tr> </tbody> </table>
<p><strong>General recommendation:</strong> Start with Import mode. Only move to DirectQuery when a specific requirement (real-time freshness, extreme data volume, security policy prohibiting data extraction) makes Import infeasible. In our experience, 80%+ of enterprise Power BI models perform best in Import mode, even in multi-cloud scenarios.</p>
<h2>Security Considerations for Multi-Cloud BI</h2>
<p>Multi-cloud analytics introduces security challenges that must be addressed architecturally.</p>
<h3>Authentication and Authorization</h3>
<ul> <li><strong>AWS sources:</strong> Use IAM roles for Redshift and Athena authentication where possible. For gateway-based connections, use database-specific service accounts with minimum required permissions (read-only access to specific schemas/tables). Rotate credentials on a defined schedule.</li> <li><strong>GCP sources:</strong> Use service account authentication with JSON key files for BigQuery. Grant BigQuery Data Viewer and BigQuery Job User roles—not BigQuery Admin. Store service account keys securely and rotate regularly.</li> <li><strong>Power BI layer:</strong> <a href="/blog/power-bi-row-level-security">Row-level security (RLS)</a> enforced in the Power BI semantic model provides a consistent authorization layer regardless of which cloud the underlying data came from. Even if a user could access the Redshift cluster directly, RLS in Power BI restricts them to their authorized data subset.</li> </ul>
<h3>Data in Transit and at Rest</h3>
<ul> <li>All Power BI connections to cloud sources use TLS encryption in transit</li> <li>Imported data is encrypted at rest in Power BI Service (AES-256)</li> <li>For additional protection, enable Bring Your Own Key (BYOK) encryption for Power BI Premium/Fabric capacity</li> <li>Gateway-to-source connections use the source's native encryption (SSL/TLS for database connections)</li> <li>Sensitivity labels from Microsoft Purview can classify data from any source, providing consistent data protection regardless of cloud origin</li> </ul>
<h3>Network Architecture</h3>
<ul> <li>Use private endpoints where available (Azure Private Link for Power BI Premium, private VPC endpoints for Redshift and BigQuery)</li> <li>Implement network segmentation: gateway VMs should only have access to the specific database ports they need</li> <li>Enable VPC flow logs (AWS) or VPC Flow Logs (GCP) for audit trails on gateway network traffic</li> <li>Consider Azure ExpressRoute (to Azure), AWS Direct Connect, and Google Cloud Interconnect for dedicated network paths when data volumes are large or latency-sensitive</li> </ul>
<h2>Cost Optimization Strategies</h2>
<p>Multi-cloud BI can become expensive without deliberate cost management. Here are the strategies that reduce costs while maintaining performance.</p>
<h3>Reduce Cloud Query Costs</h3>
<ul> <li><strong>Import mode over DirectQuery:</strong> Imported data incurs cloud costs only during scheduled refresh, not during user interactions. For a report accessed by 500 users per day, the cost difference is dramatic.</li> <li><strong>Incremental refresh:</strong> Extract only new/changed data from cloud sources rather than full refreshes. A table with 100M rows that grows by 100K rows daily should refresh only the 100K new rows, reducing extraction costs by 99%+.</li> <li><strong>BigQuery flat-rate/Editions:</strong> If DirectQuery to BigQuery is required, switch from on-demand pricing to capacity-based pricing for predictable costs.</li> <li><strong>Redshift reserved instances:</strong> For persistent DirectQuery workloads against Redshift, reserved instances reduce costs by 30-75% compared to on-demand.</li> <li><strong>Athena query optimization:</strong> Partition S3 data, use Parquet/ORC format, and configure workgroup query limits to prevent runaway costs from Power BI-generated queries.</li> </ul>
<h3>Reduce Data Transfer Costs</h3>
<ul> <li><strong>Co-locate gateways:</strong> Install gateways in the same cloud region as the data source to avoid cross-region data transfer charges.</li> <li><strong>Compress data in transit:</strong> Parquet and columnar formats reduce data transfer volumes by 80-90% compared to CSV/JSON.</li> <li><strong>Fabric shortcuts:</strong> OneLake shortcuts to S3 and GCS read data in place without copying it to Azure, eliminating cross-cloud storage duplication costs (though egress charges still apply during reads).</li> <li><strong>Aggregate before extraction:</strong> Use cloud-side materialized views or pre-aggregation tables to reduce the volume of data extracted into Power BI during refresh.</li> </ul>
<h3>Optimize Power BI Capacity Costs</h3>
<ul> <li><strong>Right-size Fabric capacity:</strong> Use the <a href="/blog/power-bi-premium-fabric-capacity-planning-guide-2026">Fabric Capacity Metrics app</a> to monitor actual utilization and right-size your F SKU. Over-provisioning wastes budget; under-provisioning causes throttling.</li> <li><strong>Pause non-production capacity:</strong> Fabric capacity can be paused during nights and weekends for dev/test environments, reducing costs by 60-70%.</li> <li><strong>Use aggregation tables:</strong> <a href="/blog/power-bi-aggregations">User-defined aggregations</a> reduce the memory and CPU required to serve queries by pre-computing common aggregation patterns, allowing you to use a smaller capacity SKU.</li> </ul>
<h2>Monitoring and Troubleshooting Multi-Cloud Connections</h2>
<p>Multi-cloud BI adds complexity to troubleshooting. Implement these monitoring practices:</p>
<ul> <li><strong>Gateway monitoring:</strong> The Power BI admin portal shows gateway status, query duration, and error rates. Set up email alerts for gateway offline events. Deploy the <a href="/blog/power-bi-monitoring-alerting-admin-best-practices-2026">Power BI monitoring workspace</a> for detailed gateway performance metrics.</li> <li><strong>Refresh failure alerts:</strong> Configure Power BI dataset refresh failure alerts via Power Automate. When a refresh fails, the flow captures the error details and notifies the appropriate team. Common multi-cloud refresh failures include expired credentials, network connectivity issues (gateway VM restarted, security group changed), and source database maintenance windows.</li> <li><strong>Cloud-side query monitoring:</strong> Enable query logging on each cloud source to trace Power BI-generated queries. In Redshift, use STL_QUERY and STL_QUERYTEXT system tables. In BigQuery, use INFORMATION_SCHEMA.JOBS views. In Athena, use CloudTrail and workgroup query history. This is essential for identifying expensive queries generated by DirectQuery models.</li> <li><strong>End-to-end latency tracking:</strong> For DirectQuery models, measure the time from Power BI visual interaction to data rendering. If latency exceeds user expectations, identify whether the bottleneck is in Power BI query generation (DAX complexity), network transit (gateway routing), or source query execution (cloud warehouse performance).</li> </ul>
<h2>Real-World Multi-Cloud BI Architecture Example</h2>
<p>One of EPC Group's enterprise clients runs the following multi-cloud analytics architecture:</p>
<ul> <li><strong>Salesforce data</strong> → landed in AWS Redshift via Fivetran → connected to Power BI via Import mode (daily refresh)</li> <li><strong>Google Analytics + Google Ads</strong> → exported to BigQuery → connected to Power BI via Import mode (twice-daily refresh)</li> <li><strong>ERP data (SAP)</strong> → replicated to Azure SQL Database via Azure Data Factory → connected to Power BI via DirectQuery (real-time inventory and financial data)</li> <li><strong>IoT sensor data</strong> → streamed to AWS Kinesis → landed in S3 Parquet → shortcutted to Fabric OneLake → Direct Lake mode (near-real-time operational analytics)</li> <li><strong>Unified semantic model</strong> → single <a href="/blog/power-bi-data-modeling-best-practices-enterprise-2026">Power BI semantic model</a> combining all sources with consistent definitions, <a href="/blog/power-bi-row-level-security">RLS</a> enforcement, and <a href="/blog/advanced-dax-patterns-enterprise-analytics-2026">DAX measures</a></li> <li><strong>Governance</strong> → <a href="/blog/power-bi-devops-cicd-deployment-pipelines-2026">deployment pipelines</a> (dev/test/prod), <a href="/blog/power-bi-workspace-governance-tenant-settings-guide-2026">workspace governance</a>, sensitivity labels, and <a href="/blog/power-bi-monitoring-alerting-admin-best-practices-2026">automated monitoring</a></li> </ul>
<p>This architecture provides the client with a single pane of glass for cross-cloud analytics, consistent metric definitions regardless of data origin, governed access controls, and predictable costs. The entire platform serves 2,000+ users across 6 business units.</p>
<h2>Getting Started with Multi-Cloud Power BI</h2>
<p>Building a multi-cloud BI architecture requires careful planning across connectivity, security, performance, and cost dimensions. The most common mistake is treating each cloud connection as an isolated point solution rather than designing a coherent architecture that addresses governance, security, and cost holistically.</p>
<p>Start with these steps:</p> <ol> <li><strong>Inventory your data sources</strong> by cloud provider, database technology, data volume, freshness requirements, and current access patterns</li> <li><strong>Define the connectivity mode</strong> (Import, DirectQuery, Direct Lake, or composite) for each source based on the decision framework above</li> <li><strong>Design the gateway architecture</strong> — determine which sources need gateways, where gateway VMs should be deployed, and whether you need gateway clusters for HA</li> <li><strong>Build the unified semantic model</strong> that combines sources into a consistent, governed <a href="/blog/power-bi-star-schema">star schema</a> with RLS, endorsement, and shared definitions</li> <li><strong>Implement monitoring</strong> across all layers: gateway health, refresh success/failure, cloud query costs, and end-user performance</li> <li><strong>Optimize costs continuously</strong> — review cloud query costs, data transfer costs, and capacity utilization monthly</li> </ol>
<p><a href="/contact">Contact EPC Group</a> for a multi-cloud analytics architecture assessment. Our <a href="/services/power-bi-architecture">Power BI architecture</a> and <a href="/services/data-analytics">data analytics</a> consulting teams design and implement multi-cloud BI solutions for enterprises with complex data estates spanning AWS, Azure, GCP, and on-premises infrastructure.</p>
Frequently Asked Questions
Does Power BI natively support connections to AWS Redshift and Google BigQuery?
Yes. Power BI provides certified native connectors for both Amazon Redshift and Google BigQuery, along with dozens of other non-Microsoft data sources. The Redshift connector supports both Import and DirectQuery modes, authenticating with database username/password or IAM credentials. The BigQuery connector also supports Import and DirectQuery, authenticating with Google organizational accounts (OAuth) or service accounts (JSON key file). Both connectors are maintained by Microsoft and receive regular updates. For Redshift clusters in private VPCs, you will need the On-premises Data Gateway installed on a Windows EC2 instance with network access to the cluster. For BigQuery, gateway is typically not needed since BigQuery exposes a public API endpoint, though some organizations route through a gateway for network policy compliance.
How do I control costs when using Power BI DirectQuery against BigQuery?
BigQuery on-demand pricing charges per query based on data scanned, and Power BI DirectQuery generates a query for every visual interaction (filter change, page navigation, slicer selection). A single Power BI page with 10 visuals can generate 10+ BigQuery queries on each interaction. To control costs: First, consider switching from on-demand to BigQuery Editions with autoscaling, which provides capacity-based pricing independent of query volume. Second, enable BigQuery BI Engine to cache frequently accessed data in memory, reducing query costs and improving performance. Third, create BigQuery materialized views for common aggregation patterns that Power BI queries frequently. Fourth, use Import mode with incremental refresh instead of DirectQuery where near-real-time freshness is not strictly required—this shifts from per-interaction costs to per-refresh costs, which is dramatically cheaper for reports with many users. Fifth, set BigQuery cost controls (custom quotas per project or user) to prevent unexpected cost spikes from Power BI query generation.
Can I combine data from Redshift and BigQuery in a single Power BI report?
Yes, through Power BI composite models. A composite model allows you to combine multiple data sources with different connectivity modes in a single semantic model. For example, you can import dimension tables from Azure SQL, use DirectQuery against Redshift for one set of fact tables, and DirectQuery against BigQuery for another set of fact tables—all in one model with relationships between them. This enables cross-source analysis in a single report. However, there are important considerations: cross-source queries (visuals that need data from both Redshift and BigQuery simultaneously) require Power BI to join data from different sources, which impacts performance. For optimal performance, design your model so that most visuals query a single source, with cross-source joins limited to high-value analytical scenarios. Alternatively, use the Centralized Import pattern where all data is imported into Power BI or consolidated in a Fabric Lakehouse, which eliminates cross-source query performance concerns entirely.
What is the recommended gateway architecture for a multi-cloud deployment?
For production multi-cloud deployments, deploy gateway clusters in each cloud environment that contains private data sources. In AWS, install two or more gateway nodes on Windows EC2 instances (m5.large minimum, m5.xlarge recommended) in the VPC containing your Redshift clusters and RDS instances. In GCP, install two or more gateway nodes on Windows Compute Engine instances in the VPC containing your Cloud SQL instances. In Azure, gateways are typically needed only for resources in private VNets. Each gateway cluster should have nodes in different availability zones for resilience. All gateway nodes communicate outbound to Power BI Service over HTTPS port 443—no inbound ports need to be opened, which simplifies firewall configuration. Use the same gateway recovery key across all nodes in a cluster. Monitor gateway health through the Power BI admin portal and configure Power Automate alerts for gateway offline events. The gateway VMs should be patched monthly and included in your infrastructure monitoring and backup processes.
How does Microsoft Fabric OneLake change multi-cloud BI architecture?
Microsoft Fabric OneLake introduces shortcuts that create virtual references to data in Amazon S3 and Google Cloud Storage without copying the data. This means Power BI reports using Direct Lake mode can read data directly from S3 buckets or GCS buckets through the OneLake shortcut layer, getting Import-like performance without the traditional Import refresh cycle. The architecture simplifies significantly: instead of building extraction pipelines from each cloud into Power BI, you create OneLake shortcuts to your S3 and GCS locations, the data stays in its original cloud storage, and Fabric handles the rest. Fabric also provides a unified governance layer across all shortcutted data, including access controls, lineage tracking, and sensitivity labels, regardless of which cloud the data physically resides in. The main limitations are that shortcuts are read-only (you cannot write back through a shortcut), Delta Lake format is required for Direct Lake mode (Fabric can convert other formats), and Fabric capacity costs apply (F64 or higher for production workloads). For organizations already investing in Fabric, OneLake shortcuts represent the simplest path to multi-cloud BI.