How to Connect Databricks to Power BI: The Complete Integration Guide for 2026
Integration
Integration18 min read

How to Connect Databricks to Power BI: The Complete Integration Guide for 2026

Complete guide to connecting Databricks to Power BI using Partner Connect, Unity Catalog, DirectQuery, SQL warehouses, Delta Lake, and cost optimization.

By EPC Group

<h1>How to Connect Databricks to Power BI: The Complete Integration Guide for 2026</h1>

<p>Databricks and Power BI are two of the most widely deployed platforms in the modern enterprise data stack. Databricks provides the unified analytics engine for data engineering, data science, and machine learning workloads built on Apache Spark. Power BI delivers the visualization, semantic modeling, and self-service analytics layer that business users interact with daily. Connecting these two platforms correctly is the difference between a performant, governed, cost-efficient analytics environment and one plagued by slow queries, runaway compute costs, and ungoverned data access.</p>

<p>This guide covers every production-relevant integration pattern between Databricks and Power BI in 2026, including the Databricks connector in Power BI Desktop, Unity Catalog integration, Partner Connect, DirectQuery vs Import mode tradeoffs, SQL warehouse configuration, Photon acceleration, authentication methods, Delta Lake table optimization, cost governance, and lakehouse architecture patterns. EPC Group has implemented these integration patterns across Fortune 500 organizations in healthcare, financial services, and government—industries where data governance, performance, and cost predictability are non-negotiable. Our <a href="/services/data-analytics">data analytics consulting</a> team specializes in building production-grade Databricks-to-Power BI pipelines.</p>

<h2>The Databricks Connector in Power BI</h2>

<p>Power BI includes a native Databricks connector that has undergone significant improvements through 2025 and into 2026. The connector is available in Power BI Desktop under Get Data &gt; Azure &gt; Azure Databricks. It supports both Azure Databricks and Databricks on AWS (via the generic Databricks connector). The connector communicates with Databricks through the ODBC/JDBC interface exposed by SQL warehouses or all-purpose clusters.</p>

<p>When you configure the connector, you provide the Server Hostname and HTTP Path from your Databricks compute resource. The Server Hostname is the workspace URL (e.g., `adb-1234567890123456.7.azuredatabricks.net`), and the HTTP Path is the compute-specific path (e.g., `/sql/1.0/warehouses/abc123def456`). Both values are found in the JDBC/ODBC tab of your SQL warehouse or cluster configuration page in the Databricks workspace.</p>

<p>The connector surfaces the full catalog hierarchy: catalogs, schemas (databases), and tables/views. With Unity Catalog enabled, this hierarchy represents the three-level namespace (`catalog.schema.table`) that governs all data access. Power BI's Navigator pane displays this hierarchy and allows you to select specific tables or views to load into your semantic model.</p>

<h2>Unity Catalog and Power BI Integration</h2>

<p>Unity Catalog is the unified governance layer for the Databricks Lakehouse Platform. It provides centralized access control, audit logging, data lineage, and data discovery across all Databricks workspaces in an account. When Power BI connects to Databricks, Unity Catalog governs which catalogs, schemas, tables, and columns the connecting identity can access.</p>

<p>The integration between Unity Catalog and Power BI is critical for enterprise deployments because it ensures that the same access policies enforced on data engineers and data scientists also apply to Power BI report consumers. If a Unity Catalog policy restricts a service principal from accessing the `finance.sensitive.transactions` table, Power BI will not display that table in the Navigator and queries against it will fail with a permissions error.</p>

<p>Key Unity Catalog capabilities that affect Power BI integration include:</p>

<ul> <li><strong>Column-level security</strong>: Mask or redact sensitive columns so Power BI users see only what their role permits. This is essential for HIPAA and PCI-DSS compliance in healthcare and financial services.</li> <li><strong>Row-level filters</strong>: Apply row-level filters in Unity Catalog that restrict which rows are returned to Power BI based on the connecting identity. This can supplement or replace Power BI's native row-level security (RLS) for defense-in-depth.</li> <li><strong>Data lineage</strong>: Unity Catalog tracks lineage from source tables through transformations to the tables Power BI queries. This provides end-to-end visibility for audit and compliance.</li> <li><strong>Tags and classification</strong>: Tag tables and columns with sensitivity levels (PII, PHI, financial) that inform governance policies and help Power BI model designers understand data sensitivity.</li> </ul>

<p>EPC Group recommends implementing Unity Catalog as the primary governance layer before connecting Power BI. Retrofitting governance after hundreds of Power BI reports are in production is exponentially more expensive. Our <a href="/services/power-bi-architecture">Power BI architecture services</a> include Unity Catalog governance design as a standard phase of every Databricks integration engagement.</p>

<h2>Partner Connect for Power BI</h2>

<p>Databricks Partner Connect is the fastest way to establish a connection between a Databricks workspace and Power BI. From the Databricks workspace sidebar, navigate to Partner Connect, select Power BI, and Databricks will automatically generate a connection file (.pbids) pre-configured with the correct Server Hostname, HTTP Path, and catalog settings for a SQL warehouse in your workspace.</p>

<p>Opening the .pbids file in Power BI Desktop establishes the connection without manual entry of server details. Partner Connect also creates a dedicated SQL warehouse (or uses an existing one) optimized for BI query patterns. This is significant because it separates BI workloads from data engineering workloads at the compute layer, which is essential for both performance isolation and cost attribution.</p>

<p>For organizations with multiple Databricks workspaces (development, staging, production), Partner Connect should be configured against the production workspace only. Development and staging connections should be configured manually to prevent accidental report publishing against non-production data.</p>

<h2>DirectQuery vs Import Mode: When to Use Each</h2>

<p>The most consequential architectural decision when connecting Databricks to Power BI is choosing between DirectQuery and Import mode. This choice affects query performance, data freshness, compute costs, semantic model size, and user experience. There is no universally correct answer—the right choice depends on data volume, query complexity, freshness requirements, and budget.</p>

<h3>Import Mode</h3>

<p>Import mode loads data from Databricks into the Power BI VertiPaq engine (an in-memory columnar store). Queries execute against the local VertiPaq cache, which provides sub-second response times for most analytical queries. Data is refreshed on a schedule (up to 48 refreshes per day with Premium/Fabric capacity).</p>

<p><strong>Use Import when:</strong></p> <ul> <li>Data volumes per table are under 1 billion rows (VertiPaq handles this efficiently)</li> <li>Users expect sub-second query response regardless of complexity</li> <li>Data freshness of 30 minutes to 24 hours is acceptable</li> <li>You want to minimize Databricks compute costs (warehouse runs only during refresh windows)</li> <li>Complex DAX calculations require the full VertiPaq engine</li> </ul>

<h3>DirectQuery Mode</h3>

<p>DirectQuery sends every Power BI query back to the Databricks SQL warehouse in real time. No data is cached in Power BI. This provides near-real-time data freshness but makes query performance dependent on the SQL warehouse's size, configuration, and concurrent load.</p>

<p><strong>Use DirectQuery when:</strong></p> <ul> <li>Data volumes exceed Power BI model size limits (even with Large Dataset Storage Format)</li> <li>Near-real-time data freshness is a business requirement</li> <li>The Databricks SQL warehouse is adequately sized and Photon-accelerated</li> <li>Query patterns are well-defined and optimized (DirectQuery penalizes poorly written queries more than Import)</li> <li>You need to enforce Unity Catalog security at query time rather than at refresh time</li> </ul>

<h3>Composite Models (Hybrid)</h3>

<p>Power BI composite models allow mixing Import and DirectQuery storage modes within the same semantic model. Dimension tables (low cardinality, slowly changing) are imported for performance, while fact tables (high cardinality, frequently updated) remain in DirectQuery mode against Databricks. This hybrid approach balances performance with data freshness and is the most common pattern EPC Group deploys for enterprise Databricks-to-Power BI integrations.</p>

<h2>Databricks SQL Warehouses for BI Workloads</h2>

<p>SQL warehouses (formerly SQL endpoints) are the recommended compute resource for Power BI connections. They are optimized for BI-style queries: aggregations, filters, joins across star schemas, and concurrent user access. All-purpose clusters, while capable of serving Power BI queries, are designed for interactive notebook development and lack the BI-specific optimizations of SQL warehouses.</p>

<p>Key SQL warehouse configuration decisions for Power BI workloads:</p>

<ul> <li><strong>Warehouse type</strong>: Choose Serverless SQL warehouses for fastest startup time (seconds vs minutes) and automatic scaling. Classic SQL warehouses provide more control over cluster configuration but have slower cold-start times.</li> <li><strong>Size</strong>: Start with Medium (4 DBU/hour) for development/testing and Large or X-Large for production workloads with concurrent users. Each size doubles the compute resources and concurrent query capacity.</li> <li><strong>Auto-stop</strong>: Configure auto-stop to 10-15 minutes for development warehouses and 30-60 minutes for production. This prevents idle compute costs while avoiding cold starts during active usage.</li> <li><strong>Scaling</strong>: Enable multi-cluster scaling with a minimum of 1 and maximum of 3-5 clusters for production warehouses. Each additional cluster handles approximately the same concurrent query load as the base size, enabling linear scaling.</li> <li><strong>Channel</strong>: Use the Current channel for production stability. Preview channel provides early access to new features but may introduce regressions.</li> </ul>

<h2>Performance Optimization: Photon, Caching, and Materialized Views</h2>

<p>Databricks provides three primary performance acceleration mechanisms that directly impact Power BI query performance:</p>

<h3>Photon Engine</h3>

<p>Photon is Databricks' native vectorized query engine written in C++ that replaces the Spark SQL execution engine for supported operations. Photon accelerates the scan-filter-aggregate-join patterns that dominate BI queries by 3-8x compared to standard Spark SQL. SQL warehouses (both Serverless and Pro) run Photon by default. This is one of the primary reasons SQL warehouses outperform all-purpose clusters for Power BI workloads.</p>

<p>Photon's performance gains are most significant on:</p> <ul> <li>Large table scans with selective filters (common in Power BI date/dimension filters)</li> <li>Aggregations (SUM, COUNT, AVG) over billions of rows</li> <li>Hash joins between fact and dimension tables</li> <li>String operations and predicate evaluation</li> </ul>

<h3>Result Caching</h3>

<p>Databricks SQL warehouses cache query results in local SSD storage. When Power BI sends a query that is identical to a previously executed query and the underlying data has not changed (determined by Delta Lake's transaction log), the cached result is returned instantly without re-executing the query. This is transparent to Power BI and dramatically improves performance for DirectQuery models where multiple users run the same reports.</p>

<p>Result caching is enabled by default on SQL warehouses. Cache invalidation is automatic—any Delta Lake transaction (INSERT, UPDATE, DELETE, MERGE) that modifies the underlying table invalidates cached results for queries against that table. This ensures Power BI always receives fresh data without manual cache management.</p>

<h3>Materialized Views</h3>

<p>Materialized views in Databricks pre-compute and store the results of complex queries as Delta tables. When Power BI queries reference base tables, the SQL warehouse's query optimizer can automatically rewrite the query to use the materialized view if it satisfies the query's requirements. This is especially powerful for Power BI DirectQuery models that repeatedly execute expensive aggregations.</p>

<p>Create materialized views for the most expensive and frequently executed Power BI query patterns:</p>

<p><code>CREATE MATERIALIZED VIEW sales_daily_summary AS SELECT date, region, product_category, SUM(revenue) as total_revenue, COUNT(DISTINCT customer_id) as unique_customers FROM sales.transactions GROUP BY date, region, product_category;</code></p>

<p>Materialized views refresh automatically when the base tables are updated. Combined with Photon and result caching, they create a three-layer performance optimization stack that makes DirectQuery Power BI reports viable even on very large datasets.</p>

<h2>Authentication Methods: Azure AD, PAT Tokens, and OAuth</h2>

<p>Databricks supports multiple authentication methods for Power BI connections. Choosing the right method depends on your organization's security requirements, identity provider, and governance policies.</p>

<h3>Azure Active Directory (Entra ID) Authentication</h3>

<p>For Azure Databricks, Azure AD (now Microsoft Entra ID) authentication is the recommended method for production deployments. Power BI authenticates using the user's Azure AD identity or a service principal. This provides single sign-on (SSO), integrates with conditional access policies, and enables per-user audit logging in Unity Catalog.</p>

<p><strong>Benefits</strong>: Centralized identity management, MFA enforcement, conditional access, per-user Unity Catalog permissions, no credential rotation required.</p>

<p><strong>Configuration</strong>: In Power BI Desktop, select Azure Active Directory as the authentication method when configuring the Databricks connector. For scheduled refresh in the Power BI service, configure OAuth2 credentials in the gateway/data source settings.</p>

<h3>Personal Access Tokens (PAT)</h3>

<p>PAT tokens are static credentials generated in the Databricks workspace. They are the simplest authentication method but the least secure for production use.</p>

<p><strong>Limitations</strong>: Tokens do not expire by default (must be configured), cannot enforce MFA, do not integrate with conditional access, and Unity Catalog logs the token owner rather than the end user. PAT tokens should be used only for development and testing, never for production Power BI deployments.</p>

<h3>OAuth (Machine-to-Machine)</h3>

<p>Databricks supports OAuth 2.0 for machine-to-machine authentication using service principals. This is the recommended method for automated scenarios such as Power BI scheduled refresh where no interactive user is present. The service principal is registered in Azure AD and granted specific Unity Catalog permissions, enabling least-privilege access.</p>

<p>EPC Group's standard authentication architecture for Databricks-to-Power BI uses Azure AD for interactive users (Power BI Desktop development) and OAuth service principals for scheduled refresh in the Power BI service. PAT tokens are prohibited in production environments. Our <a href="/services/enterprise-deployment">enterprise deployment services</a> include authentication architecture design and implementation.</p>

<h2>Delta Lake Tables and Power BI</h2>

<p>All tables in the Databricks Lakehouse are stored in Delta Lake format—an open-source storage layer that adds ACID transactions, schema enforcement, time travel, and performance optimizations on top of Parquet files. Delta Lake's architecture directly impacts Power BI query performance and data management.</p>

<h3>Table Optimization for Power BI</h3>

<p>Delta tables should be optimized specifically for the query patterns Power BI generates:</p>

<ul> <li><strong>OPTIMIZE</strong>: Run `OPTIMIZE table_name` to compact small files into larger files (target size 1GB). Power BI queries scan fewer files, reducing I/O overhead.</li> <li><strong>ZORDER</strong>: Apply Z-ordering on columns that Power BI frequently filters on (typically date columns and high-cardinality dimension keys). `OPTIMIZE table_name ZORDER BY (order_date, customer_id)` co-locates related data for faster predicate pushdown.</li> <li><strong>Liquid Clustering</strong>: In Databricks 2025+, Liquid Clustering replaces Z-ordering with automatic, incremental clustering that does not require full table rewrites. Enable it with `ALTER TABLE table_name CLUSTER BY (order_date, region)`.</li> <li><strong>Column statistics</strong>: Ensure column statistics are collected (`ANALYZE TABLE table_name COMPUTE STATISTICS FOR ALL COLUMNS`) so the query optimizer can generate efficient execution plans for Power BI queries.</li> <li><strong>Partition pruning</strong>: If tables are partitioned (e.g., by year/month), ensure Power BI queries include partition columns in filters. The SQL warehouse will skip irrelevant partitions entirely.</li> </ul>

<h3>Delta Sharing for Cross-Organization Power BI Access</h3>

<p>Delta Sharing is an open protocol for secure data sharing across organizations. A Databricks workspace can share specific tables or schemas with external Power BI consumers without copying data. The recipient connects Power BI to a Delta Sharing endpoint using the Databricks connector, and Unity Catalog governs what data is shared and audits all access. This is particularly valuable for healthcare data exchanges, financial data partnerships, and government inter-agency analytics.</p>

<h2>Cost Optimization: Warehouse Sizing and Governance</h2>

<p>Databricks compute costs are the primary cost driver in a Databricks-to-Power BI architecture. SQL warehouse costs are measured in Databricks Units (DBUs) per hour, and costs scale linearly with warehouse size and runtime. Without governance, Power BI DirectQuery workloads can generate significant and unpredictable Databricks bills.</p>

<h3>Cost Optimization Strategies</h3>

<ul> <li><strong>Right-size warehouses</strong>: Start with the smallest warehouse that meets your latency requirements. Monitor query latency percentiles (p50, p95, p99) in the SQL warehouse monitoring dashboard and scale up only when p95 latency exceeds your SLA.</li> <li><strong>Separate development and production warehouses</strong>: Development warehouses should be Small with aggressive auto-stop (5 minutes). Production warehouses should be sized for concurrent user load with moderate auto-stop (30 minutes).</li> <li><strong>Use Import mode where appropriate</strong>: Import mode limits Databricks compute to refresh windows only. A 30-minute refresh running twice daily consumes 1 hour of warehouse time versus 8+ hours for a DirectQuery warehouse serving business hours.</li> <li><strong>Implement query tagging</strong>: Tag Power BI queries with report names and workspace identifiers so Databricks cost reports can attribute spend to specific Power BI reports and teams.</li> <li><strong>Budget alerts</strong>: Configure Databricks budget alerts to notify administrators when SQL warehouse spend exceeds thresholds. Integrate with Power BI usage metrics to correlate cost spikes with specific reports or user behavior.</li> <li><strong>Serverless warehouses</strong>: Serverless SQL warehouses eliminate idle compute costs entirely. You pay only for query execution time (per-query DBU billing). For sporadic BI workloads, serverless can reduce costs by 50-70% compared to always-on classic warehouses.</li> </ul>

<h2>Lakehouse Architecture with Power BI</h2>

<p>The Databricks Lakehouse architecture combines the scalability and cost-efficiency of data lakes with the data management and performance capabilities of data warehouses. Power BI sits at the consumption layer of this architecture, serving dashboards, reports, and self-service analytics to business users.</p>

<h3>The Medallion Architecture</h3>

<p>The standard lakehouse pattern is the medallion (bronze-silver-gold) architecture:</p>

<ul> <li><strong>Bronze layer</strong>: Raw data ingested from source systems (ERP, CRM, IoT, APIs) in its original format. Stored as Delta tables with append-only semantics for full auditability.</li> <li><strong>Silver layer</strong>: Cleansed, conformed, and deduplicated data. Business logic applied (currency conversion, entity resolution, data quality rules). Star schema dimensions begin to form here.</li> <li><strong>Gold layer</strong>: Business-ready aggregations and metrics. Optimized for Power BI consumption with star schema design, pre-computed KPIs, and query-optimized table structures (ZORDER, Liquid Clustering, materialized views).</li> </ul>

<p>Power BI should connect exclusively to the Gold layer. Allowing Power BI to query Bronze or Silver tables exposes raw, uncleansed data to business users and generates expensive full-table scans on unoptimized tables. EPC Group enforces this boundary using Unity Catalog permissions: Power BI service principals are granted SELECT only on Gold layer schemas, with no access to Bronze or Silver.</p>

<h3>Semantic Layer Considerations</h3>

<p>An important architectural decision is where the semantic layer resides. There are two patterns:</p>

<ul> <li><strong>Power BI semantic model</strong>: Business logic (measures, calculated columns, hierarchies, RLS) defined in the Power BI dataset. This is the traditional approach and provides the richest DAX-based calculation capabilities.</li> <li><strong>Databricks semantic layer</strong>: Business logic defined as SQL views or functions in Databricks, with Power BI connecting via DirectQuery to pre-computed views. This centralizes business logic in the lakehouse, making it available to Power BI and any other BI tool (Tableau, Looker, custom apps).</li> </ul>

<p>For organizations standardized on Power BI, the Power BI semantic model approach provides the best user experience and DAX flexibility. For organizations with multiple BI tools, centralizing the semantic layer in Databricks avoids duplicating business logic. Many enterprises use a hybrid approach: core metrics and dimensions defined in Databricks SQL views, with Power BI adding presentation-layer measures and visual-specific calculations.</p>

<h3>Real-Time Analytics with Databricks and Power BI</h3>

<p>For near-real-time dashboards, the architecture combines Databricks Structured Streaming (ingesting data into Delta tables with sub-minute latency) with Power BI DirectQuery against SQL warehouses. The streaming pipeline writes to Gold layer Delta tables, and Power BI DirectQuery queries return the latest data on each report refresh. Combined with automatic page refresh in Power BI (configurable down to 1-second intervals on Premium/Fabric capacity), this delivers near-real-time dashboards powered by the Databricks Lakehouse.</p>

<h2>Step-by-Step: Connecting Databricks to Power BI</h2>

<p>Follow these steps to establish a production-grade connection:</p>

<ol> <li><strong>Create a SQL warehouse</strong> in your Databricks workspace. Choose Serverless for cost efficiency or Pro for advanced features. Size based on expected concurrent Power BI users (Medium for &lt;10 users, Large for 10-50, X-Large for 50+).</li> <li><strong>Configure Unity Catalog permissions</strong>. Grant the Power BI service principal (or user group) SELECT on the Gold layer catalog/schema. Deny access to Bronze and Silver layers.</li> <li><strong>Retrieve connection details</strong>. In the SQL warehouse settings, navigate to the Connection Details tab. Copy the Server Hostname and HTTP Path.</li> <li><strong>Open Power BI Desktop</strong>. Click Get Data &gt; Azure &gt; Azure Databricks (or Databricks for non-Azure). Enter the Server Hostname and HTTP Path.</li> <li><strong>Authenticate</strong>. Select Azure Active Directory for interactive use. Enter credentials when prompted. For service principals, configure OAuth in the Power BI service data source settings.</li> <li><strong>Select tables</strong>. Navigate the Unity Catalog hierarchy (catalog &gt; schema &gt; tables). Select Gold layer tables for your semantic model.</li> <li><strong>Choose storage mode</strong>. Select Import for cached performance or DirectQuery for real-time data. Use composite mode to mix both within the same model.</li> <li><strong>Build the semantic model</strong>. Define relationships, measures, hierarchies, and RLS rules in Power BI Desktop.</li> <li><strong>Publish and configure refresh</strong>. Publish to the Power BI service. Configure scheduled refresh (Import) or gateway settings (DirectQuery) with OAuth service principal credentials.</li> <li><strong>Monitor and optimize</strong>. Use Databricks SQL warehouse monitoring to track query latency and cost. Use Power BI usage metrics to identify expensive reports. Optimize Delta tables and create materialized views for the most costly query patterns.</li> </ol>

<h2>Common Pitfalls and How to Avoid Them</h2>

<ul> <li><strong>Using all-purpose clusters instead of SQL warehouses</strong>: All-purpose clusters lack Photon, result caching, and BI-optimized query planning. Always use SQL warehouses for Power BI.</li> <li><strong>No auto-stop configuration</strong>: Forgetting to set auto-stop on SQL warehouses results in 24/7 compute costs even when no one is running reports. Always configure auto-stop.</li> <li><strong>Querying Bronze/Silver layers directly</strong>: Raw and intermediate data layers are not optimized for BI queries and expose uncleansed data. Restrict Power BI to Gold layer only.</li> <li><strong>PAT tokens in production</strong>: PAT tokens bypass Azure AD security policies and create audit blind spots. Use Azure AD or OAuth service principals for production.</li> <li><strong>Ignoring column statistics</strong>: Without column statistics, the SQL warehouse cannot optimize query plans, resulting in full table scans. Run ANALYZE TABLE after data loads.</li> <li><strong>Over-sized warehouses</strong>: Starting with X-Large warehouses "just in case" wastes budget. Start small, measure latency, and scale based on data.</li> </ul>

<h2>Why EPC Group for Databricks and Power BI Integration</h2>

<p>EPC Group brings 25+ years of Microsoft ecosystem expertise to Databricks-to-Power BI integration engagements. As specialists in enterprise data analytics, we have deployed these architectures across healthcare organizations requiring HIPAA-compliant data pipelines, financial institutions with SOC 2 audit requirements, and government agencies operating under FedRAMP. Every engagement begins with an architecture assessment that evaluates your current Databricks environment, Power BI deployment, data volumes, freshness requirements, and compliance obligations.</p>

<p>Our <a href="/services/data-analytics">data analytics consulting</a> team designs the end-to-end lakehouse architecture. Our <a href="/services/power-bi-architecture">Power BI architecture</a> specialists build the semantic models, DAX measures, and RLS policies. Our <a href="/services/enterprise-deployment">enterprise deployment</a> team handles production rollout, monitoring, and optimization. The result is a governed, performant, cost-efficient analytics platform that scales with your organization.</p>

<p><strong>Ready to connect Databricks to Power BI the right way?</strong> <a href="/contact">Contact EPC Group</a> for a free architecture assessment. We will evaluate your current environment and deliver a production-ready integration plan tailored to your data volumes, compliance requirements, and budget.</p>

Frequently Asked Questions

What is the best way to connect Databricks to Power BI for enterprise use?

The recommended approach is to connect Power BI to a Databricks SQL warehouse using the native Azure Databricks connector in Power BI Desktop. Use Azure Active Directory (Entra ID) authentication for interactive development and OAuth service principals for scheduled refresh in the Power BI service. Connect exclusively to Gold layer tables governed by Unity Catalog. For the fastest initial setup, use Databricks Partner Connect which generates a pre-configured .pbids connection file. EPC Group recommends SQL warehouses over all-purpose clusters because SQL warehouses include Photon acceleration, result caching, and BI-optimized query planning that significantly improve Power BI query performance. Contact EPC Group at /contact for a free architecture assessment.

Should I use DirectQuery or Import mode when connecting Power BI to Databricks?

The choice depends on data volume, freshness requirements, and budget. Use Import mode when data volumes are under 1 billion rows per table, sub-second query response is required, and data freshness of 30 minutes to 24 hours is acceptable. Import minimizes Databricks compute costs because the SQL warehouse runs only during refresh windows. Use DirectQuery when data volumes exceed Power BI model size limits, near-real-time freshness is required, or you need Unity Catalog security enforced at query time. For most enterprise scenarios, EPC Group recommends composite models that import dimension tables for performance while keeping large fact tables in DirectQuery mode against Databricks.

How does Unity Catalog affect Power BI connections to Databricks?

Unity Catalog governs all data access when Power BI connects to Databricks. It controls which catalogs, schemas, tables, and columns the Power BI connecting identity can access. Column-level security masks sensitive fields, row-level filters restrict which rows are returned, and all access is audited in the Unity Catalog audit log. This means the same governance policies applied to data engineers and data scientists automatically extend to Power BI consumers. EPC Group recommends implementing Unity Catalog before connecting Power BI to ensure governance is in place from day one rather than retrofitted after reports are in production.

How do I optimize Databricks SQL warehouse costs for Power BI workloads?

Six strategies reduce Databricks compute costs for Power BI: (1) Right-size warehouses by starting small and scaling based on measured p95 query latency. (2) Separate development (Small with 5-minute auto-stop) and production warehouses (sized for concurrent users with 30-minute auto-stop). (3) Use Import mode where possible to limit warehouse runtime to refresh windows only. (4) Use Serverless SQL warehouses for sporadic workloads where per-query billing eliminates idle costs. (5) Create materialized views for expensive aggregation queries to reduce compute per query. (6) Configure Databricks budget alerts to catch cost anomalies early. Organizations following these practices typically reduce Databricks BI compute costs by 40-60% compared to unoptimized deployments. Contact EPC Group via /contact for a cost optimization assessment.

What is the medallion architecture and how does Power BI fit into it?

The medallion architecture organizes lakehouse data into three layers: Bronze (raw ingested data), Silver (cleansed and conformed data), and Gold (business-ready aggregations optimized for consumption). Power BI connects exclusively to the Gold layer, which contains star schema tables, pre-computed KPIs, and query-optimized structures with ZORDER or Liquid Clustering. Connecting Power BI to Bronze or Silver layers exposes uncleansed data to business users and generates expensive full-table scans on unoptimized tables. EPC Group enforces this boundary using Unity Catalog permissions where Power BI service principals receive SELECT access only on Gold layer schemas.

How do I authenticate Power BI to Databricks securely in a production environment?

For production deployments on Azure Databricks, use Azure Active Directory (Entra ID) for interactive users developing in Power BI Desktop and OAuth 2.0 service principals for scheduled refresh in the Power BI service. Azure AD provides single sign-on, MFA enforcement, conditional access integration, and per-user audit logging in Unity Catalog. Personal Access Tokens (PAT) should never be used in production because they bypass Azure AD security policies, cannot enforce MFA, and log only the token owner rather than the end user. EPC Group standard practice prohibits PAT tokens in production and configures service principals with least-privilege Unity Catalog permissions scoped to Gold layer schemas only.

What Delta Lake optimizations improve Power BI query performance on Databricks?

Five Delta Lake optimizations directly improve Power BI query performance: (1) Run OPTIMIZE to compact small files into ~1GB target files, reducing I/O overhead during scans. (2) Apply ZORDER BY on columns Power BI frequently filters (date columns and dimension keys) to co-locate related data. (3) Use Liquid Clustering (available in Databricks 2025+) for automatic incremental clustering without full table rewrites. (4) Collect column statistics with ANALYZE TABLE to enable the query optimizer to generate efficient execution plans. (5) Create materialized views for the most expensive aggregation queries so the SQL warehouse can serve pre-computed results. Combined with Photon acceleration and result caching on SQL warehouses, these optimizations enable DirectQuery Power BI reports to achieve interactive performance on tables with billions of rows.

DatabricksPower BIIntegrationDelta LakeUnity CatalogDirectQueryLakehouseAzureData AnalyticsSQL Warehouse

Need Help With Power BI?

Our experts can help you implement the solutions discussed in this article.

Ready to Transform Your Data Strategy?

Get a free consultation to discuss how Power BI and Microsoft Fabric can drive insights and growth for your organization.