Connecting Power BI to Databricks: Setup, Performance, and Architecture Patterns
Power BI can query Databricks data directly through a native connector, using either DirectQuery (live queries against SQL warehouses) or Import mode (data pulled into Power BI's in-memory engine). The setup takes about 15 minutes using Partner Connect.
The real challenge is not the connection itself — it is designing the data architecture so dashboards perform well at scale. A poorly optimized DirectQuery report against a 500-million-row fact table will timeout. A well-structured report with aggregation tables, Z-ordered Delta storage, and a right-sized SQL warehouse will load in under two seconds. This guide covers the connection setup, the DirectQuery vs. Import trade-off, and the performance patterns that make the integration production-ready.
Connect Power BI to Databricks using Partner Connect (one-click setup) or the native connector in Power BI Desktop. Use DirectQuery mode for large, frequently updated datasets — queries run live against Databricks SQL warehouses. Use Import mode for smaller datasets where dashboard speed matters more than freshness. Optimize performance with aggregation tables, Z-ordered Delta tables, and right-sized SQL warehouses.
Architecture: How Power BI Queries Databricks
Power BI sends SQL queries to a Databricks SQL warehouse. The warehouse executes those queries against Delta Lake tables stored in cloud object storage (ADLS, S3, or GCS). Results flow back through the Databricks connector into Power BI's rendering engine.
In DirectQuery mode, Power BI sends a new SQL query on every user interaction — clicking a filter, hovering over a chart, drilling into a hierarchy. The dashboard is always fresh, but every click costs a round trip to Databricks. In Import mode, Power BI pulls the full dataset during a scheduled refresh, compresses it into an in-memory columnar store (VertiPaq), and serves all interactions locally. The dashboard is fast but only as fresh as the last refresh.
The choice between DirectQuery and Import is the most consequential architecture decision in this integration. It affects dashboard speed, data freshness, compute costs, and which Power BI features are available. Getting it wrong means either stale dashboards or slow, expensive ones.
Connection Setup
Two methods, both straightforward.
Partner Connect (recommended)
Partner Connect automates the setup from the Databricks side. In the Databricks workspace, navigate to Partner Connect, select Power BI, and Databricks provisions a SQL warehouse, generates a personal access token (PAT), and produces a .pbids file. Open that file in Power BI Desktop, enter the PAT when prompted, and the connection is live. The entire process takes about five clicks and one token paste.
What happens behind the scenes: Databricks creates a dedicated SQL warehouse sized for BI queries, configures the connection endpoint, and embeds the hostname and HTTP path into the .pbids file. You can resize the warehouse later as your query patterns evolve.
Native connector (manual)
In Power BI Desktop, click Get Data, search for "Databricks," and select the Azure Databricks connector (it works for AWS and GCP deployments too, despite the name). Enter the server hostname and HTTP path from your SQL warehouse's connection details tab. Choose DirectQuery or Import mode. For authentication, select Personal Access Token and paste a token generated from your Databricks user settings.
Authentication options: PAT works across all clouds. Azure AD SSO is available for Azure Databricks and eliminates token management — users authenticate with their corporate credentials. For automated Power BI Service refresh, use service principals instead of personal tokens.
Databricks Partner Connect supports one-click setup for Power BI, Tableau, and other BI tools. The SQL warehouse created through Partner Connect is pre-configured for BI query patterns with Photon acceleration enabled by default.
— Databricks, Power BI integration documentation
DirectQuery vs. Import: Choosing the Right Mode
This is not a simple pros/cons decision. The right mode depends on dataset size, freshness requirements, and how users interact with the dashboard.
Use DirectQuery when the dataset exceeds 1 GB, when data changes hourly or more frequently, or when you need real-time visibility. DirectQuery keeps the data in Databricks and pushes queries down to the SQL warehouse, so there's no dataset size limit on the Power BI side.
Use Import when the dataset is under 1 GB, data refreshes daily or less frequently, and you need maximum dashboard interactivity. Import mode loads data into Power BI's VertiPaq engine, which is extremely fast for slicing, filtering, and aggregating — noticeably faster than DirectQuery for interactive exploration.
The hybrid approach combines both: Import dimension tables (products, customers, dates — relatively small and static) and DirectQuery for fact tables (transactions, events — large and frequently updated). This gives you fast filtering on dimensions with live data from the fact tables. Configure this using Power BI's composite model feature.
Performance Optimization
Performance problems in this integration almost always trace to one of three layers: the Databricks storage, the SQL warehouse, or the Power BI report design.
Databricks side
Run OPTIMIZE on every table that Power BI queries. This compacts small files into larger ones, reducing the number of file reads per query. Apply Z-ORDER on columns that Power BI filters on most — typically date columns, region, product category, and any field that appears as a slicer in your dashboards.
Create aggregation views for the metrics that appear on your most-viewed dashboards. Instead of having Power BI query a 200-million-row fact table to show monthly revenue by region, pre-aggregate that into a summary table with a few thousand rows. Power BI's aggregation feature automatically routes queries to the summary when possible and falls back to the detail table for drill-through.
SQL warehouse
Start with a Medium SQL warehouse for most workloads. Monitor query execution times and concurrency in the warehouse's query history. If queries consistently take more than 5 seconds or you see queuing, scale up. If the warehouse is idle most of the time, enable auto-stop (5-10 minute timeout) to avoid paying for unused compute.
Enable Photon. It's a C++ execution engine that runs alongside Spark and accelerates SQL queries by 30-50% for typical BI workloads. There's a small DBU surcharge, but it usually pays for itself through faster queries and smaller warehouse sizes.
Power BI side
Limit visuals per report page to 8-12. Every visual generates at least one SQL query in DirectQuery mode. A page with 25 visuals fires 25+ queries simultaneously when a user changes a filter — overwhelming even a large SQL warehouse. Minimize cross-filtering between visuals. Use Performance Analyzer in Power BI Desktop to identify which visuals generate the slowest queries.
For data model design, use a star schema with narrow dimension tables. Avoid wide fact tables with unused columns — every column Power BI imports consumes memory, and every column in DirectQuery adds to the query payload. Create views in Databricks that expose only the columns Power BI needs.
Security and Access Control
Unity Catalog permissions apply to Power BI connections. If a user's Databricks identity only has access to certain tables or columns, those restrictions are enforced server-side — Power BI receives only the data the user is authorized to see. This includes row-level security implemented through Unity Catalog row filters.
PAT rotation matters for security hygiene. Set PATs to expire after 90 days and establish a rotation process. For production Power BI Service connections, use service principals — they don't expire like personal tokens and aren't tied to individual user accounts that might leave the organization.
Azure AD SSO eliminates PAT management entirely for Azure Databricks. Users authenticate with their corporate credentials, and permissions flow through from Active Directory to Unity Catalog. This is the most secure option and the simplest to maintain, but it's only available on Azure Databricks.
For network security, configure private endpoints to keep traffic between Power BI Service and Databricks on the cloud provider's backbone network. Add IP whitelisting to restrict which networks can reach the SQL warehouse. All connections use TLS encryption in transit.
Publishing, Sharing, and Refresh
Publish reports from Power BI Desktop to Power BI Service using the Publish button. The report and its dataset upload to the cloud, where other users can access them. For Import mode datasets, configure scheduled refresh in the dataset settings — typically daily or every few hours, depending on freshness requirements.
DirectQuery datasets don't need scheduled refresh (queries run live), but they do need the SQL warehouse to be running when users access the dashboard. Configure the warehouse's auto-start settings so it spins up on the first query and stops after a period of inactivity.
Share reports through Power BI apps (curated collections for business users), direct workspace access (for analysts), or embedded in Microsoft Teams channels. Row-level security configured in Databricks carries through to Power BI Service — each user sees only their authorized data, regardless of how they access the report.
Power BI supports single sign-on (SSO) with Azure Databricks, passing the user's Azure AD identity through to Unity Catalog. This ensures row-level and column-level security policies defined in Databricks are automatically enforced in every Power BI report.
— Microsoft, Power BI and Azure Databricks Integration
Troubleshooting Common Issues
Connection fails immediately. Check three things in order: (1) Is the SQL warehouse running? Stopped warehouses reject connections. (2) Has the PAT expired? Generate a new one and update the connection. (3) Is the server hostname correct? Copy it from the SQL warehouse's connection details tab — don't type it manually.
Dashboard loads slowly or times out. Open Performance Analyzer in Power BI Desktop and identify the slowest visuals. Check whether the underlying queries are scanning full tables (fix with Z-ORDER and aggregation tables). Check the SQL warehouse size — if queries are queuing, the warehouse is undersized. Reduce visual count on the page if it exceeds 12.
Data appears stale in Import mode. Check the refresh history in Power BI Service dataset settings. If refreshes are failing, the most common cause is an expired PAT or a network change blocking the connection. For large datasets, switch to incremental refresh — it refreshes only recent partitions, reducing refresh time and the chance of timeout failures.
Incorrect or missing data. Verify Unity Catalog permissions for the service principal or user account Power BI uses. If a user reports missing rows, the cause is usually row-level security filtering data they're not authorized to see — which is working correctly. If columns are missing, check the view definition in Databricks.
How Dawiso Adds Context to Power BI Reports
Power BI reports show data but rarely explain what the data means. A dashboard displays "Revenue: $4.2M" — but which products are included? Are returns deducted? Is it recognized or booked revenue? Does it include a subsidiary acquired last quarter?
Dawiso's business glossary defines these metrics in one place. "Revenue" gets a canonical definition — gross vs. net, inclusion criteria, calculation method — that applies across Power BI, Databricks SQL, and executive presentations. When an analyst hovers over a metric, the definition is the same one the CFO approved.
Data lineage traces each Power BI report back through the Databricks SQL warehouse to the original source system. When a source changes — a CRM migration, a schema update, a new data feed — Dawiso's lineage identifies which Power BI reports are affected before users see broken dashboards.
Through the Model Context Protocol (MCP), AI copilots can query Dawiso for metric definitions while users interact with Power BI dashboards. An analyst asking "why did revenue drop this month?" gets context that includes not just the numbers but the business definition of revenue, the data sources behind it, and whether any upstream changes might explain the variance.
Conclusion
Connecting Power BI to Databricks is a 15-minute setup. Making it perform well is an architecture problem. The decisions that matter are DirectQuery vs. Import (choose based on data size and freshness), SQL warehouse sizing (start Medium, scale based on query history), and Delta table optimization (OPTIMIZE, Z-ORDER, aggregation tables). The governance gap — reports showing numbers without business context — closes when a catalog like Dawiso maps every metric to its definition, lineage, and ownership.