Power BI Data Modeling: Star Schemas, Relationships, and Performance Patterns
A Power BI data model determines how fast your reports run, how intuitive they are to build, and how accurately they calculate metrics. The core pattern is the star schema: fact tables (transactions, events) surrounded by dimension tables (products, customers, dates). Relationships between tables control how filters propagate and how DAX measures aggregate.
Getting the model right matters more than any visual design choice. A bad model creates slow, inaccurate reports that no amount of DAX optimization can fix. A well-structured star schema with proper relationships, efficient data types, and clear measures makes report building almost trivial — filters work as expected, calculations are correct, and performance stays fast as data grows.
Power BI data models use star schema design: narrow fact tables with numeric measures linked to descriptive dimension tables through one-to-many relationships. Use measures (not calculated columns) for dynamic aggregations. Optimize performance by choosing efficient data types, managing column cardinality, and building aggregation tables for large datasets. Implement row-level security with DAX filter expressions to control data access per user role.
Star Schema: The Foundation
A star schema separates data into two types of tables. Fact tables contain the measurable events — sales transactions, website visits, support tickets. They are tall (many rows) and narrow (few columns): mostly numeric values and foreign keys. Dimension tables contain the descriptive attributes — product names, customer segments, date labels. They are wide (many columns) and short (relatively few rows).
A concrete example: a Sales Fact table with columns DateKey, ProductKey, CustomerKey, Quantity, UnitPrice, and TotalAmount. Each key links to a dimension: the Date Dimension provides Year, Quarter, Month, DayOfWeek. The Product Dimension provides ProductName, Category, Subcategory, Brand. The Customer Dimension provides CustomerName, Segment, Region, City.
Star schemas outperform flat tables in Power BI for three reasons. First, Power BI's VertiPaq engine compresses dimension columns with low cardinality extremely well — a "Category" column with 20 unique values compresses to almost nothing. Second, filter direction is clear — filters flow from dimensions to facts, never creating circular paths. Third, DAX measures perform best when they aggregate over a narrow fact table with integer keys, not when they scan wide tables with repeated text values.
The anti-pattern to avoid is the "one big table" — a single denormalized table with every column. It works fine for 10,000 rows. At 10 million rows, it consumes 5-10x more memory than an equivalent star schema, filters become unpredictable, and DAX calculations slow to a crawl because VertiPaq can't compress repeated text values efficiently.
Relationships and Filter Propagation
Relationships in Power BI define how filters move between tables. When a user selects "2025" in a date slicer, Power BI filters the Date Dimension to that year, then propagates the filter through the relationship to the Sales Fact table, showing only 2025 transactions.
The standard pattern is one-to-many: one row in the dimension matches many rows in the fact table. The filter always flows from the "one" side (dimension) to the "many" side (fact). This direction is automatic and unambiguous.
Bi-directional filtering reverses the flow, letting fact tables filter dimensions. Example: selecting a product in a visual that's connected to Sales Fact, and having the Customer Dimension filter to show only customers who bought that product. It's powerful but risky — bi-directional filters slow queries and can create ambiguous filter paths in complex models. Avoid them unless the use case specifically requires it.
Role-playing dimensions occur when a fact table has multiple relationships to the same dimension. A sales fact might have both OrderDateKey and ShipDateKey, both pointing to the Date Dimension. Power BI only allows one active relationship per table pair. The second becomes inactive and must be activated explicitly in DAX using USERELATIONSHIP():
ShippedRevenue = CALCULATE(SUM(Sales[TotalAmount]), USERELATIONSHIP(Sales[ShipDateKey], DateDim[DateKey]))
Bi-directional cross-filtering should be used sparingly. It impacts query performance and can produce unexpected results in complex models. Default to single-direction filtering and only enable bi-directional when there's a clear analytical requirement.
Measures vs. Calculated Columns
This distinction trips up most Power BI beginners, and getting it wrong leads to bloated models and wrong calculations.
Calculated columns are computed once during data refresh and stored in the model. They operate in row context — each row gets a value based on that row's data. Use them for fixed categorizations that don't change with user filters: PriceBand = IF([ListPrice] > 100, "Premium", "Standard"). The value is the same regardless of what the user selects on the dashboard.
Measures are computed at query time in response to user interactions. They operate in filter context — the result changes based on which filters are active. TotalRevenue = SUM(Sales[TotalAmount]) returns a different number when the user selects "EMEA" versus "North America." Measures consume zero storage because they're formulas, not data.
The decision framework: if the value changes based on what the user selects (total revenue for selected region, year-over-year growth for selected product), use a measure. If the value is fixed per row regardless of filter context (categorizing products into price bands, extracting the domain from an email address), use a calculated column. When in doubt, use a measure — they're more memory-efficient and more flexible.
Hierarchies and Drill-Down
Hierarchies enable drill-down navigation in visuals. A Date hierarchy (Year, Quarter, Month, Day) lets users start at the yearly view and drill into months to find trends. A Product hierarchy (Category, Subcategory, Product) lets users explore from broad categories to specific items. A Geography hierarchy (Country, State, City) supports regional analysis.
Build hierarchies in the model view by dragging columns into a hierarchy group. Always create a dedicated Date dimension table rather than relying on Power BI's auto-generated date hierarchy — the built-in version lacks fiscal year support, custom week definitions, and holiday flags. Mark the table with Mark as Date Table so time intelligence functions like TOTALYTD and SAMEPERIODLASTYEAR work correctly.
Performance Optimization
Performance problems in Power BI models are almost always caused by one of five things, ranked by impact:
1. Text keys instead of integer keys. If your fact table joins to dimensions on text columns like product codes or customer IDs, switch to integer surrogate keys. VertiPaq compresses integers far more efficiently, and hash joins on integers are 3-5x faster. This single change can cut model size by 40%.
2. Unused columns. Every column in the model consumes memory and adds to refresh time — even columns that appear in no visual. Remove columns you don't need in Power Query before they reach the model. A common offender: importing all 80 columns from a source table when the report uses 12.
3. High-cardinality text columns in fact tables. A "Description" column with 5 million unique values in a fact table inflates model size dramatically. Move descriptive text to dimension tables, or remove it entirely if it's not used in analysis.
4. Calculated columns where measures would work. A calculated column that computes running totals for 50 million rows stores 50 million values. The equivalent measure computes the same result on the fly, using zero storage.
5. Missing aggregation tables. For datasets over 100 million rows, create pre-aggregated summary tables in the source system (or in Databricks). Power BI's aggregation feature routes queries to the summary table when possible and falls back to the detail table for drill-through. This can reduce query times from minutes to milliseconds.
Sizing guidance: models under 500 MB work well on shared capacity. Models between 500 MB and 1 GB need careful optimization or Premium capacity. Models over 1 GB require Premium/Fabric and should use incremental refresh with partitioning.
Row-Level Security
Row-level security (RLS) restricts which data rows each user can see. A regional manager should see only their region's data. An account executive should see only their accounts. RLS makes this possible within a single shared report.
Implementation: create a security role in Power BI Desktop, define a DAX filter expression, and assign users to roles in Power BI Service. For a regional manager who should only see EMEA data:
[Region] = LOOKUPVALUE(Users[Region], Users[Email], USERPRINCIPALNAME())
This expression filters the Sales Fact table to rows matching the current user's assigned region. The filter propagates through relationships — if Sales connects to Product and Customer, those dimensions are also filtered to show only EMEA-relevant data.
Test RLS in Power BI Desktop using View as Role before publishing. After publishing to Power BI Service, assign Azure AD users or groups to each security role in the dataset settings. A common mistake: publishing the report but forgetting to assign users to roles — the result is an empty report for everyone except the publisher.
Row-level security in Power BI works with both Import and DirectQuery modes. When used with DirectQuery connected to Databricks, RLS defined in Power BI runs on top of Unity Catalog's native row-level security — providing defense in depth at both the BI and platform layers.
— Microsoft, Row-level security in Power BI
Advanced Patterns
Composite models combine Import and DirectQuery tables in a single model. Import your dimension tables for fast filtering and keep large fact tables in DirectQuery mode against Databricks. This gives you the best of both worlds: interactive dimension filtering with live fact data.
Calculation groups eliminate repetitive time intelligence measures. Instead of creating separate YTD, QTD, and MTD versions of every measure (Revenue YTD, Cost YTD, Margin YTD...), define a calculation group with time intelligence items that apply to any measure. This can reduce a model with 60 measures to 20 base measures plus one calculation group. Requires Premium or Fabric capacity.
Many-to-many relationships through bridge tables handle cases where the standard one-to-many pattern doesn't fit. A customer who belongs to multiple segments, or a product that appears in multiple promotions, requires a bridge table to avoid duplicating rows. Design these carefully — incorrect bridge tables create inflated totals.
Role-playing dimensions are the same dimension table used multiple times with different relationships. The Date Dimension connected to OrderDate (active) and ShipDate (inactive) is the classic example. Use USERELATIONSHIP() in measures that need the alternate relationship path. Some modelers prefer creating separate physical copies of the dimension (OrderDate table, ShipDate table) to avoid the inactive relationship complexity.
How Dawiso Documents Your Data Model
Power BI data models are often tribal knowledge. The person who built the model knows that "Revenue" excludes returns, that "Active Customer" means a purchase in the last 90 days, and that the Customer Dimension gets refreshed from two different CRM systems. When that person leaves, the knowledge leaves too.
Dawiso's data catalog indexes Power BI datasets and maps each table and column to business glossary terms. The definition of "Revenue" — gross vs. net, inclusion criteria, source of truth — lives in one place and applies across Power BI, Databricks SQL, and executive presentations.
Data lineage traces each dimension and fact table back to its origin: source database, ETL pipeline, staging table. When a source system changes — a CRM migration, a schema update, a new data feed — Dawiso's lineage identifies which Power BI datasets, measures, and reports are affected before they break.
Through the Model Context Protocol (MCP), AI copilots can query Dawiso for metric definitions and model documentation. An analyst asking "how is customer lifetime value calculated?" gets the canonical formula, the source tables, and the Power BI measure name — without digging through a shared folder of outdated data dictionaries.
Conclusion
Power BI data modeling is an architecture discipline, not a configuration task. The star schema is not optional — it's how VertiPaq was designed to work. Relationships should flow one direction (dimension to fact) unless you have a specific reason for bidirectional filtering. Measures beat calculated columns for anything that responds to user context. Performance optimization starts with data types and unused columns, not with DAX tuning. And the model's business logic — what "Revenue" means, where the data comes from, who owns it — belongs in a catalog, not in someone's head.