February 20, 2026

power bi power querydata transformationdata cleaningetl processes

Power BI Power Query

Power Query is the ETL engine inside Power BI Desktop and Service. It connects to data sources, applies transformations, and loads clean datasets into the data model. Every time you import a CSV, join two tables, or filter out rows with null values, Power Query is doing the work — generating M language code behind the scenes while you click through a graphical interface.

Power Query matters beyond data cleaning because it is the point where data governance meets BI in practice. Every step in Power Query changes the meaning of the data. A filter step removes rows. A column rename changes what the field is called downstream. A merge step combines two sources with specific join logic. Without documentation, nobody downstream knows what happened between source and dashboard. This makes Power Query both the most powerful and the most governance-risky component in the Power BI stack.

TL;DR

Power Query is Power BI's built-in ETL engine. It connects to 200+ data sources, applies transformations (filter, merge, pivot, type conversion), and loads clean data into the model. Transformations are written in M language and execute in sequence — each step building on the previous one. Query folding pushes operations back to the source for performance. The governance risk: undocumented Power Query steps silently change data meaning, creating lineage gaps that affect every downstream report.

What Power Query Does

Power Query follows a Connect, Transform, Load pipeline. You select a data source (a SQL database, an Excel file, a REST API). Power Query connects, pulls a preview of the data, and opens the Query Editor. In the editor, you apply transformations — each one recorded as an "Applied Step" in a sequential list. When you are done, you close the editor and the transformed data loads into the Power BI data model.

The graphical UI generates M code for every action. When you click "Remove Columns," Power Query writes Table.RemoveColumns(#"Previous Step", {"Column1", "Column2"}). When you filter rows, it writes Table.SelectRows(#"Previous Step", each [Status] = "Active"). The Applied Steps pane is a complete record of every transformation, in order. You can click any step to see the data at that point in the pipeline.

Power Query runs on refresh, not in real time. When you click Refresh in Power BI Desktop, or when a scheduled refresh triggers in the Service, Power Query re-executes every step from source to final table. This means your transformations are repeatable and deterministic — the same source data always produces the same output.

Where it runs: Power BI Desktop executes Power Query locally on your machine. Power BI Service executes it in the cloud (for cloud sources) or through an on-premises data gateway (for sources behind a firewall). The same M code runs in both environments, but performance characteristics differ based on network latency and available compute.

How it compares: Power Query occupies the same space as SSIS, Azure Data Factory, and dbt — ETL tools that move and transform data. The difference is that Power Query is embedded directly in the BI tool, making it the default choice for Power BI users. For small to medium datasets (up to a few million rows), Power Query handles the job without a separate ETL infrastructure. For larger or more complex pipelines, organizations typically use dedicated ETL tools and consume their output in Power BI.

Click to enlarge

Connecting to Data Sources

Power Query supports over 200 connectors, covering the most common data source categories.

Files: Excel, CSV, JSON, Parquet, XML, text files, PDF. Power Query can read files from local disk, network shares, SharePoint, and Azure Blob Storage. For folders containing multiple files with the same structure (e.g., monthly CSV exports), the "Folder" connector loads all files and combines them automatically.

Databases: SQL Server, PostgreSQL, Oracle, MySQL, Azure SQL, Azure Synapse, Amazon Redshift, Google BigQuery, Snowflake. Database connectors support query folding — pushing transformations back to the database engine — which is critical for performance with large tables.

Cloud services: SharePoint, Dataverse, Dynamics 365, Salesforce, Google Analytics, Azure Data Lake, Azure Cosmos DB. These connectors handle authentication (OAuth, service principal, API keys) and pagination automatically.

APIs: REST endpoints via Web.Contents() and OData feeds via OData.Feed(). For REST APIs, you can pass headers, query parameters, and authentication tokens. Pagination requires custom M code for APIs that return results in pages.

Authentication modes depend on the source: anonymous access for public APIs, Windows authentication for on-premises databases, OAuth for cloud services, and service accounts for scheduled refresh. The on-premises data gateway bridges Power BI Service to sources behind a corporate firewall, running Power Query on a local machine and sending only the results to the cloud.

A practical example: connecting to a SQL Server with a parameterized server name. You define a Power BI parameter called DatabaseServer with a default value of "dev-sql-server.database.windows.net." The Power Query source step references this parameter: Sql.Database(DatabaseServer, "SalesDB"). When deploying to production, you change the parameter value to the production server. This pattern enables environment switching without modifying the query logic.

Power Query supports over 200 data connectors across files, databases, cloud services, and APIs — making it the most broadly connected ETL engine embedded in a BI tool.

— Microsoft, Power Query Documentation

Essential Transformations

Ten transformations cover 90% of what Power Query users need.

Filter Rows removes records based on conditions. Keep only active customers, exclude test transactions, filter to the last 12 months. This is typically the first step because it reduces data volume early.

Remove Columns drops columns you do not need. Source tables often have 50+ columns; your report might use 10. Removing unnecessary columns early improves performance and simplifies the data model.

Change Type sets the data type for each column — text, whole number, decimal, date, boolean. Getting types right is essential because the Power BI data model depends on correct types for relationships, aggregations, and DAX calculations.

Split Column separates a single column into multiple columns by delimiter or position. A "Full Name" column becomes "First Name" and "Last Name." A "Date Time" column becomes separate "Date" and "Time" columns.

Merge Columns does the reverse — combining multiple columns into one with a specified separator.

Replace Values substitutes one value for another across a column. Replace "N/A" with null. Replace "USA" with "United States" for consistency. Replace empty strings with null for proper null handling.

Pivot and Unpivot reshape data between wide and narrow formats. Unpivot turns columns like "Jan," "Feb," "Mar" into rows with a "Month" column and a "Value" column — essential for time-series analysis. Pivot does the reverse.

Group By aggregates rows by one or more columns, producing sums, counts, averages, or other aggregations. Group sales by customer and month to get monthly totals.

Add Conditional Column creates a new column based on if-then-else logic. If Amount > 1000 then "High" else if Amount > 500 then "Medium" else "Low."

Merge Queries (Joins) combines two tables based on matching keys — the Power Query equivalent of a SQL JOIN. Supports Inner, Left Outer, Right Outer, Full Outer, Left Anti, and Right Anti joins. You select the key columns in each table, choose the join type, and expand the columns you need from the joined table.

Query Folding and Performance

Query folding is the single most important performance concept in Power Query. When folding works, Power Query translates your transformation steps into a SQL query and sends it to the database. The database filters and transforms the data, and only the results are sent to Power BI. When folding breaks, Power Query downloads all the raw data and processes it locally.

The difference is dramatic. A table with 10 million rows, filtered to 50,000: with query folding, the database filters to 50,000 rows and sends those. Without folding, Power Query downloads all 10 million rows, then filters locally. On a slow network or with limited memory, the unfolded query may time out or crash.

Steps that fold: Filter Rows, Remove Columns, Sort, Group By, Rename Columns, Change Type (for database-supported types), Merge Queries (when both sources are in the same database).

Steps that break folding: Add Custom Column with M functions (like Text.Upper or Date.Year), Table.Buffer, complex conditional columns, merges between different data sources, and most M-specific operations that have no SQL equivalent.

How to check: right-click any step in the Applied Steps pane and look for "View Native Query." If the option is available and not grayed out, the step folds. If it is grayed out, folding broke at or before this step. Every step after the break also runs locally.

The optimization strategy is straightforward: put foldable steps first. Filter rows and remove columns before adding custom columns or applying M functions. This ensures the maximum reduction in data volume happens at the source, with only the remaining transformations running locally.

Click to enlarge

Queries that fold to the data source can execute 10-100x faster than those that download all data locally, depending on data volume and transformation complexity.

— SQLBI, Optimizing Power Query Performance

M Language Fundamentals

M is a functional language. Every Power Query query is a let ... in expression that binds a sequence of named steps, each producing a table that the next step consumes. The graphical UI writes M for you, but understanding the language unlocks manual operations that the UI cannot express.

A basic query looks like this:

let Source = Sql.Database("server", "database"), Filtered = Table.SelectRows(Source, each [Status] = "Active"), Removed = Table.RemoveColumns(Filtered, {"InternalID", "CreatedBy"}) in Removed

Each line after let is a step. Source connects to the database. Filtered filters rows where Status equals "Active." Removed drops two columns. The in keyword specifies which step to output — always the last one.

You need to write M manually when the graphical UI cannot express what you need. Three common cases:

Custom functions: Reusable transformations that you can apply across multiple queries. A function that standardizes phone numbers — stripping parentheses, dashes, and spaces, then reformatting to (XXX) XXX-XXXX — can be defined once and invoked on any table that has a phone column.

Dynamic data sources: When the data source path, URL, or query string needs to be constructed at runtime from parameters or other data. For example, building a REST API URL that includes a date range from Power BI parameters: Web.Contents("https://api.example.com/data?from=" & StartDate & "&to=" & EndDate).

Error handling: The try ... otherwise pattern catches errors at the row level. try Number.FromText([TextNumber]) otherwise 0 attempts to convert text to a number and returns 0 if the conversion fails. This prevents a single bad row from failing the entire query.

Common Patterns and Recipes

Five patterns solve the most common real-world Power Query problems.

Combine files from a folder. You have a folder of monthly CSV exports — Sales_Jan.csv, Sales_Feb.csv, etc. The Folder connector loads all files, and Table.Combine merges them into a single table. Power Query generates a helper function that applies the same transformations to each file before combining. When a new month's file appears in the folder, the next refresh picks it up automatically.

Incremental refresh. For large datasets, you do not want to reload everything on every refresh. Incremental refresh uses Power BI parameters (RangeStart and RangeEnd) to filter the source query to only the changed date range. Power BI manages the partitions — keeping historical partitions intact and refreshing only the current window. This requires query folding on the date filter to work efficiently.

Parameter-driven data source switching. A Power BI parameter called Environment with values "Dev" and "Prod" controls which database Power Query connects to. An if statement in the source step reads the parameter: if Environment = "Dev" then "dev-server" else "prod-server". This enables the same .pbix to work against different environments without modifying query logic.

Handling slowly changing dimensions. Source tables with effective dates (EffectiveStart, EffectiveEnd) need filtering to get the current version of each record. Table.SelectRows(Source, each [EffectiveEnd] = null) keeps only the current records. For Type 2 slowly changing dimensions, you may need to join the dimension with a fact table on both the key and date range to get the correct version for each transaction.

Custom function for data standardization. A function that cleans and standardizes a text column — trimming whitespace, converting to title case, replacing known abbreviations — can be applied to any query. Define it once as a query that returns a function, then invoke it in the Applied Steps of other queries.

The Governance Gap in Power Query

Power Query is powerful but creates a governance blind spot. Transformations are embedded inside .pbix files — invisible to anyone who does not open Power BI Desktop and inspect the Applied Steps.

Consider a typical scenario. A dashboard shows a metric called "Revenue." The number looks reasonable. Stakeholders make decisions based on it. But inside the .pbix, the Power Query steps tell a different story: step 5 filters out rows where OrderType equals "Internal." Step 9 joins with a currency table and converts all amounts to USD. Step 14 renames "Total_Amount_After_Tax" to "Revenue." Step 22 filters out returns by removing rows where Quantity is negative.

None of this is visible outside the .pbix. A data catalog entry for this report might say "Revenue by Region" — but it cannot tell you that "Revenue" excludes internal orders, includes tax, applies a currency conversion, and excludes returns. That logic lives only in the Applied Steps, buried inside a binary .pbix file.

This creates lineage gaps. A downstream user asks "what does Revenue mean?" and nobody outside the Power BI development team can answer confidently. Another team builds their own report using a different definition of Revenue — maybe including internal orders, maybe excluding tax — and now two reports show different numbers for "Revenue" without any external documentation explaining why.

The solution is documenting transformation lineage externally. A data catalog should record not just that the report exists, but what transformations the Power Query steps apply. When "amt_usd" becomes "Revenue" in step 14, that mapping should be tracked in the catalog so that anyone — a governance team, a data steward, an AI agent — can trace the relationship between source column and report metric.

Click to enlarge

How Dawiso Supports Power Query Workflows

Dawiso's data catalog documents what happens between source and Power BI dataset — the transformation lineage that Power Query hides inside .pbix files.

When a Power Query step renames "amt_usd" to "Revenue," the catalog tracks that mapping. The entry for "Revenue" in the Power BI dataset links back to the source column "amt_usd" in the Sales database, with documentation of every transformation applied in between — filtering, joining, converting, renaming. This makes the transformation chain visible to governance teams, data stewards, and any stakeholder who needs to understand what the number means.

The business glossary ensures that column names emerging from Power Query match approved definitions. If the glossary defines "Revenue" as "net revenue after returns and discounts, in USD, from confirmed external orders," then a Power Query step that creates a "Revenue" column must apply the matching business logic. If the definitions diverge — say, the Power Query "Revenue" includes returns — the mismatch is documented and flagged.

Through the Model Context Protocol (MCP), AI agents can trace data from source system through Power Query transformations to dashboard visualization. An agent can answer "where does the Revenue number on this dashboard come from?" by querying the catalog: source table → Power Query steps → Power BI dataset column → report visual. This end-to-end lineage closes the gap that Power Query's embedded transformations create.

Dawiso also tracks data freshness — when each source was last loaded, whether the Power Query refresh succeeded, and how current the data in the Power BI dataset is. When a scheduled refresh fails silently, the catalog alerts data stewards instead of letting stale data persist unnoticed in production reports.

Conclusion

Power Query is where raw data becomes analysis-ready datasets. Its graphical interface, 200+ connectors, and M language provide everything needed for ETL within Power BI. Query folding ensures performance by pushing work to the source database. But Power Query also creates the biggest governance blind spot in the Power BI stack: transformations embedded in .pbix files that nobody outside the development team can see. Organizations that treat Power Query as a "just click and clean" tool accumulate lineage debt — undocumented transformations that make it impossible to trace what "Revenue" actually means. A data catalog, business glossary, and lineage tracking turn Power Query from a black box into a governed, auditable transformation layer.

See it in action

Data & Analytics Catalog

Create a unified view of your data assets and gain insights faster with automated data discovery.

What Power Query Does Connecting to Data Sources Essential Transformations Query Folding and Performance M Language Fundamentals Common Patterns and Recipes The Governance Gap in Power Query How Dawiso Supports Power Query Workflows

Power BI Power Query

What Power Query Does

Connecting to Data Sources

Essential Transformations

Query Folding and Performance

M Language Fundamentals

Common Patterns and Recipes

The Governance Gap in Power Query

How Dawiso Supports Power Query Workflows

Conclusion

Table of contents

Related Terms

Databricks with dbt - Modern Data Transformation Stack

Connecting Power BI to Databricks - Complete Integration Guide

Feature Engineering

What Is an Analytical Pipeline?

Power BI AI Insights - Complete Guide to Intelligent Data Analysis

Power BI Data Modeling - Complete Guide to Effective Data Architecture