SQL PARTITION BY — Guide to Window Functions and Data Partitioning
PARTITION BY divides a result set into groups for window function calculations without collapsing rows the way GROUP BY does. Every row keeps its identity while gaining access to aggregate values across its partition — department totals, category rankings, or running sums within a customer account.
This guide covers the syntax, the most useful window functions paired with PARTITION BY, and the performance implications of partitioning.
PARTITION BY splits rows into groups inside a window function's OVER clause. Unlike GROUP BY, it preserves every row in the output. Pair it with ROW_NUMBER for ranking, SUM for running totals, LAG/LEAD for row comparisons, and FIRST_VALUE/LAST_VALUE for boundary values. Add ORDER BY inside OVER to control calculation order. Index the partition columns and ORDER BY columns together to avoid sort spills.
PARTITION BY vs GROUP BY
This is the most common confusion. GROUP BY collapses rows — a table of 1,000 employees grouped by department produces one row per department. PARTITION BY keeps all rows and appends a computed column — every employee row stays, but each one now carries the department-level aggregate.
The side-by-side query:
-- GROUP BY: 3 rows SELECT department, SUM(salary) AS dept_total FROM employees GROUP BY department;
-- PARTITION BY: all rows preserved SELECT name, department, salary, SUM(salary) OVER(PARTITION BY department) AS dept_total FROM employees;
Syntax and Components
The OVER clause has three parts. All are optional, but the combination determines behavior:
FUNCTION() OVER ( [PARTITION BY col1, col2, ...] [ORDER BY col3 [ASC|DESC]] [ROWS|RANGE|GROUPS frame_spec] )
PARTITION BY defines the groups. Omit it and the entire result set is one partition.
ORDER BY defines the row order within each partition. Required for ranking functions and running totals. Omit it and there is no defined order — aggregate window functions return the same value for every row in the partition (the partition total).
Frame specification controls which rows within the partition participate in the calculation. Only relevant when ORDER BY is present. See the Frame Specifications section below.
Ranking Functions
The three ranking functions — ROW_NUMBER, RANK, and DENSE_RANK — all require ORDER BY and benefit from PARTITION BY. The difference shows up when rows have tied values:
-- Rank employees by salary within each department SELECT name, department, salary, ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary DESC) AS row_num, RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS rnk, DENSE_RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS dense_rnk FROM employees;
The most practical pattern is the top-N per group query. Get the highest-paid employee in each department:
SELECT name, department, salary FROM ( SELECT name, department, salary, ROW_NUMBER() OVER( PARTITION BY department ORDER BY salary DESC ) AS rn FROM employees ) ranked WHERE rn = 1;
Use ROW_NUMBER when you need exactly one row per group. Use RANK or DENSE_RANK when ties should share the same position.
Aggregate Window Functions
The standard aggregate functions — SUM, AVG, COUNT, MIN, MAX — all work inside OVER(PARTITION BY ...). The most common pattern compares an individual row to its group:
SELECT name, department, salary, AVG(salary) OVER(PARTITION BY department) AS dept_avg, salary - AVG(salary) OVER(PARTITION BY department) AS diff_from_avg FROM employees;
Percentage of department total:
SELECT name, department, salary, ROUND( 100.0 * salary / SUM(salary) OVER(PARTITION BY department), 1 ) AS pct_of_dept FROM employees;
For running totals and frame specifications with SUM() OVER, see the dedicated SUM() OVER guide.
Window function queries reference column names like "department" or "salary" whose business meaning must be agreed upon across teams. A business glossary in a data catalog provides those canonical definitions, so a PARTITION BY department query partitions by the same department definition company-wide.
LAG, LEAD, and Row Comparison
LAG accesses the previous row's value within the partition. LEAD accesses the next row's value. Both require ORDER BY to define "previous" and "next."
Month-over-month sales change per product category:
SELECT category, sale_month, monthly_sales, LAG(monthly_sales) OVER( PARTITION BY category ORDER BY sale_month ) AS prev_month, monthly_sales - LAG(monthly_sales) OVER( PARTITION BY category ORDER BY sale_month ) AS month_over_month FROM monthly_sales;
The first row in each partition has no previous row, so LAG returns NULL by default. You can supply a default value as the third argument:
LAG(monthly_sales, 1, 0) OVER(PARTITION BY category ORDER BY sale_month)
Day-over-day stock price movement:
SELECT trading_date, closing_price, closing_price - LAG(closing_price) OVER( ORDER BY trading_date ) AS daily_change, ROUND( 100.0 * (closing_price - LAG(closing_price) OVER(ORDER BY trading_date)) / LAG(closing_price) OVER(ORDER BY trading_date), 2 ) AS pct_change FROM stock_prices WHERE symbol = 'AAPL';
Window functions are executed after WHERE, GROUP BY, and HAVING — they see the filtered, grouped result set. This means you cannot reference a window function result in a WHERE clause; use a subquery or CTE instead.
— Markus Winand, Use The Index, Luke — Window Functions
FIRST_VALUE, LAST_VALUE, and NTH_VALUE
FIRST_VALUE returns the value from the first row in the partition. Useful for comparing each row to the starting point — for example, comparing each month's revenue to January:
SELECT sale_month, revenue, revenue - FIRST_VALUE(revenue) OVER( ORDER BY sale_month ) AS diff_from_jan FROM monthly_revenue WHERE sale_year = 2024;
LAST_VALUE has a well-known gotcha. The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means LAST_VALUE returns the current row's value — not the last row in the partition. To get the actual last row, override the frame:
SELECT sale_month, revenue, LAST_VALUE(revenue) OVER( ORDER BY sale_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS dec_revenue FROM monthly_revenue WHERE sale_year = 2024;
NTH_VALUE accesses the Nth row in the partition. Less common, but useful for accessing specific positions — for example, the second-highest salary:
SELECT name, department, salary, NTH_VALUE(salary, 2) OVER( PARTITION BY department ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS second_highest FROM employees;
Frame Specifications
Frame specifications control which rows within a partition participate in the window function calculation. Three frame types exist:
ROWS counts physical row positions. ROWS BETWEEN 2 PRECEDING AND CURRENT ROW always includes exactly 3 rows (or fewer at partition boundaries).
RANGE groups by ORDER BY value. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW includes all rows whose ORDER BY value is less than or equal to the current row's value. If two rows share the same date, both are included simultaneously.
GROUPS (SQL:2011) counts distinct ORDER BY value groups. Available in PostgreSQL 11+ and SQLite 3.28+.
Common frame specifications:
ROWS UNBOUNDED PRECEDING— running total (shorthand for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)ROWS BETWEEN N PRECEDING AND CURRENT ROW— moving window (e.g., 7-day sum)ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING— partition total (needed for LAST_VALUE and NTH_VALUE)
When ORDER BY is specified in a window function, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — not ROWS. This distinction matters when duplicate ORDER BY values exist: RANGE includes all peers, ROWS does not.
— Microsoft, SELECT — OVER Clause (Transact-SQL)
Performance and Indexing
Window functions sort data. Without an index, the database performs an in-memory or on-disk sort for every distinct PARTITION BY + ORDER BY combination. The fix is a composite index that matches the window specification:
-- For: OVER(PARTITION BY department ORDER BY salary DESC) CREATE INDEX idx_emp_dept_salary ON employees (department, salary DESC);
Partition columns come first, then ORDER BY columns, in the same direction (ASC/DESC). If the query also accesses other columns, adding them to the index makes it a covering index — the database never touches the table.
Memory consumption. The database must buffer all rows in a partition to compute the window function. If one partition has 50 million rows (e.g., all transactions for a single large customer), memory usage spikes. Solutions: filter with WHERE before the window function, or materialize intermediate results in a CTE or temp table.
EXPLAIN output. Look for Sort or WindowAgg nodes in the execution plan. If you see Sort Method: external merge, the sort spilled to disk — add an index or reduce the partition size.
Multiple window functions. If multiple window functions share the same PARTITION BY and ORDER BY, the database sorts once and reuses the result. Different sort orders require separate sorts. Group your window functions by sort specification to minimize sorting overhead:
-- Same sort for all three — one sort operation SELECT name, department, salary, SUM(salary) OVER w AS dept_total, AVG(salary) OVER w AS dept_avg, ROW_NUMBER() OVER w AS rn FROM employees WINDOW w AS (PARTITION BY department ORDER BY salary DESC);
The WINDOW clause (supported by PostgreSQL and MySQL 8+) lets you define a named window and reuse it, reducing duplication and making the optimizer's job easier.