SQL WHERE Clause -- Complete Guide to Data Filtering
The WHERE clause filters which rows a SQL statement affects. It works with SELECT, UPDATE, and DELETE -- any statement that needs to target specific records instead of the entire table. WHERE evaluates each row against a condition and includes only rows where the condition is true.
SQL WHERE filters rows based on conditions. Use comparison operators (=, <, >) for simple checks, logical operators (AND, OR, NOT) to combine conditions, and special operators like IN, BETWEEN, LIKE, and IS NULL for pattern matching and range queries. WHERE runs before GROUP BY in the execution order -- use HAVING to filter after aggregation.
WHERE Syntax and Execution Order
WHERE appears after FROM and runs as step 2 in the SQL execution order -- after the database identifies source tables (FROM/JOIN) but before GROUP BY, HAVING, SELECT, or ORDER BY.
This ordering has a practical consequence: you cannot use column aliases from SELECT in a WHERE clause, because SELECT runs after WHERE. This query fails:
-- ERROR: "total_price" is not yet defined when WHERE runs
SELECT price * quantity AS total_price
FROM order_items
WHERE total_price > 1000;The fix is to repeat the expression:
-- Correct: repeat the expression
SELECT price * quantity AS total_price
FROM order_items
WHERE price * quantity > 1000;If you need to filter on an aggregated value (e.g., "only groups with more than 5 orders"), use HAVING instead of WHERE. HAVING runs after GROUP BY.
Comparison and Logical Operators
Comparison operators
The six comparison operators work on numbers, strings, and dates:
-- Equal
SELECT * FROM employees WHERE department = 'Engineering';
-- Not equal (both forms are valid)
SELECT * FROM products WHERE status != 'discontinued';
SELECT * FROM products WHERE status <> 'discontinued';
-- Greater/less than
SELECT * FROM orders WHERE total > 500;
SELECT * FROM employees WHERE hire_date <= '2024-06-30';Logical operators: AND, OR, NOT
Combine conditions with AND (all must be true), OR (at least one must be true), and NOT (negate):
SELECT product_name, price, stock
FROM products
WHERE category = 'Electronics'
AND price < 500
AND stock > 0;Operator precedence matters: NOT evaluates first, then AND, then OR. Missing parentheses is a common source of bugs. Consider this query:
-- Bug: AND binds tighter than OR
-- This finds: (category = Electronics AND price < 100) OR (category = Clothing)
-- ...which includes ALL clothing regardless of price
SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Clothing'
AND price < 100;
-- Fixed: parentheses clarify intent
SELECT * FROM products
WHERE (category = 'Electronics' OR category = 'Clothing')
AND price < 100;Pattern Matching and Ranges
IN operator
IN checks whether a value matches any item in a list. It is cleaner than chaining OR conditions:
-- List form
SELECT * FROM employees
WHERE department IN ('Sales', 'Marketing', 'Support');
-- Subquery form
SELECT * FROM products
WHERE category_id IN (
SELECT category_id FROM categories WHERE active = true
);BETWEEN operator
BETWEEN checks a range and is inclusive on both ends. This is a common source of off-by-one errors with dates:
-- Includes both Jan 1 and Dec 31
SELECT * FROM orders
WHERE order_date BETWEEN '2025-01-01' AND '2025-12-31';
-- Same as:
SELECT * FROM orders
WHERE order_date >= '2025-01-01' AND order_date <= '2025-12-31';LIKE operator
LIKE matches string patterns using two wildcards: % (any sequence of characters) and _ (any single character).
-- Starts with "Tech"
SELECT * FROM customers WHERE company_name LIKE 'Tech%';
-- Contains "data" anywhere
SELECT * FROM products WHERE description LIKE '%data%';
-- Second character is "a"
SELECT * FROM customers WHERE last_name LIKE '_a%';Performance note: LIKE 'prefix%' can use a B-tree index because the fixed prefix narrows the search. LIKE '%suffix' cannot use an index and forces a full table scan. If you need suffix or substring search on large tables, consider a full-text index.
NULL Handling
NULL in SQL means "unknown." It is not zero, not an empty string, not false. NULL has its own logic, and ignoring it is the most common source of wrong results in WHERE clauses.
Rule 1: You cannot compare NULL with = or !=. Both expressions evaluate to UNKNOWN (not true, not false), so the row is excluded:
-- This returns ZERO rows, even if phone IS null
SELECT * FROM customers WHERE phone = NULL;
-- Correct
SELECT * FROM customers WHERE phone IS NULL;
SELECT * FROM customers WHERE phone IS NOT NULL;Rule 2: != excludes NULLs silently. This is the most dangerous trap. If you want "all customers who are not in London," you might write:
-- This EXCLUDES customers where city is NULL
SELECT * FROM customers WHERE city != 'London';Customers with city = NULL are silently dropped because NULL != 'London' evaluates to UNKNOWN. To include them:
SELECT * FROM customers
WHERE city != 'London' OR city IS NULL;COALESCE provides a default for NULLs, which is useful for calculations and display:
SELECT customer_name, COALESCE(phone, 'No phone on file') AS phone
FROM customers;SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. Any comparison involving NULL produces UNKNOWN, and WHERE includes only rows where the condition evaluates to TRUE. This is defined in the ISO SQL standard and is the root cause of most NULL-related bugs.
— C.J. Date, SQL and Relational Theory
WHERE with Subqueries
Scalar subquery
A scalar subquery returns a single value. Use it to compare against an aggregate:
-- Employees earning above average
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);IN subquery
Returns rows where the column matches any value in the subquery result:
-- Customers who placed at least one order in 2025
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders
WHERE order_date >= '2025-01-01'
);EXISTS and NOT EXISTS
EXISTS tests whether a correlated subquery returns at least one row. It is the standard pattern for semi-joins (find matching rows) and anti-joins (find non-matching rows):
-- Semi-join: customers WITH orders
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
-- Anti-join: customers WITHOUT orders
SELECT c.customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);EXISTS is often faster than IN for large datasets because the database stops scanning after finding the first match. NOT EXISTS is safer than NOT IN when the subquery column might contain NULLs.
Performance
The biggest performance lever in WHERE is whether the predicate can use an index. Database engineers call this sargability -- whether the predicate is "Search ARGument ABLE."
The core rule: never wrap an indexed column in a function. WHERE YEAR(order_date) = 2025 forces the database to evaluate the YEAR function on every row. Rewrite it as a range predicate:
-- Non-sargable (full scan)
SELECT * FROM orders WHERE YEAR(order_date) = 2025;
-- Sargable (index seek)
SELECT * FROM orders
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01';Other performance considerations:
- Selective predicates first: when combining AND conditions, put the most selective one first. An index on
statusthat returns 100 rows out of 10 million is more useful than one onregionthat returns 2 million. - OR and indexes:
WHERE col1 = 'a' OR col2 = 'b'often cannot use a single index. Consider rewriting as two queries joined with UNION. - Implicit type conversion:
WHERE order_id = '123'whenorder_idis an integer forces the database to convert every row, preventing index usage.
A predicate is sargable if the DBMS engine can take advantage of an index to speed up the execution of the query. The term is derived from "Search ARGument ABLE." Making predicates sargable is often the single most impactful query optimization technique.
— Markus Winand, Use The Index, Luke -- Obfuscated Conditions
Common Mistakes
NULL comparison with = or !=
Already covered above, but it bears repeating because it causes so many bugs. Always use IS NULL / IS NOT NULL. And remember that WHERE column != 'value' silently excludes NULLs.
Implicit type conversion
Comparing a numeric column to a string forces implicit conversion. Some databases convert the column (preventing index use); others convert the literal (safe but fragile):
-- Risky: string-to-integer conversion on every row
SELECT * FROM orders WHERE order_id = '12345';
-- Safe: match the data type
SELECT * FROM orders WHERE order_id = 12345;Using OR instead of IN
Chaining many OR conditions on the same column is verbose and harder to optimize. Use IN instead:
-- Verbose
SELECT * FROM products
WHERE category = 'A' OR category = 'B' OR category = 'C';
-- Clean
SELECT * FROM products
WHERE category IN ('A', 'B', 'C');Forgetting parentheses with AND/OR
Because AND binds tighter than OR, A OR B AND C is evaluated as A OR (B AND C). Always add parentheses when mixing AND and OR to make intent explicit.
Test before UPDATE/DELETE
Before running a destructive statement, test the WHERE condition with SELECT first:
-- Step 1: verify which rows will be affected
SELECT * FROM orders WHERE status = 'cancelled' AND order_date < '2024-01-01';
-- Step 2: delete only after verifying the result set
DELETE FROM orders WHERE status = 'cancelled' AND order_date < '2024-01-01';WHERE clauses reference column values whose meaning depends on business context -- is status = 'A' "active" or "archived"? Dawiso's business glossary documents what each value means, reducing filtering errors caused by ambiguous column semantics. When analysts can look up canonical definitions in a data catalog, they write more accurate WHERE conditions.