Skip to main content
sql wheredata filteringconditional queriessql basics

SQL WHERE Clause -- Complete Guide to Data Filtering

The WHERE clause filters which rows a SQL statement affects. It works with SELECT, UPDATE, and DELETE -- any statement that needs to target specific records instead of the entire table. WHERE evaluates each row against a condition and includes only rows where the condition is true.

TL;DR

SQL WHERE filters rows based on conditions. Use comparison operators (=, <, >) for simple checks, logical operators (AND, OR, NOT) to combine conditions, and special operators like IN, BETWEEN, LIKE, and IS NULL for pattern matching and range queries. WHERE runs before GROUP BY in the execution order -- use HAVING to filter after aggregation.

WHERE Syntax and Execution Order

WHERE appears after FROM and runs as step 2 in the SQL execution order -- after the database identifies source tables (FROM/JOIN) but before GROUP BY, HAVING, SELECT, or ORDER BY.

WHERE IN THE EXECUTION ORDERFROMWHEREFilters rowsGROUP BYHAVINGSELECTORDER BYLIMITWHERE filters individual rows before any grouping happens
Click to enlarge

This ordering has a practical consequence: you cannot use column aliases from SELECT in a WHERE clause, because SELECT runs after WHERE. This query fails:

-- ERROR: "total_price" is not yet defined when WHERE runs SELECT price * quantity AS total_price FROM order_items WHERE total_price > 1000;

The fix is to repeat the expression:

-- Correct: repeat the expression SELECT price * quantity AS total_price FROM order_items WHERE price * quantity > 1000;

If you need to filter on an aggregated value (e.g., "only groups with more than 5 orders"), use HAVING instead of WHERE. HAVING runs after GROUP BY.

Comparison and Logical Operators

Comparison operators

The six comparison operators work on numbers, strings, and dates:

-- Equal SELECT * FROM employees WHERE department = 'Engineering'; -- Not equal (both forms are valid) SELECT * FROM products WHERE status != 'discontinued'; SELECT * FROM products WHERE status <> 'discontinued'; -- Greater/less than SELECT * FROM orders WHERE total > 500; SELECT * FROM employees WHERE hire_date <= '2024-06-30';

Logical operators: AND, OR, NOT

Combine conditions with AND (all must be true), OR (at least one must be true), and NOT (negate):

SELECT product_name, price, stock FROM products WHERE category = 'Electronics' AND price < 500 AND stock > 0;

Operator precedence matters: NOT evaluates first, then AND, then OR. Missing parentheses is a common source of bugs. Consider this query:

-- Bug: AND binds tighter than OR -- This finds: (category = Electronics AND price < 100) OR (category = Clothing) -- ...which includes ALL clothing regardless of price SELECT * FROM products WHERE category = 'Electronics' OR category = 'Clothing' AND price < 100; -- Fixed: parentheses clarify intent SELECT * FROM products WHERE (category = 'Electronics' OR category = 'Clothing') AND price < 100;

Pattern Matching and Ranges

IN operator

IN checks whether a value matches any item in a list. It is cleaner than chaining OR conditions:

-- List form SELECT * FROM employees WHERE department IN ('Sales', 'Marketing', 'Support'); -- Subquery form SELECT * FROM products WHERE category_id IN ( SELECT category_id FROM categories WHERE active = true );

BETWEEN operator

BETWEEN checks a range and is inclusive on both ends. This is a common source of off-by-one errors with dates:

-- Includes both Jan 1 and Dec 31 SELECT * FROM orders WHERE order_date BETWEEN '2025-01-01' AND '2025-12-31'; -- Same as: SELECT * FROM orders WHERE order_date >= '2025-01-01' AND order_date <= '2025-12-31';

LIKE operator

LIKE matches string patterns using two wildcards: % (any sequence of characters) and _ (any single character).

-- Starts with "Tech" SELECT * FROM customers WHERE company_name LIKE 'Tech%'; -- Contains "data" anywhere SELECT * FROM products WHERE description LIKE '%data%'; -- Second character is "a" SELECT * FROM customers WHERE last_name LIKE '_a%';

Performance note: LIKE 'prefix%' can use a B-tree index because the fixed prefix narrows the search. LIKE '%suffix' cannot use an index and forces a full table scan. If you need suffix or substring search on large tables, consider a full-text index.

NULL Handling

NULL in SQL means "unknown." It is not zero, not an empty string, not false. NULL has its own logic, and ignoring it is the most common source of wrong results in WHERE clauses.

Rule 1: You cannot compare NULL with = or !=. Both expressions evaluate to UNKNOWN (not true, not false), so the row is excluded:

-- This returns ZERO rows, even if phone IS null SELECT * FROM customers WHERE phone = NULL; -- Correct SELECT * FROM customers WHERE phone IS NULL; SELECT * FROM customers WHERE phone IS NOT NULL;

Rule 2: != excludes NULLs silently. This is the most dangerous trap. If you want "all customers who are not in London," you might write:

-- This EXCLUDES customers where city is NULL SELECT * FROM customers WHERE city != 'London';

Customers with city = NULL are silently dropped because NULL != 'London' evaluates to UNKNOWN. To include them:

SELECT * FROM customers WHERE city != 'London' OR city IS NULL;

COALESCE provides a default for NULLs, which is useful for calculations and display:

SELECT customer_name, COALESCE(phone, 'No phone on file') AS phone FROM customers;

SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. Any comparison involving NULL produces UNKNOWN, and WHERE includes only rows where the condition evaluates to TRUE. This is defined in the ISO SQL standard and is the root cause of most NULL-related bugs.

— C.J. Date, SQL and Relational Theory

WHERE with Subqueries

Scalar subquery

A scalar subquery returns a single value. Use it to compare against an aggregate:

-- Employees earning above average SELECT employee_name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

IN subquery

Returns rows where the column matches any value in the subquery result:

-- Customers who placed at least one order in 2025 SELECT customer_name FROM customers WHERE customer_id IN ( SELECT customer_id FROM orders WHERE order_date >= '2025-01-01' );

EXISTS and NOT EXISTS

EXISTS tests whether a correlated subquery returns at least one row. It is the standard pattern for semi-joins (find matching rows) and anti-joins (find non-matching rows):

-- Semi-join: customers WITH orders SELECT c.customer_name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id ); -- Anti-join: customers WITHOUT orders SELECT c.customer_name FROM customers c WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id );

EXISTS is often faster than IN for large datasets because the database stops scanning after finding the first match. NOT EXISTS is safer than NOT IN when the subquery column might contain NULLs.

Performance

The biggest performance lever in WHERE is whether the predicate can use an index. Database engineers call this sargability -- whether the predicate is "Search ARGument ABLE."

SARGABLE VS NON-SARGABLE PREDICATESNon-Sargable (full table scan)XWHERE YEAR(order_date) = 2025Function wraps the columnXWHERE UPPER(name) = 'JOHN'Function wraps the columnXWHERE price + tax > 100Arithmetic on the columnXWHERE name LIKE '%smith'Sargable (index seek)WHERE order_date >= '2025-01-01'Range on bare columnWHERE name = 'John'Use CI collation insteadWHERE price > 100 - taxMove arithmetic to the constantWHERE name LIKE 'smith%'
Click to enlarge

The core rule: never wrap an indexed column in a function. WHERE YEAR(order_date) = 2025 forces the database to evaluate the YEAR function on every row. Rewrite it as a range predicate:

-- Non-sargable (full scan) SELECT * FROM orders WHERE YEAR(order_date) = 2025; -- Sargable (index seek) SELECT * FROM orders WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01';

Other performance considerations:

  • Selective predicates first: when combining AND conditions, put the most selective one first. An index on status that returns 100 rows out of 10 million is more useful than one on region that returns 2 million.
  • OR and indexes: WHERE col1 = 'a' OR col2 = 'b' often cannot use a single index. Consider rewriting as two queries joined with UNION.
  • Implicit type conversion: WHERE order_id = '123' when order_id is an integer forces the database to convert every row, preventing index usage.

A predicate is sargable if the DBMS engine can take advantage of an index to speed up the execution of the query. The term is derived from "Search ARGument ABLE." Making predicates sargable is often the single most impactful query optimization technique.

— Markus Winand, Use The Index, Luke -- Obfuscated Conditions

Common Mistakes

NULL comparison with = or !=

Already covered above, but it bears repeating because it causes so many bugs. Always use IS NULL / IS NOT NULL. And remember that WHERE column != 'value' silently excludes NULLs.

Implicit type conversion

Comparing a numeric column to a string forces implicit conversion. Some databases convert the column (preventing index use); others convert the literal (safe but fragile):

-- Risky: string-to-integer conversion on every row SELECT * FROM orders WHERE order_id = '12345'; -- Safe: match the data type SELECT * FROM orders WHERE order_id = 12345;

Using OR instead of IN

Chaining many OR conditions on the same column is verbose and harder to optimize. Use IN instead:

-- Verbose SELECT * FROM products WHERE category = 'A' OR category = 'B' OR category = 'C'; -- Clean SELECT * FROM products WHERE category IN ('A', 'B', 'C');

Forgetting parentheses with AND/OR

Because AND binds tighter than OR, A OR B AND C is evaluated as A OR (B AND C). Always add parentheses when mixing AND and OR to make intent explicit.

Test before UPDATE/DELETE

Before running a destructive statement, test the WHERE condition with SELECT first:

-- Step 1: verify which rows will be affected SELECT * FROM orders WHERE status = 'cancelled' AND order_date < '2024-01-01'; -- Step 2: delete only after verifying the result set DELETE FROM orders WHERE status = 'cancelled' AND order_date < '2024-01-01';

WHERE clauses reference column values whose meaning depends on business context -- is status = 'A' "active" or "archived"? Dawiso's business glossary documents what each value means, reducing filtering errors caused by ambiguous column semantics. When analysts can look up canonical definitions in a data catalog, they write more accurate WHERE conditions.

Dawiso
Built with love for our users
Make Data Simple for Everyone.
Try Dawiso for free today and discover its ease of use firsthand.
© Dawiso s.r.o. All rights reserved