Skip to main content
sql selectdatabase queriesdata retrievalsql basics

SQL SELECT -- Complete Guide to Database Queries

SELECT is the most-used SQL statement. It retrieves data from one or more tables using a declarative syntax -- you describe the shape of the result you want, and the database engine figures out how to get it. This guide covers SELECT's core clauses, the execution order that governs how they interact, and practical patterns for real-world queries.

TL;DR

SQL SELECT retrieves data from database tables. Every query has a FROM clause (source), optional WHERE (filter), GROUP BY (aggregate), HAVING (filter groups), SELECT (columns), and ORDER BY (sort). The database processes these clauses in a fixed execution order that differs from how you write them. Master the execution order and you will debug 90% of SELECT problems before they happen.

How SELECT Works

SQL is a declarative language. When you write a SELECT statement, you are not telling the database step-by-step how to retrieve data. You are describing what the result should look like -- which columns, which rows, in what order. The database's query optimizer decides the actual execution path: which indexes to use, whether to scan a table or seek into it, and in what sequence to process joins.

This is the opposite of imperative code. In Python or Java, you write loops and conditionals to filter a list. In SQL, you state the condition, and the engine handles the iteration. A query like SELECT product_name, price FROM products WHERE price > 100 doesn't specify how to find matching rows -- the optimizer might use a B-tree index on the price column, a full table scan, or a bitmap index, depending on table statistics and available indexes.

This separation between intent and execution is why the same SELECT statement can perform very differently on two databases with different indexes, or even on the same database as data volume changes.

SELECT Syntax and Clause Order

A complete SELECT statement can include seven clauses. They must be written in this order:

SELECT columns FROM table WHERE row_filter GROUP BY grouping_columns HAVING group_filter ORDER BY sort_columns LIMIT row_count;

But the database executes them in a different order. This is the single most important concept in understanding SELECT behavior:

SQL QUERY EXECUTION ORDERStep 1FROMIdentify source tables and apply JOINsStep 2WHEREFilter individual rowsStep 3GROUP BYCollapse rows into groupsStep 4HAVINGFilter groups by aggregate conditionStep 5SELECTEvaluate expressions, compute aliasesStep 6-7ORDER BYSort result, then LIMIT restricts rows
Click to enlarge

Notice that SELECT is step 5, not step 1. This explains several common frustrations:

  • You cannot use a column alias defined in SELECT inside a WHERE clause -- because WHERE runs before SELECT evaluates aliases.
  • You can use a column alias in ORDER BY -- because ORDER BY runs after SELECT.
  • You cannot reference aggregate results in WHERE. Use HAVING instead, which runs after GROUP BY.

The SQL standard defines a logical processing order for query clauses: FROM and JOINs are resolved first, followed by WHERE, GROUP BY, HAVING, SELECT, ORDER BY, and finally OFFSET/FETCH. Understanding this order is essential for writing correct queries.

— ISO/IEC 9075, SQL Standard

Filtering and Sorting

The WHERE clause filters rows before any grouping or aggregation happens. It supports comparison operators, logical operators, and several special forms:

-- Comparison and logical operators SELECT order_id, customer_id, total FROM orders WHERE total > 500 AND status = 'shipped'; -- IN for multiple values SELECT product_name, category FROM products WHERE category IN ('Electronics', 'Furniture', 'Clothing'); -- BETWEEN for ranges (inclusive on both ends) SELECT employee_name, hire_date FROM employees WHERE hire_date BETWEEN '2024-01-01' AND '2024-12-31'; -- LIKE for pattern matching SELECT company_name FROM customers WHERE company_name LIKE 'Tech%'; -- IS NULL (not = NULL -- that never works) SELECT customer_id, email FROM customers WHERE phone IS NULL;

ORDER BY sorts the final result. You can sort by multiple columns and mix ascending/descending:

SELECT product_name, category, price FROM products WHERE price > 50 ORDER BY category ASC, price DESC;

Row limiting varies by database. MySQL and PostgreSQL use LIMIT, SQL Server uses TOP, and the SQL standard defines FETCH FIRST:

-- MySQL / PostgreSQL SELECT * FROM orders ORDER BY order_date DESC LIMIT 10; -- SQL Server SELECT TOP 10 * FROM orders ORDER BY order_date DESC; -- ANSI SQL SELECT * FROM orders ORDER BY order_date DESC FETCH FIRST 10 ROWS ONLY;

Aggregation and Grouping

GROUP BY collapses rows that share a value into summary rows. Combined with aggregate functions, it turns raw data into reports:

SELECT category, COUNT(*) AS product_count, AVG(price) AS avg_price, SUM(quantity_sold) AS total_sold FROM products GROUP BY category HAVING COUNT(*) > 5 ORDER BY total_sold DESC;

Key rules: every column in SELECT must either appear in GROUP BY or be inside an aggregate function. HAVING filters groups (after aggregation), while WHERE filters rows (before aggregation). See the GROUP BY guide for advanced patterns like ROLLUP, CUBE, and GROUPING SETS.

Joins and Subqueries

Real queries rarely read from a single table. JOINs combine rows from multiple tables based on a related column:

-- INNER JOIN: only matching rows SELECT c.customer_name, o.order_date, o.total FROM customers c INNER JOIN orders o ON c.customer_id = o.customer_id; -- LEFT JOIN: all customers, even without orders SELECT c.customer_name, o.order_date FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id; -- Multi-table chain SELECT c.customer_name, p.product_name, oi.quantity FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id;

Subqueries embed one SELECT inside another. They are useful for comparisons against aggregated values or for filtering with EXISTS:

-- Scalar subquery: compare against an aggregate SELECT product_name, price FROM products WHERE price > (SELECT AVG(price) FROM products); -- EXISTS: find customers who placed orders in 2025 SELECT c.customer_name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.order_date >= '2025-01-01' );

CTEs (Common Table Expressions) make complex queries more readable by breaking them into named steps:

WITH monthly_revenue AS ( SELECT DATE_TRUNC('month', order_date) AS month, SUM(total) AS revenue FROM orders GROUP BY DATE_TRUNC('month', order_date) ) SELECT month, revenue, revenue - LAG(revenue) OVER (ORDER BY month) AS change FROM monthly_revenue ORDER BY month;
ANATOMY OF A SELECT STATEMENTSELECTc.customer_name,Column list (step 5)SUM(oi.quantity) AS total_qtyAggregate + aliasFROMcustomers cSource table + alias (step 1)JOINorders o ON c.customer_id = o.customer_idCombine tables (step 1)JOINorder_items oi ON o.order_id = oi.order_idWHEREo.order_date >= '2025-01-01'Row filter (step 2)GROUP BYc.customer_nameCollapse into groups (step 3)HAVINGSUM(oi.quantity) > 10Group filter (step 4)ORDER BYtotal_qty DESCSort result (step 6)LIMIT20;Restrict output rows (step 7)
Click to enlarge

Performance and Common Mistakes

Most SELECT performance issues come down to three things: missing indexes, reading more data than needed, and writing predicates that prevent index usage.

The SELECT * anti-pattern

Using SELECT * in production code causes problems that go beyond wasted bandwidth. It reads every column from disk even when you only need two. It breaks applications when someone adds or reorders columns. And it prevents index-only scans -- a fast path where the database answers the query entirely from an index without touching the table data.

-- Avoid in production SELECT * FROM orders WHERE customer_id = 42; -- Better: name the columns you need SELECT order_id, order_date, total FROM orders WHERE customer_id = 42;

Use EXPLAIN to understand execution

Every database has an EXPLAIN command that shows how it plans to execute a query. This is the first tool to reach for when a query is slow:

-- PostgreSQL EXPLAIN ANALYZE SELECT c.customer_name, COUNT(o.order_id) AS order_count FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_date >= '2025-01-01' GROUP BY c.customer_name ORDER BY order_count DESC;

Look for Seq Scan (full table scan) on large tables -- this usually means a missing index. Look for Nested Loop with high row counts -- this can indicate a missing join index or a better join strategy.

Common mistakes

  • Comparing with NULL using = -- WHERE status = NULL returns zero rows. Use IS NULL instead.
  • Ambiguous column references -- in multi-table queries, always prefix columns with table aliases. WHERE customer_id = 5 fails if both tables have a customer_id column.
  • Implicit type conversion -- WHERE order_id = '123' forces the database to convert every row's order_id to a string, which can prevent index usage.
  • Using column aliases in WHERE -- WHERE total_qty > 10 fails if total_qty is a SELECT alias. Repeat the expression or use a CTE.

The most important thing for speeding up queries on a single table is indexing. The most common mistake is to write queries that prevent the database from using the available indexes.

— Markus Winand, Use The Index, Luke

Practical Patterns

CASE expressions for conditional logic

CASE lets you create computed columns with conditional logic, which is useful for bucketing, labeling, and conditional aggregation:

SELECT product_name, price, CASE WHEN price < 50 THEN 'Budget' WHEN price < 200 THEN 'Mid-range' ELSE 'Premium' END AS price_tier FROM products ORDER BY price;

Window functions for ranking

Window functions let you compute values across related rows without collapsing them into groups. Unlike GROUP BY, they keep every row in the result:

SELECT product_name, category, price, RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank, ROW_NUMBER() OVER (ORDER BY price DESC) AS overall_rank FROM products;

E-commerce monthly revenue report

SELECT DATE_TRUNC('month', o.order_date) AS order_month, COUNT(DISTINCT o.customer_id) AS unique_customers, COUNT(*) AS total_orders, SUM(o.total) AS monthly_revenue, AVG(o.total) AS avg_order_value FROM orders o WHERE o.order_date >= '2025-01-01' GROUP BY DATE_TRUNC('month', o.order_date) ORDER BY order_month;

Customer segmentation with tiering

SELECT c.customer_id, c.customer_name, COUNT(o.order_id) AS total_orders, SUM(o.total) AS total_spent, CASE WHEN SUM(o.total) > 10000 THEN 'VIP' WHEN SUM(o.total) > 2000 THEN 'Premium' ELSE 'Standard' END AS customer_tier FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name ORDER BY total_spent DESC;
SELECT * VS NAMED COLUMNSSELECT *XExtra disk I/O for unused columnsXBreaks when schema changesXCannot use index-only scansXHides column intent from readersXSends unnecessary data over networkSELECT col1, col2Explicit intent -- self-documentingEnables index-only scansSmaller result set, less memorySurvives schema changes safelyEasier to review in code diffs
Click to enlarge

When SELECT queries reference column names like "revenue" or "active_customer," those definitions should live in a shared data catalog and business glossary. Dawiso provides the canonical definitions that make query results interpretable across teams. Through data lineage, analysts can trace which source tables feed their SELECT results and verify that the data they are querying is current and trustworthy.

Dawiso
Built with love for our users
Make Data Simple for Everyone.
Try Dawiso for free today and discover its ease of use firsthand.
© Dawiso s.r.o. All rights reserved