Skip to main content
sql jointable relationshipsdatabase queriesrelational database

SQL JOIN -- Complete Guide to Database Table Relationships

SQL JOIN combines rows from two or more tables based on a related column. Relational databases split data across tables to avoid duplication -- customers in one table, orders in another, products in a third. JOINs are how you reassemble that data for queries. Understanding which JOIN type to use determines whether your results include all records, only matches, or both.

TL;DR

SQL JOIN combines rows from multiple tables using a shared column (usually a foreign key). INNER JOIN returns only matching rows. LEFT JOIN returns all rows from the left table plus matches from the right. RIGHT JOIN does the reverse. FULL OUTER JOIN returns everything. CROSS JOIN produces every possible combination. Choose the type based on whether missing matches should appear as NULLs or be excluded.

How JOINs Work

A JOIN operates on two tables and a condition that defines how rows match. The condition appears in the ON clause and almost always compares a foreign key in one table to a primary key in another.

Consider two tables: customers with a customer_id primary key, and orders with a customer_id foreign key. When you write JOIN orders ON customers.customer_id = orders.customer_id, the database finds every pair of rows where the customer IDs match. Customer "Acme Corp" with customer_id = 7 gets matched to every order row where customer_id = 7.

This is why normalization creates the need for JOINs. Instead of duplicating the customer's name, address, and email on every order row, a normalized database stores customer data once and uses the foreign key to link tables. JOINs reassemble the full picture at query time.

The ON clause is not limited to simple equality. You can join on multiple columns, use inequality conditions, or combine equality with range checks. But equality joins on indexed foreign keys are by far the most common -- and the most performant.

JOIN Types

SQL JOIN TYPESABINNER JOINOnly matchesABLEFT JOINAll A + matchesABRIGHT JOINAll B + matchesABFULL OUTEREverything
Click to enlarge

INNER JOIN

Returns only rows where the ON condition matches in both tables. If a customer has no orders, they do not appear. If an order has no matching customer, it does not appear.

SELECT c.customer_name, o.order_date, o.total FROM customers c INNER JOIN orders o ON c.customer_id = o.customer_id;

LEFT JOIN

Returns all rows from the left table and matching rows from the right. Where there is no match, right-side columns are NULL. This is the join you use when you want "all customers, even those without orders."

SELECT c.customer_name, o.order_date, o.total FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;

RIGHT JOIN

The mirror of LEFT JOIN. Returns all rows from the right table plus matches from the left. In practice, most developers rewrite RIGHT JOINs as LEFT JOINs by swapping the table order, since it reads more naturally.

SELECT c.customer_name, o.order_date, o.total FROM customers c RIGHT JOIN orders o ON c.customer_id = o.customer_id;

FULL OUTER JOIN

Returns all rows from both tables. Where there is no match on either side, the missing columns are NULL. Useful for data reconciliation -- finding records that exist in one system but not another.

SELECT c.customer_name, o.order_id, o.total FROM customers c FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;

CROSS JOIN

Produces a Cartesian product -- every row from the first table paired with every row from the second. If table A has 100 rows and table B has 50, the result has 5,000 rows. Useful for generating combinations (e.g., all products crossed with all store locations), but dangerous on large tables.

SELECT p.product_name, s.store_name FROM products p CROSS JOIN stores s;

SELF JOIN

A table joined to itself using different aliases. The classic use case is an employee-manager hierarchy where manager_id references another row in the same employees table:

SELECT e.employee_name AS employee, m.employee_name AS manager FROM employees e INNER JOIN employees m ON e.manager_id = m.employee_id;

Multi-Table JOINs

Real queries often chain three, four, or more tables together. An e-commerce order detail query is a textbook example -- you need data from customers, orders, order items, and products:

MULTI-TABLE JOIN CHAINcustomersPK: customer_idcustomer_idordersPK: order_idorder_idorder_itemsFK: order_id, product_idproduct_idproductsPK: product_idEach arrow represents a JOIN ON condition matching a foreign key to a primary key
Click to enlarge
SELECT c.customer_name, o.order_date, p.product_name, oi.quantity, oi.unit_price, (oi.quantity * oi.unit_price) AS line_total FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id WHERE o.order_date >= '2025-01-01' ORDER BY o.order_date DESC;

Table aliases (c, o, oi, p) are not optional in multi-table queries -- they make the code readable and prevent ambiguous column errors. Choose short but meaningful abbreviations.

When queries join five or more tables, understanding which columns match across tables requires documentation. Dawiso's data catalog maps foreign key relationships and column lineage, so analysts can verify JOIN conditions against the canonical data model instead of guessing from column names.

JOIN vs Subqueries

JOINs and subqueries can often solve the same problem. The choice depends on what you need in the result and how the optimizer handles each pattern.

Use JOIN when you need columns from both tables in the output, or when the optimizer can leverage indexes on both sides efficiently:

-- JOIN: get customer names alongside their order totals SELECT c.customer_name, SUM(o.total) AS lifetime_value FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_name;

Use a subquery when the logic is a filter (yes/no), not a data source. EXISTS is particularly efficient for semi-joins because the database can stop searching after finding the first match:

-- EXISTS: find customers who ordered in 2025 (no order columns needed) SELECT c.customer_name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.order_date >= '2025-01-01' );

PostgreSQL's query planner often converts IN subqueries to semi-joins internally, making them equivalent to EXISTS in performance. However, for NOT IN with nullable columns, the semantics differ -- NOT EXISTS is almost always the safer and faster choice.

— PostgreSQL Documentation, Query Planning

Conditional and Advanced JOINs

Additional ON conditions

You can add conditions to the ON clause beyond simple key equality. This filters rows during the join, which behaves differently from filtering in WHERE for outer joins:

-- Only join to orders from 2025 -- unmatched customers still appear SELECT c.customer_name, o.order_date FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2025-01-01';

If you moved o.order_date >= '2025-01-01' to the WHERE clause instead, it would filter out customers without 2025 orders entirely -- turning your LEFT JOIN into an effective INNER JOIN.

Self-join patterns

Beyond employee-manager hierarchies, self-joins find rows that relate within the same table. For example, finding products in the same category that are priced within 10% of each other:

SELECT a.product_name AS product_a, b.product_name AS product_b, a.price AS price_a, b.price AS price_b FROM products a JOIN products b ON a.category = b.category AND a.product_id < b.product_id AND b.price BETWEEN a.price * 0.9 AND a.price * 1.1;

JOIN with a derived table

You can join to the result of a subquery, treating it as a temporary table:

SELECT c.customer_name, recent.order_count FROM customers c JOIN ( SELECT customer_id, COUNT(*) AS order_count FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '30 days' GROUP BY customer_id ) recent ON c.customer_id = recent.customer_id;

Performance

JOIN performance depends heavily on indexes, table sizes, and the optimizer's strategy. These are the high-impact optimizations:

Index the JOIN columns. Every foreign key used in an ON clause should have an index. Without it, the database resorts to nested loop scans -- acceptable for small tables, disastrous for large ones.

Read the EXPLAIN output. Look for Hash Join vs Nested Loop vs Merge Join. Hash joins work well for large unsorted datasets. Merge joins work well when both sides are already sorted (or indexed). Nested loops are efficient for small inner tables with an index:

EXPLAIN ANALYZE SELECT c.customer_name, o.total FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_date >= '2025-01-01';

Filter before joining. WHERE clauses that reduce the row count early give the optimizer less work. A query that filters orders to the last 30 days before joining to customers is cheaper than joining all orders and filtering afterward -- though a good optimizer may reorder these automatically.

Avoid accidental Cartesian products. A missing or incorrect ON condition turns a JOIN into a CROSS JOIN. If customers has 10,000 rows and orders has 500,000, the accidental cross product is 5 billion rows. Always verify row counts with SELECT COUNT(*) after writing a new multi-table query.

Adding an index on a foreign key column used in a JOIN condition can reduce query execution time by orders of magnitude. On a 10-million-row table, an indexed nested loop join can complete in milliseconds, while the same join without an index may take minutes.

— Markus Winand, Use The Index, Luke -- Join Operations

Troubleshooting

Duplicate rows

If your query returns more rows than expected, the most likely cause is a one-to-many relationship you did not account for. Joining customers to orders gives one result row per order, not per customer. If you only want customer-level data, aggregate with GROUP BY or use DISTINCT -- but understand why duplicates appear before masking them.

Missing data

If rows are unexpectedly absent, check three things: (1) Is the JOIN type correct? INNER JOIN excludes non-matching rows. (2) Do NULL values in the join columns prevent matching? NULLs never equal anything, including other NULLs. (3) Are data types mismatched? Joining a VARCHAR column to an INTEGER column may produce zero matches or force implicit conversion.

NULL handling in LEFT JOIN results

LEFT JOIN fills right-side columns with NULL when there is no match. Use COALESCE to provide defaults:

SELECT c.customer_name, COALESCE(SUM(o.total), 0) AS lifetime_value FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_name;

Database-Specific Syntax

MySQL: USING clause

When both tables have a column with the same name, MySQL's USING shorthand is cleaner than ON:

SELECT c.customer_name, o.order_date FROM customers c JOIN orders o USING (customer_id);

PostgreSQL: LATERAL JOIN

LATERAL lets a subquery in the FROM clause reference columns from preceding tables -- enabling "for each row, run this subquery" patterns. It is useful for finding the N most recent orders per customer:

SELECT c.customer_name, recent.order_date, recent.total FROM customers c LEFT JOIN LATERAL ( SELECT order_date, total FROM orders o WHERE o.customer_id = c.customer_id ORDER BY order_date DESC LIMIT 3 ) recent ON true;

SQL Server: CROSS APPLY / OUTER APPLY

SQL Server's equivalent of LATERAL JOIN. CROSS APPLY works like INNER JOIN LATERAL and OUTER APPLY like LEFT JOIN LATERAL:

SELECT c.customer_name, recent.order_date, recent.total FROM customers c OUTER APPLY ( SELECT TOP 3 order_date, total FROM orders o WHERE o.customer_id = c.customer_id ORDER BY order_date DESC ) recent;
JOIN EXECUTION FLOWRead Table Acustomers (1,000 rows)Read Table Borders (50,000 rows)Match on ONcustomer_id = customer_idApply WHEREorder_date >= 2025ResultFor LEFT JOIN: unmatched rows from Table A are added with NULLs for Table B columnsWHERE filters happen AFTER the JOIN -- moving conditions to ON changes LEFT JOIN behavior
Click to enlarge
Dawiso
Built with love for our users
Make Data Simple for Everyone.
Try Dawiso for free today and discover its ease of use firsthand.
© Dawiso s.r.o. All rights reserved