Skip to main content
sql functionsdatabase queriesrecord countingdata analysis

SQL COUNT — Complete Guide to Counting Records

COUNT is the most-used aggregate function in SQL. It counts rows matching a condition, counts non-NULL values in a column, or counts distinct values. Nearly every report, dashboard, and data quality check starts with some form of COUNT — alongside other aggregates like SUM, AVG, MIN, and MAX.

This guide covers the three COUNT variants, their NULL behavior, GROUP BY patterns, CASE-based counting, and the performance considerations that matter on tables with millions of rows.

TL;DR

SQL COUNT has three forms: COUNT(*) counts all rows including NULLs, COUNT(column) counts non-NULL values only, and COUNT(DISTINCT column) counts unique non-NULL values. Pair COUNT with GROUP BY for category totals and HAVING to filter groups by count. On large tables, COUNT(*) without a WHERE clause forces a full table scan — use covering indexes or approximate counts when exact numbers are not required.

Three Forms of COUNT

The difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column) comes down to how each handles NULLs. Most counting errors trace back to picking the wrong variant.

THREE FORMS OF COUNT — SAME DATA, DIFFERENT RESULTSSample table: users (5 rows, 2 NULLs in email column, 1 duplicate email)COUNT(*)Counts all rowsnameemailAlicealice@co.comBobNULLCarolcarol@co.comDanNULLEvealice@co.comResult: 5COUNT(email)Skips NULLsnameemailAlicealice@co.comBobNULL — skippedCarolcarol@co.comDanNULL — skippedEvealice@co.comResult: 3COUNT(DISTINCT email)Skips NULLs + duplicatesnameemailAlicealice@co.comBobNULL — skippedCarolcarol@co.comDanNULL — skippedEveduplicate — skippedResult: 2
Click to enlarge

COUNT(*) counts all rows, including those with NULLs. Use it when you want a total row count.

COUNT(column) counts only rows where the specified column is not NULL. Use it to measure data completeness — "how many customers have an email address?"

COUNT(DISTINCT column) counts unique non-NULL values. Use it to measure cardinality — "how many different email addresses exist?"

SELECT COUNT(*) AS total_rows, COUNT(email) AS rows_with_email, COUNT(DISTINCT email) AS unique_emails FROM users;

COUNT with WHERE

The WHERE clause filters rows before COUNT aggregates. This is straightforward but worth stating because the execution order matters — WHERE runs first, then COUNT operates on the surviving rows.

-- Count employees in a specific department SELECT COUNT(*) AS engineering_count FROM employees WHERE department = 'Engineering'; -- Count orders placed in 2025 SELECT COUNT(*) AS orders_2025 FROM orders WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'; -- Count products that are in stock SELECT COUNT(*) AS in_stock FROM products WHERE stock_quantity > 0;

COUNT with GROUP BY and HAVING

GROUP BY partitions rows into groups; COUNT aggregates within each group. HAVING filters groups based on the aggregate result.

COUNT EXECUTION ORDERFROM10,000 rowsWHERE3,200 rowsGROUP BY12 groupsCOUNT12 countsHAVING5 groupsSELECToutputWHERE filters rows before grouping. HAVING filters groups after counting.You cannot use COUNT in WHERE — it has not been computed yet at that stage.
Click to enlarge

-- Count employees per department, show only departments with 5+ employees SELECT department, COUNT(*) AS employee_count FROM employees WHERE status = 'active' GROUP BY department HAVING COUNT(*) >= 5 ORDER BY employee_count DESC;

The key distinction: WHERE filters rows before grouping; HAVING filters groups after counting. You cannot use COUNT in a WHERE clause — the count has not been computed at that stage of execution.

Conditional COUNT with CASE

CASE inside COUNT lets you count multiple categories in a single query, avoiding multiple table scans:

SELECT COUNT(CASE WHEN salary < 50000 THEN 1 END) AS under_50k, COUNT(CASE WHEN salary BETWEEN 50000 AND 100000 THEN 1 END) AS range_50k_100k, COUNT(CASE WHEN salary > 100000 THEN 1 END) AS over_100k FROM employees;

This scans the table once and produces three counts. The alternative — three separate SELECT COUNT(*) ... WHERE salary < 50000 queries — scans the table three times. On a table with millions of rows, the difference is measurable.

Another practical pattern — counting order priorities in one pass:

SELECT COUNT(CASE WHEN priority = 'high' THEN 1 END) AS high, COUNT(CASE WHEN priority = 'medium' THEN 1 END) AS medium, COUNT(CASE WHEN priority = 'low' THEN 1 END) AS low FROM support_tickets WHERE created_at >= '2025-01-01';

PostgreSQL offers a cleaner syntax with the FILTER clause:

SELECT COUNT(*) FILTER (WHERE priority = 'high') AS high, COUNT(*) FILTER (WHERE priority = 'medium') AS medium, COUNT(*) FILTER (WHERE priority = 'low') AS low FROM support_tickets WHERE created_at >= '2025-01-01';

COUNT in Subqueries

COUNT in subqueries enables "count of counts" patterns — finding groups that meet a threshold:

-- How many departments have more than 10 employees? SELECT COUNT(*) AS large_departments FROM ( SELECT department_id FROM employees GROUP BY department_id HAVING COUNT(*) > 10 ) AS dept_counts; -- How many customers have ordered more than once? SELECT COUNT(*) AS repeat_customers FROM ( SELECT customer_id FROM orders GROUP BY customer_id HAVING COUNT(*) > 1 ) AS multi_order_customers;

When checking whether rows exist, use EXISTS instead of COUNT(*) > 0. EXISTS stops scanning after the first matching row, while COUNT(*) must scan all matching rows to compute the total. On large tables, the performance difference can be orders of magnitude.

— PostgreSQL, EXISTS documentation

-- Slow: counts all matching rows, then checks if count > 0 SELECT * FROM customers c WHERE (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) > 0; -- Fast: stops at the first match SELECT * FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

Performance at Scale

COUNT(*) without a WHERE clause forces the database to scan every row in the table. On a table with 100 million rows, this takes seconds — or minutes if the table does not fit in memory.

COUNT(*) PERFORMANCE BY ENGINEScenarioMySQL InnoDBMySQL MyISAMPostgreSQLSQL ServerCOUNT(*) no WHEREFull table scanInstant (stored)Full table scanFull table scanCOUNT(*) + indexed WHEREIndex scanIndex scanIndex scanIndex scanAPPROX_COUNT_DISTINCTNot availableNot availableNot availableHyperLogLogMyISAM stores the exact row count and returns it instantly. InnoDB and PostgreSQL must scanbecause MVCC means different transactions see different row counts.
Click to enlarge

The reason InnoDB and PostgreSQL cannot store a pre-computed row count: MVCC (Multi-Version Concurrency Control) means different transactions may see different versions of the same row. There is no single "correct" count without scanning.

APPROX_COUNT_DISTINCT uses the HyperLogLog algorithm to estimate distinct counts with approximately 2% error rate, while using a fraction of the memory and CPU of an exact COUNT(DISTINCT). On billion-row tables, the approximate version can return in seconds instead of minutes.

— Microsoft, APPROX_COUNT_DISTINCT documentation

Practical performance tips:

  • Covering indexes let the database answer COUNT queries from the index alone, without touching the table. An index on (department, status) can serve SELECT department, COUNT(*) FROM employees WHERE status = 'active' GROUP BY department entirely from the index.
  • COUNT(*) with WHERE is usually fast if the WHERE columns are indexed. It becomes slow only when the WHERE matches most of the table.
  • When exact counts are not needed — dashboards showing "~12.3M rows" — approximate counts or cached counts are the right tool.

Common Mistakes

Using COUNT(column) when you meant COUNT(*). If the column has NULLs, COUNT(column) returns a smaller number than expected. If you want a total row count, always use COUNT(*).

COUNT(DISTINCT) on columns with NULLs. COUNT(DISTINCT email) excludes NULLs from the count. If 30% of your rows have NULL emails, the distinct count will be significantly lower than the total row count, which may or may not be what you intended.

Missing GROUP BY with non-aggregated columns. Writing SELECT department, COUNT(*) FROM employees without GROUP BY is an error in PostgreSQL and SQL Server. MySQL in permissive mode returns an arbitrary value for department — technically worse, because it silently gives you wrong results.

NULL verification pattern — when debugging unexpected COUNT results, always check for NULLs first:

SELECT COUNT(*) AS total_rows, COUNT(email) AS non_null_emails, COUNT(*) - COUNT(email) AS null_emails FROM customers;

Data Quality Checks with COUNT

COUNT is the foundation of data quality measurement. These queries answer "can I trust this table?"

Completeness check — what percentage of rows have values for each critical column:

SELECT 'email' AS field, COUNT(*) AS total_records, COUNT(email) AS populated, ROUND(COUNT(email) * 100.0 / COUNT(*), 2) AS completeness_pct FROM customers UNION ALL SELECT 'phone', COUNT(*), COUNT(phone), ROUND(COUNT(phone) * 100.0 / COUNT(*), 2) FROM customers;

Duplicate detection — find values that appear more than once:

SELECT email, COUNT(*) AS occurrences FROM customers WHERE email IS NOT NULL GROUP BY email HAVING COUNT(*) > 1;

Orphan record identification — find child rows with no matching parent:

SELECT COUNT(*) AS orphaned_items FROM order_items oi WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.order_id = oi.order_id );

Data quality checks like these tell you whether a table is trustworthy. In a data catalog, that completeness percentage becomes metadata attached to the column — visible to anyone deciding whether to use the data. Dawiso's data catalog tracks these quality metrics alongside column definitions and lineage, connecting the output of COUNT queries directly to data governance workflows.

Dawiso
Built with love for our users
Make Data Simple for Everyone.
Try Dawiso for free today and discover its ease of use firsthand.
© Dawiso s.r.o. All rights reserved