SQL INSERT — Guide to Adding Data to Database Tables
INSERT adds rows to a table. You specify the target table, the columns, and the values. Every other SQL operation — SELECT, UPDATE, DELETE — assumes the data already exists. INSERT is how it gets there.
This guide covers single-row and multi-row inserts, INSERT ... SELECT for data migration, upsert patterns for handling duplicates, and bulk insert performance across PostgreSQL, MySQL, and SQL Server.
SQL INSERT adds rows to a table. Always list column names explicitly — omitting them breaks when the schema changes. Use multi-row VALUES for batch inserts (5–10x faster than individual statements). INSERT ... SELECT copies data between tables in a single operation. For duplicate handling, PostgreSQL uses ON CONFLICT, MySQL uses ON DUPLICATE KEY UPDATE, and SQL Server uses MERGE. Wrap related inserts in transactions.
INSERT Syntax
The basic INSERT statement names the target table, lists columns, and provides values:
INSERT INTO employees (employee_id, first_name, last_name, email, department)
VALUES (1001, 'Sarah', 'Chen', 'sarah.chen@company.com', 'Engineering');
Always list column names explicitly. The shorthand form — INSERT INTO employees VALUES (...) — relies on column order matching the table definition. If someone adds a column, reorders columns, or drops one, the statement breaks or silently inserts values into the wrong columns. Listing columns costs you ten seconds of typing and saves hours of debugging.
When you skip a column in the column list, the database fills it with the column's DEFAULT value. If no default is defined and the column is nullable, it gets NULL. If the column is NOT NULL with no default, the statement fails. This behavior is predictable and worth relying on — for auto-increment IDs, timestamps, and boolean flags that default to false.
Data types in the VALUES clause must match the column definitions. Inserting a string into an integer column raises an error in strict mode (PostgreSQL, SQL Server) or silently truncates in permissive mode (MySQL with relaxed sql_mode). Strict is better.
Multi-Row INSERT
Instead of running one INSERT per row, batch them into a single statement:
INSERT INTO products (product_name, category, price, stock_quantity)
VALUES
('Wireless Mouse', 'Peripherals', 29.99, 100),
('Mechanical Keyboard', 'Peripherals', 89.99, 50),
('USB-C Hub', 'Accessories', 49.99, 75),
('Portable Monitor', 'Displays', 199.99, 30);
The performance difference is significant. Each individual INSERT requires a round trip to the database, a parse-plan-execute cycle, and a transaction log write. A multi-row INSERT does all of that once.
Practical limits vary by engine. MySQL caps the total statement size at max_allowed_packet (default 64 MB). PostgreSQL limits the number of parameters to 65,535, so a 10-column table caps out around 6,500 rows per statement. In practice, 100–1,000 rows per INSERT is the sweet spot — large enough to amortize overhead, small enough to avoid memory pressure.
INSERT ... SELECT
INSERT ... SELECT copies data from one table (or query result) into another without leaving the database engine. No application round trip, no client-side memory allocation.
INSERT INTO archived_orders (order_id, customer_id, order_date, total_amount)
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE order_date < '2024-01-01' AND status = 'completed';
This is the standard pattern for data archiving: move old records to an archive table, then DELETE the originals.
INSERT ... SELECT also works for building summary tables. Combine it with GROUP BY, JOIN, and WHERE to aggregate data from multiple sources into a reporting table:
INSERT INTO monthly_revenue (year, month, total_revenue, order_count)
SELECT
EXTRACT(YEAR FROM order_date),
EXTRACT(MONTH FROM order_date),
SUM(total_amount),
COUNT(*)
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date);
Unlike CREATE TABLE AS SELECT (CTAS), INSERT ... SELECT targets an existing table with pre-defined constraints, indexes, and permissions. Use CTAS for one-off exploration; use INSERT ... SELECT for repeatable data pipelines.
Handling Duplicates (Upsert)
The upsert problem: you want to insert a row if it does not exist, or update it if it does. Every major database solves this differently.
SQL Server's MERGE statement is not atomic by default. Without a HOLDLOCK hint, concurrent sessions can cause primary key violations even with WHEN NOT MATCHED. Always use MERGE with HOLDLOCK or serializable isolation in concurrent workloads.
— Microsoft, MERGE (Transact-SQL) documentation
PostgreSQL uses ON CONFLICT. You specify which unique constraint to check and what to do when a conflict occurs. EXCLUDED refers to the row that would have been inserted:
INSERT INTO products (product_code, product_name, price)
VALUES ('PRD001', 'Premium Widget', 299.99)
ON CONFLICT (product_code)
DO UPDATE SET
product_name = EXCLUDED.product_name,
price = EXCLUDED.price,
updated_at = CURRENT_TIMESTAMP;
MySQL uses ON DUPLICATE KEY UPDATE. It triggers on any unique key conflict, not just a specific one:
INSERT INTO products (product_code, product_name, price)
VALUES ('PRD001', 'Premium Widget', 299.99)
ON DUPLICATE KEY UPDATE
product_name = VALUES(product_name),
price = VALUES(price);
MySQL also has REPLACE INTO, but avoid it. REPLACE deletes the existing row and inserts a new one, which resets auto-increment values, fires DELETE triggers, and breaks foreign key relationships. Use ON DUPLICATE KEY UPDATE instead.
SQL Server uses MERGE, which handles insert, update, and delete in a single statement. It is powerful but verbose, and requires care with concurrency.
Default Values and NULL Handling
Columns that you omit from the INSERT column list receive their DEFAULT value. This is the mechanism behind auto-increment IDs, automatic timestamps, and boolean flags that start as false.
The SQL standard specifies that if a column has no explicit DEFAULT and is nullable, the implicit default is NULL. If it is NOT NULL with no default, the INSERT must provide a value or the statement is rejected.
— ISO/IEC, SQL:2023 Foundation (ISO/IEC 9075-2)
You can also use the DEFAULT keyword explicitly in the VALUES clause:
INSERT INTO orders (customer_id, order_date, status, total_amount)
VALUES (1001, DEFAULT, 'pending', 299.99);
Here order_date gets whatever default the column defines — typically CURRENT_DATE or CURRENT_TIMESTAMP.
Auto-increment columns (PostgreSQL SERIAL/IDENTITY, MySQL AUTO_INCREMENT, SQL Server IDENTITY) should be omitted from the column list entirely. The database generates the value. If you need the generated ID, use PostgreSQL's RETURNING clause, MySQL's LAST_INSERT_ID(), or SQL Server's OUTPUT INSERTED.
Inserting NULL into a NOT NULL column raises an error. This is the intended behavior — it protects data integrity. If you see these errors frequently, the issue is usually an application bug (missing required field), not a database problem.
Transaction Safety
When an INSERT depends on another INSERT — an order header and its line items, for example — wrap them in a transaction. If the line items fail, the header should not persist alone.
BEGIN;
INSERT INTO orders (customer_id, order_date, total_amount, status)
VALUES (1001, CURRENT_DATE, 299.97, 'processing');
-- PostgreSQL: get the generated order_id
-- MySQL: SET @order_id = LAST_INSERT_ID();
INSERT INTO order_items (order_id, product_id, quantity, unit_price)
VALUES
(currval('orders_order_id_seq'), 2001, 2, 99.99),
(currval('orders_order_id_seq'), 2002, 1, 99.99);
COMMIT;
If any statement inside the transaction fails, issue a ROLLBACK to discard all changes. The database returns to the state before the BEGIN.
When inserting data into warehouse tables, knowing the column definitions, data types, and constraints prevents runtime errors. A data catalog makes these definitions searchable. Dawiso's business glossary provides canonical column definitions so teams writing INSERT statements know exactly what format each column expects.
Bulk Loading Alternatives
When INSERT is too slow — loading millions of rows from CSV files, migrating between systems, or populating a data warehouse — databases offer dedicated bulk loading tools that bypass the SQL parser and write directly to storage.
PostgreSQL COPY is the fastest way to load data from a file:
COPY products (product_name, category, price, stock_quantity)
FROM '/data/products.csv'
WITH (FORMAT csv, HEADER true);
MySQL LOAD DATA INFILE serves the same purpose:
LOAD DATA INFILE '/data/products.csv'
INTO TABLE products
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(product_name, category, price, stock_quantity);
SQL Server BULK INSERT and the bcp utility handle high-volume loading with options for batch size, error tolerance, and minimal logging.
For all bulk loading methods, disabling indexes during the load and rebuilding them afterward can cut load time by 50–80%. The database otherwise updates every index for every row inserted. Rebuild once at the end instead.
When bulk load volumes and transformation logic grow complex enough that raw SQL becomes hard to maintain, ETL tools — or the metadata management layer that tracks the pipelines — become the better path.