Understand SQL Better and Write Faster Queries
If you can predict how the database will execute your query, you can usually make it fast. This guide builds that intuition and gives you concrete, repeatable steps to improve performance in Postgres and MySQL (most ideas apply elsewhere too).
1) Think in sets, not loops
- SQL describes what you want from sets of rows; the planner decides how.
- Replace iterative app logic with set-based operations: one statement beats N queries.
Example (bad N+1):
-- For each user in app code, you fetch their posts separately
SELECT * FROM users LIMIT 100; -- then 100 more queries for posts
Better:
SELECT u.id, u.name, p.id AS post_id, p.title
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.active = true
ORDER BY u.id
LIMIT 100;
2) Use the right schema and types
- Normalize to avoid duplication; denormalize only for read-heavy hotspots with evidence.
- Choose correct types:
- Use integer/bigint for ids, numeric for money, timestamp with time zone for moments-in-time, boolean for flags.
- Avoid storing numbers in text.
- Understand NULL:
COUNT(*)counts rows;COUNT(col)ignores NULLs.WHERE col <> 5won’t match rows wherecolis NULL.
3) Read execution plans
- Postgres:
EXPLAIN (ANALYZE, BUFFERS) <query> - MySQL:
EXPLAIN ANALYZE <query>(8.0+), orEXPLAIN <query>
What to check:
- Estimated vs actual rows (cardinality). Large mismatches = poor stats or predicates.
- Access: Index Scan/Seek vs Seq/Full Scan.
- Join method: Nested Loop, Hash Join, Merge Join.
- Sorts and materializations: Can you avoid them with indexes?
- Expensive nodes: High time, high rows, or repeated loops.
4) Indexing fundamentals
- Indexes speed lookups by ordered access; they cost writes and space.
- Typical index types:
- B-tree: default, good for equality and range.
- Postgres: GIN/GiST for full-text, arrays, trigrams; Hash for equality (rarely needed now).
- Composite indexes: order matters. Put most selective/equality-first columns first; sort columns near the end.
- Covering indexes: include all needed columns to avoid lookups.
- Postgres: add non-key columns with
INCLUDE. - MySQL InnoDB: secondary indexes already include PK; you can still benefit from column order for covering to avoid back lookups.
- Postgres: add non-key columns with
- Filtered/partial indexes:
- Postgres:
CREATE INDEX ... WHERE deleted_at IS NULL; - MySQL: use generated columns + partial prefix (or secondary strategies), or separate tables/partitions.
- Postgres:
5) Write sargable predicates
“SARGable” = Search ARGument-able, meaning the engine can use an index.
Do this:
-- Good: function on constant, not column
WHERE created_at >= now() - interval '30 days'
-- Good: left-anchored LIKE with index or trigram/FTS where supported
WHERE email LIKE 'john%'
Avoid this:
-- Bad: function on column blocks index
WHERE date(created_at) = '2025-03-10'
-- Bad: non-anchored pattern needs full scan unless using trigram/FTS
WHERE email LIKE '%john%'
Fixes:
-- Make it a range
WHERE created_at >= '2025-03-10'::date
AND created_at < '2025-03-11'::date
-- Postgres trigram
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON users USING gin (email gin_trgm_ops);
Tips:
- Use
BETWEEN a AND bcarefully (inclusive both sides). For timestamps, prefer[start, next_day)half-open ranges. - Replace
ORwithUNION ALLif it enables index usage on both branches and avoids broad scans. - Large
IN (...)lists can be fine if indexed; for very large lists, load into a temp table and join.
6) Join strategies that scale
- Always join on indexed keys when possible.
- Filter early: push selective
WHEREclauses into subqueries/CTEs so fewer rows join. - Choose the right semi-join:
EXISTSoften beatsINfor correlated checks.- Prefer
JOIN ... ONwith proper conditions; avoidJOINthenDISTINCTto fake semi-joins.
Examples:
-- Existence check
SELECT u.*
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.state = 'paid'
);
-- Pre-filter before join
WITH active_orders AS (
SELECT id, user_id FROM orders WHERE state = 'paid'
)
SELECT u.id, u.name, ao.id AS order_id
FROM users u
JOIN active_orders ao ON ao.user_id = u.id;
7) Aggregations and windows
- Group by fewer rows: filter first, then aggregate.
- Index on group-by columns can help with grouping and ordered aggregates.
- Window functions save many self-joins and are often faster.
Examples:
-- Running totals per user
SELECT
user_id,
created_at,
amount,
sum(amount) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total
FROM payments
WHERE created_at >= now() - interval '90 days';
8) Pagination that doesn’t slow down
- Offset pagination gets slower as offset grows: the engine still scans skipped rows.
- Keyset pagination (a.k.a. seek method) is fast and stable.
-- Slow for large offsets
SELECT * FROM posts ORDER BY id ASC LIMIT 50 OFFSET 100000;
-- Fast: use the last seen id
SELECT * FROM posts WHERE id > $last_id ORDER BY id ASC LIMIT 50;
9) Upserts, batching, and the N+1 cure
- Prefer one statement:
- Postgres:
INSERT ... ON CONFLICT (key) DO UPDATE ... - MySQL:
INSERT ... ON DUPLICATE KEY UPDATE ...
- Postgres:
- Batch writes to reduce round trips; use bulk loaders for very large imports:
- Postgres:
COPY table FROM STDIN. - MySQL:
LOAD DATA INFILE.
- Postgres:
- Replace per-row updates with a single set-based statement.
10) Transactions, locks, and contention
- Keep transactions short. Avoid holding open transactions in web requests longer than necessary.
- Choose isolation level intentionally.
- Use lock options to avoid waiting forever:
- Postgres:
SELECT ... FOR UPDATE NOWAITorSKIP LOCKEDfor queues.
- Postgres:
- Watch for hot rows (counters, global settings). Shard counters or use batch increments.
11) Stats, maintenance, and partitioning
- Fresh statistics = good plans.
- Postgres:
ANALYZEruns automatically; increaseALTER TABLE ... ALTER COLUMN ... SET STATISTICSfor skewed columns. - MySQL: persistent statistics in 8.0;
ANALYZE TABLEwhen needed.
- Postgres:
- Manage bloat and fragmentation:
- Postgres: autovacuum; consider
VACUUM (FULL)orREINDEXduring maintenance windows if needed. - MySQL:
OPTIMIZE TABLEfor some engines.
- Postgres: autovacuum; consider
- Partition large, time-based tables and ensure pruning:
- Postgres declarative partitioning with range/list; index per partition.
- MySQL range/list/hash partitions; verify pruning in
EXPLAIN.
12) Practical indexing patterns
- Equality then range:
-- Query
WHERE org_id = $1 AND created_at >= $from AND created_at < $to
-- Index (Postgres/MySQL)
CREATE INDEX idx_events_org_created_at ON events (org_id, created_at);
- Covering read-mostly table:
-- Query
SELECT email, name FROM users WHERE lower(email) = lower($1);
-- Postgres functional + include
CREATE INDEX idx_users_email_lower ON users ((lower(email))) INCLUDE (name);
- Partial/filtered index for active rows:
-- Postgres
CREATE INDEX idx_orders_active ON orders (user_id, created_at) WHERE state = 'paid';
13) Measuring and iterating
- Use the right tools:
- Postgres:
EXPLAIN (ANALYZE, BUFFERS),pg_stat_statements,auto_explain,pg_stat_io(PG16+),pg_stat_activity. - MySQL:
EXPLAIN ANALYZE, Performance Schema, Slow Query Log.
- Postgres:
- Compare plans before/after index or rewrite. Keep fixtures representative of production data skew.
- Track p95/p99 latencies and rows examined vs returned.
14) A 10-minute optimization checklist
Before writing:
- What is the exact question? Which rows and which columns?
- Can you filter or aggregate earlier? Can you avoid sorting?
- Which columns are in predicates, joins, group by, order by?
While writing:
- Use sargable predicates, avoid functions on columns.
- Return only needed columns, avoid
SELECT *. - Prefer keyset pagination over large offsets.
After writing:
EXPLAINthe query. Is it using an index? Are estimates close to actuals?- Add/adjust a composite or covering index if justified.
- Re-check plan; confirm rows examined dropped and latency improved.
15) Mini lab: from 2.3s to 120ms
Original:
SELECT *
FROM events
WHERE date(created_at) = '2025-03-10'
AND org_id = 42
ORDER BY created_at DESC
LIMIT 100;
Problems: function on column, SELECT *, missing composite index.
Fix:
-- Half-open time window + limited columns
SELECT id, created_at, type, payload
FROM events
WHERE org_id = 42
AND created_at >= '2025-03-10'::date
AND created_at < '2025-03-11'::date
ORDER BY created_at DESC
LIMIT 100;
-- Index
CREATE INDEX idx_events_org_created_at ON events (org_id, created_at DESC) INCLUDE (type, payload);
Result: Index seek on (org_id, created_at), no sort, covering read.
Key takeaways:
- Make the plan predictable: sargable predicates + right composite indexes.
- Filter early, aggregate smartly, paginate with seeks.
- Measure with
EXPLAIN ANALYZE, iterate, and keep stats fresh.