Understand SQL Better and Write Faster Queries

If you can predict how the database will execute your query, you can usually make it fast. This guide builds that intuition and gives you concrete, repeatable steps to improve performance in Postgres and MySQL (most ideas apply elsewhere too).

1) Think in sets, not loops

SQL describes what you want from sets of rows; the planner decides how.
Replace iterative app logic with set-based operations: one statement beats N queries.

Example (bad N+1):

-- For each user in app code, you fetch their posts separately
SELECT * FROM users LIMIT 100; -- then 100 more queries for posts

Better:

SELECT u.id, u.name, p.id AS post_id, p.title
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.active = true
ORDER BY u.id
LIMIT 100;

2) Use the right schema and types

Normalize to avoid duplication; denormalize only for read-heavy hotspots with evidence.
Choose correct types:
- Use integer/bigint for ids, numeric for money, timestamp with time zone for moments-in-time, boolean for flags.
- Avoid storing numbers in text.
Understand NULL:
- COUNT(*) counts rows; COUNT(col) ignores NULLs.
- WHERE col <> 5 won’t match rows where col is NULL.

3) Read execution plans

Postgres: EXPLAIN (ANALYZE, BUFFERS) <query>
MySQL: EXPLAIN ANALYZE <query> (8.0+), or EXPLAIN <query>

What to check:

Estimated vs actual rows (cardinality). Large mismatches = poor stats or predicates.
Access: Index Scan/Seek vs Seq/Full Scan.
Join method: Nested Loop, Hash Join, Merge Join.
Sorts and materializations: Can you avoid them with indexes?
Expensive nodes: High time, high rows, or repeated loops.

4) Indexing fundamentals

Indexes speed lookups by ordered access; they cost writes and space.
Typical index types:
- B-tree: default, good for equality and range.
- Postgres: GIN/GiST for full-text, arrays, trigrams; Hash for equality (rarely needed now).
Composite indexes: order matters. Put most selective/equality-first columns first; sort columns near the end.
Covering indexes: include all needed columns to avoid lookups.
- Postgres: add non-key columns with INCLUDE.
- MySQL InnoDB: secondary indexes already include PK; you can still benefit from column order for covering to avoid back lookups.
Filtered/partial indexes:
- Postgres: CREATE INDEX ... WHERE deleted_at IS NULL;
- MySQL: use generated columns + partial prefix (or secondary strategies), or separate tables/partitions.

5) Write sargable predicates

“SARGable” = Search ARGument-able, meaning the engine can use an index.

Do this:

-- Good: function on constant, not column
WHERE created_at >= now() - interval '30 days'

-- Good: left-anchored LIKE with index or trigram/FTS where supported
WHERE email LIKE 'john%'

Avoid this:

-- Bad: function on column blocks index
WHERE date(created_at) = '2025-03-10'

-- Bad: non-anchored pattern needs full scan unless using trigram/FTS
WHERE email LIKE '%john%'

Fixes:

-- Make it a range
WHERE created_at >= '2025-03-10'::date
  AND created_at <  '2025-03-11'::date

-- Postgres trigram
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON users USING gin (email gin_trgm_ops);

Tips:

Use BETWEEN a AND b carefully (inclusive both sides). For timestamps, prefer [start, next_day) half-open ranges.
Replace OR with UNION ALL if it enables index usage on both branches and avoids broad scans.
Large IN (...) lists can be fine if indexed; for very large lists, load into a temp table and join.

6) Join strategies that scale

Always join on indexed keys when possible.
Filter early: push selective WHERE clauses into subqueries/CTEs so fewer rows join.
Choose the right semi-join:
- EXISTS often beats IN for correlated checks.
- Prefer JOIN ... ON with proper conditions; avoid JOIN then DISTINCT to fake semi-joins.

Examples:

-- Existence check
SELECT u.*
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id AND o.state = 'paid'
);

-- Pre-filter before join
WITH active_orders AS (
  SELECT id, user_id FROM orders WHERE state = 'paid'
)
SELECT u.id, u.name, ao.id AS order_id
FROM users u
JOIN active_orders ao ON ao.user_id = u.id;

7) Aggregations and windows

Group by fewer rows: filter first, then aggregate.
Index on group-by columns can help with grouping and ordered aggregates.
Window functions save many self-joins and are often faster.

Examples:

-- Running totals per user
SELECT
  user_id,
  created_at,
  amount,
  sum(amount) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total
FROM payments
WHERE created_at >= now() - interval '90 days';

8) Pagination that doesn’t slow down

Offset pagination gets slower as offset grows: the engine still scans skipped rows.
Keyset pagination (a.k.a. seek method) is fast and stable.

-- Slow for large offsets
SELECT * FROM posts ORDER BY id ASC LIMIT 50 OFFSET 100000;

-- Fast: use the last seen id
SELECT * FROM posts WHERE id > $last_id ORDER BY id ASC LIMIT 50;

9) Upserts, batching, and the N+1 cure

Prefer one statement:
- Postgres: INSERT ... ON CONFLICT (key) DO UPDATE ...
- MySQL: INSERT ... ON DUPLICATE KEY UPDATE ...
Batch writes to reduce round trips; use bulk loaders for very large imports:
- Postgres: COPY table FROM STDIN.
- MySQL: LOAD DATA INFILE.
Replace per-row updates with a single set-based statement.

10) Transactions, locks, and contention

Keep transactions short. Avoid holding open transactions in web requests longer than necessary.
Choose isolation level intentionally.
Use lock options to avoid waiting forever:
- Postgres: SELECT ... FOR UPDATE NOWAIT or SKIP LOCKED for queues.
Watch for hot rows (counters, global settings). Shard counters or use batch increments.

11) Stats, maintenance, and partitioning

Fresh statistics = good plans.
- Postgres: ANALYZE runs automatically; increase ALTER TABLE ... ALTER COLUMN ... SET STATISTICS for skewed columns.
- MySQL: persistent statistics in 8.0; ANALYZE TABLE when needed.
Manage bloat and fragmentation:
- Postgres: autovacuum; consider VACUUM (FULL) or REINDEX during maintenance windows if needed.
- MySQL: OPTIMIZE TABLE for some engines.
Partition large, time-based tables and ensure pruning:
- Postgres declarative partitioning with range/list; index per partition.
- MySQL range/list/hash partitions; verify pruning in EXPLAIN.

12) Practical indexing patterns

Equality then range:

-- Query
WHERE org_id = $1 AND created_at >= $from AND created_at < $to
-- Index (Postgres/MySQL)
CREATE INDEX idx_events_org_created_at ON events (org_id, created_at);

Covering read-mostly table:

-- Query
SELECT email, name FROM users WHERE lower(email) = lower($1);
-- Postgres functional + include
CREATE INDEX idx_users_email_lower ON users ((lower(email))) INCLUDE (name);

Partial/filtered index for active rows:

-- Postgres
CREATE INDEX idx_orders_active ON orders (user_id, created_at) WHERE state = 'paid';

13) Measuring and iterating

Use the right tools:
- Postgres: EXPLAIN (ANALYZE, BUFFERS), pg_stat_statements, auto_explain, pg_stat_io (PG16+), pg_stat_activity.
- MySQL: EXPLAIN ANALYZE, Performance Schema, Slow Query Log.
Compare plans before/after index or rewrite. Keep fixtures representative of production data skew.
Track p95/p99 latencies and rows examined vs returned.

14) A 10-minute optimization checklist

Before writing:

What is the exact question? Which rows and which columns?
Can you filter or aggregate earlier? Can you avoid sorting?
Which columns are in predicates, joins, group by, order by?

While writing:

Use sargable predicates, avoid functions on columns.
Return only needed columns, avoid SELECT *.
Prefer keyset pagination over large offsets.

After writing:

EXPLAIN the query. Is it using an index? Are estimates close to actuals?
Add/adjust a composite or covering index if justified.
Re-check plan; confirm rows examined dropped and latency improved.

15) Mini lab: from 2.3s to 120ms

Original:

SELECT *
FROM events
WHERE date(created_at) = '2025-03-10'
  AND org_id = 42
ORDER BY created_at DESC
LIMIT 100;

Problems: function on column, SELECT *, missing composite index.

Fix:

-- Half-open time window + limited columns
SELECT id, created_at, type, payload
FROM events
WHERE org_id = 42
  AND created_at >= '2025-03-10'::date
  AND created_at <  '2025-03-11'::date
ORDER BY created_at DESC
LIMIT 100;

-- Index
CREATE INDEX idx_events_org_created_at ON events (org_id, created_at DESC) INCLUDE (type, payload);

Result: Index seek on (org_id, created_at), no sort, covering read.

Key takeaways:

Make the plan predictable: sargable predicates + right composite indexes.
Filter early, aggregate smartly, paginate with seeks.
Measure with EXPLAIN ANALYZE, iterate, and keep stats fresh.