Database Indexing — How It Actually Works Under the Hood
Most developers know indexes make queries fast. Few understand why. This post breaks down B-Tree indexing, query planning, and the exact mechanics that turn a 2-second query into a 2ms one.
The Question That Should Bother You
You run CREATE INDEX idx_email ON users(email) and your query goes from 2 seconds to 2 milliseconds. But what actually changed? What data structure was built? How does the database "know" to use it?
If you can't answer those questions, you're cargo-culting your way through performance optimization. Let's fix that.
What Happens Without an Index
When you run:
SELECT * FROM users WHERE email = 'yousuf@example.com';Without an index, the database does a sequential scan -- it reads every single row in the table and checks if email matches. For a table with 10 million rows, that's 10 million comparisons.
Seq Scan on users (cost=0.00..184320.00 rows=1 width=72) Filter: (email = 'yousuf@example.com'::text) Rows Removed by Filter: 9999999 Planning Time: 0.08 ms Execution Time: 2134.52 ms
That Seq Scan means "I checked every row." The Rows Removed by Filter: 9999999 tells you exactly how wasteful this is.
What a B-Tree Index Actually Is
When you create an index, the database builds a B-Tree (Balanced Tree) -- a sorted, self-balancing tree data structure designed for disk-based lookups.
A B-Tree stores keys in sorted order across nodes. Each node can have multiple keys and children. The tree stays balanced, meaning every leaf is at the same depth. This guarantees that finding any value takes O(log n) disk reads instead of O(n).
Interactive B-Tree Lookup
Click "Trace Lookup" to watch the database navigate the tree to find an email starting with "Y":
Three hops. Not 10 million row comparisons -- just 3 node lookups to find the exact range. The leaf node contains a pointer to the actual row on disk.
The Lookup Process
Start at root
The query planner sees WHERE email = 'yousuf@...'. It starts at the root node. 'Y' > 'M', so go right.
Navigate internal nodes
At the next level: 'Y' > 'R' and 'Y' > 'W', so go to the rightmost child.
Reach leaf node
The leaf node contains sorted email values starting with W-Z. Binary search within the leaf finds the exact entry. The entry contains a pointer (tuple ID) to the actual row on disk.
Fetch the row
The database follows the pointer to the heap (actual table data) and reads just that one row. Total disk reads: 3-4 instead of scanning millions.
Index Scan using idx_email on users (cost=0.43..8.45 rows=1 width=72) Index Cond: (email = 'yousuf@example.com'::text) Planning Time: 0.12 ms Execution Time: 0.034 ms
From 2134ms to 0.034ms. That's a ~63,000x speedup, and now you know exactly why.
When the Database Doesn't Use Your Index
Creating an index doesn't guarantee it will be used. The query planner makes a cost-based decision every time.
⚠️ Indexes aren't magic -- the planner can ignore them
The database estimates whether using the index is cheaper than a sequential scan. If your query returns a large percentage of the table, a sequential scan can actually be faster (because sequential disk reads are cheaper than random seeks).
Common reasons your index gets ignored:
-
Low selectivity --
WHERE status = 'active'when 95% of rows are active. The index adds overhead without filtering much. -
Functions on indexed columns --
WHERE LOWER(email) = '...'won't use an index onemail. You need a functional index:CREATE INDEX idx_lower_email ON users(LOWER(email)). -
Type mismatch --
WHERE id = '42'(string) whenidis an integer. The implicit cast prevents index usage. -
OR conditions across columns --
WHERE email = '...' OR name = '...'can't efficiently use a single-column index. -
Small tables -- For tables with < ~1000 rows, sequential scan is faster. The planner knows this.
Types of Indexes You Should Know
| Index Type | Best For | How It Works |
|---|---|---|
| B-Tree (default) | Equality and range queries (=, <, >, BETWEEN) | Sorted balanced tree. O(log n) lookup. |
| Hash | Exact equality only (=) | Hash table. O(1) lookup but no range support. |
| GIN | Full-text search, JSONB, arrays | Inverted index. Maps values to rows that contain them. |
| GiST | Geometric/spatial data, range types | Generalized search tree. Supports overlap, containment. |
| BRIN | Very large tables with naturally ordered data | Block Range Index. Stores min/max per block. Tiny size. |
Composite Indexes: Column Order Matters
CREATE INDEX idx_user_lookup ON users(status, created_at);This index works like a phone book: sorted first by status, then by created_at within each status group.
🔴 The leftmost prefix rule
A composite index on (A, B, C) can be used for queries on A, A+B, or A+B+C. It cannot be used for queries on just B or just C alone. The columns must be used from left to right. Always put the most selective (most filtered) column first.
-- Uses the index
SELECT * FROM users WHERE status = 'active' AND created_at > '2026-01-01';
-- Uses the index (leftmost prefix)
SELECT * FROM users WHERE status = 'active';
-- Cannot use this index efficiently
SELECT * FROM users WHERE created_at > '2026-01-01';The Cost of Indexes
Indexes aren't free. Every INSERT, UPDATE, and DELETE must also update all relevant indexes.
If a table has 5 indexes and you insert a row, the database performs 6 writes: 1 to the table + 1 to each index. For write-heavy workloads (logging, event streams), excessive indexing can cut your write throughput in half. Profile before you add indexes.
Rules of Thumb
- Index columns that appear in WHERE, JOIN, and ORDER BY clauses
- Don't index columns with low cardinality (boolean, status with 2-3 values) unless combined in a composite index
- Monitor unused indexes --
pg_stat_user_indexesin PostgreSQL shows index usage stats - Partial indexes save space --
CREATE INDEX ... WHERE status = 'active'only indexes active rows
Key Takeaway
✅ Understanding beats memorizing
An index is a B-Tree that maps column values to row locations on disk. The query planner uses cost estimation to decide whether it's cheaper to use the index or scan the table. Your job is to create indexes that the planner will want to use: high selectivity, correct column order, matching data types.
Always run EXPLAIN ANALYZE before and after adding indexes. Let the numbers tell you what works -- not assumptions.
References
- PostgreSQL: Index Types -- Official documentation on all supported index types
- Use The Index, Luke -- The best free resource on SQL indexing and query performance
- B-Tree Visualization -- Interactive B-Tree visualizer from USF