blog/backend-engineering/database-indexing-how-it-works

Database Indexing — How It Actually Works Under the Hood

Most developers know indexes make queries fast. Few understand why. This post breaks down B-Tree indexing, query planning, and the exact mechanics that turn a 2-second query into a 2ms one.

February 5, 2026·7 min read·Updated Feb 12, 2026

The Question That Should Bother You

You run CREATE INDEX idx_email ON users(email) and your query goes from 2 seconds to 2 milliseconds. But what actually changed? What data structure was built? How does the database "know" to use it?

If you can't answer those questions, you're cargo-culting your way through performance optimization. Let's fix that.

What Happens Without an Index

When you run:

SELECT * FROM users WHERE email = 'yousuf@example.com';

Without an index, the database does a sequential scan -- it reads every single row in the table and checks if email matches. For a table with 10 million rows, that's 10 million comparisons.

PostgreSQL EXPLAIN

$ EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'yousuf@example.com';

Seq Scan on users  (cost=0.00..184320.00 rows=1 width=72)
Filter: (email = 'yousuf@example.com'::text)
Rows Removed by Filter: 9999999
Planning Time: 0.08 ms
Execution Time: 2134.52 ms

That Seq Scan means "I checked every row." The Rows Removed by Filter: 9999999 tells you exactly how wasteful this is.

What a B-Tree Index Actually Is

When you create an index, the database builds a B-Tree (Balanced Tree) -- a sorted, self-balancing tree data structure designed for disk-based lookups.

📌 B-Tree: The Core Idea

A B-Tree stores keys in sorted order across nodes. Each node can have multiple keys and children. The tree stays balanced, meaning every leaf is at the same depth. This guarantees that finding any value takes O(log n) disk reads instead of O(n).

Interactive B-Tree Lookup

Click "Trace Lookup" to watch the database navigate the tree to find an email starting with "Y":

B-Tree Index Traversal — Finding 'yousuf@...'

D-H

A-C

D-G

H-L

R-W

M-Q

R-V

W-Z

Path:M→R-W→W-Z(3 lookups = O(log n))

Three hops. Not 10 million row comparisons -- just 3 node lookups to find the exact range. The leaf node contains a pointer to the actual row on disk.

The Lookup Process

Start at root

The query planner sees WHERE email = 'yousuf@...'. It starts at the root node. 'Y' > 'M', so go right.

Navigate internal nodes

At the next level: 'Y' > 'R' and 'Y' > 'W', so go to the rightmost child.

Reach leaf node

The leaf node contains sorted email values starting with W-Z. Binary search within the leaf finds the exact entry. The entry contains a pointer (tuple ID) to the actual row on disk.

Fetch the row

The database follows the pointer to the heap (actual table data) and reads just that one row. Total disk reads: 3-4 instead of scanning millions.

PostgreSQL EXPLAIN — With Index

$ EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'yousuf@example.com';

Index Scan using idx_email on users  (cost=0.43..8.45 rows=1 width=72)
Index Cond: (email = 'yousuf@example.com'::text)
Planning Time: 0.12 ms
Execution Time: 0.034 ms

Sequential Scan vs Index Scan — 10M Rows

Sequential Scan2134ms

Index Scan0.034ms

From 2134ms to 0.034ms. That's a ~63,000x speedup, and now you know exactly why.

When the Database Doesn't Use Your Index

Creating an index doesn't guarantee it will be used. The query planner makes a cost-based decision every time.

⚠️ Indexes aren't magic -- the planner can ignore them

The database estimates whether using the index is cheaper than a sequential scan. If your query returns a large percentage of the table, a sequential scan can actually be faster (because sequential disk reads are cheaper than random seeks).

Common reasons your index gets ignored:

Low selectivity -- WHERE status = 'active' when 95% of rows are active. The index adds overhead without filtering much.
Functions on indexed columns -- WHERE LOWER(email) = '...' won't use an index on email. You need a functional index: CREATE INDEX idx_lower_email ON users(LOWER(email)).
Type mismatch -- WHERE id = '42' (string) when id is an integer. The implicit cast prevents index usage.
OR conditions across columns -- WHERE email = '...' OR name = '...' can't efficiently use a single-column index.
Small tables -- For tables with < ~1000 rows, sequential scan is faster. The planner knows this.

Types of Indexes You Should Know

Index Type	Best For	How It Works
B-Tree (default)	Equality and range queries (=, <, >, BETWEEN)	Sorted balanced tree. O(log n) lookup.
Hash	Exact equality only (=)	Hash table. O(1) lookup but no range support.
GIN	Full-text search, JSONB, arrays	Inverted index. Maps values to rows that contain them.
GiST	Geometric/spatial data, range types	Generalized search tree. Supports overlap, containment.
BRIN	Very large tables with naturally ordered data	Block Range Index. Stores min/max per block. Tiny size.

Composite Indexes: Column Order Matters

CREATE INDEX idx_user_lookup ON users(status, created_at);

This index works like a phone book: sorted first by status, then by created_at within each status group.

🔴 The leftmost prefix rule

A composite index on (A, B, C) can be used for queries on A, A+B, or A+B+C. It cannot be used for queries on just B or just C alone. The columns must be used from left to right. Always put the most selective (most filtered) column first.

-- Uses the index
SELECT * FROM users WHERE status = 'active' AND created_at > '2026-01-01';
 
-- Uses the index (leftmost prefix)
SELECT * FROM users WHERE status = 'active';
 
-- Cannot use this index efficiently
SELECT * FROM users WHERE created_at > '2026-01-01';

The Cost of Indexes

Indexes aren't free. Every INSERT, UPDATE, and DELETE must also update all relevant indexes.

📌 Write Amplification

If a table has 5 indexes and you insert a row, the database performs 6 writes: 1 to the table + 1 to each index. For write-heavy workloads (logging, event streams), excessive indexing can cut your write throughput in half. Profile before you add indexes.

Write Cost vs Number of Indexes

0 indexes (table only)1x

2 indexes3x

5 indexes6x

10 indexes11x

Rules of Thumb

Index columns that appear in WHERE, JOIN, and ORDER BY clauses
Don't index columns with low cardinality (boolean, status with 2-3 values) unless combined in a composite index
Monitor unused indexes -- pg_stat_user_indexes in PostgreSQL shows index usage stats
Partial indexes save space -- CREATE INDEX ... WHERE status = 'active' only indexes active rows

Key Takeaway

✅ Understanding beats memorizing

An index is a B-Tree that maps column values to row locations on disk. The query planner uses cost estimation to decide whether it's cheaper to use the index or scan the table. Your job is to create indexes that the planner will want to use: high selectivity, correct column order, matching data types.

Always run EXPLAIN ANALYZE before and after adding indexes. Let the numbers tell you what works -- not assumptions.

References

PostgreSQL: Index Types -- Official documentation on all supported index types
Use The Index, Luke -- The best free resource on SQL indexing and query performance
B-Tree Visualization -- Interactive B-Tree visualizer from USF