dhy@ironhide: ~/site
dhy@ironhide:~/site$cat header.html
_____ _ _ _ _ | __ \| | | | | | | | | | | |_| | | | | | | | | _ | |_| | | |__| | | | | _ | |_____/|_| |_|_| |_| ~/dhy.tr — personal notes & technical writing
dhy@ironhide:~/site$ls -la *.md

PostgreSQL Index Types: Which to Use When

The Problem

Last week, a ILIKE '%keyword%' query on a 400-million-row log table took 47 seconds. There was a B-tree index, but PostgreSQL wasn't using it. So I had to learn all index types from scratch.

This article compares PostgreSQL's 6 different index types with real queries. Unlike online docs that say "use this index for this pattern" and move on, I'll test each one with a real problem.


PostgreSQL Index Types

As of PostgreSQL 17, there are 6 built-in index types:

Index Type Default Operator Class Best At
B-tree yes Equality, range, sorting
Hash no Equality only (=)
GiST no Geometry, full-text, special types
GIN no Arrays, full-text, JSON
SP-GiST no Partitioned data, IP ranges
BRIN no Sequential large tables

Let's test them all on a 10 million row test table.


Test Environment

CREATE TABLE logs (
    id BIGSERIAL PRIMARY KEY,
    created_at TIMESTAMPTZ DEFAULT now(),
    level TEXT,
    message TEXT,
    metadata JSONB,
    tags TEXT[]
);

-- Insert 10 million rows
INSERT INTO logs (created_at, level, message, metadata, tags)
SELECT
    now() - (random() * INTERVAL '365 days'),
    (ARRAY['INFO', 'WARN', 'ERROR', 'DEBUG'])[floor(random() * 4 + 1)],
    'Log message ' || generate_series,
    jsonb_build_object(
        'user_id', floor(random() * 100000),
        'duration_ms', random() * 5000,
        'endpoint', '/api/' || floor(random() * 50)
    ),
    ARRAY[
        (ARRAY['auth', 'payment', 'search', 'admin'])[floor(random() * 4 + 1)],
        (ARRAY['slow', 'fast', 'timeout'])[floor(random() * 3 + 1)]
    ]
FROM generate_series(1, 10000000);

VACUUM ANALYZE logs;

1. B-tree — The Standard Choice

B-tree is the default type you get when you write CREATE INDEX. It covers equality, range (BETWEEN, <, >), ORDER BY, and DISTINCT operations.

When it works

-- B-tree works great
CREATE INDEX idx_logs_created_at ON logs (created_at);

EXPLAIN ANALYZE
SELECT * FROM logs
WHERE created_at BETWEEN '2026-01-01' AND '2026-02-01';
-- Index Scan, ~2ms
-- Also great for ORDER BY + LIMIT combination
EXPLAIN ANALYZE
SELECT * FROM logs ORDER BY created_at DESC LIMIT 50;
-- Index Scan Backward, ~0.3ms

When it breaks

-- ILIKE '%xyz%' — B-tree can't be used because pattern isn't left-anchored
EXPLAIN ANALYZE
SELECT * FROM logs WHERE message ILIKE '%timeout%';
-- Seq Scan, 47 seconds ❌

-- Function-wrapped column — index is skipped
EXPLAIN ANALYZE
SELECT * FROM logs WHERE LOWER(level) = 'error';
-- Seq Scan

When B-tree breaks, you need other index types.


2. Hash — Faster Than B-tree for Equality, But Rarely Used

Hash indexes only support the = operator. No sorting, no range, no ORDER BY. Before PostgreSQL 10, they weren't even WAL-logged, so many people still think "hash indexes are unsafe." They're not anymore.

CREATE INDEX idx_logs_message_hash ON logs USING hash (message);

-- About 20% faster than B-tree for this query
EXPLAIN ANALYZE
SELECT * FROM logs WHERE message = 'Log message 5000000';
-- Hash Index Scan, ~0.08ms vs B-tree 0.10ms

The difference is small but even at 10 million rows — the advantage grows on larger tables. Still, due to limitations, B-tree is preferred in most scenarios.

My usage recommendation: Only consider hash indexes for tables where "a single column, equality queries only, never sorted." Rare in practice.


3. BRIN — The Hidden Weapon for Large Tables

Block Range INdex (BRIN) indexes physical page blocks, not each row. It keeps summary info like "created_at values in this block are between January 2026 and March 2026." Because of this, it's 1000x smaller than B-tree.

Real-life scenario

-- B-tree on 10M rows: ~300MB
-- BRIN on same column: ~200KB (YES, KILOBYTES)

CREATE INDEX idx_logs_created_at_brin ON logs USING brin (created_at);
-- See BRIN index size
SELECT pg_size_pretty(pg_relation_size('idx_logs_created_at_brin'));
-- 200 kB

-- B-tree size
SELECT pg_size_pretty(pg_relation_size('idx_logs_created_at'));
-- 312 MB

When BRIN wins

When data is physically ordered by the indexed column:

-- We wrote the table ordered by created_at — BRIN works great
EXPLAIN ANALYZE
SELECT count(*) FROM logs
WHERE created_at BETWEEN '2026-01-01' AND '2026-03-01';
-- BRIN: Bitmap Index Scan, 120ms, index size 200KB
-- B-tree: Index Scan, 8ms, index size 312MB

Yes, B-tree is faster. But BRIN's trade-off: 1560x smaller index for 93% less RAM usage. Log tables, IoT sensor data, time-series data — BRIN is king here.

When BRIN loses

If data isn't ordered, BRIN reads too many blocks and becomes slower than B-tree:

-- BRIN is useless on data inserted with random()
-- because each block covers the full time range

Practical rule: Use BRIN if INSERT order = index column order. Otherwise, use B-tree.


4. GIN — The Savior of Arrays and Full-Text

Generalized Inverted Index (GIN) is optimized for cases where a single key appears in multiple rows: arrays, JSON, full-text search.

Array columns (like our tags column)

CREATE INDEX idx_logs_tags ON logs USING gin (tags);

-- Find rows containing 'auth' in the "tags" array column
EXPLAIN ANALYZE
SELECT * FROM logs WHERE tags @> ARRAY['auth'];
-- Bitmap Index Scan on idx_logs_tags, ~45ms
-- Seq Scan comparison: ~2800ms

-- Multiple tags — AND operator
EXPLAIN ANALYZE
SELECT * FROM logs WHERE tags @> ARRAY['auth', 'slow'];
-- Bitmap Index Scan, ~12ms

B-tree can't do this query at all. Without GIN, it's a full table scan every time.

JSONB columns

CREATE INDEX idx_logs_metadata ON logs USING gin (metadata);

-- Rows where metadata -> 'endpoint' = '/api/42'
EXPLAIN ANALYZE
SELECT * FROM logs WHERE metadata @> '{"endpoint": "/api/42"}';
-- Bitmap Index Scan, ~25ms
-- Seq Scan: ~3100ms

-- Filter by user_id inside metadata
EXPLAIN ANALYZE
SELECT * FROM logs WHERE metadata @> '{"user_id": 99999}';
-- Bitmap Index Scan, ~30ms

GIN index downside: Writes (INSERT/UPDATE) are slower than B-tree. GIN is for "read-heavy, write-light" tables.

Full-text search

-- Add tsvector column (or generated column)
ALTER TABLE logs ADD COLUMN message_tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', message)) STORED;

CREATE INDEX idx_logs_message_tsv ON logs USING gin (message_tsv);

-- Full-text search
EXPLAIN ANALYZE
SELECT * FROM logs
WHERE message_tsv @@ to_tsquery('english', 'timeout & payment');
-- Bitmap Index Scan, ~15ms

Remember: for full-text search, use tsvector/tsquery instead of ILIKE '%keyword%'. Performance difference can be up to 1000x.


5. GiST — Geometry and Special Data Types

Generalized Search Tree is the generalized form of B-tree. PostgreSQL extensions (PostGIS, ltree, pg_trgm) use GiST.

Most practical use: Speeding up ILIKE with pg_trgm

CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- This index speeds up ILIKE '%keyword%' queries
CREATE INDEX idx_logs_message_trgm ON logs
  USING gist (message gist_trgm_ops);

EXPLAIN ANALYZE
SELECT * FROM logs WHERE message ILIKE '%timeout%';
-- Index Scan, ~180ms (previous Seq Scan: 47 seconds)

EXPLAIN ANALYZE
SELECT * FROM logs WHERE message ILIKE '%payment%timeout%';
-- Index Scan, ~95ms

260x speedup. pg_trgm splits text into three-character groups (trigrams) and indexes them with GiST. It destroys "pattern in the middle" queries like ILIKE '%keyword%'.

This is exactly the solution that saved our 47-second log query.

GIN vs GiST trigram

-- You can do the same thing with GIN
CREATE INDEX idx_logs_message_trgm_gin ON logs
  USING gin (message gin_trgm_ops);
Property GiST GIN
Index creation time Faster Slower
Index size Smaller Larger (2-3x)
Search speed Slower Faster
Write speed Better Worse

General rule: GIN for read-heavy systems, GiST for write-heavy systems.


6. SP-GiST — IP Ranges and Partitioned Data

Space-partitioned GiST divides data into non-repeating regions. Most common use is inet/cidr types and phone number routing.

CREATE TABLE ip_blocks (
    id SERIAL PRIMARY KEY,
    network CIDR,
    region TEXT,
    isp TEXT
);

INSERT INTO ip_blocks VALUES
  (1, '10.0.0.0/8', 'Private', 'RFC 1918'),
  (2, '172.16.0.0/12', 'Private', 'RFC 1918'),
  (3, '8.8.8.0/24', 'US', 'Google');

CREATE INDEX idx_ip_blocks_network ON ip_blocks
  USING spgist (network);

-- "Which block does this IP belong to?" query
EXPLAIN ANALYZE
SELECT * FROM ip_blocks WHERE network >>= '8.8.8.8'::inet;
-- Index Scan, ~0.3ms

You'll rarely use SP-GiST in daily life. But if you're doing IP routing, CIDR matching, or geographic bounding-box queries, this is the only index you need to know.


Decision Tree: Which Index When?

What type of query?
│
├─ Equality (=), range, sorting
│  ├─ Table large AND physically ordered? → BRIN
│  └─ Otherwise → B-tree
│
├─ Equality ONLY, no sorting, table very large
│  └─ Hash (~20% faster than B-tree)
│
├─ Array containment (@>), JSON search, full-text
│  ├─ Read-heavy → GIN
│  └─ Write-heavy → think, maybe B-tree is enough
│
├─ ILIKE '%keyword%', similarity search
│  ├─ Read-heavy → GIN + gin_trgm_ops
│  └─ Write-heavy → GiST + gist_trgm_ops
│
├─ IP/CIDR range, geographic queries
│  └─ SP-GiST
│
└─ Geometry, PostGIS
   └─ GiST

Real-World Benchmark Results

Values measured on 10 million row logs table (AMD Ryzen 7, NVMe SSD, PostgreSQL 17):

Query No Index B-tree BRIN GIN GiST (pg_trgm)
WHERE created_at BETWEEN 3200ms 8ms 120ms - -
WHERE message = '...' 3100ms 0.1ms - - -
WHERE message ILIKE '%timeout%' 47000ms - - - 180ms
WHERE tags @> ARRAY['auth'] 2800ms - - 45ms -
WHERE metadata @> '{"user_id": 99}' 3100ms - - 30ms -
ORDER BY created_at LIMIT 50 4100ms 0.3ms 850ms - -

Combining Multiple Indexes: Bitmap Combine

PostgreSQL combines multiple indexes as bitmaps when needed:

CREATE INDEX idx_logs_level ON logs (level);
CREATE INDEX idx_logs_tags ON logs USING gin (tags);

EXPLAIN ANALYZE
SELECT * FROM logs
WHERE level = 'ERROR' AND tags @> ARRAY['payment'];
Bitmap Heap Scan
  → BitmapAnd
    → Bitmap Index Scan on idx_logs_level
    → Bitmap Index Scan on idx_logs_tags

Each index creates its own bitmap, they're combined with AND/OR, then rows are fetched from the heap. This is important because you don't need to create a composite index for every column combination. PostgreSQL can combine 2-3 separate indexes with bitmap combine.

(Composite indexes are still faster, but maintaining an index for every combination is costly — the trade-off is yours.)


Index Maintenance: The Forgotten but Critical Topic

Indexes don't optimize themselves. Especially on tables with heavy UPDATE/DELETE:

-- See index usage statistics
SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

When is REINDEX needed?

-- Check index bloat
SELECT
    schemaname, tablename,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

If total_size (table + indexes) is much larger than expected:

-- Use CONCURRENTLY so the table doesn't get locked
REINDEX INDEX CONCURRENTLY idx_logs_created_at;

Without CONCURRENTLY, REINDEX locks the table — never do this in production.


Conclusion

PostgreSQL's index system isn't just about B-tree. Choosing the right index type:

  • Reduces an ILIKE query on a 10 million row log table from 47 seconds to 180 milliseconds (260x speedup)
  • Reduces index size on a 400 million row time-series table from 312 MB to 200 KB (1560x less RAM)
  • Rescues JSONB and Array queries from full table scan

My most common mistake: Trying to solve everything with the default B-tree. Analyze your query patterns, look at pg_stat_user_indexes, then choose the right index type.

My second most common mistake: Forgetting index maintenance. Add REINDEX CONCURRENTLY to your cron.


Tags: postgresql, database, indexing, btree, gin, gist, brin, performance Date: 2026-05-25MARKDOWN_EOF

dhy@ironhide:~/site$