Database Indexing for People Who Are Not DBAs

Most application developers encounter database performance only when a query that was fast in development becomes slow in production. The usual remedy is an index, yet indexes are often added by intuition rather than understanding, which produces both missing indexes that would have helped and superfluous ones that quietly impose cost. This article explains indexing for engineers who are not database administrators: what an index is, the queries it accelerates, the price it exacts on writes and storage, and a workable method for deciding which indexes to create. The thesis is that indexing is a deliberate trade between read speed and write cost, and that the trade is best made from evidence rather than instinct.

What an Index Actually Is

Without an index, answering a query that filters rows requires the database to scan the entire table, examining each row to see whether it matches. This is a sequential scan, and its cost grows with the size of the table. An index is a separate data structure that lets the database locate matching rows quickly without examining every one, much as the index of a book lets a reader find a topic without reading every page.

The PostgreSQL documentation frames the canonical case directly. Given a query that filters on a column, creating an index on that column allows the system to find the matching rows in a small number of steps rather than scanning the whole table. The improvement is most dramatic on large tables and on queries that select a small fraction of the rows.

When an Index Helps

An index repays its cost when queries repeatedly filter, join, or sort on the indexed columns. The clearest beneficiaries are equality and range conditions in a WHERE clause, the columns used to join tables, and the columns used to order results. If an application routinely retrieves records by a particular column, that column is a candidate for an index.

The benefit is conditional, however. An index helps when a query is selective, meaning it returns a small portion of the table. When a query matches most of the rows, scanning the whole table can be cheaper than consulting an index and then fetching the rows it points to, and the planner may sensibly ignore the index. An index on a column with very few distinct values, such as a boolean, therefore often provides little benefit.

For queries that filter on more than one column together, a multicolumn index covering those columns can outperform separate single-column indexes. The order of columns in such an index matters, because it is most useful for conditions that constrain the leading columns.

What an Index Costs

An index is not free, and treating it as such is the most common indexing mistake. Every index must be kept consistent with the table it serves. When a row is inserted, updated, or deleted, the database must update the affected indexes as well, which makes write operations slower in proportion to the number of indexes on the table. Indexes also consume storage, and a table burdened with many indexes can occupy substantially more space than its data alone.

The consequence is that indexes should be created to satisfy real query patterns, not added speculatively to every column. An index that no query uses imposes write and storage cost while delivering no read benefit, and such unused indexes accumulate easily over a project’s life.

A Method for Choosing Indexes

A disciplined approach begins with measurement rather than guessing. Identify the queries that are slow or frequent, then ask the database to explain how it executes them. PostgreSQL provides the EXPLAIN command, which reports the plan the system chose, including whether it performed a sequential scan or used an index. A sequential scan over a large table in a frequently run query is a strong signal that an index on the filtered columns is worth testing.

After adding a candidate index, re-examine the plan to confirm the planner now uses it and that the query is faster. Periodically, review existing indexes for ones that no query exercises and remove them to recover their write and storage cost. Indexing is iterative: measure, add, verify, and prune.

Conclusion

An index is a structure that trades additional storage and slower writes for faster reads, and it is justified when real, selective queries filter, join, or sort on its columns. It is wasted when added speculatively, applied to columns with few distinct values, or left in place after the query that needed it is gone. The reliable method is evidential rather than intuitive: find the slow queries, inspect their execution plans, add indexes where a scan is the bottleneck, verify the improvement, and remove indexes that earn nothing. Approached this way, indexing becomes a tractable engineering decision rather than a specialist’s mystery.