Managing Indexes in PostgreSQL
An index makes your Postgres queries faster. The index is like a "table of contents" for your data - a reference list which allows queries to quickly locate a row in a given table without needing to scan the entire table (which in large tables can take a long time).
Indexes can be structured in a few different ways. The type of index chosen depends on the values you are indexing. By far the most common index type, and the default in Postgres, is the B-Tree. A B-Tree is the generalized form of a binary search tree, where nodes can have more than two children.
Even though indexes improve query performance, the Postgres query planner may not always make use of a given index when choosing which optimizations to make. Additionally indexes come with some overhead - additional writes and increased storage - so it's useful to understand how and when to use indexes, if at all.
Create an index#
Let's take an example table:
create table persons ( id bigint generated by default as identity primary key, age int, height int, weight int, name text, deceased boolean );
note
All the queries in this guide can be run using the SQL Editor in the Supabase Dashboard, or via psql
if you're connecting directly to the database.
We might want to frequently query users based on their age:
select name from persons where age = 32;
Without an index, Postgres will scan every row in the table to find equality matches on age.
You can verify this by doing an explain on the query:
explain select name from persons where age = 32;
Outputs:
Seq Scan on persons (cost=0.00..22.75 rows=x width=y)
Filter: (age = 32)
To add a simple B-Tree index you can run:
create index idx_persons_age on persons (age);
caution
It can take a long time to build indexes on large datasets and the default behaviour of create index
is to lock the table from writes.
Luckily Postgres provides us with create index concurrently
which prevents blocking writes on the table, but does take a bit longer to build.
Here is a simplified diagram of the index we just created (note that in practice, nodes actually have more than two children).
You can see that in any large data set, traversing the index to locate a given value can be done in much less operations (O(log n)) than compared to scanning the table one value at a time from top to bottom (O(n)).
Partial indexes#
If you are frequently querying a subset of rows then it may be more efficient to build a partial index. In our example, perhaps we only want to match on age
where deceased is false
. We could build a partial index:
create index idx_living_persons_age on persons (age) where deceased is false;
Ordering indexes#
By default B-Tree indexes are sorted in ascending order, but sometimes you may want to provide a different ordering. Perhaps our application has a page featuring the top 10 oldest people. Here we would want to sort in descending order, and include NULL
values last. For this we can use:
create index idx_persons_age_desc on persons (age desc nulls last);
Reindexing#
After a while indexes can become stale and may need rebuilding. Postgres provides a reindex
command for this, but due to Postgres locks being placed on the index during this process, you may want to make use of the concurrent
keyword.
reindex index concurrently idx_persons_age;
Alternatively you can reindex all indexes on a particular table:
reindex table concurrently persons;
Take note that reindex
can be used inside a transaction, but reindex [index/table] concurrently
cannot.