Why it wasn’t using this index? Wait, why it WAS using it?…

Postgres does not have any optimizer hints. Which is a curse and a blessing. A blessing – because thanks to that the PG contributors developed one of the best optimizers ever existed. A curse… well, what you do if you know for sure you know the right way, and this damn database just does not get it????

A couple of weeks ago I’ve received an email from one of my co-workers: can you tell my why this index is not used in this query?

The table was something like this:

line_id INTEGER,
class_id INTEGER,
line_type_id INTEGER,
line TEXT,
line_status_id INTEGER,
created_at TIMESTAMPTZ);

lines_pkey PRIMARY KEY (id)
index_lines_on_class_id_and_line_id” (class_id, line_id) WHERE status_id = 1

The query he was running was the following:

SELECT * FROM lines WHERE status_id = 1 ORDER BY line_id ASC LIMIT 1;

First I said: yea, you’re right! There is no way this index can be used for this query. Look – the first field in this index is class_id, so it is first ordered by class_id, and only then – by line_id, so… and actually, before I even said so, my other co-worker suggested to create a reverse order index, which was used by the optimizer.
– So, does it answer your question?- I’ve continued.
– Not exactly, – my co-worker said: – see, if I truncate this table (which happens periodically) and run ANALYZE on an empty table, then for a while the index will be used! But after next scheduled analyze it would stop, and it would use the PK index instead.

Did you guess, what’s the problem with using a PK index in this case? You’re right, the status_id=1 happens extremely infrequently! I’ve asked my co-worker, whether he actually needed to have a class_id in this index, and turned out, that there were other queries, which would be supported by this index, so he was hoping he could use it for both cases, and was wondering, whether he needs two different indexes with different order of columns.

I suggested, that since the class_id was very low-selective (less than ten distinct values), he might be better off completely omitting it, and turned out it was the best solution. The partial index

index_lines_on_line_id" (line_id) WHERE status_id = 1

worked for both query types.

So – problem solved? Yes, but I am still wondering, why 1)this “wrong” index actually worked fast, if the optimizer was forced to use it 2)why it was not using in “on it’s own will”??? My guess is, that with a generally small number of records with status_id=1 the whole index was read into main memory, and then the search would become instantaneous…

… but why the optimizer would not figure it out???


Leave a comment

Filed under SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s