One more optimization problem I came across last week. I’ve spotted a slow-running query, and could not figure out, why it could be slow.
The query was rather simple, and if we take all the “extras” out of the equation, it would come to something like this:
SELECT l.loan_id FROM loans l
AND l.loan_id NOT IN
(SELECT loan_id FROM payment_runs WHERE run_date=’2016-01-13′)
Now, the table payment_runs has a primary_key (loan_id, run_date) – and that’s basically all it has. So you would imagine, this index will be used for searching the loans – but it was not!
Actually, I’ve realized more or less right away, why it was not used: it starts from loan_id, thereby the b-tree search should already “know” a loan_id or a set of loan_id’s to start using the index. However, we need exactly the opposite – we need to find loan_ids, which are not present in the table, and there is no use to try to utilize the index, since the search criteria is based on the second value.
We ended up creating one more, “reverse” index on that table (run_date, loan_id). Although it looks completely redundant, we really need it – the table payment_runs is growing very fast, and there was no way to avoid a full table scan without this index!