Monthly Archives: February 2016

Using inline functions

Did you know anything about inline functions, which return the sets of records? I didn’t know anything about them until recently, when one of my co-workers discovered them trying to solve a problem, which has being a problem for a while…

Here is what the problem was. We have a table in our Data Warehouse, which is updated incrementally, since it stores historical and current data for all installments for all loans. All good so far, right?  The problem is, that for reporting purposes we need to calculate lots of additional data, based on the current date – and this data needs to be calculated for all installments, for all loans. Obviously, it’s impossible to recreate all data daily, especially because the actual data is being refreshed every 15 minutes.

The option which seems obvious would be to create a view, which will use the “permanent table and calculate the rest dynamically based on the current date. Sounds like a perfect solution? Well, if you try to query this data for specific loan or installment, then – yes, you receive response instantaneously. But if you want to join this view with some other table – performance is terrible! And no wonder – not only Postgres, but virtually none database can put condition or parameter deep into grouping.  Materializing the view would not help either, because such materialized view would soon become so big, that the calculations would run out of temp tablespace.

And then one of our co-workers asked: do you guys know about the inline functions?

You can read about inline functions here. The biggest advantage of using the inline functions returning record set is, that they are not the black boxes. For the optimizer the inline function body is directly substituted by the raw SQL, and the optimization is no different than of any “regular” query. In our case are now were spared of “calculate everything for all installments first”, and the query would be optimized for a specific use case.

Leave a comment

Filed under SQL

A need for reverse index

One more optimization problem I came across last week. I’ve spotted a slow-running query, and could not figure out, why it could be slow.

The query was rather simple, and if we take all the “extras” out of the equation, it would come to something like this:

 

SELECT l.loan_id FROM loans l
WHERE
AND l.loan_id NOT IN
(SELECT loan_id FROM payment_runs WHERE run_date=’2016-01-13′)

 

Now, the table payment_runs has a primary_key (loan_id, run_date) – and that’s basically all it has. So you would imagine, this index will be used for searching the loans – but it was not!

Actually, I’ve realized more or less right away, why it was not used: it starts from loan_id, thereby the b-tree search should already “know” a loan_id or a set of loan_id’s to start using the index. However, we need exactly the opposite – we need to find loan_ids, which are not present in the table, and there is no use to try to utilize the index, since the search criteria is based on the second value.

We ended up creating one more, “reverse” index on that table (run_date, loan_id). Although it looks completely redundant, we really need it – the table payment_runs is growing very fast, and there was no way to avoid a full table scan without this index!

 

 

Leave a comment

Filed under SQL