It happened so many times in the past… once again about “the art of full scan”

Let me tell you: there are so little tricks you can utilize when optimizing the OLTP queries, that each time I am writing something about this, I feel like I am repeating myself for the hundredth time… yet… people make the same mistakes again and again, and so here I am preaching the same thing again and again…

The original query which was brought to my attention last Thursday looked like this:

SELECT
cs.id as c_id
...
,(select
count(1)
from
approvals apr inner join customer_sources cs_s on apr.customer_id = cs_s.customer_id
where
apr.processed_on >= app.processed_on - interval '30seconds'
and apr.processed_on = app.processed_on - interval '30seconds'
and apr.processed_on '2014-05-01'
AND app.processed_on <= '2014-05-01 00:01:00'
AND app.teletrack_report_id is not NULL

No, actually it did not look like this, since it was about 200 lines long, but that was the idea: there was a dozen columns (positions, select-list components – whatever you want to call them), and each of them was selected by means of embedding yet another select, which would query the same pair of tables.

And note, that the inner copy of customer_sources was totally unnecessary, since the “approvals” do not correspond to specific leads, but to the customers.

The original timings were like this: when a coworker, who wrote this query, was trying to run it for 1 min time interval, it would not return results in 10 minutes, and then he would kill this query suspecting, that something is going wrong.

There is no magic in how to deal with these types of queries. Just imagine yourself in the place of a database. Would you like to read the same table again and again, each time selecting only the records of one kind?! Never! What you would do, you would read this table record my record, and sort the records, while reading: this one goes to this pile, this one goes to the other one, and this one can be thrown away, it’s not good for anything.

So, please! advice your database to do exactly the same! Like this:

SELECT
cs.id as c_id
...
sum (case when cs.source_type_cd like '%leadpile%'and apr.customer_id is not null then 1 else 0 end) as leadpile,
sum (case when cs.source_type_cd like '% 9g%' then 1 else 0 end) as nineg
...
FROM
customer_sources cs
inner join approvals app on cs.customer_id = app.customer_id
and cs.created_on > '2014-05-01' and cs.created_on '2014-05-01'
AND app.processed_on = app.processed_on - interval '30seconds'
and apr.processed_on < app.processed_on

In course of our discussion with the report developer we also came to the conclusion, that we only need the “recent” customer sources, not the ones, which are 4 years old. Also, we ere able to put the correct interval right away, because – guess what? – this SELECT is executed in 15 seconds!

Leave a comment

Filed under SQL

Leave a comment