Sometimes you just need to understand, what a query does

Look at this query, which was topping the list of our “worst offenders”:

SELECT CASE WHEN v_code = '2' OR v_code = '300' OR v_code = '303'
OR v_code = '401' OR v_code = '402'
THEN 1 ELSE 0 END AS result
FROM customers c
( SELECT DISTINCT customer_id, first_value ( t.v_code )
OVER ( PARTITION BY customer_id ORDER BY CASE WHEN v_code IS NOT NULL THEN TRUE END, updated_at DESC ) AS v_code
FROM ( SELECT r.customer_id,
CASE WHEN report_type = 'v_report'
AND report_field = 'code' THEN trim ( COALESCE ( VALUE, '0' ) ) ELSE NULL END AS v_code
FROM customers c JOIN reports r USING ( customer_id )
JOIN report_types rt USING ( report_type_id )
JOIN report_values rv USING ( report_id )
JOIN report_fields rf USING ( report_field_id ) WHERE report_type IN ( 'v_report' )
AND report_field IN ( 'code' ) ) t ) subq USING ( customer_id )
WHERE p_customer_id = customer_id;

This report was running for over half a minute!

First question: can you actually understand, what it does? Not so obvious, right? That’s why I hate do not like the window functions: people use them without thinking, and actually any functionality which any windows function provide, can be achieved by means of regular SQL.

If you would actually have enough time to read through this SQL, you would realize, that the only thing this report does, it selects the most resent report of type ‘v_report’ for a given customer, and then it looks at what are the values in the report field “code”, and in case of certain values it returns 1 and otherwise 0. That’s it.

But what it actually does? It selects all customers, for each customer it selects all reports of type ‘v_report’, selects the most recent for each customer and then selects the customer in question.

Now, do we need to build any index to optimize this report, or rather do we need to optimize it at all? No. We just need to rewrite it, making sure it does exactly what it is supposed to do, no extras:

select case when value in ( '2','300','303','401','402' ) then 1 else 0 end
from reports r
join report_values rv on rv.report_id=r.report_id and customer_id=p_customer_id
and report_type_id=(select report_type_id
from report_types where report_type='v_report')
and report_field_id in (select report_field_id from report_fields where report_field ='code' )
order by r.updated_at desc limit 1

That’s it! Execution time? About 100 ms!


Leave a comment

Filed under SQL

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s