Monthly Archives: November 2014

I am publicized in Tech Cocktail

Check it out right here!

Leave a comment

Filed under news, People

Sometimes you just need to understand, what a query does

Look at this query, which was topping the list of our “worst offenders”:

SELECT CASE WHEN v_code = '2' OR v_code = '300' OR v_code = '303'
OR v_code = '401' OR v_code = '402'
THEN 1 ELSE 0 END AS result
FROM customers c
( SELECT DISTINCT customer_id, first_value ( t.v_code )
OVER ( PARTITION BY customer_id ORDER BY CASE WHEN v_code IS NOT NULL THEN TRUE END, updated_at DESC ) AS v_code
FROM ( SELECT r.customer_id,
CASE WHEN report_type = 'v_report'
AND report_field = 'code' THEN trim ( COALESCE ( VALUE, '0' ) ) ELSE NULL END AS v_code
FROM customers c JOIN reports r USING ( customer_id )
JOIN report_types rt USING ( report_type_id )
JOIN report_values rv USING ( report_id )
JOIN report_fields rf USING ( report_field_id ) WHERE report_type IN ( 'v_report' )
AND report_field IN ( 'code' ) ) t ) subq USING ( customer_id )
WHERE p_customer_id = customer_id;

This report was running for over half a minute!

First question: can you actually understand, what it does? Not so obvious, right? That’s why I hate do not like the window functions: people use them without thinking, and actually any functionality which any windows function provide, can be achieved by means of regular SQL.

If you would actually have enough time to read through this SQL, you would realize, that the only thing this report does, it selects the most resent report of type ‘v_report’ for a given customer, and then it looks at what are the values in the report field “code”, and in case of certain values it returns 1 and otherwise 0. That’s it.

But what it actually does? It selects all customers, for each customer it selects all reports of type ‘v_report’, selects the most recent for each customer and then selects the customer in question.

Now, do we need to build any index to optimize this report, or rather do we need to optimize it at all? No. We just need to rewrite it, making sure it does exactly what it is supposed to do, no extras:

select case when value in ( '2','300','303','401','402' ) then 1 else 0 end
from reports r
join report_values rv on rv.report_id=r.report_id and customer_id=p_customer_id
and report_type_id=(select report_type_id
from report_types where report_type='v_report')
and report_field_id in (select report_field_id from report_fields where report_field ='code' )
order by r.updated_at desc limit 1

That’s it! Execution time? About 100 ms!

Leave a comment

Filed under SQL

What is the actual cause?

A couple of weeks ago a co-worker emailed me the following.

Hettie, last week we asked you whether you think moving our database to remote from local would decrease performance. And you told us, that you do not think we will experience any visible performance degradation. Well, we did move our database yesterday, and performance decreased about six times. And looking at our monitoring tool we can see that all the slowness is on the database side. Is this something you would expect?

I told him: no, definitely not! Let me stop by tomorrow, and we shall see, what’s going on.

When I stopped by with the intension to run parallel tests on both databases, I found out, that they removed all the data volume from the local db, andthereby it became impossible to compare performance. Meanwhile I’ve asked, whether I can see parameter file.

Turned out, they didn’t have access to it, and then I did my little investigation to find out, who has, and who actually installed this DB, and whether he or she changed any parameters. And guess what I found out? I found that the work_mem parameter in their new remote database was set at default – 1 MB, when in should have being at least 128 MB! Then they contacted the person, who could change this parameter and the problem was solved.

Why I am writing about this? Because it is a typical example of “we see what we want to see”. My coworker was so expecting the problems when switching to remote database, and he attributed the problem to this fact without any hesitations… To be honest, I often have a similar problem – looking for the issues, which I expect to see, not the real ones. And we all do – and very often.

Leave a comment

Filed under Data management, Systems