The Mysteries of NULL values, Part 1

On Tuesday I’ve got a question from one of our developers, which sounded like this: Hettie, apparently I am doing something wrong. Is it true, that I can’t include NULL into the list of values? I need to write something like

WHERE customer_id in (12345, NULL)

meaning,  need to select all items, which are assigned to this customer or not assigned at all.

I said: No, you can’t so this. You can’t do this, because NULL is not a value. NULL is  a condition. You can’t compare anything to NULL,  you can just say whether some values IS NULL or IS NOT NULL.

– Now I understand, – said my co-worker.

– Do you want me to help you to right this query?

– You see, the problem is – I need to generate it in my program…

… NULLs are as tricky to understand for developers, probably as much as articles are tricky for foreigners: both seem so easy, if not trivial,  and the rules seem pretty obvious, but the moment you start to actually use them, you have no clue which rule to apply!

Brief summary:

– NULLS are not values, they are conditions

– You can’t compare any value to NULL, unless this particular SQL implementation has some special tools to process it, but in any case it’s not a part of standard SQL

– Each implementation may have it’s a own way of sorting by the column, which have NULL values; Postgres allows explicitly define the sorting order

But where NULLS become especially tricky is when we need to generate a SQL statement!

Generating queries is a problem of it’s own (although, I have to admit, it can be an immense fun!). Usually we choose to generate queries rather then to write a procedure or function with parameter, when we expect the queries to be really different, execution different queries depending on parameters passed. On the other hand, carefully writing a query generating code is such a pain and requires such a tedious debugging, that developers are trying (almost subconsciously) to reduce the number of different cases, which is the opposite of our goal – to write the best query for each case!

I’ve stared this post with one of such examples, where the query should look like

WHERE (customer_id=12345 OR customer_id IS NULL)

or, if there are other conditions we  need to apply, the better SQL will look like this:

SELECT ….

FROM…..

WHERE (…. AND customer_id=12345)

UNION ALL

SELECT ….

FROM…..

WHERE (…. AND customer_id IS NULL)

Substituting OR with UNION ALL almost always gives a huge performance improvement, and I will write a separate post about this and similar tricks!

When we need to generate a SQL statement, we ofter face a way bigger problem. A very common situation involves several drop-down lists, and a user can select from any of them, and any number of lists can be omitted. The very common mistake developers make is to generate a WHERE condition as something like

WHERE attr1 IN ||v_string1||   OR v_string1  IS NULL

AND attr2 IN ||v_string2||  OR v_string2  IS NULL

This will work, but it will work awfully slow! And that’s exactly the case where we need to apply the intelligent SQL generation.

Does anybody have any idea how this condition should be rewritten?  Please reply with your suggestions, and I will show my solution in the next post!

Advertisements

Leave a comment

October 26, 2013 · 1:51 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s