Why did it stop working?…

Last week I had a feeling that I am very popular – everybody were asking: do you have a minute? Can you look into something? Can I come over? Can you come over to our floor? Everybody wanted my opinion on how to optimize their programs. Soon enough I’ve realized, that my growing popularity was due to the end of the 3rd quoter! Everybody were running their “end of the quoter” reports, and sure enough, many of them would suddenly stop working.

So I’ve spent quite a bit of time sitting with people, asking them, what this query is supposed to do, whether this or that condition means what I think it means, is this redundant or not? Everybody were very patient answering my questions, but sooner or later they would pop up a question of their own, and this question would be always one of the following:

– This report/batch/program worked fine just yesterday, what happened today, that it stopped working? It would finish in one hour, and now it runs for twelve!

–  We are not the only ones who is experiencing problems, may be we should tune some parameters globally?

And since I’ve realized one more time, that I am answering these questions over and over again (not only last week, but at least for the past 20 years!), I though it would be a good idea to put my answer in writing – and here they are.

Every query would run fine on a small data set. Every query would run fine on a big machine with fast processor and lots of main memory. If a query is written inefficiently, the execution time will increase dramatically, when the data volume grows. This is the difference between a good and a bad query – a good query is scalable, meaning that when the data volumes increase significantly, the execution time will increase only  slightly.  The real problem is, that the notion of what exactly is large or small is evolving rapidly. Ten years ago it could become evident that something is wrong with a query when a source table would grow over 100,000 records. Nowadays you can easily operate with millions of records and not notice any problem.

The hardware prices go down. The price of developer’s time goes up. Paradoxically, these two trends put together open the way to really bad queries. Why? Because nobody wants a developer to spend a lot of time perfecting the query, when it’s “good enough”.

Is there any problem with this approach? May be that’s how life is going to be from now on? Actually there is a problem, because a degree os “badness” a bad query can reach will always beat any hardware improvements at some point. And the better hardware we have, the later this some point will come. And the later it comes, the more it will cost to fix it! More time (and money) will be spent!

Back to the burning questions. This particular program had this  particular problem all along, however, for a long time this problem was masked, because we have huge main memory, because we have powerful processors, in short – because the database could work around this problem. And today we reach the point, where it could not deal with this problem any more. You can’t dump all loans into main memory. You can execute in parallel one single poorly written join… That’s why.

Should we try to tune the database parameters? First, most of the time they are already tuned. For the 30 years of my professional career a can remember just a few situations when some key parameter settings were overlooked by DBAs.  Second, when we change the database parameters, it affects each and single query which runs against this database. And the same parameter change may have different effect on different queries (for example, different effect on “short” and “long” queries). So the rule should be to try to optimize a query first – in the majority of cases this will be enough!

So – am I against hardware improvement:)? Definitely not! We need faster processors, we need better disks, we need more main memory – because we want to be able to serve more customers, because we want to deliver a flawless service. But we should pay close attention to the quality of our queries before they signal a problem. We should write “the right queries” from the very beginning, so that we (hopefully!) won’t have to fix them later.

How to tell, which query is good, and which is not  – that’s another question…


Leave a comment

Filed under Data management, Systems

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s