One of my usua/perennial rants is that many managers would tell you something like “business needs this functionality, we will optimize later”. And we all know what happens “later” – nothing. The most important thing I love about working at Braviant is that the tech team shares the same values, since I know quite well how often this is not the case, I appreciate it immensely. However, business might still think differently, and what’s good for business… well, that’s what we should do, since we all need to make money.
… After one of our recent releases our DBA noticed that one of the application functions slowed down significantly, specifically instead of executing for about 3 sec, it started to execute for 7-8 seconds. And immediately we all were alarmed. You might wonder – why? The 7 sec execution time is a good time, perfectly acceptable for the end users, especially because this function is executed not so often. Well… Didn’t I just say our tech team agree on priorities? We ll, we believe that the good user experience includes fast response time, and thereby our applications time out on 10 sec. And if a function’s average execution time is over 7 sec, the peak time can easily reach 10 sec!
I had to make a miracle… the sooner the better. Because, as you can imagine, I usually do not write bad queries. Well, most of the time:). Which meant I had to find some unusual optimization.
To tell the truth, I knew right away, why this function starter to perform slower. We added one new field to the distributed query (which was required by business stakeholders, or cause!), and to select this additional field I needed to join one more remote table. And all of a sudden, although all required indexes were on place, the optimizer would choose the full table scan. Of a huge table!
Not much I can do to explain the optimizer that they are wrong (are optimizers male or female, what do you think? 🙂 – they are male in my mother tongue, which explains a lot – they are always sure they know better how to make things work!). So I had to find a way to put this optimizer in such a situation, that there won’t be any way other than to utilize the indexes which were out there. First I thought it will be relatively easy – in all previous cases when a similar issue would occur, I would create a view on the remote node – but this time it didn’t work. I’ve conducted several experiments, and came to the conclusion that the only way to make it work is to implement one new technology, which I’ve played with a couple of months ago, but never implemented in production.
– making sure it does what I want
– mapping through the foreign data wrapper and making sure it wors
– creating a new version of the function
– testing in lower environments
– implementing on staging and QA on both our products
– in production on both products
Total execution time of the process described above: around 3 hours.
Result: everybody happy, we’ve got one more groundbreaking technology, which I can guarantee nobody in the world is using (because documentation clearly says it’s impossible:)), and which we will be able to use in many other cases to reduce execution time. And all because we have our priorities right!
P.S. Are you anxious to find out what is this technology? Come to 2Q PgConf in Chicago this December!