Tag Archives: optimization


I remember when I learned about JOIN_COLLAPSE_LIMIT parameter in Postgres. And I remember the unfortunate circumstances of that encounter. Specifically, it was set to eight, which caused a relatively simple SELECT to be executed in a completely wrong way. 

Since I didn’t know anything about that parameter at the moment, it looked to me like the order of joins in the SELECT statements in Postgres matters, and it sounded very wrong and the opposite of what documentation said.

But then I learned about JOIN_COLLAPSE_LIMIT, and suddenly all the execution plans magically fixed themselves after I changed it for 12. I performed a lot of experiments back then trying to find an optimal value for that parameter, and at the end of the day I decided, that no matter how many tables I am joining, changing it to anything above 12 does not help. Either the execution plan does not improve (you have 16 tables in the join, and you think you change the limit to 16, and the query would fix itself – and it does not). Or, if you change it to more than 20, the optimization takes too much time.

However, a couple of weeks ago, somebody requested my help in query optimization. The problem was that it was not a single query, but rather a query framework, and a person would run several queries in sequence, each time adding or removing search criteria that could belong to any of the tables involved in this SELECT.

Yes, “if it would be me,” I would write the SELECT statement differently each time, but that was not something I could suggest to a person who asked for my help. That would be too complicated. I looked at the query, estimated the number of joins and suggested, that at the beginning of each session they would execute 


And it worked!!!! I do not know what’s so funny in the fact that it worked as expected… but I was really amused 🙂


Filed under SQL

Simple Rules for Better SQL

I suppose I am not the only person facing this problem. From time to time, you come across some report which is running very slow. Or just a stand-alone SQL statement, which somebody in your company needs to execute once a month, and they might not even remember who wrote it. And you look at 200 lines of code, and think – how I can even start to optimize it? 

It usually helps if a person who wrote the code is still around. Even better, if this report was written just recently and the requirements are still fresh in people’s minds. Still – what would you do, if you have very limited time, very convoluted SQL, and you need to figure out how to make it better so that it won’t block ten other processes? 

For a while, I’ve been trying to come up with simple guidelines for report writers, which would help to ensure that their reports perform decently without anybody spending ten times more time. It turned out that a very short list of rules can cover pretty much all the situations. We’ve been using these guidelines for a while now, and I am pretty happy with the results. Here is the list:

  • For each report, figure out whether it is an “exception” report or a “summary” report. Exception means that we are looking for some errors, for abnormal execution. Such reports should return nothing if everything goes normally. Summary means that we want to process a large number of records and calculate some results, something like quarterly earnings. 
  • If a report in question is an “exception report,” define the criteria for the exception (what makes this report an exception). Make sure there is an index that can be used to select rows based on this criterion and make sure that this criterion is applied first in the execution plan. 
  • If a report is a summary, look for a way to update it incrementally. Make sure that there is an index to facilitate this search (most often, the increment will be by the time last updated, and this field should be indexed).

If you can find an example of the query, which does not fit into either of the categories and/or can’t be optimized using these rules – let me know 🙂 

Leave a comment

Filed under SQL

I knew I won’t like this feature! About the parallel execution

Many years ago, when Postgres 9.6 was still in making, my coworker said to me with excitement: Hettie, Postgres will now have the ability to run queries in parallel! There can be only four parallel processes, but still, isn’t in nice?!

And I remember exactly what I’ve replied: no, I do not like this feature a bit! You know why? Because executing queries in parallel would rarely solve performance problems. In fact, if four parallel workers would solve your performance problems, they were not really problems! It can do more harm than good, because it can mask some real performance problems for a while, and then they will turn just to be more severe.

What happened then – I’ve started a new chapter of my life at Braviant, and we had Postgres 9.5 then, and then for almost 3 years it was like I never had time to stop and upgrade :). But now I have a team, so we’ve finally planned the upgrade, and since we were already four versions behind, we planned upgrade to 9.6 and immediately to PG 10.

We’ve started from our Data Warehouse. First – in the staging environment, we we’ve tested, and observed the execution for some time. And then on the mirroring instance, and only then – on production.

And then it started! Seriously, out of all data refreshes this one is only one, which is important for the OLTP instance, because it sends data to one of our client interfaces. It started to behave inconsistently, Sometimes it would be just fine. Other times, instead of about 50 seconds it has been running for an hour, and probably won’t finish if we won’t kill it. Yes, it was obvious, that something did change in the execution plan after the upgrade. But what?! From the first glance the execution plan looked the same, all HASH JOINS, which you would expect, when you join tables with no restrictive conditions.

But it was still painfully slow. What was more puzzling – I could take out of the equation JOIN to any table, and performance would be unpredictable. After several dozen of attempts to make things run decided to take a closer look at the execution plan. And you know what I saw?! Yes, parallel execution/! Four joins were parallelized, which resulted in the execution time been really horrible. After the issue was found, the only thing left was to figure out, how to turn this feature off 🙂

Leave a comment

Filed under Data management, Development and testing, SQL

One more time on the state of optimization

I just have to show some of the execution time graphs, and how the have changed after the optimized versions of the respective functions were deployed:

I know that many people are wondering looking at the second image, why I am striving to optimize things which are already running super-fast?

It’s not because I am trying to demonstrate my superpowers, it’s because i know for the fact, that with the database size we currently have, that is the right execution time. What does it mean? it means, that if the execution time is more than that, it indicates the wrong execution plan.

All these optimizations have been performed on our OLTP database, which means that all of these queries are “small queries”, retrieving a relatively small number of records. Which implies, that the appropriate indexes should be used, and that the execution plans should show the NESTED LOOP join algorithm. When I see the execution time of 500 ms, it tells me that there is at least one full table scan inside. Which in turn, means, that the execution time will be increasing, when the data volumes will be growing. Which is not good, if we are building a scalable system.

Another important thing to consider is that all these small queries cannot be “parallelized” to speed up the execution. We are in the OLTP environment, not OLAP. I know that I can’t rely on switching to the larger AWS instance, because 1) this process gets out of control very fast 2) does not help. Seen the execution times like on both of these pictures, like “I can’t see it” just proves, that the functions are performing as expected.

Leave a comment

Filed under Data management, Development and testing, SQL

New features are available in the bitemporal repo – and I am so happy about it!

I Really hope that most of my follows know something about the pg_bitemporal project, because if you didn’t hear about it, you won’t be able to share my excitement!

We started to build our bitemporal library for PostgreSQL about four years ago, it was merely a “proof of concept”, and Chad Slaughter, who initiated all this work, knowing my work habits way too well, was re-iterating again and again – do not optimize it yet!

Well, I didn’t, but then I’ve joined Braviant Holdings, and a year later I was granted a permission to use our bitemporal framework in production. Some of the performance flaws became apparent even during the test run, and I was able to fix them. Later, while we were using it in production more and more, I’ve come up with new functions, UPDATE_SELECT and CORRECT_SELECT, since we actually needed them, and since the bitemporal operations were supposed to behave the same way as regular database operations.

About three weeks ago we had a very important release, which along with addressing multiple business needs, included some significant changes on the technical side. One of the consequences was, that it significantly increased the traffic on our new planform, and as a result we started to see some timeouts.

Although these timeouts were pretty rare, we saw them as a problem. I personally pledged the system will remain scalable, and now I couldn’t just go with “bitemporal updates are slow”. Yes, the execution time was at 2 to 3 seconds most of the time, but sometimes it would spike, and our microservices have a hard timeout at 10 seconds.

Some time ago I’ve already mentioned in this blog, how thankful I am for those timeouts! Nothing else foster innovation more than a necessity to address performance problems immediately, because they have a direct impact on production.

This time around I was 99.9% sure that the periodic slowness happens during the remote query, which is a part of the problematic function. Turned out, though, that this 0.01% was the case, and together with our DB team we were able to determine, that the problematic statement was the last UPDATE in the bitemporal update function. If you’d ask me a week before that, I would say, that I am not going to address the bitemporal performance for the next several months, but I had no choice.

Thanks to Boris Novikov, who helped me immensely in testing and verifying several different approaches, and eventually identified the best one, and to Chad Slaughter, who was merging my commits from 7-30 AM to 9-30 PM, so that the master branch of the bitemporal library would have the latest updates by the time of the release, and thanks to our amazing QA team, who had to run and rerun tests that day multiple times, the new bitemporal functions are now on place. Not only for Braviant Holdings, but for the whole community.

I would also like to mention, that since I was already changing the functions, I’ve fixed one long-overdue issues: all functions have versions, which are PG 10 compliant. We’ve left the old versions there, because some of the are used in the existing production systems but if you are just starting, you can use the new ones.

Check it out at https://github.com/scalegenius/pg_bitemporal

Leave a comment

Filed under news, research, SQL, Team and teamwork

About one very controversial optimization

I meant to write about this optimization story for a while, not because somebody can use it as an example, but rather as a counter-example, a case of “what I’ve tough you so far is all completely wrong and not working”.

It all have started with the story which I’ve described in this blog post: the problem was, that indexes on the remote site were not being used, and in order to make things work I had to create a remote function, which would “inject” the constants into a remotely executed query and then too map the results. A couple of weeks later I had to face a similar problem which had also manifested itself only in production environment. But that time around  the performance degradation was so severe, that we had to rollback the changes.

First I thought that I will be able to deal with a similar problem in a similar way, but turned out that was not the case. The SQL involving remote tables was a little bit more complex, and the function didn’t help. I’ve realized that I needed to explicitly pass some constants to the dynamic query, which meant that I had to perform a crime against relational theory, braking one perfectly fine SQL into two, and passing the result of the first selection to the second statement. This performed better, but still not ideal. I continued to follow the same path. It became better. I’ve repeated it one more time. And one more time.

At the end it was a textbook example of how not to write SQL. My own lectures were always filled with this “how it will not perform” examples: first you select a person, then using the person id you select position, then… But what I did to optimize  performance my function was the same  thing almost to the letter. I needed to select all loans for a specific person and some of the loan-related details. And instead of writing one SQL I ended up doing the following:

  • having a person_id, select all accounts
  • generate a SQL with all these user accounts embedded as constants to retrieve loans
  • generate a set of SQL statements, which would access remote information for each of the loans individually
  • and actually there was more in the middle

A part of the problem is, that in PostgreSQL functions are not really stored in a truly “compiled” way, they should be  rather viewed as “stored sequences of actions”.  And parametrized statements, unlike those in different systems,  are not optimized properly. That’s why in most of the cases I  build dynamic SQL inside of the functions. But in the case I am describing here,  things were even worse. Even when I was executing a SQL statement for a specific person id, the indexes on user account  and loan where not really utilized (at least not always).  So in order for me to push through the usage of indexed, I had to explicitly pass the values to the generated SQL…

It all worked at the end, and performance was amazing, but I can’t even convince myself to post code snippets – it looks ugly 🙂

Oh, and another part of the equation is, that in all those SELECT statement were selecting a really small portion of the data from each of the tables, so there was not that many total executions.

The moral of the story:  trust, but verify!

Leave a comment

Filed under Data management, Development and testing, SQL

Can you teach somebody to optimize?

I’ve got a lot of feedback on my last blogpost; one question was posted on another paltform where I’ve reblogged the same text, and this question was so interesting that I’ve decided to write a separate post in reply.

So tell me, Hettie, are these kind of discoveries being reported at the conferences, and would they later become a part of a common knowledgebase? Will this sort of technique be taught in colleges? Overall, in your opinion, are the nowadays CS graduates more knowledgeable in this area. And, by the way, is there any special knowledge which is necessary to be able to resolve problems like this, or it’s just a combination of basic knowledge plus experience?

Great question! I have being teaching optimization for almost 15 years, and in general my optimism on this subject is very modest. You can teach technique, you can show tons of examples, and still there is no guarantee that a student who has attended your class will be able to correctly identify a similar pattern in the real life and to recall what specific technique which was advised for a similar problem. But may be I am just not good in teaching this stuff.

It’s tempting to say, that all that matters are years of practice, but this is also not always the case, since you, and me, and all of us can recall the situations in which years of experience did not help. And to be honest, I do not want to wait for “years of experience”! I want to be able to hire a new grad, who can optimize at a reasonable level. And I am not saying I never met this kind of CS grads, but what I am saying is that whenever it happens, it is due to the person’s individual abilities and/or a desire to excel in this particular skill.

Let’s be clear: the kind of breakthrough as I’ve described in the previous post does not happen often. In fact, you might never get anything like this in your live. But there are still tons of optimizations which can be done almost every day.

I would still argue that knowing the basics is a key. For the thousandth time over – you need to know your math, your calc and your algebra in order to understand how databases work. You might not be aware of some sophisticated indexes, but you should be able to identify, looking at the query, what’s it about< whether it is "short" or "long". And if you try and try, and it does not become faster, you need all your convincing powers to convince yourself, that this query can be optimized. There should be a way.

Another big thing I am trying to teach – to write queries declaratively. This is an extremely challenging task, because most of the requirements are formulated in an imperative manner. So what I am trying to teach, is that even if you find something like “previous three occurrences”, or “return to the previous status”, or “compare the top two” in the requirements, you still can write a declarative statement, carefully using CASE, GROUP BY and sometimes window functions. And it’s amazing, how fast everything starts running right away. Most of the time being able to reduce the number of table scans to one does the trick, except of… well, except of the situation, when you should do exactly the opposite.

I didn’t figure out yet, how to teach to distinguish one from another :). But the more I think about it, there more it seems like that’s what signifies that somebody can optimize, and that skill is the most difficult to teach. Most optimization classes teach you how and when to use some indexes, and how to choose the join order, but they do not teach how to rewrite a query itself.

… For the original question: no, I do not think they teach it in school. But I am trying to promote this idea!

Leave a comment

Filed under Data management, SQL