Monthly Archives: September 2013

From the PG Open 2013. Postgres Optimizer.

As many of you know, the was a PG Open 2013 conference in Chicago on September 16-18. This conference was one of the most positive experiences I had recently. The quality of the majority of talks and tutorials was excellent, the speakers were extremely competent, the presentation skills of the majority of speakers were great. There were close to none talks which you would describe like “this is one SQL I wrote” story – and I heard a lot of them on the previous PG Open conferences.

I may be imagining this, but I had an impression, that even a crowd looked differently – more professional 🙂

One of the talks I’ve enjoyed the most was “Query Planning Gone Wrong” by Robert Haas. Unfortunately, I still do not see the presentation on the conference website, but I hope it will appear there eventually. I was looking forward to this presentation, since optimization is my favorite subject in all the world of data management, and it was exciting to hear from a person who actually develops the Postgres optimizer, and to have an opportunity to ask him any questions I wanted!

So I sat on the first row, and each time Robert Haas asked the audience, if anybody had any questions, I was the first to jump in. I have an impression, that most people at the conference get to know me not because of my own presentation, but because of the questions I’ve asked Robert.  You may not see me on this picture, but I am on the right side of the first row!

And here is Robert Haas himself:

#1. Now, let me tell you, that the Postgres optimizer is one of the best optimizers I’ve ever worked with, and it can do quite amazing things, you would not even expect. For example, if you have a SELECT statement like this:


FROM <….>

WHERE cond1 



The majority of optimizers would honestly calculate a result set for cond1, then order this result set by X and then take the first N records. Because how can you take the first N one before you know the whole set?!

Postgres optimizer is often smart enough to take a different approach. If proper indexes are provided, and if N is relatively small, it would just read data in the order of X, check conditions cond1 and leave a record in the result set, if this condition if true – and do it until the total number of selected records reaches N. It’s even faster, when ORDER BY is not required.

The problem, Robert Haas was talking about is, that in some cases the optimizer take the wrong path, when, for example, all the records, which satisfy this condition, are located “at the end” of result set, and  we can’t tell it from statistics. If the total number of records, which satisfy condition, is low, choosing this path may significantly increase the execution time.

SInce Postgres does not have optimizer hints, it’s not much we can do about it, except of being aware.

#2. There is a well-known trick how to block the usage of an index. For example, it an attribute ID is indexed, you can write a condition

WHERE ID+0=<…>,

and then index by ID won’t be used (there are rare situations, when we need it). SO Robert was sayig, that this is an optimizer bug, and then I yelled: no-no, please do not fix it! SO he laughed and explained to the audience why I was screaming:)

#3.He talked about using NOT IN () versus NOT EXIST versus OUTER JOIN WHERE ID IS NULL. He strongly did not recommend using NOT IN, since there are cases when NULL is processed improperly, and strongly encouraged to use NOT EXIST.

Since I am not a big fan of the latter, I asked him what does optimizer do in terms of converting these conditions to the ANTI JOIN, and he said, that from the optimizer perspective they are identical, and both are treated as an ANTI-JOIN. Which was really good to know:)

#4. He talked about fixing the optimizer to be able to drop “excessive selection criteria”, and I jumped in again and asked him could he please not do this, because again, I have some stuff build on excessive criteria.

#5. Then he asked, what topic we want him to talk about,  and I told him I am interested in OR processing, and I’ve asked whether there are plans to make an optimizer to understand converting OR into UNION ALL, and he said – not yet, but this is on the radar….

There were more discussion topics, but those were the ones I was interested most, We also talked for a little bit after his presentation, and this was all so unbelievable – as if you are talking to the Postgres Optimizer itself! (I mean – the program:))


Filed under Data management