A Word About JOIN_COLLAPSE_LIMIT

I remember when I learned about JOIN_COLLAPSE_LIMIT parameter in Postgres. And I remember the unfortunate circumstances of that encounter. Specifically, it was set to eight, which caused a relatively simple SELECT to be executed in a completely wrong way. 

Since I didn’t know anything about that parameter at the moment, it looked to me like the order of joins in the SELECT statements in Postgres matters, and it sounded very wrong and the opposite of what documentation said.

But then I learned about JOIN_COLLAPSE_LIMIT, and suddenly all the execution plans magically fixed themselves after I changed it for 12. I performed a lot of experiments back then trying to find an optimal value for that parameter, and at the end of the day I decided, that no matter how many tables I am joining, changing it to anything above 12 does not help. Either the execution plan does not improve (you have 16 tables in the join, and you think you change the limit to 16, and the query would fix itself – and it does not). Or, if you change it to more than 20, the optimization takes too much time.

However, a couple of weeks ago, somebody requested my help in query optimization. The problem was that it was not a single query, but rather a query framework, and a person would run several queries in sequence, each time adding or removing search criteria that could belong to any of the tables involved in this SELECT.

Yes, “if it would be me,” I would write the SELECT statement differently each time, but that was not something I could suggest to a person who asked for my help. That would be too complicated. I looked at the query, estimated the number of joins and suggested, that at the beginning of each session they would execute 

SET JOIN_COLLAPSE_LMIT=16

And it worked!!!! I do not know what’s so funny in the fact that it worked as expected… but I was really amused 🙂

2 Comments

Filed under SQL

2 responses to “A Word About JOIN_COLLAPSE_LIMIT

  1. Lukass Franklin

    I have had to increase JOIN_COLLAPSE_LIMIT to 30 on my main reporting database to deal with really ugly queries the users write, sometimes with up to 50 joins! I have not noticed any meaningful slowdown in planning, but then that server has a relatively low total query concurrency with most queries being very complex queries.

  2. Hettie D.

    I think it largely depends on the hardware characteristics :). In my case (approx 5 years ago) it was set to 30, and the result was that the explain plan was hanging and not producing any output. When users are doing 50+ joins- nothing can be done, if you ask me :). If we are talking about “me,” not an abstract user, I would set the limit to zero and write a query with the correct order of joins. That’s what I did in the above case.

    I did notice a difference in planning time, which was visible, that’s why I stopped experimenting. The reason I wanted to write about this case was that it’s probably the first time in my life it worked the magic!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s