About a week before PG Open I had one performance disaster on our new platform. I’ve developed a function a while ago, which is generating a pretty sophisticated query and then executes it. And this function worked very fast, both query generation and execution took about 100 ms. Now imagine how I felt, when two days after this thing went in production, I saw this function execution at the very top of the offenders list! And the average execution time was almost 2 sec! I’ve started investigating, and could not figure out, what’s wrong.
I had a full copy of production database locally on my computer, and is was performing perfectly fine. More interestingly, I found that even EXPLAIN (without ANALYZE) was executing close to a second.
So after I exhausted all of my ideas, e-mailed to my favorite DBA (no-no, do not take my wrong, I love all of you guys, but…:)). So I’ve emailed to my favorite DBA asking for help. And you know what he found?
He found, that the default_statistics_target parameter on this Postgres cluster was set to 10,000 instead of our usual 1,000, and that’s why it took so long to analyze. And consequently, since the query in the function was generated each time, it had to be analyzed each time, and it took so long for the whole function.
I was so happy that the solution turned to be so simple; and now this is fixed on all clusters.
And, speaking about parameters… do you remember this post? At the conference I talked to people about it, and found out, that it was enough to increase the join_collapse_limit parameter! It was set to 8, and I’ve experimented to see, how much I can increase it not to kill the optimizer performance.
The problem with this particular cluster is, that there are lots of small “vertically partitioned” tables, so the average join would easily have 10 tables involved, and the moment the number of tables in the join will exceed the join_collapse_limit, the optimizer would just join them in the order they are listed….
Thus one more problem was solved, and this was a perfect timing, because this very day one of my coworkers asked for my advice on his troubled query, and turned out that increasing this parameter locally to 12 would bring the query execution time from 2 min down to 13 ms 🙂