If you didn’t have a chance to listed to the live broadcast lat Friday – enojoy the recording:
This Friday Jan 4 I will be on the Tech Scene Chicago Radio Broadcast talking about Chicago Postgres User Group. It will be aired live at 105.5 FM at 1PM, and will be replayed next twp Saturdays at 9AM and next Friday Jan 11 at 1PM. It will also streamed on lumpenradio.com. I have no idea yet what the questions are going to be, but my goal is to promote PostgreSQL (Chicago-wide and world-wide:))
I meant to write about this optimization story for a while, not because somebody can use it as an example, but rather as a counter-example, a case of “what I’ve tough you so far is all completely wrong and not working”.
It all have started with the story which I’ve described in this blog post: the problem was, that indexes on the remote site were not being used, and in order to make things work I had to create a remote function, which would “inject” the constants into a remotely executed query and then too map the results. A couple of weeks later I had to face a similar problem which had also manifested itself only in production environment. But that time around the performance degradation was so severe, that we had to rollback the changes.
First I thought that I will be able to deal with a similar problem in a similar way, but turned out that was not the case. The SQL involving remote tables was a little bit more complex, and the function didn’t help. I’ve realized that I needed to explicitly pass some constants to the dynamic query, which meant that I had to perform a crime against relational theory, braking one perfectly fine SQL into two, and passing the result of the first selection to the second statement. This performed better, but still not ideal. I continued to follow the same path. It became better. I’ve repeated it one more time. And one more time.
At the end it was a textbook example of how not to write SQL. My own lectures were always filled with this “how it will not perform” examples: first you select a person, then using the person id you select position, then… But what I did to optimize performance my function was the same thing almost to the letter. I needed to select all loans for a specific person and some of the loan-related details. And instead of writing one SQL I ended up doing the following:
- having a person_id, select all accounts
- generate a SQL with all these user accounts embedded as constants to retrieve loans
- generate a set of SQL statements, which would access remote information for each of the loans individually
- and actually there was more in the middle
A part of the problem is, that in PostgreSQL functions are not really stored in a truly “compiled” way, they should be rather viewed as “stored sequences of actions”. And parametrized statements, unlike those in different systems, are not optimized properly. That’s why in most of the cases I build dynamic SQL inside of the functions. But in the case I am describing here, things were even worse. Even when I was executing a SQL statement for a specific person id, the indexes on user account and loan where not really utilized (at least not always). So in order for me to push through the usage of indexed, I had to explicitly pass the values to the generated SQL…
It all worked at the end, and performance was amazing, but I can’t even convince myself to post code snippets – it looks ugly 🙂
Oh, and another part of the equation is, that in all those SELECT statement were selecting a really small portion of the data from each of the tables, so there was not that many total executions.
The moral of the story: trust, but verify!
Better later, than never: by popular demand here are the videos of both talks from Braviant Holdings, presented at 2Q PG Conf. Enjoy 🙂
It’s the second time that 2QPGConf was coming to Chicago, and this time around it was even better, than last year! This year it was a two-days conference, with a day of training preceding the actual conference day.
I’ve been asked by many people, what exactly did I like about the conference, so I thought I will put it in writing.
First, I really like an idea of regional conferences, I think they play a very important role in giving the users direct access to the leading specialists, an opportunity to know what other people are doing, and all of it at a reasonable cost. Not all the companies are as good as mine to have a budget for professional development for each employee, and many are reluctant to let their employees being away from work for a week. Or, as I was told, sometimes companies won’t even allow their employees to attend the conference unless they have an accepted talk. And it is so much easier to convince your manager to let you go for a day :).
Secondly, I really liked the talks, I think it was an excellent selection, both on technical level and on variety of topics and areas of use. These talks sparked a lot of conversations, and lots of new contacts have been made. Actually I think a couple of extra hours of networking would be great.
For me the most important thing was that I was able to promote Chicago PostgreSQL User group. I am really happy that our group is now considered to be one of the most successful in the country, and that people turn to me for advice on how to start (or re-energize) the groups in other cities. I hope that my work helps to promote Postgres as a database of choice. Actually in his opening keynote Tom Kincaid was talking about Postgres becoming more and more popular among developers, and I think that people are just starting to realise all it’s potential!
I really do. I have no idea, why they were even introduced into the SQL Language – for the record, there are no views in the relational theory. There is no justification for views to exist.
I suspect, that a person who’ve introduced the views for the first time just thought about themselves, and was lazy to repeat the same SQL over and over again, And then they never gave a thought, that somebody else may start using the same view.
The first reason I hate views it, actually comes out of the only legitimate reason to use the at all (yea, I am contradicting myself). I believe, the only legitimate reason to have a view is to use it as a canned report. So you come up with a perfect SQL for some requirements, and then you give it to the end user in form of a view, and say – use this! Just SELECT * FROM this-view.
But the next thing happening is (and that is my reason number one to hate views) that the end users take the words “as good as a table” literally and start actually using your perfect view as a table, like adding some conditions, or worse – joining it with other tables. Actually, even adding a condition may be damaging enough if your views contains a GROUP BY clause.
Why this is so bad? Because in general Postgres (and most of other databases) does not guarantee anything regarding how the execution plan with the view will go. It might or might not execute the view first. It may or may not push down additional selection criteria. In general, the performance becomes unpredictable.
The second reason is the continuation of the first. The end user does not know what’s inside the view (as they should not!), and often would make decision on which view to use literally by it’s name!
You might have guessed by now, that there is some immediate reason I am writing this post. Here is it: a couple of weeks ago I finally found some time to address the super-poor performance of one of the views used by our marketing team. Since I’ve written the first version of this view about 2.5 years ago, I thought I should be the one to fix it now.
I’ve started to read the code, looking at what have changed in the past 2+ years. Some of the conditions looked questionable to me, and I’ve scheduled a meeting with the primary business stakeholder to go over those conditions.
You won’t believe what I found out! None, NONE of the complicated conditions from the original report were needed! None. Instead, only a certain subset of the old output was needed, with a couple of extra fields. So all the complex calculations from the original view – they were performed, but the results were not needed. Instead, there was an additional filtering, since less than half of the result set was used. Well, the name of the view was still reflecting its purpose, so the marketing department never had a second thought 🙂
I can’t even call this an optimization! I’ve just crossed out 805 of the old code and added a new condition. A second instead of 15 minutes! So now we are good for now… till somebody will try to use this view for something else 🙂
A couple of months ago, when I was super-stressed about not being able to do any work at work, my daughter has recommended me to read a book Deep Work: Rules for Focused Success in a Distracted World
There were several interesting ideas which I liked, and several helpful techniques which came just in time for me to complete a couple of tasks, which required to be 100% focused on them. The idea that you need to isolate yourself from distractions to be able to accomplish a serious task is trivial, but hard to follow :). When starting those “deep sessions” for the first time, I’ve realized that I’ve been doing a similar thing long time ago, when I was a single working mother with two small children and another in grade school, and in order to be competitive I had to squeeze the eight hours worth of work into four hours. I forgot since then :), but now, when reading this book, I am completely agreeing with the author that people rarely work productively for 8 hours straight, and that the bulk of productivity can be compressed into 4 hours. So – yes, I am very happy to be reminded how exactly this works.
What I did not like was a general assumption, that everything you are doing in life should be evaluated from the standpoint of whether it helps you to achieve your main goal or not. If this sounds too abstract, I can give a very specific examples. The author states that basically all social networks are evil :), and there is no justification of using them, if you have some important goals in your life. Except of if this goal is to become popular on the internet :))
My take on this topic is that yes, sometimes you need to limit your participation in the social networks, because if you react at each and single “like”, you will accomplish nothing. I usually do NOT reply to Livejournal comments when I am in the office or working on something important from home (not because somebody is watching me, but because it is indeed distracting :)), but I will definitely allocate at least an hour a day to these activities, because there are many people with whom I interact exclusively on the social media (mostly for geographical reasons). And it is not about being or not being popular, it’s about not giving away some parts of me, which are important. It’s like no matter how busy I am, I will volunteer for The Night Ministry at least twice a month. My Big Goals are definitely important, but it is equally important to be a person I am 🙂