Tag Archives: databases

Can you teach somebody to optimize?

I’ve got a lot of feedback on my last blogpost; one question was posted on another paltform where I’ve reblogged the same text, and this question was so interesting that I’ve decided to write a separate post in reply.

So tell me, Hettie, are these kind of discoveries being reported at the conferences, and would they later become a part of a common knowledgebase? Will this sort of technique be taught in colleges? Overall, in your opinion, are the nowadays CS graduates more knowledgeable in this area. And, by the way, is there any special knowledge which is necessary to be able to resolve problems like this, or it’s just a combination of basic knowledge plus experience?

Great question! I have being teaching optimization for almost 15 years, and in general my optimism on this subject is very modest. You can teach technique, you can show tons of examples, and still there is no guarantee that a student who has attended your class will be able to correctly identify a similar pattern in the real life and to recall what specific technique which was advised for a similar problem. But may be I am just not good in teaching this stuff.

It’s tempting to say, that all that matters are years of practice, but this is also not always the case, since you, and me, and all of us can recall the situations in which years of experience did not help. And to be honest, I do not want to wait for “years of experience”! I want to be able to hire a new grad, who can optimize at a reasonable level. And I am not saying I never met this kind of CS grads, but what I am saying is that whenever it happens, it is due to the person’s individual abilities and/or a desire to excel in this particular skill.

Let’s be clear: the kind of breakthrough as I’ve described in the previous post does not happen often. In fact, you might never get anything like this in your live. But there are still tons of optimizations which can be done almost every day.

I would still argue that knowing the basics is a key. For the thousandth time over – you need to know your math, your calc and your algebra in order to understand how databases work. You might not be aware of some sophisticated indexes, but you should be able to identify, looking at the query, what’s it about< whether it is "short" or "long". And if you try and try, and it does not become faster, you need all your convincing powers to convince yourself, that this query can be optimized. There should be a way.

Another big thing I am trying to teach – to write queries declaratively. This is an extremely challenging task, because most of the requirements are formulated in an imperative manner. So what I am trying to teach, is that even if you find something like “previous three occurrences”, or “return to the previous status”, or “compare the top two” in the requirements, you still can write a declarative statement, carefully using CASE, GROUP BY and sometimes window functions. And it’s amazing, how fast everything starts running right away. Most of the time being able to reduce the number of table scans to one does the trick, except of… well, except of the situation, when you should do exactly the opposite.

I didn’t figure out yet, how to teach to distinguish one from another :). But the more I think about it, there more it seems like that’s what signifies that somebody can optimize, and that skill is the most difficult to teach. Most optimization classes teach you how and when to use some indexes, and how to choose the join order, but they do not teach how to rewrite a query itself.

… For the original question: no, I do not think they teach it in school. But I am trying to promote this idea!

Advertisements

Leave a comment

Filed under Data management, SQL

Can you teach somebody to optimize?

I’ve got a lot of feedback on my last blogpost; one question was posted on another paltform where I’ve reblogged the same text, and this question was so interesting that I’ve decided to write a separate post in reply.

So tell me, Hettie, are these kind of discoveries being reported at the conferences, and would they later become a part of a common knowledgebase? Will this sort of technique be taught in colleges? Overall, in your opinion, are the nowadays CS graduates more knowledgeable in this area. And, by the way, is there any special knowledge which is necessary to be able to resolve problems like this, or it’s just a combination of basic knowledge plus experience?

Great question! I have being teaching optimization for almost 15 years, and in general my optimism on this subject is very modest. You can teach technique, you can show tons of examples, and still there is no guarantee that a student who has attended your class will be able to correctly identify a similar pattern in the real life and to recall what specific technique which was advised for a similar problem. But may be I am just not good in teaching this stuff.

It’s tempting to say, that all that matters are years of practice, but this is also not always the case, since you, and me, and all of us can recall the situations in which years of experience did not help. And to be honest, I do not want to wait for “years of experience”! I want to be able to hire a new grad, who can optimize at a reasonable level. And I am not saying I never met this kind of CS grads, but what I am saying is that whenever it happens, it is due to the person’s individual abilities and/or a desire to excel in this particular skill.

Let’s be clear: the kind of breakthrough as I’ve described in the previous post does not happen often. In fact, you might never get anything like this in your live. But there are still tons of optimizations which can be done almost every day.

I would still argue that knowing the basics is a key. For the thousandth time over – you need to know your math, your calc and your algebra in order to understand how databases work. You might not be aware of some sophisticated indexes, but you should be able to identify, looking at the query, what’s it about< whether it is "short" or "long". And if you try and try, and it does not become faster, you need all your convincing powers to convince yourself, that this query can be optimized. There should be a way.

Another big thing I am trying to teach – to write queries declaratively. This is an extremely challenging task, because most of the requirements are formulated in an imperative manner. So what I am trying to teach, is that even if you find something like “previous three occurrences”, or “return to the previous status”, or “compare the top two” in the requirements, you still can write a declarative statement, carefully using CASE, GROUP BY and sometimes window functions. And it’s amazing, how fast everything starts running right away. Most of the time being able to reduce the number of table scans to one does the trick, except of… well, except of the situation, when you should do exactly the opposite.

I didn’t figure out yet, how to teach to distinguish one from another :). But the more I think about it, there more it seems like that’s what signifies that somebody can optimize, and that skill is the most difficult to teach. Most optimization classes teach you how and when to use some indexes, and how to choose the join order, but they do not teach how to rewrite a query itself.

… For the original question: no, I do not think they teach it in school. But I am trying to promote this idea!

Leave a comment

Filed under Data management, SQL

How to make optimization a priority

One of my usua/perennial rants is that many managers would tell you something like “business needs this functionality, we will optimize later”. And we all know what happens “later” – nothing. The most important thing I love about working at Braviant is that the tech team shares the same values, since I know quite well how often this is not the case, I appreciate it immensely. However, business might still think differently, and what’s good for business… well, that’s what we should do, since we all need to make money.

… After one of our recent releases our DBA noticed that one of the application functions slowed down significantly, specifically instead of executing for about 3 sec, it started to execute for 7-8 seconds. And immediately we all were alarmed. You might wonder – why? The 7 sec execution time is a good time, perfectly acceptable for the end users, especially because this function is executed not so often. Well… Didn’t I just say our tech team agree on priorities? We ll, we believe that the good user experience includes fast response time, and thereby our applications time out on 10 sec. And if a function’s average execution time is over 7 sec, the peak time can easily reach 10 sec!

I had to make a miracle… the sooner the better. Because, as you can imagine, I usually do not write bad queries. Well, most of the time:). Which meant I had to find some unusual optimization.

To tell the truth, I knew right away, why this function starter to perform slower. We added one new field to the distributed query (which was required by business stakeholders, or cause!), and to select this additional field I needed to join one more remote table. And all of a sudden, although all required indexes were on place, the optimizer would choose the full table scan. Of a huge table!

Not much I can do to explain the optimizer that they are wrong (are optimizers male or female, what do you think? 🙂 – they are male in my mother tongue, which explains a lot – they are always sure they know better how to make things work!). So I had to find a way to put this optimizer in such a situation, that there won’t be any way other than to utilize the indexes which were out there. First I thought it will be relatively easy – in all previous cases when a similar issue would occur, I would create a view on the remote node – but this time it didn’t work. I’ve conducted several experiments, and came to the conclusion that the only way to make it work is to implement one new technology, which I’ve played with a couple of months ago, but never implemented in production.

So…
– testing
– making sure it does what I want
– mapping through the foreign data wrapper and making sure it wors
– creating a new version of the function
– testing in lower environments
– QA
– implementing on staging and QA on both our products
– in production on both products

Total execution time of the process described above: around 3 hours.

Result: everybody happy, we’ve got one more groundbreaking technology, which I can guarantee nobody in the world is using (because documentation clearly says it’s impossible:)), and which we will be able to use in many other cases to reduce execution time. And all because we have our priorities right!

P.S. Are you anxious to find out what is this technology? Come to 2Q PgConf in Chicago this December!

2 Comments

Filed under Companies, Data management, Development and testing, SQL

The 2Q PGCONF program just announced!

Hi Chicago! I am extremely excited to share with you the program of the upcoming 2Q PGCONF! After the last year’s success the organizers decided to add a whole day f training to the conference.

So, here is what you need to know: the training schedule is here. The talks schedule is here. Exciting news for my company – two talks from Braviant Holdings were accepted! I think it tells a lot about how our company foster innovation and invest in employees professional development.

And now the important dates to remember: the training is on Tue December 4, and the talks are on Wed December 5, and October 14 is the last day of the early bird registration!!!

And finally the last thing. The Chicago PostgreSQL User Group meetup will be next week, Wed Oct 17, featuring Shaun Thomas. Please come and have a sneak peek of the conference!

Leave a comment

Filed under Companies, events, news, talks

What’s happening at PG Open 2018

Lot’s of interesting talks! The fact that this year I am not alone at the conference, that I have a team – helps a lot! We coordinate between the three of us, which talks we want to attend. Otherwise I don’t know how I would manage! Yes, you can read about the new features of Postgres 11 online, but it is way more interesting to listen to the people who made it happen. Also, I am always looking for people who are solving similar problems as we do, like dealing with incorporating time into data, improving the speed of the data load, and finding efficient ways to connect to external systems.

I was able to catch up with many people whom I knew from the previous conferences, and met lots of new interesting people. And there is still a whole day of the conference ahead, I will try again to listen to as many of the presentations as I can.

Tha opening of the conference

Leave a comment

Filed under events, talks

How to deal with “First you write, then you optimize”

Recently I’ve made yet another attempt to optimize several quite inefficient report, which had been out there for so long, that nobody remembers, who have written them, and who have speced them out. I did not finish this task, even with the help of my team we just didn’t have enough time to spare on improving something, which “works”. And when I was thinking about it over the weekend, I thought that different people may have different definition of “works”, when it comes to SQL writing.

I always say – it’s easier to write the code correctly on the first place, that to optimize reports which are written inefficiently, but way too often I am getting a response from the stakeholders – we need it now, you can optimize it later. And then later never comes.

This is so obvious, that I would not spend time writing about this yet another 101th time. However this time my thoughts have taken a different path. I thought about why I was unable to finish this optimization, why I didn’t have enough time. The reason was, that something else which I was writing have taken way more time than I originally planned. And the reason why it took so long is that I find myself unable to write, even as a first draft, the code which I know for sure is suboptimal. No matter how many people will tell me that “it does not have to be perfect”. And then – yes, it takes longer.

Now I think that when I teach an optimization class, and when I am showing to my audience some cool technique, my thought process is that everybody should realize how much better this technique is, and than use “that and only that” approach, because how-else-this-is-so-obviously-better. And as for my audience, people think – ok. It’s cool. Good to know. And do not change there code -writing habits…

Once again, I might have been just ignorant, and may be everybody knew it all along, but it was a revelation for me: if I want to teach people good coding habits, if I really want to avoid this situation “first write, then optimize”, not only that I need to show people how to write a coode code. And not only I need to show how not to write. But I also need to teach zero-tolerance to the bad coding. So that people’s minds won’t even wander in this direction:). If it is not written the right way, it is not done.

I am pretty sure nobody would like this idea, but can I at least have a dream :)?

3 Comments

Filed under SQL

2Q PGConf 2018 in Chicago – important update!

Dear friends, colleagues and followers! Do you remember, how awesome the first 2QPGConf in Chicago was? I really hope that the second 2QPGConf will be even better!

So what is my important update? Here: the talks submission deadline has been extended to September 12.

I think that this fact (the deadline have being moved to after the Labor Day and after the PG Open) will allow way more people to submit their talk proposals. I truly believe that you should not be “somebody” to submit a talk proposal. Let’s be honest: there are only “so many” new features in each of the Postgres releases, no matter how new and exciting they are. But each and single practical implementation is unique. Each and single faces its own challenges. And yes, the whole purpose of the knowledge exchange on the conferences is not to repeat other mistakes, to learn from the best practices, which might work… or not 🙂

In any case, here is my message: if you are doing something interesting with Postgres, please consider a talk submittion! To all the benefits of the local conferences I’ve listed in my post a year ago, I will just add one thing: if your talk is accepted, you attend the conference for free 🙂

So… see you there?!

Leave a comment

Filed under events, SQL, talks