Category Archives: Development and testing

Funny thing happened…

I have to share that:). I had a ticket to develop one operational report. Not only the report requirements were complex, but also it was very difficult to debug. We just started to collect the data required for this report, and we do not have enough of it to cover all potential issues. Moreover, since the data volumes are so small, the issues are resolved fast, so in two days, I never had a chance to catch a single case of exception. Until this morning. When a thought suddenly came to my mind, and I asked myself: Hettie, why in the world you are waiting for an exception to happen in real-time?! All your tables are bitemporal, so you can time-travel to any moment of the past, including the time, when exceptions occurred!

It’s funny and not funny that it took me two days to figure this out! Especially because I was the one who introduced bitemporality!

Worked as expected 🙂

Advertisements

Leave a comment

Filed under Data management, Development and testing

I knew I won’t like this feature! About the parallel execution

Many years ago, when Postgres 9.6 was still in making, my coworker said to me with excitement: Hettie, Postgres will now have the ability to run queries in parallel! There can be only four parallel processes, but still, isn’t in nice?!

And I remember exactly what I’ve replied: no, I do not like this feature a bit! You know why? Because executing queries in parallel would rarely solve performance problems. In fact, if four parallel workers would solve your performance problems, they were not really problems! It can do more harm than good, because it can mask some real performance problems for a while, and then they will turn just to be more severe.

What happened then – I’ve started a new chapter of my life at Braviant, and we had Postgres 9.5 then, and then for almost 3 years it was like I never had time to stop and upgrade :). But now I have a team, so we’ve finally planned the upgrade, and since we were already four versions behind, we planned upgrade to 9.6 and immediately to PG 10.

We’ve started from our Data Warehouse. First – in the staging environment, we we’ve tested, and observed the execution for some time. And then on the mirroring instance, and only then – on production.

And then it started! Seriously, out of all data refreshes this one is only one, which is important for the OLTP instance, because it sends data to one of our client interfaces. It started to behave inconsistently, Sometimes it would be just fine. Other times, instead of about 50 seconds it has been running for an hour, and probably won’t finish if we won’t kill it. Yes, it was obvious, that something did change in the execution plan after the upgrade. But what?! From the first glance the execution plan looked the same, all HASH JOINS, which you would expect, when you join tables with no restrictive conditions.

But it was still painfully slow. What was more puzzling – I could take out of the equation JOIN to any table, and performance would be unpredictable. After several dozen of attempts to make things run decided to take a closer look at the execution plan. And you know what I saw?! Yes, parallel execution/! Four joins were parallelized, which resulted in the execution time been really horrible. After the issue was found, the only thing left was to figure out, how to turn this feature off 🙂

Leave a comment

Filed under Data management, Development and testing, SQL

One more time on the state of optimization

I just have to show some of the execution time graphs, and how the have changed after the optimized versions of the respective functions were deployed:

I know that many people are wondering looking at the second image, why I am striving to optimize things which are already running super-fast?

It’s not because I am trying to demonstrate my superpowers, it’s because i know for the fact, that with the database size we currently have, that is the right execution time. What does it mean? it means, that if the execution time is more than that, it indicates the wrong execution plan.

All these optimizations have been performed on our OLTP database, which means that all of these queries are “small queries”, retrieving a relatively small number of records. Which implies, that the appropriate indexes should be used, and that the execution plans should show the NESTED LOOP join algorithm. When I see the execution time of 500 ms, it tells me that there is at least one full table scan inside. Which in turn, means, that the execution time will be increasing, when the data volumes will be growing. Which is not good, if we are building a scalable system.

Another important thing to consider is that all these small queries cannot be “parallelized” to speed up the execution. We are in the OLTP environment, not OLAP. I know that I can’t rely on switching to the larger AWS instance, because 1) this process gets out of control very fast 2) does not help. Seen the execution times like on both of these pictures, like “I can’t see it” just proves, that the functions are performing as expected.

Leave a comment

Filed under Data management, Development and testing, SQL

About one very controversial optimization

I meant to write about this optimization story for a while, not because somebody can use it as an example, but rather as a counter-example, a case of “what I’ve tough you so far is all completely wrong and not working”.

It all have started with the story which I’ve described in this blog post: the problem was, that indexes on the remote site were not being used, and in order to make things work I had to create a remote function, which would “inject” the constants into a remotely executed query and then too map the results. A couple of weeks later I had to face a similar problem which had also manifested itself only in production environment. But that time around  the performance degradation was so severe, that we had to rollback the changes.

First I thought that I will be able to deal with a similar problem in a similar way, but turned out that was not the case. The SQL involving remote tables was a little bit more complex, and the function didn’t help. I’ve realized that I needed to explicitly pass some constants to the dynamic query, which meant that I had to perform a crime against relational theory, braking one perfectly fine SQL into two, and passing the result of the first selection to the second statement. This performed better, but still not ideal. I continued to follow the same path. It became better. I’ve repeated it one more time. And one more time.

At the end it was a textbook example of how not to write SQL. My own lectures were always filled with this “how it will not perform” examples: first you select a person, then using the person id you select position, then… But what I did to optimize  performance my function was the same  thing almost to the letter. I needed to select all loans for a specific person and some of the loan-related details. And instead of writing one SQL I ended up doing the following:

  • having a person_id, select all accounts
  • generate a SQL with all these user accounts embedded as constants to retrieve loans
  • generate a set of SQL statements, which would access remote information for each of the loans individually
  • and actually there was more in the middle

A part of the problem is, that in PostgreSQL functions are not really stored in a truly “compiled” way, they should be  rather viewed as “stored sequences of actions”.  And parametrized statements, unlike those in different systems,  are not optimized properly. That’s why in most of the cases I  build dynamic SQL inside of the functions. But in the case I am describing here,  things were even worse. Even when I was executing a SQL statement for a specific person id, the indexes on user account  and loan where not really utilized (at least not always).  So in order for me to push through the usage of indexed, I had to explicitly pass the values to the generated SQL…

It all worked at the end, and performance was amazing, but I can’t even convince myself to post code snippets – it looks ugly 🙂

Oh, and another part of the equation is, that in all those SELECT statement were selecting a really small portion of the data from each of the tables, so there was not that many total executions.

The moral of the story:  trust, but verify!

Leave a comment

Filed under Data management, Development and testing, SQL

How to make optimization a priority

One of my usual/perennial rants is that many managers would tell you something like “business needs this functionality, we will optimize later”. And we all know what happens “later” – nothing. The most important thing I love about working at Braviant is that the tech team shares the same values, since I know quite well how often this is not the case, I appreciate it immensely. However, business might still think differently, and what’s good for business… well, that’s what we should do, since we all need to make money.

… After one of our recent releases our DBA noticed that one of the application functions slowed down significantly, specifically instead of executing for about 3 sec, it started to execute for 7-8 seconds. And immediately we all were alarmed. You might wonder – why? The 7 sec execution time is a good time, perfectly acceptable for the end users, especially because this function is executed not so often. Well… Didn’t I just say our tech team agree on priorities? We ll, we believe that the good user experience includes fast response time, and thereby our applications time out on 10 sec. And if a function’s average execution time is over 7 sec, the peak time can easily reach 10 sec!

I had to make a miracle… the sooner the better. Because, as you can imagine, I usually do not write bad queries. Well, most of the time:). Which meant I had to find some unusual optimization.

To tell the truth, I knew right away, why this function starter to perform slower. We added one new field to the distributed query (which was required by business stakeholders, or cause!), and to select this additional field I needed to join one more remote table. And all of a sudden, although all required indexes were on place, the optimizer would choose the full table scan. Of a huge table!

Not much I can do to explain the optimizer that they are wrong (are optimizers male or female, what do you think? 🙂 – they are male in my mother tongue, which explains a lot – they are always sure they know better how to make things work!). So I had to find a way to put this optimizer in such a situation, that there won’t be any way other than to utilize the indexes which were out there. First I thought it will be relatively easy – in all previous cases when a similar issue would occur, I would create a view on the remote node – but this time it didn’t work. I’ve conducted several experiments, and came to the conclusion that the only way to make it work is to implement one new technology, which I’ve played with a couple of months ago, but never implemented in production.

So…
– testing
– making sure it does what I want
– mapping through the foreign data wrapper and making sure it wors
– creating a new version of the function
– testing in lower environments
– QA
– implementing on staging and QA on both our products
– in production on both products

Total execution time of the process described above: around 3 hours.

Result: everybody happy, we’ve got one more groundbreaking technology, which I can guarantee nobody in the world is using (because documentation clearly says it’s impossible:)), and which we will be able to use in many other cases to reduce execution time. And all because we have our priorities right!

P.S. Are you anxious to find out what is this technology? Come to 2Q PgConf in Chicago this December!

4 Comments

Filed under Companies, Data management, Development and testing, SQL

The magic of the right answer

When you are in school, you think that the the most difficult questions are those on the final exam. I remember how I’ve anticipated this happy future when I will never-ever have to take any more test! Sure…

Besides of the fact that later I went on to get my PhD, and voluntarily subjected myself to the new battery of tests and exams, not mentioning the actual dissertation… besides of all this – the most difficult questions are not on the test. That just appear. Show up. In course of your daily work, when you take on a new project, or when you firmly believe that your project is almost done… and then comes a problem. This problem. And a cost of not coming up with a correct answer is way higher than getting B instead os A on your final.

Several days ago after a couple of back-to-back meetings I’ve got this feeling, that everything is falling apart. That my original design was only good in my head, snd that it did not survive the meeting with reality. And that felt like I’ve promised more than I can deliver, and that more than one project will need to be pushed back.

I left work a little bit earlier, biked from the train station home without any audiobook in my earbuds. The I called my mother so that this will be a checked box for that day. And then I sat down and started thinking. It might sound silly, but I’ve realized that the only thing I needed was to focus and to analyze all the inputs I’ve already had. I was not staring at the code. I was not even staring at the database schema. Not writing. Not drawing. Just imagining all of the things I need to fit into one picture in my head. “Imagine you are a database” :).

And then that was it! Before I could even analyze all the dependencies, I knew it was right! I’ve started going through all the difference scenarios in my head: yes, this will work! And in this case as well! And for the future, when we won’t need this particular functionality it will work as well! And it won’t require any additional data transformation!

The next thing I did – I’ve set up a meeting with my co-workers so that they could criticize me ideas and find the cases when “this” won’t work. (Fast-forward to the next day – we actually found more cases when “this” will work!)

And thinking about my 35 years in this profession – those are the best moments, the peaks of satisfaction. This very moment, when you can see the solution, and you know it’s the right one right away. It may be a super-cool optimization, when you improve a query execution time 500 times. It may be a new design, it may be a new way of writing functions. But this moment when you know that what you’ve just come up with is right – that’s the real magic.

The next nest thing is to call my husband and tell him what a cool thing I’ve just invented, and then run around the office with my laptop opened showing this cool thing to everybody 🙂

Leave a comment

Filed under Development and testing

There is nothing more permanent than temporal solutions!

I like this saying, because it always reminds me about the danger of doing something “just for now”, and I often cite it to my coworkers when I argue for a better, more stable solution. However, sometimes I would revert to these kind of tactics myself, and then…

Yesterday one of our external service providers was  moving our data to another server, and as a result I had to recreate the foreign servers associated with it, and cascade-recreate all the foreign  tables. I’d say I am reasonably organized, all all my data definitions are stored in a github repository. So that in the event like yesterday I just need to rerun a DDL script.

Which I did. And a half an  hour later I’ve started one of my daily jobs, which sometime crashes, that’s why I am always keeping a closer look on it. It crashed that time as well, but for an unexpected reason – one of the foreign tables used could not find it’s source… The name didn’t look familiar to me, and it was definitely not created in the morning. I’ve searched the github with no luck, then searched the “suspicious”  directories on my computer. Finally, I’ve started the global search on my computer and iCloud. And I’ve found a missing definition! Guess, what was the name of the file it was stored in? – temp_fix. sql  

🙂

Leave a comment

Filed under Data management, Development and testing, SQL