Tag Archives: Braviant

My Team Anniversary

Exactly one year ago, my team became three times bigger. So today is not only the first anniversary for my co-workers Sudheer and Tejas but also the first anniversary of the Braviant database team.

Thank you guys for doing an outstanding job!

img_3088

Advertisements

Leave a comment

Filed under People, Companies, Team and teamwork

My talk: What is a database?

I’ve presented this talk a week ago as a part of “Braviant Talks”, where people from different departments of our company talk about what their department is doing. It intended to be as non-technical as possible, which I think was achieved, at least to some extent, and … I just like how it turned out:). Enjoy 🙂

Leave a comment

Filed under Companies, events, talks

Three years with Braviant Holdings

LinkedIn has announced it a little bit earlier, but my actual 3-year anniversary was yesterday, April 18.  Three years ago I wrote in my journal:

Real people. The calm state of mind I haven’t had for so long, I can’t even remember. Meaningful conversations. I can talk about important things, and people listen My opinion matters. I am happy.

Three years later I can repeat all of the above.

 

 

 

Leave a comment

Filed under Companies, events, People

I knew I won’t like this feature! About the parallel execution

Many years ago, when Postgres 9.6 was still in making, my coworker said to me with excitement: Hettie, Postgres will now have the ability to run queries in parallel! There can be only four parallel processes, but still, isn’t in nice?!

And I remember exactly what I’ve replied: no, I do not like this feature a bit! You know why? Because executing queries in parallel would rarely solve performance problems. In fact, if four parallel workers would solve your performance problems, they were not really problems! It can do more harm than good, because it can mask some real performance problems for a while, and then they will turn just to be more severe.

What happened then – I’ve started a new chapter of my life at Braviant, and we had Postgres 9.5 then, and then for almost 3 years it was like I never had time to stop and upgrade :). But now I have a team, so we’ve finally planned the upgrade, and since we were already four versions behind, we planned upgrade to 9.6 and immediately to PG 10.

We’ve started from our Data Warehouse. First – in the staging environment, we we’ve tested, and observed the execution for some time. And then on the mirroring instance, and only then – on production.

And then it started! Seriously, out of all data refreshes this one is only one, which is important for the OLTP instance, because it sends data to one of our client interfaces. It started to behave inconsistently, Sometimes it would be just fine. Other times, instead of about 50 seconds it has been running for an hour, and probably won’t finish if we won’t kill it. Yes, it was obvious, that something did change in the execution plan after the upgrade. But what?! From the first glance the execution plan looked the same, all HASH JOINS, which you would expect, when you join tables with no restrictive conditions.

But it was still painfully slow. What was more puzzling – I could take out of the equation JOIN to any table, and performance would be unpredictable. After several dozen of attempts to make things run decided to take a closer look at the execution plan. And you know what I saw?! Yes, parallel execution/! Four joins were parallelized, which resulted in the execution time been really horrible. After the issue was found, the only thing left was to figure out, how to turn this feature off 🙂

Leave a comment

Filed under Data management, Development and testing, SQL

One more time on the state of optimization

I just have to show some of the execution time graphs, and how the have changed after the optimized versions of the respective functions were deployed:

I know that many people are wondering looking at the second image, why I am striving to optimize things which are already running super-fast?

It’s not because I am trying to demonstrate my superpowers, it’s because i know for the fact, that with the database size we currently have, that is the right execution time. What does it mean? it means, that if the execution time is more than that, it indicates the wrong execution plan.

All these optimizations have been performed on our OLTP database, which means that all of these queries are “small queries”, retrieving a relatively small number of records. Which implies, that the appropriate indexes should be used, and that the execution plans should show the NESTED LOOP join algorithm. When I see the execution time of 500 ms, it tells me that there is at least one full table scan inside. Which in turn, means, that the execution time will be increasing, when the data volumes will be growing. Which is not good, if we are building a scalable system.

Another important thing to consider is that all these small queries cannot be “parallelized” to speed up the execution. We are in the OLTP environment, not OLAP. I know that I can’t rely on switching to the larger AWS instance, because 1) this process gets out of control very fast 2) does not help. Seen the execution times like on both of these pictures, like “I can’t see it” just proves, that the functions are performing as expected.

Leave a comment

Filed under Data management, Development and testing, SQL

New features are available in the bitemporal repo – and I am so happy about it!

I Really hope that most of my follows know something about the pg_bitemporal project, because if you didn’t hear about it, you won’t be able to share my excitement!

We started to build our bitemporal library for PostgreSQL about four years ago, it was merely a “proof of concept”, and Chad Slaughter, who initiated all this work, knowing my work habits way too well, was re-iterating again and again – do not optimize it yet!

Well, I didn’t, but then I’ve joined Braviant Holdings, and a year later I was granted a permission to use our bitemporal framework in production. Some of the performance flaws became apparent even during the test run, and I was able to fix them. Later, while we were using it in production more and more, I’ve come up with new functions, UPDATE_SELECT and CORRECT_SELECT, since we actually needed them, and since the bitemporal operations were supposed to behave the same way as regular database operations.

About three weeks ago we had a very important release, which along with addressing multiple business needs, included some significant changes on the technical side. One of the consequences was, that it significantly increased the traffic on our new planform, and as a result we started to see some timeouts.

Although these timeouts were pretty rare, we saw them as a problem. I personally pledged the system will remain scalable, and now I couldn’t just go with “bitemporal updates are slow”. Yes, the execution time was at 2 to 3 seconds most of the time, but sometimes it would spike, and our microservices have a hard timeout at 10 seconds.

Some time ago I’ve already mentioned in this blog, how thankful I am for those timeouts! Nothing else foster innovation more than a necessity to address performance problems immediately, because they have a direct impact on production.

This time around I was 99.9% sure that the periodic slowness happens during the remote query, which is a part of the problematic function. Turned out, though, that this 0.01% was the case, and together with our DB team we were able to determine, that the problematic statement was the last UPDATE in the bitemporal update function. If you’d ask me a week before that, I would say, that I am not going to address the bitemporal performance for the next several months, but I had no choice.

Thanks to Boris Novikov, who helped me immensely in testing and verifying several different approaches, and eventually identified the best one, and to Chad Slaughter, who was merging my commits from 7-30 AM to 9-30 PM, so that the master branch of the bitemporal library would have the latest updates by the time of the release, and thanks to our amazing QA team, who had to run and rerun tests that day multiple times, the new bitemporal functions are now on place. Not only for Braviant Holdings, but for the whole community.

I would also like to mention, that since I was already changing the functions, I’ve fixed one long-overdue issues: all functions have versions, which are PG 10 compliant. We’ve left the old versions there, because some of the are used in the existing production systems but if you are just starting, you can use the new ones.

Check it out at https://github.com/scalegenius/pg_bitemporal

Leave a comment

Filed under news, research, SQL, Team and teamwork

Braviant Holdings talks at 2Q PG Conf

Better later, than never: by popular demand here are the videos of both talks from Braviant Holdings, presented at 2Q  PG Conf. Enjoy 🙂


 

Leave a comment

Filed under Companies, events, talks