Category Archives: SQL

How to Drop a User

Do you have some “dead” users in your production database? Users which belonged to some former employees, who are long gone, but their Postgres users still maintain their ghostly presence in your environment? Just be honest :). I know that all of us should have some process in place to drop a user when somebody leaves the company. Most of the time, we are doing it promptly. But sometimes…

Sometimes, when we are trying to drop a user, we get an error: that user owns some objects. And there is no “DROP CASCADE” option for a user. Moreover, most likely, you do not want to drop cascade because that user might own some critical objects you do not want to lose!

Yes, we should not have such situations in the first place, and you might even have a policy stating that objects should never belong to the individual users, only to the group roles. But things happen. Maybe, somebody was in a hurry fixing production issues. Maybe, something was created “temporarily.” In any case, you can’t drop this user. You need to reassign the objects the user owns to the correct user/role. You need to revoke permissions. In case you can understand what in the world these permissions mean!

Several weeks ago, I was facing one of these situations. I successfully reassigned objects and revoked permissions, except for a couple. These last ones were weird, and I could not revoke then no matter what! What I mean, I was issuing a revoke command, it would complete successfully, but permissions still stayed! Apparently, I did not know what I was doing :).
I had to rely on the last resort – ask Grandpa Google. And I quickly discovered a command I didn’t use before: DROP OWNED BY. Looks like exactly what I wanted, almost DROP CASCADE, right? Wrong! Because in contrast to DROP CASCADE, it won’t tell you what exactly it dropped! When you issue DROP CASCADE in a transaction, it would tell you which objects it is going to drop, and you can make sure you are not accidentally dropping something you didn’t mean to drop. But this command executes in dead silence!

I can’t even describe how scary it was to actually press ENTER:)

… In case you think I have no fear!

Leave a comment

Filed under Data management, SQL

Introducing NORM repo

In the course of the past two and a half years, I gave an endless number of talks about “our JSON thing,” which we now call NORM. And since my very first presentation, people would ask me whether I can show an example of the code developed using this approach. And I never had a good answer. I do not know what took me so long, but now I finally have it. 

For those who love an idea of the wold without ORM, and who want to start “doing it the right way,” the working example of the NORM technology can be found here. 

Please take a look! We are planning to add to this repo a small java program that will use the PostgreSQL functions. 

And I would love to hear from people who would

  • clone this repo and let us know what does not work or how documentation can be improved
  • develop a simple app on ruby on rails, which will use the same Postgres functions 
  • utilize the same approach with any other database/application combination

Looking forward to the feedback!

2 Comments

Filed under Development and testing, research, SQL

There was an index – but not the right one!

Several months ago, I came up with what I thought was a great idea: an easy algorithm to quickly verify whether a SELECT will execute in a reasonable time. Writing down these simple rules helped me a lot, especially in all too frequent situations when a report has to be deployed in production immediately because it is a business emergency. 

One of those happened a couple of weeks ago on Friday. Yes, I know – who in the world deploys anything to production on Friday afternoon?!  Apparently, I do, especially when the word “emergency” is in the air. 

A person who developed this report told me that in the staging environment, it was executed “as before,” about 30 minutes. It as unacceptably slow, but I didn’t have time to optimize, and I could not block this deployment. It was an incremental refresh; I checked the execution plan – it was using indexes, and I said – OK. And deployed it.

The next thing I knew was that it was running and running and running. And I had to kill it because it was blocking other processes. Then I decided to leave it to run overnight and see whether it is going to complete ever. It ran for four hours! I took it out of the schedule. Now I had to figure out how to run it :). I looked for the delta criteria and tried to run just this one – and saw a full scan. And then, I ran an execution plan for the whole query again. And looked at it closer. Well… it was using an index in the inner join, but … the wrong index! 

You know what I mean: there are situations when the using index is slower than a table scan – it was one of those! And it happened because the field which was the driver of the delta was not indexed at all! After all, the solution was easy: I created the two missing indexes, then together with the developer, we removed a couple of unnecessary checks, and then the whole report took just 30 seconds to run. 

A little bit better than four hours, right?!

2 Comments

Filed under Data management, Development and testing, SQL

More Updates on the State of Bitemporality

A month ago, I’ve proudly announced that I was able to modify bitemporal correction to allow correction of any effective interval of past and present, not just the current one. I was planning to embed this capability into the regular bitemporal correction, but after a couple of weeks of hesitation I decided against it.

The reason is very egotistical :). At Braviant Holdings, we use bitemporal correction in our production application, and the response time is critical. I already know that the generalized case of bitemporal correction is slightly slower due to the additional checks, and I am not going to risk the performance of what’s already in place. This being said, it shouldn’t make a significant difference for those who is using pg_bitemporal at their own risk, since the difference in the execution time is just “times two,” so if your original correction is optimized to 0.8 sec, the new one may be 1.7 sec. Makes difference for us, though.

Howeer, the function is out there and available, so if you want to try it out, check out ll_bitemporal_correction_hist.

Another update is relate to my attempts to create a barrier for straight insert/update to bitemporal tables (without compromising performance).

My original idea, that it would be possible to revoke  simple INSERT/UPDATE/DELETE from bitemporal tables and make bitemporal operations SECURITY DEFINER won’t work.


The reason is that if we create bitemporal functions as SECURITY DEFINER, and definer is a database owner, then when we grant any user permission to execute this bitemporal function on ANY bitemporal table. That means, we won’t be able to restrict access to specific tables, because we can’t grant execution based on the function parameters (again, theoretically possible, but requires creating additional objects)Now I am back to the trigger idea again.  What could be theoretically done is to have INSTEAD triggers, which would just disable insert/update/delete and then to disable these triggers inside functions. But this again requires a higher level of permissions. 

Obviously, anybody who want to instill this level of data safeguarding, can create their own aliased to bitemporal functions, related to specific tables or schemas, but that will be a custom build, not anything I can provide generically.

At the moment I am out of new ideas, just wanted to state it officially. And if anybody is interested in how to create these custom-built functions – please feel free to reach out!

pg_bitemporal public repo can be found here.

Leave a comment

Filed under Data management, SQL

A Word About JOIN_COLLAPSE_LIMIT

I remember when I learned about JOIN_COLLAPSE_LIMIT parameter in Postgres. And I remember the unfortunate circumstances of that encounter. Specifically, it was set to eight, which caused a relatively simple SELECT to be executed in a completely wrong way. 

Since I didn’t know anything about that parameter at the moment, it looked to me like the order of joins in the SELECT statements in Postgres matters, and it sounded very wrong and the opposite of what documentation said.

But then I learned about JOIN_COLLAPSE_LIMIT, and suddenly all the execution plans magically fixed themselves after I changed it for 12. I performed a lot of experiments back then trying to find an optimal value for that parameter, and at the end of the day I decided, that no matter how many tables I am joining, changing it to anything above 12 does not help. Either the execution plan does not improve (you have 16 tables in the join, and you think you change the limit to 16, and the query would fix itself – and it does not). Or, if you change it to more than 20, the optimization takes too much time.

However, a couple of weeks ago, somebody requested my help in query optimization. The problem was that it was not a single query, but rather a query framework, and a person would run several queries in sequence, each time adding or removing search criteria that could belong to any of the tables involved in this SELECT.

Yes, “if it would be me,” I would write the SELECT statement differently each time, but that was not something I could suggest to a person who asked for my help. That would be too complicated. I looked at the query, estimated the number of joins and suggested, that at the beginning of each session they would execute 

SET JOIN_COLLAPSE_LMIT=16

And it worked!!!! I do not know what’s so funny in the fact that it worked as expected… but I was really amused 🙂

2 Comments

Filed under SQL

New Bitemporal Correction

It might take several more days, till the new ll_bitemporal_correction function will appear in pg_bitemporal GitHub repo, but it is ready. I finished development and basic testing, and it works.

It means nothing for those who do not care about “the state of bitemporality”, but I know that some people care. For those people, I am happy to report, that finally I delivered what you guys have asked for a long time. Now, you can make bitemporal correction not just to the last effective interval, but to any stretch of time. It may cover several historical intervals, it may start in the middle. It can’t end i the middle, because I didn’t come up with any business case for such situation, but if there will be a demand, I am sure I can provide :). Overall – I am very happy about this result :). Stay tunes – I will post when it will be officially available.

On a different note – we are almost P12 compatible, only bitemporal referential integrity still does not work, but it will be fixed shortly.

Huge thank you for everybody who show interest in this activity:)

2 Comments

Filed under news, SQL

Chicago PUG January Meetup – Great Start for 2020

It has been for three years now, that Bravinat is hosting Chicago PUG meetups. Today we’ve rung in 2020 with two excellent presentations and very productive follow-up discussions.


I want to thank one more time all the user group members who joined us today.

Looking forward to another great year ahead!

Engaged audience
Our speakers

1 Comment

Filed under events, SQL, talks

Simple Rules for Better SQL

I suppose I am not the only person facing this problem. From time to time, you come across some report which is running very slow. Or just a stand-alone SQL statement, which somebody in your company needs to execute once a month, and they might not even remember who wrote it. And you look at 200 lines of code, and think – how I can even start to optimize it? 

It usually helps if a person who wrote the code is still around. Even better, if this report was written just recently and the requirements are still fresh in people’s minds. Still – what would you do, if you have very limited time, very convoluted SQL, and you need to figure out how to make it better so that it won’t block ten other processes? 

For a while, I’ve been trying to come up with simple guidelines for report writers, which would help to ensure that their reports perform decently without anybody spending ten times more time. It turned out that a very short list of rules can cover pretty much all the situations. We’ve been using these guidelines for a while now, and I am pretty happy with the results. Here is the list:

  • For each report, figure out whether it is an “exception” report or a “summary” report. Exception means that we are looking for some errors, for abnormal execution. Such reports should return nothing if everything goes normally. Summary means that we want to process a large number of records and calculate some results, something like quarterly earnings. 
  • If a report in question is an “exception report,” define the criteria for the exception (what makes this report an exception). Make sure there is an index that can be used to select rows based on this criterion and make sure that this criterion is applied first in the execution plan. 
  • If a report is a summary, look for a way to update it incrementally. Make sure that there is an index to facilitate this search (most often, the increment will be by the time last updated, and this field should be indexed).

If you can find an example of the query, which does not fit into either of the categories and/or can’t be optimized using these rules – let me know 🙂 

Leave a comment

Filed under SQL

PostgreSQL And Academia

Recently I’ve been thinking a lot about relationships between the PostgreSQL community and the DB research community. To put it bluntly – these two communities do not talk to each other!

There are many reasons why I am concerned about this situation. First, I consider myself belonging to both of these communities. Even if right now I am 90% in industry, I can’t write off my academic past and writing a scientific paper with the hope of being accepted to the real database conference is something which appeals to me.

Second, I want to have quality candidates for the database positions when I have them. The problem is more than scientists do not speak at the Postgres conferences, and Postgres developers do not speak at the academic conferences. The bigger problem is that for many CS students, their academic research and practical experience to not intersect at all! They study some cool algorithms, and then they practice their SQL on MySQL databases, which as I have already mentioned multiple times, lacks so many basic database features, that it hardly can be considered a database!

If these students practiced using PostgreSQL, they would have a real full-scale object-relational database, not a “light” version, but a real thing, which supports tons of index types, data types, constraints, has procedural language, and the list can go on and on.

It is especially upsetting to see this disconnect since so many database researches were completed on Postgres, for Postgres, with the help of Postgres; R-trees and GIST indexes, to name a couple. Also, the SIGMOD Test of Time Award in 2018 was given to the paper “Serializable isolation for snapshot databases”, which was implemented in Postgres.

I know the answer to the question “why they do not talk?” Researches do not want to talk at the Postgres conferences, because those are not scientific conferences, and the participation in these conferences will not result in any publication. Postgres developers do not want to talk at the CS conferences, because they do not like to write long papers :), and also, even if they do submit something, their papers often are rejected as “not having any scientific value.”

I know the answer. But I do not like it :). So maybe – we can talk about it?!

2 Comments

Filed under research, SQL, Systems

Let’s Go Bitemporal!

Dear friends and followers from the Postgres community! Today, let’s talk more about the bitemporal library (as if I did not speak enough about it yet!).

We have been developing Postgres functions, which support bitemporal operation for almost four years by now. We have found our initial inspiration in the Asserted Versioning Framework (AVF), first introduced by Jonson and Weiss nearly twenty years ago. There is nothing new in the concept of incorporating time dimension into data, and even the concept of two-dimensional time is not new. However, we believe that AVF approaches the task in the best possible way and that it allows making the time a true and integral part of data.

We believe that Postgres is suited the best to support a two-dimensional time due to the tow factors: the presence of the interval types and GIST with exclusion constraints. Having these two available made the process of implementation of the concept more or less trivial.

Implementation of bitemporal operations took some time, though, and we are still in the process of improving some of the functions. However, we are happy to share with the world, that Bravinat Holdings runs both OLTP and OLAP databases on the bitemporal framework with no performance degradation. Since we had an opportunity to develop as we go, we could address lots of issues in this implementation, which we initially did not even expect.
Recently we have uploaded several files into the docs section of the pg_bitemporal GitHub repo, including several presentations and short papers so that those who are interested can read more on the theory of bitemporality. We hope that people will give it a try – it works! Also, we are always looking for volunteers who will be interested in collaboration.

Please check us out at https://github.com/scalegenius/pg_bitemporal

Leave a comment

Filed under research, SQL, Systems