Monthly Archives: September 2016

Our bitemporal presentation at PG Open 2016

Our talk was the first talk of the third day of conference, and it went really well. By now I’ve presented bitemporal concepts at least ten times for different audiences, and I would say that this presentation was one of the best  – in terms of the interest expressed by the listeners, the questions we’ve got and overall enthusiasm this talk had generated:). Seriously, I didn’t expect that many people actually have read Snodgras works; didn’t quite expect such thoughtful questions about implementation details and performance. And it felt so good when people would approach me later in the day not just saying “great presentation” but really wanting to talk in details about our work.

The best work to describe my feelings would be – empowered!

20160915_090854

The ppt of our presentation is available here:

bi_temporal_pg_open_2016

Advertisements

Leave a comment

Filed under events, People, research, talks

PG Open – day two

Technically, it was day one, since tutorials “does not count”, but since there were only 3 days total, I consider it day two :).

After Frost’s opening remarks there was a Diamond Keynote of Simon Riggs (2Quadrant was a conference Diamond sponsor)

20160914_085600

20160914_085603

Continue reading

Leave a comment

Filed under events, People, talks

PG Open 2016 – day one

Just got back from Dallas. These three days were really great! This is the sixth PG Open I am attending, and each year the conference getting better and better. This year was probably the first time I didn’t come across any “so-so” presentations. Literally each and single one were worth attending.

Same goes with the overall technical level of the participants- it feels great to receive meaningful questions, and feel connected to the audience.

The first day was a day of tutorials; I’ve only attended the Advanced SQL and functions. It’s always fascinating to be able to “feel and touch” the internals of Postgres, to be able to ask the first hand: so, that’s what I see, can you explain me why it behaves that way?  I’ve asked about the indexes on jsonb, because each time I am asked, why not to use jsonb type, if the object structure is likely to change. I am always replying, that it’s OK to use a jsonb object as long as you do not really care about what’s inside and treat it as a whole, but if you want to search by something…

And then people immediately tell me: but now you can index jsonb, what’s the problem? Well it is a problem, because those indexes are not even remotely as efficient and standard b-tree indexes. I’ve being telling this to everybody since these indexes were introduced; but now I have it first-hand :). Unfortunately Steven Frost didn’t have any comparison statistics, but at least I have his word :).

There were two other things of “I’ve always thought so, but didn’t have a proof”. One – that most of the time multi-field indexes perform worse, that several single-field indexes with bitmap matching. I would say the only situation they are beneficial is when you can use an index-only scan. And the second thing – that if you want you anti-join to be executed efficiently, you should perform an outer join and then filter by condition “id in this other table is null”. Once again, I never had any doubts about that, but I was always criticized for that!

I hope that those tutorials were recorded and will be available next week 🙂

Leave a comment

Filed under SQL, talks

PG Open 2016 starts tomorrow!

And I will be there! I am just thinking – it’s amazing how many people who are going to attend I know, and even more amazing, that all these people belong to soooo different parts of my life, that I would never imagine them being at the same place at the same time. And yet – this will start tomorrow!

This being said… if for any reason you didn’t know until now that the PG Open starts tomorrow, and that I will be there, and if by any chance you are somewhere around Dallas… or not around but willing to travel… Just saying… I love surprises :))

PG Open 2016

 

Leave a comment

Filed under events, research, talks

Duplicates removal

I do not think I’ve ever published in this blog the simple way of removing duplicates from the table. I’ve faced this problem multiple times in different environments, first coming up with a solution during my consulting engagement with the City of Chicago and later virtually everywhere I worked.  I believe now is the time to finally document this solution.

First, let’s define the problem, and let’s specify more precisely what “duplicates” mean in this context. First, let’s assume that the table with duplicates has a primary key. Although the SQL standard does not require to have a PK constraint on each table, it’s considered a good practice to have one, even if it is just a sequential number assigned to the record on insertion. If all other attributes, except the PK are the same in two records, let’s consider them duplicates.

Most of the time duplicates are really not intended to happen and they occur due to some programmatic error in the insertion process. After the error is discovered and corrected, we usually want to remove the duplicates, leaving just one copy of each set of attributes. Yes, we can select distinct values into a temporal table, truncate and re-insert, but this process would initiate a long exclusive lock (possibly prohibitively long) on the table, which is not always possible in production environment. Besides, if there may be memory limitations, id the table is big enough.

To remove duplicates fast, I’ve developed a very efficient algorithm, which can be schematically represented as follows:


v_prev_record :={0,' ',' ',' ' ,' ');
FOR rec in (SELECT t.pk, t.attr1, t.attr2, t.attr3 FROM table_name t
ORDER BY 2,3,4,1 LOOP
IF v_prev_record.attr1=rec.attr1 AND
v_prev_record.attr2=rec.attr2 AND
v_prev_record.attr3=rec.attr3
THEN
DELETE table_name WHERE pk=rec.pk;
END IF;
v_prev_record.attr1:=rec.attr1;
v_prev_record.attr2:=rec.attr2;
v_prev_record.attr3:=rec.attr3;
END LOOP;

This way only “the first” of identical records will remain in the table.

When we are just removing dups, this might look like somewhat excessive  amount of work, but often the table with the dups also serves as a reference in the foreign key constraint, and then we need to modify a dependent table as well, and this might REALLY take long time! Now check out, how a very small modification to the code above can make this work easy!

v_prev_record :={0,' ',' ',' ' ,' ');
v_pk_keep :=0;
FOR rec in (SELECT t.pk, t.attr1, t.attr2, t.attr3 FROM table_name t
ORDER BY 2,3,4,1 LOOP
IF v_prev_record.attr1=rec.attr1 AND
v_prev_record.attr2=rec.attr2 AND
v_prev_record.attr3=rec.attr3
THEN
UPDATE dependent_table SET fk_attr:=v_pk_to_keep
WHERE fk_attr=rec.pk; --we are about to delete this dup
DELETE table_name WHERE pk=rec.pk;
ELSE
v_pk_to_keep:=rec.pk;--this is a "new" value we keep
END IF;
v_prev_record.attr1:=rec.attr1;
v_prev_record.attr2:=rec.attr2;
v_prev_record.attr3:=rec.attr3;
END LOOP;

Isn’t it just beautiful?!

5 Comments

Filed under SQL