Category Archives: research

PostgreSQL And Academia

Recently I’ve been thinking a lot about relationships between the PostgreSQL community and the DB research community. To put it bluntly – these two communities do not talk to each other!

There are many reasons why I am concerned about this situation. First, I consider myself belonging to both of these communities. Even if right now I am 90% in industry, I can’t write off my academic past and writing a scientific paper with the hope of being accepted to the real database conference is something which appeals to me.

Second, I want to have quality candidates for the database positions when I have them. The problem is more than scientists do not speak at the Postgres conferences, and Postgres developers do not speak at the academic conferences. The bigger problem is that for many CS students, their academic research and practical experience to not intersect at all! They study some cool algorithms, and then they practice their SQL on MySQL databases, which as I have already mentioned multiple times, lacks so many basic database features, that it hardly can be considered a database!

If these students practiced using PostgreSQL, they would have a real full-scale object-relational database, not a “light” version, but a real thing, which supports tons of index types, data types, constraints, has procedural language, and the list can go on and on.

It is especially upsetting to see this disconnect since so many database researches were completed on Postgres, for Postgres, with the help of Postgres; R-trees and GIST indexes, to name a couple. Also, the SIGMOD Test of Time Award in 2018 was given to the paper “Serializable isolation for snapshot databases”, which was implemented in Postgres.

I know the answer to the question “why they do not talk?” Researches do not want to talk at the Postgres conferences, because those are not scientific conferences, and the participation in these conferences will not result in any publication. Postgres developers do not want to talk at the CS conferences, because they do not like to write long papers :), and also, even if they do submit something, their papers often are rejected as “not having any scientific value.”

I know the answer. But I do not like it :). So maybe – we can talk about it?!

Advertisements

2 Comments

Filed under research, SQL, Systems

Let’s Go Bitemporal!

Dear friends and followers from the Postgres community! Today, let’s talk more about the bitemporal library (as if I did not speak enough about it yet!).

We have been developing Postgres functions, which support bitemporal operation for almost four years by now. We have found our initial inspiration in the Asserted Versioning Framework (AVF), first introduced by Jonson and Weiss nearly twenty years ago. There is nothing new in the concept of incorporating time dimension into data, and even the concept of two-dimensional time is not new. However, we believe that AVF approaches the task in the best possible way and that it allows making the time a true and integral part of data.

We believe that Postgres is suited the best to support a two-dimensional time due to the tow factors: the presence of the interval types and GIST with exclusion constraints. Having these two available made the process of implementation of the concept more or less trivial.

Implementation of bitemporal operations took some time, though, and we are still in the process of improving some of the functions. However, we are happy to share with the world, that Bravinat Holdings runs both OLTP and OLAP databases on the bitemporal framework with no performance degradation. Since we had an opportunity to develop as we go, we could address lots of issues in this implementation, which we initially did not even expect.
Recently we have uploaded several files into the docs section of the pg_bitemporal GitHub repo, including several presentations and short papers so that those who are interested can read more on the theory of bitemporality. We hope that people will give it a try – it works! Also, we are always looking for volunteers who will be interested in collaboration.

Please check us out at https://github.com/scalegenius/pg_bitemporal

Leave a comment

Filed under research, SQL, Systems

Bitemporal documentation is available!

Everybody who was curious enough to start using our pg_bitemporal github repo would complain about the lack of documentation, so we’ve tried really hard to provide our follows with some guidance.

What we have now, is very far from perfect, but if you go to the docs directory, there is a lot of documentation, including our old presentations, explanations of the basic bitemporal concepts and most importantly first ever bitemporal functions manual, which we promise to make more readable in the nearest future. Meanwhile – please share your feedback! Thank you!

Leave a comment

Filed under Data management, research

How to build this nested JSON

This post is in response to multiple requests from the PG Conf attendees to provide a “schematic” example of how to build a record set and to convert it to a JSON object (basically to illustrate our “Connecting Galaxies” talk.

Actually, if you click on the link above and download the presentation, you will see an extra slide at the end, which provides the link to the code below… but we know that people do not like “clicking the links” 🙂

So – here is it one more time!

--- Building complex JSON objects from multiple tables --
--- Deriving aggregates json_agg and n_array_agg
--- wrapping an array (of rows) into JSON converted to text for JDBC ----transfer 

— Test aggregation
create table aa (a_id text, a_name text);
insert into aa values (‘1’, ‘Anna’),(‘2′,’Bob’);

—- Aggregation on second level —–
create table bb (b_id text, ab_id text, b_num text, b_ph text);
insert into bb values (‘101’, ‘1’, ‘101-101’, ‘(800)-123’);
insert into bb values (‘1012’, ‘1’, ‘1012102’, ‘(800)-1234’),
( ‘201’, ‘2’, ‘201-201’, ‘(800)-1345′);

— add one more embedded array on the second level —–
create table cc (c_id text, ac_id text, c_st text, c_more text);
insert into cc values
( ’11’, ‘1’ , ‘ stst’, ‘more-more’),
( ’12’, ‘1’ , ‘ swsw’, ‘more-less’);
insert into cc values
( ’21’, ‘2’ , ‘ stst-2’, ‘more-mo2222re’),
( ’22’, ‘2’ , ‘ sws222w’, ‘more-less22’);

—- add third level , dd will be under bb —-

create table dd (d_id text, bd_id text, d_value text);
insert into dd values
( ‘1101’ , ‘101’, ‘dv11’),
( ‘1102’ , ‘101’, ‘dv12’),
( ‘2101’ , ‘201’, ‘dv22’);

—- CTE might be helpful to avoid multiple processing of big top-level tables —-

create or replace
function array_agg_next (agg_sta anyarray, val anyelement, b boolean)
returns anyarray as
$$
declare
out_array alias for $0;
begin
if b then
if agg_sta is null then
out_array := array [val ];
else
out_array := agg_sta || val ;
end if;
else
out_array := agg_sta ;
end if;
return out_array;
END;
$$ LANGUAGE plpgsql;

drop
function array_agg_final (agg_sta anyarray);
create or replace
function array_agg_final (agg_sta anyarray)
returns anyarray as
$$
declare final_array alias for $0;
begin
final_array := agg_sta;
return final_array ;
END;
$$ LANGUAGE plpgsql;

—–======================================================
create or replace
function array_agg_next (agg_sta anyarray, val anyelement, b boolean)
returns anyarray as
$$
begin
if b then
if agg_sta is null
then agg_sta := array [val ];
else agg_sta := agg_sta || val;
end if;
end if;
return agg_sta;
END;
$$ LANGUAGE plpgsql;

drop aggregate n_array_agg (anyelement, boolean);
CREATE AGGREGATE n_array_agg (anyelement, boolean) (
sfunc= array_agg_next,
— FINALFUNC = array_agg_final,
STYPE = anyarray
);

drop type if exists jaa cascade;
create type jdd as (ddi_key text, dd_value text);
create type jcc as (cc_key text, cc_st text, more text);
Create type jbb as (bb_key text, num text, ph text, “DD” jdd[]);
create type jaa as (p_key text, p_name text, “BB” jbb[], “CC” jcc[]);

—- collect rows of a specific type into an array, depends on the output type —–
create or replace
function collect_items (s text)
returns jaa[]
LANGUAGE plpgsql as
$body$
declare
result jaa[];
begin
select array_agg(single_item) into result
from
(select
row (
a_id, a_name,
n_array_agg( row (b_id, b_num, b_ph, d_agg )::jbb , b_id is not null),
n_array_agg( row (c_id, c_st, c_more )::jcc, c_id is not null)
) ::jaa as single_item
from
(
select a_id, a_name, b_id, b_num, b_ph, c_id, c_st, c_more,
n_array_agg (row(d_id, d_value)::jdd, d_id is not null) as d_agg
from
(
select a_id, a_name, b_id, b_num, b_ph, null as c_id, null as c_st, null as c_more, d_id, d_value
from aa join bb on a_id = ab_id
left join dd on b_id = bd_id
UNION ALL
select a_id, a_name, null as b_id, null as b_num, null as b_ph, c_id, c_st, c_more, null as d_id, null as d_value
from aa join cc on a_id = ac_id
) temp_table_3
group by
a_id, a_name, b_id, b_num, b_ph, c_id, c_st, c_more
) temp_table_2
group by a_id, a_name) items
where 0=1
;
return result;
end;
$body$;

—– just another output type —-
create or replace
function collect_subitems (s text)
returns jbb[]
LANGUAGE plpgsql as
$body$
declare
result jbb[];
begin
select array_agg(single_item) into result
from
(
select row(
b_id, b_num, b_ph, n_array_agg (row(d_id, d_value)::jdd, d_id is not null)
)::jbb as single_item
from bb left join dd on b_id = bd_id
group by b_id, b_num, b_ph
) items;
return result;
end;
$body$;

— wrap array into a set of text strings containing the json representation of array elements
— This function does not depend on any type
create or replace
function array_transport (all_items anyarray) returns setof text
RETURNS NULL ON NULL INPUT
LANGUAGE plpgsql as
$body$
declare
item record;
begin
foreach item in array all_items
loop
return next( to_json(item))::text;
end loop;
end;
$body$;

select * from array_transport (collect_items(‘a’));

select * from array_transport (collect_subitems(‘a’));

===============================

 

Leave a comment

Filed under research, SQL, talks

New features are available in the bitemporal repo – and I am so happy about it!

I Really hope that most of my follows know something about the pg_bitemporal project, because if you didn’t hear about it, you won’t be able to share my excitement!

We started to build our bitemporal library for PostgreSQL about four years ago, it was merely a “proof of concept”, and Chad Slaughter, who initiated all this work, knowing my work habits way too well, was re-iterating again and again – do not optimize it yet!

Well, I didn’t, but then I’ve joined Braviant Holdings, and a year later I was granted a permission to use our bitemporal framework in production. Some of the performance flaws became apparent even during the test run, and I was able to fix them. Later, while we were using it in production more and more, I’ve come up with new functions, UPDATE_SELECT and CORRECT_SELECT, since we actually needed them, and since the bitemporal operations were supposed to behave the same way as regular database operations.

About three weeks ago we had a very important release, which along with addressing multiple business needs, included some significant changes on the technical side. One of the consequences was, that it significantly increased the traffic on our new planform, and as a result we started to see some timeouts.

Although these timeouts were pretty rare, we saw them as a problem. I personally pledged the system will remain scalable, and now I couldn’t just go with “bitemporal updates are slow”. Yes, the execution time was at 2 to 3 seconds most of the time, but sometimes it would spike, and our microservices have a hard timeout at 10 seconds.

Some time ago I’ve already mentioned in this blog, how thankful I am for those timeouts! Nothing else foster innovation more than a necessity to address performance problems immediately, because they have a direct impact on production.

This time around I was 99.9% sure that the periodic slowness happens during the remote query, which is a part of the problematic function. Turned out, though, that this 0.01% was the case, and together with our DB team we were able to determine, that the problematic statement was the last UPDATE in the bitemporal update function. If you’d ask me a week before that, I would say, that I am not going to address the bitemporal performance for the next several months, but I had no choice.

Thanks to Boris Novikov, who helped me immensely in testing and verifying several different approaches, and eventually identified the best one, and to Chad Slaughter, who was merging my commits from 7-30 AM to 9-30 PM, so that the master branch of the bitemporal library would have the latest updates by the time of the release, and thanks to our amazing QA team, who had to run and rerun tests that day multiple times, the new bitemporal functions are now on place. Not only for Braviant Holdings, but for the whole community.

I would also like to mention, that since I was already changing the functions, I’ve fixed one long-overdue issues: all functions have versions, which are PG 10 compliant. We’ve left the old versions there, because some of the are used in the existing production systems but if you are just starting, you can use the new ones.

Check it out at https://github.com/scalegenius/pg_bitemporal

Leave a comment

Filed under news, research, SQL, Team and teamwork

The second rejected paper

****Reposting because the previous version didn’t get  shared on LinkedIn****

Object-relational impedance mismatch is by far my favorite research topic, mostly due to the fact that it has a very practical implementation. I would make even stronger statement: the most rewarding optimization is the one when you can reduce the number of SQL statements executed when a web page is rendered and all of a sudden it is loaded 50 times faster (and I mean actually 50 times, not figuratively speaking!).  It always looks like a magic – and I haven’d done anything!

This been said, the ORMs are my worst enemies, and I am always looking for opportunities to promote the better ways of communication between a database and an applications. Most of the time the human factor appears to be more important than the technological challenges, so I always think about these projects as battles.

At Braviant however, first time in my professional career I had nobody to fight about this issue – the app developers were completely on board with my approach since day one. Which allowed us to develop something really cool, and to optimize the interaction between databases and application to the point of absolute perfection. SO, when my husband suggested we’d write a short paper about this project, I had no doubt it will be accepted – because two of my previous papers on the same subject were accepted to the very serious conferences.

Life proved me wrong :), I am not going to name the conference and the workshop, but I have to make some comments about the reviews, so that the level of my frustration can be understood.

One of the reviewers asked: why we think that the number of round trips defines the response time of the web application. Another reviewer asked, whether we tried to use Mongo DB :))). And why we think that (de) serialization of the JSON takes negligible time. And why we think Hibernate is worse.

I think the only valid objection was, that the topic of the paper is not relevant to the workshop topic.  And the latter might explain the whole story.

Several years ago, when I started to attend the database conferences again, after fifteen years of absence, I made an observation that a significant number of the attendees never saw the real applications, and never had deal with performance problems, Fortunately, I’ve also met and got to know some really outstanding researches, whom I admire and feel honored to be aquatinted with, so… I am sure I will find the right place to showcase our work.

And may be it’s time to get back to my old “HDAT” workshop idea,,,

And for my fellow Chicagoans: I will be presenting this work this Tuesday, Feb 13 at the Chicago PUG meetup!

Leave a comment

Filed under research

The second rejected paper: the ORIM again

Object-relational impedance mismatch is by far my favorite research topic, mostly due to the fact that it has a very practical implementation. I would make even stronger statement: the most rewarding optimization is the one when you can reduce the number of SQL statements executed when a web page is rendered and all of a sudden it is loaded 50 times faster (and I mean actually 50 times, not figuratively speaking!).  It always looks like a magic – and I haven’d done anything!

This been said, the ORMs are my worst enemies, and I am always looking for opportunities to promote the better ways of communication between a database and an applications. Most of the time the human factor appears to be more important than the technological challenges, so I always think about these projects as battles.

At Braviant however, first time in my professional career I had nobody to fight about this issue – the app developers were completely on board with my approach since day one. Which allowed us to develop something really cool, and to optimize the interaction between databases and application to the point of absolute perfection. SO, when my husband suggested we’d write a short paper about this project, I had no doubt it will be accepted – because two of my previous papers on the same subject were accepted to the very serious conferences.

Life proved me wrong :), I am not going to name the conference and the workshop, but I have to make some comments about the reviews, so that the level of my frustration can be understood.

One of the reviewers asked: why we think that the number of round trips defines the response time of the web application. Another reviewer asked, whether we tried to use Mongo DB :))). And why we think that (de) serialization of the JSON takes negligible time. And why we think Hibernate is worse.

I think the only valid objection was, that the topic of the paper is not relevant to the workshop topic.  And the latter might explain the whole story.

Several years ago, when I started to attend the database conferences again, after fifteen years of absence, I made an observation that a significant number of the attendees never saw the real applications, and never had deal with performance problems, Fortunately, I’ve also met and got to know some really outstanding researches, whom I admire and feel honored to be aquatinted with, so… I am sure I will find the right place to showcase our work.

And may be it’s time to get back to my old “HDAT” workshop idea,,,

And for my fellow Chicagoans: I will be presenting this work this Tuesday, Feb 13 at the Chicago PUG meetup!

8 Comments

Filed under research