Category Archives: research

Introducing NORM repo

In the course of the past two and a half years, I gave an endless number of talks about “our JSON thing,” which we now call NORM. And since my very first presentation, people would ask me whether I can show an example of the code developed using this approach. And I never had a good answer. I do not know what took me so long, but now I finally have it. 

For those who love an idea of the wold without ORM, and who want to start “doing it the right way,” the working example of the NORM technology can be found here. 

Please take a look! We are planning to add to this repo a small java program that will use the PostgreSQL functions. 

And I would love to hear from people who would

  • clone this repo and let us know what does not work or how documentation can be improved
  • develop a simple app on ruby on rails, which will use the same Postgres functions 
  • utilize the same approach with any other database/application combination

Looking forward to the feedback!

2 Comments

Filed under Development and testing, research, SQL

Looking for New Ways to Bridge the Old Gap: New Ideas After the Conference

Before I went to this conference, I was resentful regarding the fact that the gap between applications and databases will never be closed. Even at the conference focused on both data engineering and software development, there was barely a place for me, and our talk barely got accepted.

I have to admit, I didn’t explore the program much before coming because I had never-ending work crisis, and we had to rework our presentation several times.

But when I took a closer look, I realized that I am way more interested in the SE sessions than in the database sessions. Day four, I could not miss a single moment, and I had several interesting conversations with the speakers.


It turned out that most of them were not even present on day one when I was giving my talk. And they said they would love to come if it won’t be on the first day.

Now I am wondering whether I did it right, never trying to present my work at the SE conferences. On the one hand, I am always saying that my success won’t be possible if I won’t have such an incredible backend team. On the other hand, I routinely say that inefficient programming is all application developers’ fault. That is not true.

One of the talks was about the refactoring techniques, and after the presentation, I asked the speaker whether he ever tried to consider taking into account the factor of accessing a database as a factoring criterion.
He replied that one of his colleagues tried to explore this option, but found it challenging: queries appear to be so entangled, so difficult to extract, that it led to nothing. I told him about my work and suggested that we would love to collaborate if he will find it interesting. He said that he would take a look, and then also mentioned that usually, the database people are not collaborating. He mentioned the lack of constraints and unwillingness to use views. I said that views are horrible because most of the time they decrease performance. But they provide a level of abstraction, he suggested. I replied – the are better ways! He said – well, then give them to us!

This exchange made me think that I am not explaining myself enough when talking about NORM. And if I know that successful implementation depends on cooperation with app developers, I should advocate for them.

Leave a comment

Filed under Development and testing, research, Systems

Finally – Accepted Paper!

Last week I had the first acceptance of my paper to the real Computer Science conference for a very long time (since 2015!). Many things make this acceptance very meaningful for me. 

First, as I said, it’s the first one after four and a half years. 

Second, this is my first acceptance for the scientific conference since I joined Braviant Holdings. When I spoke at ICDE 2016, I was already with Braviant Holding, but the work was done, and the paper was submitted during my Enova tenure.

Third, the topic of the paper is ORIM, which is very difficult to sell to both the academic community and industry alike, and we already had two rejections of the earlier version of this paper.

And the last – this is the paper Boris and I submitted together since I can’t remember how long (I want to say – since 1995 :)). OK, the first accepted paper together:)

We will be presenting at the SOFSEM 2020 conference in January. So yes, this is also my birthday present 🙂

Leave a comment

Filed under events, news, research, talks

PostgreSQL And Academia

Recently I’ve been thinking a lot about relationships between the PostgreSQL community and the DB research community. To put it bluntly – these two communities do not talk to each other!

There are many reasons why I am concerned about this situation. First, I consider myself belonging to both of these communities. Even if right now I am 90% in industry, I can’t write off my academic past and writing a scientific paper with the hope of being accepted to the real database conference is something which appeals to me.

Second, I want to have quality candidates for the database positions when I have them. The problem is more than scientists do not speak at the Postgres conferences, and Postgres developers do not speak at the academic conferences. The bigger problem is that for many CS students, their academic research and practical experience to not intersect at all! They study some cool algorithms, and then they practice their SQL on MySQL databases, which as I have already mentioned multiple times, lacks so many basic database features, that it hardly can be considered a database!

If these students practiced using PostgreSQL, they would have a real full-scale object-relational database, not a “light” version, but a real thing, which supports tons of index types, data types, constraints, has procedural language, and the list can go on and on.

It is especially upsetting to see this disconnect since so many database researches were completed on Postgres, for Postgres, with the help of Postgres; R-trees and GIST indexes, to name a couple. Also, the SIGMOD Test of Time Award in 2018 was given to the paper “Serializable isolation for snapshot databases”, which was implemented in Postgres.

I know the answer to the question “why they do not talk?” Researches do not want to talk at the Postgres conferences, because those are not scientific conferences, and the participation in these conferences will not result in any publication. Postgres developers do not want to talk at the CS conferences, because they do not like to write long papers :), and also, even if they do submit something, their papers often are rejected as “not having any scientific value.”

I know the answer. But I do not like it :). So maybe – we can talk about it?!

2 Comments

Filed under research, SQL, Systems

Let’s Go Bitemporal!

Dear friends and followers from the Postgres community! Today, let’s talk more about the bitemporal library (as if I did not speak enough about it yet!).

We have been developing Postgres functions, which support bitemporal operation for almost four years by now. We have found our initial inspiration in the Asserted Versioning Framework (AVF), first introduced by Jonson and Weiss nearly twenty years ago. There is nothing new in the concept of incorporating time dimension into data, and even the concept of two-dimensional time is not new. However, we believe that AVF approaches the task in the best possible way and that it allows making the time a true and integral part of data.

We believe that Postgres is suited the best to support a two-dimensional time due to the tow factors: the presence of the interval types and GIST with exclusion constraints. Having these two available made the process of implementation of the concept more or less trivial.

Implementation of bitemporal operations took some time, though, and we are still in the process of improving some of the functions. However, we are happy to share with the world, that Bravinat Holdings runs both OLTP and OLAP databases on the bitemporal framework with no performance degradation. Since we had an opportunity to develop as we go, we could address lots of issues in this implementation, which we initially did not even expect.
Recently we have uploaded several files into the docs section of the pg_bitemporal GitHub repo, including several presentations and short papers so that those who are interested can read more on the theory of bitemporality. We hope that people will give it a try – it works! Also, we are always looking for volunteers who will be interested in collaboration.

Please check us out at https://github.com/scalegenius/pg_bitemporal

Leave a comment

Filed under research, SQL, Systems

Bitemporal documentation is available!

Everybody who was curious enough to start using our pg_bitemporal github repo would complain about the lack of documentation, so we’ve tried really hard to provide our follows with some guidance.

What we have now, is very far from perfect, but if you go to the docs directory, there is a lot of documentation, including our old presentations, explanations of the basic bitemporal concepts and most importantly first ever bitemporal functions manual, which we promise to make more readable in the nearest future. Meanwhile – please share your feedback! Thank you!

Leave a comment

Filed under Data management, research

How to build this nested JSON

This post is in response to multiple requests from the PG Conf attendees to provide a “schematic” example of how to build a record set and to convert it to a JSON object (basically to illustrate our “Connecting Galaxies” talk.

Actually, if you click on the link above and download the presentation, you will see an extra slide at the end, which provides the link to the code below… but we know that people do not like “clicking the links” 🙂

So – here is it one more time!

--- Building complex JSON objects from multiple tables --
--- Deriving aggregates json_agg and n_array_agg
--- wrapping an array (of rows) into JSON converted to text for JDBC ----transfer 

— Test aggregation
create table aa (a_id text, a_name text);
insert into aa values (‘1’, ‘Anna’),(‘2′,’Bob’);

—- Aggregation on second level —–
create table bb (b_id text, ab_id text, b_num text, b_ph text);
insert into bb values (‘101’, ‘1’, ‘101-101’, ‘(800)-123’);
insert into bb values (‘1012’, ‘1’, ‘1012102’, ‘(800)-1234’),
( ‘201’, ‘2’, ‘201-201’, ‘(800)-1345′);

— add one more embedded array on the second level —–
create table cc (c_id text, ac_id text, c_st text, c_more text);
insert into cc values
( ’11’, ‘1’ , ‘ stst’, ‘more-more’),
( ’12’, ‘1’ , ‘ swsw’, ‘more-less’);
insert into cc values
( ’21’, ‘2’ , ‘ stst-2’, ‘more-mo2222re’),
( ’22’, ‘2’ , ‘ sws222w’, ‘more-less22’);

—- add third level , dd will be under bb —-

create table dd (d_id text, bd_id text, d_value text);
insert into dd values
( ‘1101’ , ‘101’, ‘dv11’),
( ‘1102’ , ‘101’, ‘dv12’),
( ‘2101’ , ‘201’, ‘dv22’);

—- CTE might be helpful to avoid multiple processing of big top-level tables —-

create or replace
function array_agg_next (agg_sta anyarray, val anyelement, b boolean)
returns anyarray as
$$
declare
out_array alias for $0;
begin
if b then
if agg_sta is null then
out_array := array [val ];
else
out_array := agg_sta || val ;
end if;
else
out_array := agg_sta ;
end if;
return out_array;
END;
$$ LANGUAGE plpgsql;

drop
function array_agg_final (agg_sta anyarray);
create or replace
function array_agg_final (agg_sta anyarray)
returns anyarray as
$$
declare final_array alias for $0;
begin
final_array := agg_sta;
return final_array ;
END;
$$ LANGUAGE plpgsql;

—–======================================================
create or replace
function array_agg_next (agg_sta anyarray, val anyelement, b boolean)
returns anyarray as
$$
begin
if b then
if agg_sta is null
then agg_sta := array [val ];
else agg_sta := agg_sta || val;
end if;
end if;
return agg_sta;
END;
$$ LANGUAGE plpgsql;

drop aggregate n_array_agg (anyelement, boolean);
CREATE AGGREGATE n_array_agg (anyelement, boolean) (
sfunc= array_agg_next,
— FINALFUNC = array_agg_final,
STYPE = anyarray
);

drop type if exists jaa cascade;
create type jdd as (ddi_key text, dd_value text);
create type jcc as (cc_key text, cc_st text, more text);
Create type jbb as (bb_key text, num text, ph text, “DD” jdd[]);
create type jaa as (p_key text, p_name text, “BB” jbb[], “CC” jcc[]);

—- collect rows of a specific type into an array, depends on the output type —–
create or replace
function collect_items (s text)
returns jaa[]
LANGUAGE plpgsql as
$body$
declare
result jaa[];
begin
select array_agg(single_item) into result
from
(select
row (
a_id, a_name,
n_array_agg( row (b_id, b_num, b_ph, d_agg )::jbb , b_id is not null),
n_array_agg( row (c_id, c_st, c_more )::jcc, c_id is not null)
) ::jaa as single_item
from
(
select a_id, a_name, b_id, b_num, b_ph, c_id, c_st, c_more,
n_array_agg (row(d_id, d_value)::jdd, d_id is not null) as d_agg
from
(
select a_id, a_name, b_id, b_num, b_ph, null as c_id, null as c_st, null as c_more, d_id, d_value
from aa join bb on a_id = ab_id
left join dd on b_id = bd_id
UNION ALL
select a_id, a_name, null as b_id, null as b_num, null as b_ph, c_id, c_st, c_more, null as d_id, null as d_value
from aa join cc on a_id = ac_id
) temp_table_3
group by
a_id, a_name, b_id, b_num, b_ph, c_id, c_st, c_more
) temp_table_2
group by a_id, a_name) items
where 0=1
;
return result;
end;
$body$;

—– just another output type —-
create or replace
function collect_subitems (s text)
returns jbb[]
LANGUAGE plpgsql as
$body$
declare
result jbb[];
begin
select array_agg(single_item) into result
from
(
select row(
b_id, b_num, b_ph, n_array_agg (row(d_id, d_value)::jdd, d_id is not null)
)::jbb as single_item
from bb left join dd on b_id = bd_id
group by b_id, b_num, b_ph
) items;
return result;
end;
$body$;

— wrap array into a set of text strings containing the json representation of array elements
— This function does not depend on any type
create or replace
function array_transport (all_items anyarray) returns setof text
RETURNS NULL ON NULL INPUT
LANGUAGE plpgsql as
$body$
declare
item record;
begin
foreach item in array all_items
loop
return next( to_json(item))::text;
end loop;
end;
$body$;

select * from array_transport (collect_items(‘a’));

select * from array_transport (collect_subitems(‘a’));

===============================

 

Leave a comment

Filed under research, SQL, talks