Tag Archives: databases

Somebody has their marking analytics VERY WRONG :)

Leave a comment

January 5, 2020 · 7:51 pm

Funny thing happened…

I have to share that:). I had a ticket to develop one operational report. Not only the report requirements were complex, but also it was very difficult to debug. We just started to collect the data required for this report, and we do not have enough of it to cover all potential issues. Moreover, since the data volumes are so small, the issues are resolved fast, so in two days, I never had a chance to catch a single case of exception. Until this morning. When a thought suddenly came to my mind, and I asked myself: Hettie, why in the world you are waiting for an exception to happen in real-time?! All your tables are bitemporal, so you can time-travel to any moment of the past, including the time, when exceptions occurred!

It’s funny and not funny that it took me two days to figure this out! Especially because I was the one who introduced bitemporality!

Worked as expected 🙂

Leave a comment

Filed under Data management, Development and testing

PostgreSQL And Academia

Recently I’ve been thinking a lot about relationships between the PostgreSQL community and the DB research community. To put it bluntly – these two communities do not talk to each other!

There are many reasons why I am concerned about this situation. First, I consider myself belonging to both of these communities. Even if right now I am 90% in industry, I can’t write off my academic past and writing a scientific paper with the hope of being accepted to the real database conference is something which appeals to me.

Second, I want to have quality candidates for the database positions when I have them. The problem is more than scientists do not speak at the Postgres conferences, and Postgres developers do not speak at the academic conferences. The bigger problem is that for many CS students, their academic research and practical experience to not intersect at all! They study some cool algorithms, and then they practice their SQL on MySQL databases, which as I have already mentioned multiple times, lacks so many basic database features, that it hardly can be considered a database!

If these students practiced using PostgreSQL, they would have a real full-scale object-relational database, not a “light” version, but a real thing, which supports tons of index types, data types, constraints, has procedural language, and the list can go on and on.

It is especially upsetting to see this disconnect since so many database researches were completed on Postgres, for Postgres, with the help of Postgres; R-trees and GIST indexes, to name a couple. Also, the SIGMOD Test of Time Award in 2018 was given to the paper “Serializable isolation for snapshot databases”, which was implemented in Postgres.

I know the answer to the question “why they do not talk?” Researches do not want to talk at the Postgres conferences, because those are not scientific conferences, and the participation in these conferences will not result in any publication. Postgres developers do not want to talk at the CS conferences, because they do not like to write long papers :), and also, even if they do submit something, their papers often are rejected as “not having any scientific value.”

I know the answer. But I do not like it :). So maybe – we can talk about it?!

2 Comments

Filed under research, SQL, Systems

PG Open submission deadline is extended!

To all my Postgres-minded friends and colleagues – the submission deadline for PG Open talks and tutorials has been extended. You have till July 7 to submit your proposal! See updated info.
Please consider, if you didn’t submit yet!

Leave a comment

Filed under talks

Databases are not sexy?

I’ve heard this line from a former colleague of mine. He was explaining why there are so little database people around, why IT students are not specializing in databases. That was his answer – that it not cool. “WEB-designer” sounds cool, “database developer” does not.

Several months passed since I heard that, and I was thinking: I should write a blog post about it! However, honestly – what else I can say except the things I’ve already said in this blog multiple times? That there is nothing more exciting, than exploring the endless possibilities of SQL, that nothing can be more rewarding than applying your magic to the jobs, which runs for an hour, and all of a sudden it runs in less than a minute and produces the same result:)

I suspect that the general public does not think that there is something behind a web page, and when somebody experiences a website slowness, they refer to it as “the internet is slow.” Also, the buzz words like “Big Data” often send a message, that “the databases are outdated,” and that there is something bigger, and cooler, and more fashionable, than just”databases,” which does not help a bit.

As I always like to be practical, and not only state a problem but come up with a way to solve it, I am now thinking about how to bridge this gap. I want to find ways to reach out to college and high school students and to give them some exposure to the Wonderful World Of Data. A couple of years ago when I was attending one of the Big Conferences, I’ve heard a discussion regarding “what our students want to be taught.” That was a time of Big Data just becoming a Big Deal :). Honestly, my opinion is that the student’s interest should not drive a curriculum entirely 🙂 and that that’s the right place to interfere.

Is anybody interested in joining me in this effort?

Leave a comment

Filed under Data management, SQL

How to build this nested JSON

This post is in response to multiple requests from the PG Conf attendees to provide a “schematic” example of how to build a record set and to convert it to a JSON object (basically to illustrate our “Connecting Galaxies” talk.

Actually, if you click on the link above and download the presentation, you will see an extra slide at the end, which provides the link to the code below… but we know that people do not like “clicking the links” 🙂

So – here is it one more time!

--- Building complex JSON objects from multiple tables --
--- Deriving aggregates json_agg and n_array_agg
--- wrapping an array (of rows) into JSON converted to text for JDBC ----transfer 

— Test aggregation
create table aa (a_id text, a_name text);
insert into aa values (‘1’, ‘Anna’),(‘2′,’Bob’);

—- Aggregation on second level —–
create table bb (b_id text, ab_id text, b_num text, b_ph text);
insert into bb values (‘101’, ‘1’, ‘101-101’, ‘(800)-123’);
insert into bb values (‘1012’, ‘1’, ‘1012102’, ‘(800)-1234’),
( ‘201’, ‘2’, ‘201-201’, ‘(800)-1345′);

— add one more embedded array on the second level —–
create table cc (c_id text, ac_id text, c_st text, c_more text);
insert into cc values
( ’11’, ‘1’ , ‘ stst’, ‘more-more’),
( ’12’, ‘1’ , ‘ swsw’, ‘more-less’);
insert into cc values
( ’21’, ‘2’ , ‘ stst-2’, ‘more-mo2222re’),
( ’22’, ‘2’ , ‘ sws222w’, ‘more-less22’);

—- add third level , dd will be under bb —-

create table dd (d_id text, bd_id text, d_value text);
insert into dd values
( ‘1101’ , ‘101’, ‘dv11’),
( ‘1102’ , ‘101’, ‘dv12’),
( ‘2101’ , ‘201’, ‘dv22’);

—- CTE might be helpful to avoid multiple processing of big top-level tables —-

create or replace
function array_agg_next (agg_sta anyarray, val anyelement, b boolean)
returns anyarray as
$$
declare
out_array alias for $0;
begin
if b then
if agg_sta is null then
out_array := array [val ];
else
out_array := agg_sta || val ;
end if;
else
out_array := agg_sta ;
end if;
return out_array;
END;
$$ LANGUAGE plpgsql;

drop
function array_agg_final (agg_sta anyarray);
create or replace
function array_agg_final (agg_sta anyarray)
returns anyarray as
$$
declare final_array alias for $0;
begin
final_array := agg_sta;
return final_array ;
END;
$$ LANGUAGE plpgsql;

—–======================================================
create or replace
function array_agg_next (agg_sta anyarray, val anyelement, b boolean)
returns anyarray as
$$
begin
if b then
if agg_sta is null
then agg_sta := array [val ];
else agg_sta := agg_sta || val;
end if;
end if;
return agg_sta;
END;
$$ LANGUAGE plpgsql;

drop aggregate n_array_agg (anyelement, boolean);
CREATE AGGREGATE n_array_agg (anyelement, boolean) (
sfunc= array_agg_next,
— FINALFUNC = array_agg_final,
STYPE = anyarray
);

drop type if exists jaa cascade;
create type jdd as (ddi_key text, dd_value text);
create type jcc as (cc_key text, cc_st text, more text);
Create type jbb as (bb_key text, num text, ph text, “DD” jdd[]);
create type jaa as (p_key text, p_name text, “BB” jbb[], “CC” jcc[]);

—- collect rows of a specific type into an array, depends on the output type —–
create or replace
function collect_items (s text)
returns jaa[]
LANGUAGE plpgsql as
$body$
declare
result jaa[];
begin
select array_agg(single_item) into result
from
(select
row (
a_id, a_name,
n_array_agg( row (b_id, b_num, b_ph, d_agg )::jbb , b_id is not null),
n_array_agg( row (c_id, c_st, c_more )::jcc, c_id is not null)
) ::jaa as single_item
from
(
select a_id, a_name, b_id, b_num, b_ph, c_id, c_st, c_more,
n_array_agg (row(d_id, d_value)::jdd, d_id is not null) as d_agg
from
(
select a_id, a_name, b_id, b_num, b_ph, null as c_id, null as c_st, null as c_more, d_id, d_value
from aa join bb on a_id = ab_id
left join dd on b_id = bd_id
UNION ALL
select a_id, a_name, null as b_id, null as b_num, null as b_ph, c_id, c_st, c_more, null as d_id, null as d_value
from aa join cc on a_id = ac_id
) temp_table_3
group by
a_id, a_name, b_id, b_num, b_ph, c_id, c_st, c_more
) temp_table_2
group by a_id, a_name) items
where 0=1
;
return result;
end;
$body$;

—– just another output type —-
create or replace
function collect_subitems (s text)
returns jbb[]
LANGUAGE plpgsql as
$body$
declare
result jbb[];
begin
select array_agg(single_item) into result
from
(
select row(
b_id, b_num, b_ph, n_array_agg (row(d_id, d_value)::jdd, d_id is not null)
)::jbb as single_item
from bb left join dd on b_id = bd_id
group by b_id, b_num, b_ph
) items;
return result;
end;
$body$;

— wrap array into a set of text strings containing the json representation of array elements
— This function does not depend on any type
create or replace
function array_transport (all_items anyarray) returns setof text
RETURNS NULL ON NULL INPUT
LANGUAGE plpgsql as
$body$
declare
item record;
begin
foreach item in array all_items
loop
return next( to_json(item))::text;
end loop;
end;
$body$;

select * from array_transport (collect_items(‘a’));

select * from array_transport (collect_subitems(‘a’));

===============================

 

Leave a comment

Filed under research, SQL, talks

I knew I won’t like this feature! About the parallel execution

Many years ago, when Postgres 9.6 was still in making, my coworker said to me with excitement: Hettie, Postgres will now have the ability to run queries in parallel! There can be only four parallel processes, but still, isn’t in nice?!

And I remember exactly what I’ve replied: no, I do not like this feature a bit! You know why? Because executing queries in parallel would rarely solve performance problems. In fact, if four parallel workers would solve your performance problems, they were not really problems! It can do more harm than good, because it can mask some real performance problems for a while, and then they will turn just to be more severe.

What happened then – I’ve started a new chapter of my life at Braviant, and we had Postgres 9.5 then, and then for almost 3 years it was like I never had time to stop and upgrade :). But now I have a team, so we’ve finally planned the upgrade, and since we were already four versions behind, we planned upgrade to 9.6 and immediately to PG 10.

We’ve started from our Data Warehouse. First – in the staging environment, we we’ve tested, and observed the execution for some time. And then on the mirroring instance, and only then – on production.

And then it started! Seriously, out of all data refreshes this one is only one, which is important for the OLTP instance, because it sends data to one of our client interfaces. It started to behave inconsistently, Sometimes it would be just fine. Other times, instead of about 50 seconds it has been running for an hour, and probably won’t finish if we won’t kill it. Yes, it was obvious, that something did change in the execution plan after the upgrade. But what?! From the first glance the execution plan looked the same, all HASH JOINS, which you would expect, when you join tables with no restrictive conditions.

But it was still painfully slow. What was more puzzling – I could take out of the equation JOIN to any table, and performance would be unpredictable. After several dozen of attempts to make things run decided to take a closer look at the execution plan. And you know what I saw?! Yes, parallel execution/! Four joins were parallelized, which resulted in the execution time been really horrible. After the issue was found, the only thing left was to figure out, how to turn this feature off 🙂

Leave a comment

Filed under Data management, Development and testing, SQL