My talk: What is a database?

I’ve presented this talk a week ago as a part of “Braviant Talks”, where people from different departments of our company talk about what their department is doing. It intended to be as non-technical as possible, which I think was achieved, at least to some extent, and … I just like how it turned out:). Enjoy 🙂

Advertisements

Leave a comment

Filed under Companies, events, talks

Three years with Braviant Holdings

LinkedIn has announced it a little bit earlier, but my actual 3-year anniversary was yesterday, April 18.  Three years ago I wrote in my journal:

Real people. The calm state of mind I haven’t had for so long, I can’t even remember. Meaningful conversations. I can talk about important things, and people listen My opinion matters. I am happy.

Three years later I can repeat all of the above.

 

 

 

Leave a comment

Filed under Companies, events, People

April Chicago PUG meetup is tomorrow!

Just one last reminder about tomorrow’s Chicago PUG Meetup: tomorrow we are hosting CockroachDB Lab, and I encourage everybody, including our members who were hibernating during this cold winter, to attend!

Here is a short summary of the talk:

CockroachDB is a distributed SQL database at scale for cloud native. It was Inspired by Spanner, but created as an open-source Postgres-compatible database with strong ACID guarantees. It gives an alternative to a standalone Postgres server to answer the most important questions we have to handle after we take our databases into production: How can I keep the database from going down? How can I migrate it between clouds? And, considering it’s a cluster, how can you keep the database instances consistent with each other?

CockroachDB takes on these challenges by allowing you to write your Postgres apps as if you’re talking to a single instance: your SQL code works the same way and you use standard Postgres drivers. But instead of replicating the data, the data is automatically written to a raft-consensus cluster that guarantees serializable isolation to stay consistent automatically.

CockroachDB is easy to play with in a lab and this talk will show you how. Your database can then seamlessly migrate to the cloud, and even scale between datacenters giving your Postgres database flexibility as you move from development through production. Come to this session to see how this technology works and learn how to download the open source and work with it in your own environment.

If you are planning to attend, but didn’t register yet, please take a moment to RSVP on the meetup page here. (to ensure we have enough pizza!)

Leave a comment

Filed under events, SQL

DB developers and App developers

Last week when I was at the PG Conf in New York, I met with my former co-worker, with whom we worked together while being on the New York Department of education project. She was a Java developer, and I was an Oracle developer.  I was implementing a solution, which was very close to the idea I am implementing now – building Oracle packages for each application endpoint, so that an application could only call functions, not separate SQL statements. Our collaboration was not always easy and peaceful those days, and when I was describing to her the specifics of the work we were presenting, I told her: see, if I would return to you the same type of objects back then, you won’t have any trouble with me!

She replied: Oh yes,  if that would be the case, I wouldn’t have any trouble with you! And the way she’d said that reminded me that she indeed had some troubles with me back then… And I thought to myself, that perhaps it’s not only the team I work with now is better than any other team I’ve worked with before (though it is still the case :)), but also that I myself had changed. And now I am way more flexible and willing to walk this extra mile, than fifteen years ago…  and our success is a product of collaboration, of everybody being able and willing to work together to find the right solution.

Leave a comment

Filed under People, SQL, Systems

How to build this nested JSON

This post is in response to multiple requests from the PG Conf attendees to provide a “schematic” example of how to build a record set and to convert it to a JSON object (basically to illustrate our “Connecting Galaxies” talk.

Actually, if you click on the link above and download the presentation, you will see an extra slide at the end, which provides the link to the code below… but we know that people do not like “clicking the links” 🙂

So – here is it one more time!

--- Building complex JSON objects from multiple tables --
--- Deriving aggregates json_agg and n_array_agg
--- wrapping an array (of rows) into JSON converted to text for JDBC ----transfer 

— Test aggregation
create table aa (a_id text, a_name text);
insert into aa values (‘1’, ‘Anna’),(‘2′,’Bob’);

—- Aggregation on second level —–
create table bb (b_id text, ab_id text, b_num text, b_ph text);
insert into bb values (‘101’, ‘1’, ‘101-101’, ‘(800)-123’);
insert into bb values (‘1012’, ‘1’, ‘1012102’, ‘(800)-1234’),
( ‘201’, ‘2’, ‘201-201’, ‘(800)-1345′);

— add one more embedded array on the second level —–
create table cc (c_id text, ac_id text, c_st text, c_more text);
insert into cc values
( ’11’, ‘1’ , ‘ stst’, ‘more-more’),
( ’12’, ‘1’ , ‘ swsw’, ‘more-less’);
insert into cc values
( ’21’, ‘2’ , ‘ stst-2’, ‘more-mo2222re’),
( ’22’, ‘2’ , ‘ sws222w’, ‘more-less22’);

—- add third level , dd will be under bb —-

create table dd (d_id text, bd_id text, d_value text);
insert into dd values
( ‘1101’ , ‘101’, ‘dv11’),
( ‘1102’ , ‘101’, ‘dv12’),
( ‘2101’ , ‘201’, ‘dv22’);

—- CTE might be helpful to avoid multiple processing of big top-level tables —-

create or replace
function array_agg_next (agg_sta anyarray, val anyelement, b boolean)
returns anyarray as
$$
declare
out_array alias for $0;
begin
if b then
if agg_sta is null then
out_array := array [val ];
else
out_array := agg_sta || val ;
end if;
else
out_array := agg_sta ;
end if;
return out_array;
END;
$$ LANGUAGE plpgsql;

drop
function array_agg_final (agg_sta anyarray);
create or replace
function array_agg_final (agg_sta anyarray)
returns anyarray as
$$
declare final_array alias for $0;
begin
final_array := agg_sta;
return final_array ;
END;
$$ LANGUAGE plpgsql;

—–======================================================
create or replace
function array_agg_next (agg_sta anyarray, val anyelement, b boolean)
returns anyarray as
$$
begin
if b then
if agg_sta is null
then agg_sta := array [val ];
else agg_sta := agg_sta || val;
end if;
end if;
return agg_sta;
END;
$$ LANGUAGE plpgsql;

drop aggregate n_array_agg (anyelement, boolean);
CREATE AGGREGATE n_array_agg (anyelement, boolean) (
sfunc= array_agg_next,
— FINALFUNC = array_agg_final,
STYPE = anyarray
);

drop type if exists jaa cascade;
create type jdd as (ddi_key text, dd_value text);
create type jcc as (cc_key text, cc_st text, more text);
Create type jbb as (bb_key text, num text, ph text, “DD” jdd[]);
create type jaa as (p_key text, p_name text, “BB” jbb[], “CC” jcc[]);

—- collect rows of a specific type into an array, depends on the output type —–
create or replace
function collect_items (s text)
returns jaa[]
LANGUAGE plpgsql as
$body$
declare
result jaa[];
begin
select array_agg(single_item) into result
from
(select
row (
a_id, a_name,
n_array_agg( row (b_id, b_num, b_ph, d_agg )::jbb , b_id is not null),
n_array_agg( row (c_id, c_st, c_more )::jcc, c_id is not null)
) ::jaa as single_item
from
(
select a_id, a_name, b_id, b_num, b_ph, c_id, c_st, c_more,
n_array_agg (row(d_id, d_value)::jdd, d_id is not null) as d_agg
from
(
select a_id, a_name, b_id, b_num, b_ph, null as c_id, null as c_st, null as c_more, d_id, d_value
from aa join bb on a_id = ab_id
left join dd on b_id = bd_id
UNION ALL
select a_id, a_name, null as b_id, null as b_num, null as b_ph, c_id, c_st, c_more, null as d_id, null as d_value
from aa join cc on a_id = ac_id
) temp_table_3
group by
a_id, a_name, b_id, b_num, b_ph, c_id, c_st, c_more
) temp_table_2
group by a_id, a_name) items
where 0=1
;
return result;
end;
$body$;

—– just another output type —-
create or replace
function collect_subitems (s text)
returns jbb[]
LANGUAGE plpgsql as
$body$
declare
result jbb[];
begin
select array_agg(single_item) into result
from
(
select row(
b_id, b_num, b_ph, n_array_agg (row(d_id, d_value)::jdd, d_id is not null)
)::jbb as single_item
from bb left join dd on b_id = bd_id
group by b_id, b_num, b_ph
) items;
return result;
end;
$body$;

— wrap array into a set of text strings containing the json representation of array elements
— This function does not depend on any type
create or replace
function array_transport (all_items anyarray) returns setof text
RETURNS NULL ON NULL INPUT
LANGUAGE plpgsql as
$body$
declare
item record;
begin
foreach item in array all_items
loop
return next( to_json(item))::text;
end loop;
end;
$body$;

select * from array_transport (collect_items(‘a’));

select * from array_transport (collect_subitems(‘a’));

===============================

 

Leave a comment

Filed under research, SQL, talks

PG Conf New York

It’s the first time I’ve attended the New York PG Conf, and we had two talks ln the very first day of Development track! I was a little bit nervous, but it went really well!

Tha presentation which we did together with Alyssa Ritchie was almost the same we did on PG Open and 2Q PG Conf (except that I’ve decided to change it significantly the night before!), and my short presentation was an extended presentation about remote function calls, which I did at the 2Q PG Conf. And here is it’s recording:

Leave a comment

Filed under events, talks

I knew I won’t like this feature! About the parallel execution

Many years ago, when Postgres 9.6 was still in making, my coworker said to me with excitement: Hettie, Postgres will now have the ability to run queries in parallel! There can be only four parallel processes, but still, isn’t in nice?!

And I remember exactly what I’ve replied: no, I do not like this feature a bit! You know why? Because executing queries in parallel would rarely solve performance problems. In fact, if four parallel workers would solve your performance problems, they were not really problems! It can do more harm than good, because it can mask some real performance problems for a while, and then they will turn just to be more severe.

What happened then – I’ve started a new chapter of my life at Braviant, and we had Postgres 9.5 then, and then for almost 3 years it was like I never had time to stop and upgrade :). But now I have a team, so we’ve finally planned the upgrade, and since we were already four versions behind, we planned upgrade to 9.6 and immediately to PG 10.

We’ve started from our Data Warehouse. First – in the staging environment, we we’ve tested, and observed the execution for some time. And then on the mirroring instance, and only then – on production.

And then it started! Seriously, out of all data refreshes this one is only one, which is important for the OLTP instance, because it sends data to one of our client interfaces. It started to behave inconsistently, Sometimes it would be just fine. Other times, instead of about 50 seconds it has been running for an hour, and probably won’t finish if we won’t kill it. Yes, it was obvious, that something did change in the execution plan after the upgrade. But what?! From the first glance the execution plan looked the same, all HASH JOINS, which you would expect, when you join tables with no restrictive conditions.

But it was still painfully slow. What was more puzzling – I could take out of the equation JOIN to any table, and performance would be unpredictable. After several dozen of attempts to make things run decided to take a closer look at the execution plan. And you know what I saw?! Yes, parallel execution/! Four joins were parallelized, which resulted in the execution time been really horrible. After the issue was found, the only thing left was to figure out, how to turn this feature off 🙂

Leave a comment

Filed under Data management, Development and testing, SQL