Monthly Archives: February 2015

A grouping performance problem

By now I’ve come across this problem at least thee times, each time I needed to build a relatively big rollup table, which would use a join of about ten tables, some of which may have several million rows, and others – several hundred thousand rows. Each time I needed to run something like this:


select
t1.a
,t1.b
,t2.a2
,t2.b2
,t3.a3
,t4.b4
,sum(t1.c) as c_sum
,max(t1.d) as d_max
,sum(case t1.e='a' then 1 else 0 end) as sum_a
from t1
join t2 (using join_id2)
join t3 (using join_id3)
join t4 (using join_id4)
group by t1.a
,t1.b
,t2.a2
,t3.a3
,t2.b2

In all cases the table t1 will be the only table with the “real” aggregation, and for the rest of the tables there would be exactly one line for each of the aggregated line of t1.

The problem: this query would run… forever.

Now, if I just separate the “aggregation” part:


select
t1.a
,t1.b
,sum(t1.c) as c_sum
,max(t1.d) as d_max
,sum(case t1.e='a' then 1 else 0 end) as sum_a
from t1 group by t1.a, t1.b

This one will run really fast, even though the table t1 would be the biggest table. Let’s add the rest of the tables to the join:


select
t1.a
,t1.b
,sum(t1.c) as c_sum
,max(t1.d) as d_max
,sum(case t1.e='a' then 1 else 0 end) as sum_a
from t1
join t2 (using join_id2)
join t3 (using join_id3)
join t4 (using join_id4)
group by t1.a, t1.b

This will run just a little bit slower, than the “aggregate only”. But then…

I would imagine, after all joins and groupings the rest simply does not matter, because “everything is there already”, and all data is already read from the database. But the moment I start to add these “singular” columns to the select list and group by clause, it starts to break:
Continue reading

Advertisements

2 Comments

Filed under SQL

Finally many -many months later – our presentation at PG Open 2014

Leave a comment

Filed under Systems

ICDE 2016

Last April I was very happy and proud to announce that ICDE 2016 will be held in Saint Petersburg, Russia. Now it is going to be relocated to Helsinki.

I understand, that overall it is better for the conference, but can’t stop being sad – it’a very unlikely that my Alma Mater will have any opportunity like this in an observable future. But, to be honest, it will be way more pleasant to be in Helsinki, and to show to the conference participants all the places I love.

Update: since most of the search engines are referring to this post instead of the actual conference site, I’ve decided it will be a good idea to put a proper reference here:

ICDE 2016 Helsinki – please go there for more information!

Leave a comment

Filed under news

Women in IT: how we behave and how we are perceived

Recently I had several occasions when I would start to think: is it a gender thing? Should the way I am perceived be attributed to the fact, that I am a female? Is it “a Sheryl Sandberg thing”?

Because, see, on one hand we are being told: be bold about your achievements, talk to your manager, say that you need a raise. And at the same time, nobody can argue Sandberg’s quote about “when a man is seen as a strong leader, a woman is seen as a bitch”. So… yea, quite often it happens, that women are encouraged to talk about themselves and their achievements… until they actually do start talking!

On the other hand… It’s very easy to start feeling you’ve being discriminated. Very easy to dismiss all the constructive criticism you receive from your coworkers as “this is just a gender thing, this is just because I am a female”. Because sometimes we actually can be rude and inconsiderate.

When I talk to people about different anti-discrimination acts, like when I talk about the Pride parades or affirmative actions, I always say: this is a pendulum effect. You need to twist things the opposite way to make sure they eventually come to balance.

But – not always. What I am saying, it’s not always a pendulum effect. Sometimes a person can actually be too loud, too judgmental, too much “being sure she’s always right”. There is never a one and true recipe of how to balance it all correctly. I often get upset, when I receive negative feedback, but I tough myself to suppress this immediate reaction and try to think objectively. Even if there is a “Sheryl Sandberg thing”, is there anything I can actually correct in my behavior?

One important thing I’ve recently realized is this: a couple of years ago I could joke about “my way is the right way” or “if it’s not for me, it’s not important”, and people would take it as a joke. But when you are in a leadership position, you can’t joke like this anymore. People will take it literally, and it will be too late to say: I didn’t really mean this! Ana I still have a lot of learning to do…

2 Comments

Filed under People, Team and teamwork, Workplace

The vision of data for 2015

On Friday I’ve delivered a talk about the Data Vision for 2015, about the objectives of the database developers for the year. To be honest, I was freaking out, and almost nobody realized I was. When I asked some people after the talk “how was it”, the people would respond “great as usual”, and practically nobody realized it was actually unusual, because I never ever spoke on such a topic before.

It seems pretty easy – I know exactly what I consider the most important goals for the database development. Whenever I talk to the other database developers, or to my director, or to my squad members – I am saying something to the effect “I really wish we could always to “this”! I want “that” to be done each time we deploy new code!” But when you have 45 minutes and a hundred people listening, somehow you become mute…

Overall I think it all went well. One of the most important thing which I hope had reached the audience was, that the database development should be tightly integrated with the application development. Where exactly is the borderline, depends solely on the efficiency of the solution provided, not on any presumptions of “how it should be done”. And in order to achieve it, we should be able to speak each other’s language.

A database developer should be able to read the application code, which invokes the function (s)he wrote, and to understand, how it will be used. Sometimes we, the database developers, really do not realize, what exactly causes the problems, and we think that the application developers are just difficult and do not understand there benefits. On the other hand, the application developers tend to react nervously, when “the parts of business logic” are disappearing in “this long SQL code”, while they are absolutely capable to read this code and understand what’s going on.

So, one of my biggest goals for this year is to make our development process really integrated. And really flexible. In all possible meanings 🙂

Leave a comment

Filed under Companies, Data management, Systems, talks