Monthly Archives: January 2016

Getting ready for ICDE 2016

This Monday is a deadline for the camera-ready paper versions, and I am so glad we were done more than a week ago! That was the most complex submission in my life (granted, I didn’t have a lot of them :)), and the instructions were somewhat confusing.

A couple of days ago a list of accepted Industrial papers had finally appeared on the conference website, so now we can proudly show everybody at work, that “we are there”!

And you know what? It feels really good to see ourselves in the company of Google, Oracle and Teradata!

We still have lots of work to do; I never had to prepare these huge posters, so we need to decide what we are going to put on them; and our idea of all three of us presenting will require a lot of rehearsing… but it is so-so-so exciting! I still can’t believe we did it 🙂

Leave a comment

Filed under events, talks, Uncategorized

Why it wasn’t using this index? Wait, why it WAS using it?…

Postgres does not have any optimizer hints. Which is a curse and a blessing. A blessing – because thanks to that the PG contributors developed one of the best optimizers ever existed. A curse… well, what you do if you know for sure you know the right way, and this damn database just does not get it????

A couple of weeks ago I’ve received an email from one of my co-workers: can you tell my why this index is not used in this query?

The table was something like this:


CREATE TABLE lines(
line_id INTEGER,
class_id INTEGER,
line_type_id INTEGER,
line TEXT,
line_status_id INTEGER,
created_at TIMESTAMPTZ);

Indexes:
lines_pkey PRIMARY KEY (id)
index_lines_on_class_id_and_line_id” (class_id, line_id) WHERE status_id = 1

The query he was running was the following:

SELECT * FROM lines WHERE status_id = 1 ORDER BY line_id ASC LIMIT 1;

First I said: yea, you’re right! There is no way this index can be used for this query. Look – the first field in this index is class_id, so it is first ordered by class_id, and only then – by line_id, so… and actually, before I even said so, my other co-worker suggested to create a reverse order index, which was used by the optimizer.
– So, does it answer your question?- I’ve continued.
– Not exactly, – my co-worker said: – see, if I truncate this table (which happens periodically) and run ANALYZE on an empty table, then for a while the index will be used! But after next scheduled analyze it would stop, and it would use the PK index instead.

Did you guess, what’s the problem with using a PK index in this case? You’re right, the status_id=1 happens extremely infrequently! I’ve asked my co-worker, whether he actually needed to have a class_id in this index, and turned out, that there were other queries, which would be supported by this index, so he was hoping he could use it for both cases, and was wondering, whether he needs two different indexes with different order of columns.

I suggested, that since the class_id was very low-selective (less than ten distinct values), he might be better off completely omitting it, and turned out it was the best solution. The partial index


index_lines_on_line_id" (line_id) WHERE status_id = 1

worked for both query types.

So – problem solved? Yes, but I am still wondering, why 1)this “wrong” index actually worked fast, if the optimizer was forced to use it 2)why it was not using in “on it’s own will”??? My guess is, that with a generally small number of records with status_id=1 the whole index was read into main memory, and then the search would become instantaneous…

… but why the optimizer would not figure it out???

Leave a comment

Filed under SQL

The database developers: who are they and what they have to do with “development”

Interesting question to ask in this blog, right? But recently I’ve being involved in lot’s of conversations regarding whether this profession even exists.

This is actually quite interesting, because the database development started to be a separate profession quite early in my career. Perhaps in the mid 1990s there was still no clear distinction between the database developer and the DBA, but 2000 job search clearly indicated that these two are distinct (although I’ve applied for both). In a couple of years the “Data Architect” became a new thing, and I’ve started to apply for all three (or rather by that time it was more like jobs chasing me, not the other way around).

What was never a question since 1980s, that “programmers” and “database people whatever you call them” are two separate qualifications (although some people can do both).

There is a common belief, that “small companies” like to hire people who can do both, but I remember how I was hired first time to work in the US. The very small startup initially hired a friend of my friend, who could do it all, but after just several months they realized, that they need to hire somebody who specifically knows what are the databases and what they are for.  And I think that this idea goes along with the broader tendency of professionals becoming more and more specialized, and with the general trend (yes, I know, there are always exceptions) that people who specialize in a certain area become more skillful in this specific area (yea, I know it’s not always the case, I can think about many counter – examples right now, when I type this:)).

All this might seem  contrary to what I am doing these days and what I am advocating for. I am a part of the Software Engineering Department, and I believe firmly (more than ever before:)) that my place is there. But I distinguish clearly between actually doing it all and  understanding what other parties are doing (and yes, sometimes you can do without understanding :)).

The main reason why I wanted to write on this subject one more time, however, is not work-related. I was in process of writing a blog post for my company blog about why the fact that our paper was accepted for ICDE 2016 is so-so-so cool. And it’s not only about the fact, that this conference is so selective, and that that’s a great honor and just amazing; but also because it is clear from the reviews, that it is an important step in recognizing that what we are doing is indeed optimization.

To learn, how non-obvious is it, just ask Karthik Ramachandra, how many times his and his team’s work was not recognized because “it’s not about the databases”. Our work is even more on the application side, than Dbridge project, but that’s where the coolest and the most powerful optimization happens. That’s were we can really improve the user’s experience.

When I’ve received a notification that our paper is accepted, first I was just happy and thought about it mostly as my personal achievement, but now I think that recognizing the direction we are working in is even more important – we can write more papers on that subject :). So although I never submitted a workshop proposal which I though about two years ago, I still think I did my share.

Back to where I’ve started. Just look at our paper: people have been asking me who is the main author, but we do not really have one! (well, I did most of nagging and planning, but not of the actual writing). It’s definitely app-plus-db paper, but all three of us have our own piece of work. Actually we are contemplating to make our presentation this way – so that all three of us could present their parts.

 

 

Leave a comment

Filed under events, Systems