Monthly Archives: April 2014

Please contribute to my proposal

In case I missed somebody who is interested, I am copying here an email I’ve sent earlier today:

Dear All,

In course of the past 7-8 months I spoke to many researches and practitioners who’s work is related to the restructuring application code for performance improvement rather than optimization of specific queries. Most of you agreed that these activities are hard to place, because they are in some sense in between the database and the application.

I am going to try to organize a workshop with whatever international database conference will accept this proposal. (EDBT 2015 is one of the candidates). Attached is the first draft of this proposal with the exception of particular conference – specifics (like deadlines and other)

Please let me know :

– are you, your research group, your organization are/will be interested in participation?
– interested in submitting a paper or demo?
– any additions/suggestions/ improvements to the proposal draft?
– do you know any other individuals/groups/organizations who may be interested?
– do you want to be on the mailing list for this perspective workshop?
– any ideas/candidates for invited talk(s)

Thank you! Henrietta

First International Workshop on Holistic Database Application Tuning (HDAT)

2 Comments

Filed under Data management, talks

More on ICDE 2014

A couple of things, which made this conference for me so important.

As I’ve already mentioned, I was not registered to participate, and in the past in situations like this I was always shy keeping low profile and getting embarrassed when people asked me what’s my name and where I am from. Not even thinking about participating in any discussions.

This time it was very different. I had no problem telling people, that $900 conference fee is outrageous, and that my company just paid all my EDBT-related expenses, and since I am only a couple of blocks away from the conference venue, I decided I can just walk in…

I went to almost all industrial sessions, and I actually had things to say about many of the presentations, and later people would come to talk to me during the coffee breaks. I’ve realized, that all my problems were in my head, not in my badge, or the fact that I was missing one. And that I’ve grown to all new professional level.

I liked a panel discussion about in-memory databases quite a bit; and a number of people agreed with me, that to hit cache many-many times is in no way better than hitting the disk :). But the most exciting thing that happened to me was, that I met in person Karthik Ramachandra from IIT Bombay – I found the works of his group while I was collecting information on “related work” for my own paper. To be honest, I didn’t just randomly find their work, I had some insider information, but I am not supposed to disclose it :). In any case, these guys are doing lots of cool things in a manner very similar to what I am trying to do, but staying within SQL user-defined-functions. They have some automation in place, and that’s what I am desperately trying to do. We shared our experiences in how we were trying to make our respective work recognized, and realized that the CS community at large finds it difficult to place this kind of work, and everybody tends to say “this is not our’s”.

This being said, now I am even more determined to submit a workshop proposal on “this thing”. And since I finally have to name “this thing”, I am going to adopt a term used by my new friends in Bombay and to call “it” Holistic Database Applications Tuning (or HDAT). So, we know who is HD, but we still have to find out, who is “AT” :).

Stay tuned! 🙂

Leave a comment

Filed under People, talks

A strange bug (or not?) in Postgres optimizer

Actually, I do believe it’s a bug, but some people commented to me, that this is not a bug, but a way to optimize queries. So, see for yourself.

That’s what I need to do: to select all personal information (name, e-mails, all phones) for a person (people) who’s phone number starts from a certain sequence of digits. The phone numbers unlike in the previous case I stored in a separate table “phones”, where there may be multiple active phones for one customer (the phone type and the validity period are indicated). So, one more time once we found a customer, who has at least one of the phones satisfying the search criterion, we select all phone number for him(her), matching and not matching.

The SQL in question is generated by the Postgres function, and originally I was planning on generating something like this:

SELECT
FROM people p
INNER JOIN accounts a on p.person_id=a.person_id
...
INNER JOIN phones ph ON a.account_id=ph.account_id AND ph.end_date is null
INNER JOIN emails eh ON a.account_id=eh.account_id AND eh.end_date is null
INNER JOIN phones ph_s ON a.account_id=ph_s.account_id AND ph_s.end_date is null
AND ph_s.phone_number like '84732%'

In this case the last joined table will be different depending on the search condition.I was absolutely sure, that the order of joins does not make any difference for the optimizer (that’s what I was always told!), and that the Postgres optimizer should be smart enough to figure out that the last join should be applied first. But – NO! In was executing for several seconds, and when I looked at the execution plan, I saw that the joins were performed in the listed order!

And when I changed the generated SQL to be like this:

SELECT
FROM people p
INNER JOIN accounts a on p.person_id=a.person_id
INNER JOIN phones ph_s ON a.account_id=ph_s.account_id AND ph_s.end_date is null
AND ph_s.phone_number like '84732%'
....

INNER JOIN phones ph ON a.account_id=ph.account_id AND ph.end_date is null
INNER JOIN emails eh ON a.account_id=eh.account_id AND eh.end_date is null

it worked just fine with total execution time 30-40 ms!

Do you think it’s a bug? I think so! And we should ask Mr.Haas about it!

 

11 Comments

Filed under SQL

ICDE Conference 2014 in Chicago

I didn’t even know, this conference is going to be happening in Chicago, jet along didn’t even dream of attending it. Especially because my company just paid for my participation in EDBT – I couldn’t dare to ask them to pay another $900.

But – there were good news. My husband was informed, that he has a perfect opportunity to present Saint Petersburg as a place to held ICDE 2016. We learned about this just the day I was leaving on my previous trip – Feb 14, and this meant that I had to change my trip plans for March. It was crazy and stressful, but I’ve realized what a perfect opportunity this is presenting. I mean, I was happy and proud for my city, and I was hoping we are going to get our proposal accepted, and I’ve also realized I can just walk in and listen to all the talks I would like to.

This being said, I’ve put my EDBT badge on instead of ICDE badge, which I didn’t have, and… started to explore. First wonderful thing which happened to me was that I met a couple of people, who knew me since first ADBIS conferences in Moscow, since 1994-1996, and it was so happy that people actually did remember me from my “previous life”, and I felt at home right away.

I’ve also renewed some acquaintances I’ve made through the last couple of years when I started to go to the conferences again. And also, in contrast to EDBT I’d like to say that people were way more social here, they knew when to put their phones away and get engaged into real-life conversations. I think this just shows the maturity of the crowd 🙂

Both keynotes on ICDE were really awesome, and I’ve especially liked the first one – by Anastasia Alemaki. It always makes a huge difference, when a person delivering a talk, works in real industry or realy scientific research (not CS research, I mean). I know that some people were wondering, “what is so special about what she’s doing”, and why she is doing all this “non-pure-db-things”. But she actually shows, how to be in touch with reality, and how to deliver the end users something valuable… which is not always the case.

Leave a comment

Filed under People, talks

I can make some stupid mistakes, too!

Have being looking at this SQL for 40 minutes, not understanding the reason Postgres is saying “schema eh_s does not exist”…..

inner join email_address_history eh_s
on a.account_id=eh_s.account_id and eh_s.end_date is null
and eh_s.lower(email_address) like 'kriso%'

2 Comments

Filed under SQL

I am going to be in Austin in May

I will be in Austin May 12-13, and will be talking at Austin PUG on May 13 – if for anybody this is closer than Chicago 🙂

Leave a comment

Filed under Uncategorized

Why this SELECT is bad, and how you can tell without actually running it

So, about this SELECT, which I was discussing with many of my friends and coworkers over the past week and promised to write a blog post. Yea, about this ugly SELECT.

First, what is the purpose of this select? When a call from a customer comes through, the system should be able to find this customer (if this is customer, who’s already in the system, of cause) and display all customer information right away. Which is a common task which has been successfully solved in many call centers across the world.

However, in this particular case there is one problem: the customer data is not sanitized, meaning, although in most of the cases the phone is stored in format without any dashes or spaces, there us a limited number of cases, when this is not correct, namely there is no country code in the front. (Less than 1% of all phones).

The SELECT statement which was put on place to search for a phone, was the following:

SELECT c.id, c.brand_id, p.first_name, p.last_name, p.country_cd
FROM cnu.customers c INNER JOIN cnu.people p
ON c.person_id = p.id
AND ( (POSITION('5027677730' IN p.home_phone) > 0)
OR (POSITION('5027677730' IN p.mobile_phone) > 0)
OR (POSITION('5027677730' IN p.work_phone) > 0) )

So, what’s the problem with this statement?

Continue reading

Leave a comment

Filed under Data management, SQL