PG Open 2017

It will only happen in September, but I wanted to give all my friends an advanced notice, that I will be there! My talk was accepted, and I’ve confirmed that I will be able to present, so it’s all official.

Please visit the PG Open website, and if you are in the area and want to meet – I am definitely staying for Saturday after the conference!

Leave a comment

Filed under events, talks

On importance of automation: I am migrating my data again

Moving my Data Warehouse to a separate cluster was a big and exhausting project. However, looks like it did not teach me anything – now, when I’ve started to build a staging environment, I’ve realized, that almost nothing was automated. By “automated” I mean, that you should be able to run a set of scripts on a clean database and all objects should be created.

I always had best intentions to build my data warehouse that way, but life would always get on my way in the form of urgent business requests, things, which should have being done yesterday combined with “I will clean it up tomorrow”. Now, when I am building “the same” environment for the third time in a row, I’ve decided, that I will spend extra time on cleaning up all the creation scripts and making them re-runnable, no matter how much time it will take.

Well, it takes tons of time! But now nobody by myself forces me to do things that way, and now I fully and genuinely  understand, how important is it! So it may take me another 2 weeks to finish building the staging environment, but at the end I will not only get an environment, but a process on place as well. Which will make me very proud, even if nobody but me will know 🙂

Leave a comment

Filed under Development and testing

From theory to practice

For the past several months I am implementing the bitemporal framework on the real life objects, not on the lab mice :). And this process was quite a revelation!

I’ve written the functions for bitemporal basic operations almost two years ago, and talked about them on several conferences and workshops. I could not imagine something can go wrong with them – and yet it did. And that’s exactly what happens when all your test cases are cloned lab mice!

One of the first errors I’ve got was an empty assertion interval, and that’s when I’ve realized than we never discussed the relations between transactions and bitemporal operations. Well, a transaction is a transaction, isn’t it? Nobody is supposed to see what’s inside, until transaction is finished – committed or rolled back. So… if there are several modifications (say INSERT, UPDATE and CORRECT for the same logical record) within one transaction… what we are supposed to see when transaction is committed? Just an INSERT, if the first operation was INSERT? But this “won’t be true”!

Yes, but on the other hand, imagine what will happen if we would record the “interim” state, and then later we would like to run a query “as asserted” at some time in the past, and at that exact moment some transactions will be in the uncommitted state? Then we will get results which will be in the inconsistent  status. As of now I didn’t come up with how I want these situations to be handled. I am almost convinced that I want to give a user an option: if you want to be “anti-transactional”, you can :)). But then you’ll need to accept the consequences.

Another set of problems is rather philosophical: do we believe in reincarnation? 🙂 More precisely, if an object is “bitemporally deleted”, and then a new object with the same business key value is created, is this “the same object” or a “new object”? Both ways can be supported, but I think that by default we should assume a “formal approach”, and say the this is “the same” object. And if the real world (i.e. business rules) is such, that the new object is a different object… well, that means, that something else should be included into the business key. For example, if the SSN is reused, then we need an extra piece of information, like person’d data of birth.

Related questions: can we update a deleted (inactive) record? What are the differences between UPDATE and CORRECTION if the date ranges are “equal”?  I can only imagine how many issues like this are just waiting to be discovered!

Leave a comment

Filed under Data management, Development and testing, research, SQL

Not exactly my Alma Meter but still!

The champions are from my hometown, and my actual Alma Mater is among the winners!

ACM Bulletin Archives
May 25, 2017
to view image click on
Russian Team Takes World Champion Title in ACM ICPC Programming Contest

to view image click on Three students from St. Petersburg University of IT, Mechanics and Optics (ITMO) earned the title of 2017 World Champions in the ACM International Collegiate Programming Contest (ICPC). Teams from University of Warsaw, Seoul National University and St. Petersburg State University finished the competition in second, third and fourth places and were recognized with gold medals in the prestigious competition, which ended today in Rapid City, South Dakota.

ACM-ICPC is the premier global programming competition conducted by and for the world’s universities. It is conceived, operated and shepherded by ACM, sponsored by IBM, and headquartered at Baylor University. For more than four decades, the competition has raised the aspirations and performance of generations of the world’s problem solvers in computing sciences and engineering.

At ICPC, teams of three students tackle eight or more complex, real-world problems. The students are given a problem statement, and must create a solution within a looming five-hour time limit. The team that solves the most problems in the fewest attempts in the least cumulative time is declared the winner, with the top 12 teams receiving medals.

ICPC Regional participation included 46,381 students and faculty in computing disciplines from 2,948 universities in 103 countries on six continents. A record 50,145 students and 5,073 coaches competed in ICPC and ICPC-assisted competitions this year.

“As computing increasingly becomes part of the daily routines of a growing percentage of the global population, the solution to many of tomorrow’s challenges will be written with computing code,” said ACM President Vicki L. Hanson. “The ICPC serves as a unique forum for tomorrow’scomputing professionals to showcase their skills, learn new proficiencies and to work together to solve many real-world problems. This international event fosters the innovative spirit that continues to transform our world.”

Full results of the competition are available here.

Read the news release.

Leave a comment

Filed under events, news

How women networking should NOT be organized

There was one small episode during ICDE 2017, and although it has been a month already, I still feel like I want to write about. Here is want happened

Among other booths of different vendors there was (as usual) the Amazon AWS. And one of their reps told me,that on Thursday they are going to have a “women event”, and whether I want to sign up, and if I just could leave my email with them. I told her: well, there is a conference banquet on Thursday, at what time precisely your event is going to be? And she said reassuringly: after the banquet!

Now, the banquet would start at 6PM, and on Wednesday evening I receive the following email:

Hi Hettie,
I wanted to reach out on behalf of AWS and invite you to attend the AWS Women in Engineering Networking Event tomorrow on Thursday, April 20. Our recruitment and engineering teams are coming down from Seattle for the ICDE Conference and we’d love to meet you in-person at our happy hour at Blue Door Winery in San Diego (around 3 miles from the conference venue).
There will be wine tasting, artisanal bites, and a raffle on-site. Please feel free to bring guests, the more the merrier!

I am clicking on the invite, and guess what start time it shows? Yes, you are right – at 6PM.

Let me tell you that. The banquet is the most important social event at any conference, and I would always make a point for the younger generation about the importance of attending a conference banquet. There you can be introduced or just introduce yourself to anybody, you can talk at length with the authors of the papers which were most interesting for you. People just are more relaxed and do not run to attend the next session. And if somebody organized a “women networking event” at the same time – how this should be perceived? Like “kid’s table”?! How much this kind of networking would worth? And if the event organizers didn’t bother to look at the conference program when scheduling this event, it’s even worse…

Fortunately, at least at the first glance, there was not that many women who would trade the banquet for this networking event 🙂

Leave a comment

Filed under events

Dos and Don’ts of the Data Warehouse

In the past couple of months the number of employees in our company have grown significantly. And guess what: almost all of the new employees need access to the Data warehouse!

While we were very small, I used to be able (to have time) to explain each new person, how our Data warehouse is organized, how it is being populated, how data is refreshed, and what you should and should not do. But recently I barely could memorize the names of new employees! And when I overheard one of myexperienced co-workers asking one of the new co-workers: do you know how to join tables?… I’ve realized I owe them some education.

So, last Thursday I gave a presentation about our data warehouse, and it was a big success – for many folks it was the first time realizing “how this thing works”. But un-doubtfully the most popular one was the last slide: what not to do with your database.

Since I think those statements are largely universial, I am going to paste here the contents of the last slide.

  • Although you can’t write anything to the Data Warehouse there are plenty of ways to crush the system,  so use caution.
  • Please use the copies of the core tables for exploration purposes only, do not run big queries on them
  • Please kill any query which runs over 1 min and ask somebody from the IT database group for assistance
  • Do not use temporal tables.
  • Do not create objects in the public schema.
  • Before creating a new report or requesting one, please check what’s already available. The view and mat. views in our Data Warehouse are well- documented

Couple of comments
1. “Over 1 minute” is a surprisingly good estimate. Granted, out Data Warehouse is relatively small now, but most of the time when something is running over 1 min, it indicates that either the join criteria  are not specified correctly, or one or more conditions have very low selectivity, or there is an index missing. In all of those cases an IT person should take a closer look

2. Why avoid using temporal tables? Because they occupy the same space on disk which is used to allocate the intermediate result sets, and at the end of the day slow the things down due to extra IO

3. Why not to create objects in the public schema? Well, because it’s public! Because anybody can create tables in the public schema! And everybody create tables owned by them, which other people can’t access. The public schema should only hold the publicly used functions and such.

I think, the rest is self-explanatory!

Leave a comment

Filed under Data management, SQL, talks

Chicago PUG meetup with Joe Conway

Yesterday was a Day – a day when Joe Conway presented at Chicago PUG. He was talking about the PL/R extension of Postgres, which is really important for out data analysts.

We had a full house:

And everybody were listening to the great presentation:

Continue reading

Leave a comment

Filed under events, People, SQL, talks