Monthly Archives: July 2016

How I learned to love tests: both using and writing ones – part 2

Even when I would reluctantly admit I need to have tests on place, I never understood, why one might want to put the check for the number of tests you want to run in pg_tap. What’s the point? You know how many tests you want to run, so the only thing you need is to count the executions :). And when the number does not match it means that you didn’t count them correctly;)

That’s what I was absolutely sure about… until last week. A week before that I’ve discovered that I’ve mapped one foreign table incorrectly. Or, may be, it got changed and I didn’t notice – I didn’t have proper tests!

Nevertheless, after I fixed the table structure… yes, you are right, a number of tests failed! and since ai’ve added a whole bunch of newly mapped columns (21 of them, to be precise), I had to place 84 more tests… four for each column… and after I did it… and pg_tap reported that I ran less tests than I’ve planned. And my first inclination was to change the “number of tests I want to run”. And I almost did it… but then I thought: I remember I’ve counted! If there are less than 84 new tests, then there are two options: either I counted them incorrectly, or – I misses several tests.

It was not fun at all, going through this huge file with all the tests… but I found the missing ones! And I was so happy again, that somebody forced me to run the tests each time I am committing changes 🙂



Leave a comment

Filed under Development and testing, Team and teamwork

How I learned to love tests: both using and writing ones – part 1

My fellow database developers, let’s be honest: we do not like writing tests. We are not application developers. We do not understand the test-driven development: at the end of the day, how we can figure out what should be the outcome of our functions and stored procedures, when we do not know what data they will be working with?!  When our managers tell us, we should generate the test data, we think its’ the most ridiculous thing in the world, because if we create some data, obviously the results of the testing will be favorable!

I understand, that not necessarily each and single database developer goes to that extreme, but… pretty close. And I will be the first to admit being guilty with the similar attitude. I only believed in testing on a “copy of the real data”, sort of A/B testing, which is important, but not the only thing to be tested.

Especially these days, when the data structure is not “almost always static”, when the changes the application DB are not a rare catastrophe, but a part of normal life of the application. At a minimum you want to have tests which show you that if you change”something” in the database structure, other “something” won’t break. We need those tests. But… it is so boring to write them! It slows our development process sooooo much! Especially, you know, when you have this big project to complete, and each and every half an hour matters!

At least that’s what I was thinking for the past two months working hard to bring our new Data Warehouse live. And promising to myself, I will write the tests… later :).

But my former co-worker, my forever-mentor, current consultant for my company – Chad – have written his test for “his”part  – which is a part of our system, which is responsible for “taking” to the third-party databases. So… on the night of massive changes on the said third-party database, which were not properly communicated in advance, when some parts of my processes started to fail… and when I fixed the data structures to match the new ones… the tests started to fail!

I was not happy :). Not happy at all. I was thinking – why?  Why I have to sit and fix these tests at 11-30 PM?! But guess what. It took me only 45 minutes to fix each and single test, which was touched by the change, and to validate the new data structures. And I was done before the midnight. And guess, how long it took other people to have their parts of the system updated and running with no issues? Almost the whole night, and almost the whole next day! You might laugh at the next statement, but here is it anyway: at that moment I felt very much protected by these tests.

And that’s what the tests are for, aren’t they?!

Leave a comment

Filed under Development and testing, Systems, Team and teamwork

I just hate vertical partitioning!

I get that sometimes vertical partitioning makes sense. If you almost always need just a small subset of the table columns to be selected, then why make the database to read this one wide row? Or even worse – those thousands and hundreds of thousands of the wide rows!

But when/if you choose to vertically partition a table, it’s very important to remember that the new table is not just “another table”, but it is closely related (in 1-to-1 relationship) with the rest of the vertical partitions. Otherwise… some interesting things might happen…

Let’s say you have three tables:

  • loan_header
  • loan_contract_details
  • loan_dates

Each of them represents a part of information, related to a loan as an entity. This is a classical example of vertical partitioning. However, instead of all these tables having the same primary key. they were designed in such a way, that each of them has it’s own PK, and the PK of the loan_header is listed as a foreign key. Which means, that the constraint 1-to-1 got completely lost!

Now imagine what happens, if for a very long time the key for the loan_header will be generated from one sequence, and for the loan_contract_details – from another one, but they were identical, because of in reality there is only one loan_contract_details record. And for tons of reports  you need to join to these tables – by what? By loan_id? By their own id?  Turned out that even in the situation when there was a clear indication of what should be the join criteria, many people who were writing reports where joining by wrong id! And nobody ever noticed this wrongeness! Moreover, since the id’s were somewhat “very close”, it didn’t ever happen that somebody would receive reports without results – they would receive reports with wrong results!

… and you can only imaging how long it would take to figure out this sequence of events in the middle of the night, when the only information you have is something like “some loans have wrong owners”!


Filed under Data management, SQL

And the most interesting part – we are hiring!

For those of you who is following me (and my company) on LinkedIn, this is probably no news – we are hiring, and we are hiring for many different positions.

But this one is the one I am most interested in. Wo what’s between the lines: I need to have a peer on the application side. A senior person with whom I can discuss our development strategy and specific technical solutions, with whom I will be able to have productive discussions.

And yes, I know I want too much, but may be… 🙂


Filed under Companies

PG Open 2016

While I was super busy at my new work, I forgot to publish one more very important update – our (mine and Chad’s) presentation about bitemporal data was accepted to PG Open!

The conference will be in Dallas TX, here is an official website, and here is the program. And I have to mention,  that for some reasons unknown to me the order of authors was changed from how I’ve submitted. And no, not in the alphabetical order.

And yes, when me and Grant where presenting two years ago, it was the same story. So it’s definitely something in Texas 🙂

Leave a comment

Filed under events, talks

What I was doing for the past month, and why I was so busy

I’ve just looked at my stats and realized, that I haven’t being posting anything for an entire month (till yesterday :))! In my defense – I didn’t have a single day off for the past 3 weeks, so this weekend was the first time since the beginning of June when I’ve started to get back to normal, back to life. More or less:)

I’ve spent the whole month of June building a new Data Warehouse for my company. I had a pretty good idea what I want to build, but the timing was really-really tight. I know – not only theoretically, but from my experience, that no matter how carefully you plan, there will be unplanned things which will slow you down, and that basically you need to plan for unplanned 🙂 – but it’s always hard. Especially when you know exactly, what you want to do, and you feel really frustrated when you are limited by the speed with which you can type, and by the time it takes to recompile a function or to execute a query.

That was the first time in my professional career, that I had to build the whole system from scratch, and to build it exactly the way I wanted. Ironically (or, may be not!) it was also the first time I had to sacrifice the “purity” to “what business needs”. No, do not take me wrong, actually, the at the end the business needs things to be done right, but I always hated when I was told: we need to build this thing fast now, and later we will optimize – make it right, etc. I used to always scream, that “later” will never happen, and now I made a conscious decision myself to do exactly that! And I do not regret, because now it’s me who has to control, that “later” will actually happen.

Continue reading

Leave a comment

Filed under Companies, People, Systems, Workplace

Once again about women in science

A friend of mine have sent me this link almost a month ago, but it’s just now that I got to writing about this article in The Guardian.

I liked it a lot; the most important thing it is stressing – women are already doing science, so there may be less need at this point “to encourage girls to do science” The statistics show the actually in many areas of science there are more women that man!

Then the question comes – why in this case there are less women being published?

The New Scientist blames the “choice” to have a family. It points to a study in this month’s American Economic Review that shows women incurring earnings penalties in science if they have children. A recentHouse of Commons science and technology committee report goes into more detail, saying that scientific research careers are dominated by short-term contracts with poor job security – at the very time of life that women need to have children (if they want them). The female postdoctoral scientist faces difficult decisions while stuck on fixed-term contracts before tenure, with very little in the way of institutional support. Women should not have to choose between career and family, says the science magazine. But surely male scientists face similar choices?

Turns out – not. And what follows is something we all knew for a very long time. I remember how may years ago, when I was a consultant at the City of Chicago my single-mom-consultant co-worker used to say: I need stay at home wife!

Not a husband mind you :). So, here is how the article goes:

Apparently not. European social science research shows that male and female scientists often have different types of partners: male scientists more frequently have a stay-at-home partner looking after the children, while female scientists are more likely to have another scientist as a spouse. So male scientists might not need family-friendly working practices to have a successful career but female scientists do. Hence the loss of women in the “leaky pipeline” of scientific careers. And that is to say nothing of the research that found scientists perceived job applicants to be less competent when they had female names.

Sad, but true.

You know what it made me to think about? At ICDE and other conferences of the similar caliber the organizers usually report the submissions and accepted papers stats by countries and regions. Why not to report by gender? Some of my friends have already asked me looking at the pictures from the conference – why there were so little women?!

I understand, that it’s not always easy to derive gender from the name, and I also understand, that you can’t mandate people to submit their gender. But I was thinking that at least when you register for the conference, you might be ask – specify your gender (and you might “prefer not to answer”, there should be always be this option).

Ideally though I would love to see the stats on something like: how many women among the authors how may are the main authors, how many are registered for the conference, how many actually come and who is presenting:)

Leave a comment

Filed under publications and discussions

ICDE 2016: Around NoSQL

The ICDE 2016 finished more than a month ago, and unfortunately I didn’t post even a half of the things I wanted to post about the conference. In the next couple of days I will try to explain why it happened (although some of my friends already know). But there was one topic I really wanted to cover, so I decided to finish this blog post even six weeks later.

The NoSQL.

OK, I understand that that’s a buzz word these days. Moreover, I am even aware of a limited number of situations, when using a NoSQL makes sense. But it bugs me immensely when I hear something to the effect yea, it’s not much you can do with the traditional RDBMS, in order to achieve high performance you need to switch to NoSQL. These statements are related to “oh, you need to build a Datamart – then you need to use Redshift! It’s specifically designed for datamarts! Or – when I was trying one third – party software about a month ago trying to see whether it will help me to consolidate data from different sources, I’ve realized while observing the error messages that they were using Hadoop in between!

So – when I was listening to the presentation NoSE: Schema Design for NoSQL Applications from the University of Waterloo, I could not stop wondering… The presentation was about creating virtual schemas for the noSQL framework, and the presented was and at the end I’ve asked the question. Or rather I made a statement:): I remember the times, when the relational databases were not the industry standard. I remember the times, when you would try to predict which queries would be the most frequent and model you sets or you hierarchy accordingly. And I also vividly remember, how horrible it was when it didn’t work right, and thereby I remember exactly what was the reason everybody embraced the relational model.

So my question is – why we go back in time?

You know what was interesting? That a number of participants of different age and different background approached me after this session and expressed their full support :))


Filed under Data management, talks