Tag Archives: databases

Why I like so much what I am doing

Many years ago, when I was about to graduate from the University, my not-then-husband asked me, what I wanted to do with my professional life: to write “smart” papers about how-everything-should-be, or to do something real? Because it was quite obvious, which answer he had expected at that time, I’ve answered: of cause, the latter one!

But speaking seriously, that was my goal through all my professional life. Yes, I do write the “smart” papers about how things should work, but all these discoveries are of little interest to me until I can make a practical usage out of them, and until I can prove, that what I think is right actually changes things for better.

I like to say, that “a database is a service”. There is nothing else in the world of information technology which is more remote from the end user, than the database internals. Our work manifests itself in a very not-so- straightforward way. And when the the absolutely theoretical approaches which I’ve developed, actually work the best possible way – there is nothing more exciting.

In the system which I am building right now, which is more than just an app, but the whole system, which includes interaction between different online services and the data warehouse(s) I am implementing all the ideas, which has been important for me for most of my professional career.

I am using the bitemporal model I talked so much about through the past two or three years, and it is fascinating to see that things I was hoping will work and have some value to the business, actually produce value!

I work with application developers to bypass the ORM, and to use the output of the database functions for the most efficient communication with the data storage. I did this many times before, but never before I’ve experienced that level of cooperativeness.

I am using the foreign data wrappers in a most extended manner, and literally eliminate the gap between the application databases and the data mart.

Everything I wanted to accomplish in different periods of my professional life – everything is coming together, and I can see that the results are coming out really … how I wanted them to be :).  And I can’t allow it to be different.

Leave a comment

Filed under Data management

Postgres conference in Chicago Nov 9 2017

It is my pleasure to advertise an event, which will be happening in Chicago in November:  2Q PGCONF 2017.

This conference is organized by the 2ndQuadrant, and it will be held in two locations: New York Nov 7-8 and  Chicago Nov 9. Participants can register and/or submit their talks for each of the locations separately.

If you ask me, I think that a one-day conference is great thing. It’s  much more doable, then several days conference, and your manager is way more likely to agree to you being one day away from work, than several :). This being said…

– Please consider participation (please register on the web site)

– Please consider submitting a talk (the deadline in Aug 22!)

– Please help us to find sponsors!

 

 

 

 

Leave a comment

Filed under events, news

Please join us this Thursday for a very special meetup!

Attention Chicago Postgres users, developers, DBAs and everybody who knows what the word “Postgres” means! In the unlikely event you did not hear about it already – this day is coming! Bruce Momjian will be our guest at the July meeting of the Chicago PostgreSQL User Group, and I do not think I need to say anything else! We are just excited that this is finally happening!

Please RSVP, if you are planning to come, and didn’t RSVP yet – we have a new person at the building reception, and I need to give her a guest list! Also, just for this meetup we will extend the time till 9 PM, so that everybody could enjoy the conversation.

Hope to see you there!

Leave a comment

Filed under events, People, talks

Revisiting the old code (or not)

This blog is becoming more a collection of mistakes I’ve made than anything else, but I believe learning from other people’s mistakes is important. So each time I do something not-so-smart, I am sharing it here.

I was not sure how to call this post, and still not sure the name reflects the contents, so let me proceed to the story:). It has been over a year since I’ve started to rewrite the pieces of an old system, one by one. And granted in the very beginning I didn’t know the data so well, so after a year in production I could rewrite most of them much better.

But what is more important, that data itself has changed as well. One of the important changes was that a year ago we were using two external service providers for loans processing, and now for several months we are not using one of them (except of the servicing of the old loans). But it turned out, that I had a step in my code (which BTW had to be executed every two hours!) which would try to fill in the ID from this old system which we are not using anymore – for all records, which do not have this ID assigned! Which means, (since we do not use this system) that every two hours I was scanning all records – for nothing!

After I commented out this loop, the execution time for the whole process became pretty much invisible.

… now – how I should title this post?!

2 Comments

Filed under Data management, Development and testing

On importance of automation: I am migrating my data again

Moving my Data Warehouse to a separate cluster was a big and exhausting project. However, looks like it did not teach me anything – now, when I’ve started to build a staging environment, I’ve realized, that almost nothing was automated. By “automated” I mean, that you should be able to run a set of scripts on a clean database and all objects should be created.

I always had best intentions to build my data warehouse that way, but life would always get on my way in the form of urgent business requests, things, which should have being done yesterday combined with “I will clean it up tomorrow”. Now, when I am building “the same” environment for the third time in a row, I’ve decided, that I will spend extra time on cleaning up all the creation scripts and making them re-runnable, no matter how much time it will take.

Well, it takes tons of time! But now nobody by myself forces me to do things that way, and now I fully and genuinely  understand, how important is it! So it may take me another 2 weeks to finish building the staging environment, but at the end I will not only get an environment, but a process on place as well. Which will make me very proud, even if nobody but me will know 🙂

Leave a comment

Filed under Development and testing

From theory to practice

For the past several months I am implementing the bitemporal framework on the real life objects, not on the lab mice :). And this process was quite a revelation!

I’ve written the functions for bitemporal basic operations almost two years ago, and talked about them on several conferences and workshops. I could not imagine something can go wrong with them – and yet it did. And that’s exactly what happens when all your test cases are cloned lab mice!

One of the first errors I’ve got was an empty assertion interval, and that’s when I’ve realized than we never discussed the relations between transactions and bitemporal operations. Well, a transaction is a transaction, isn’t it? Nobody is supposed to see what’s inside, until transaction is finished – committed or rolled back. So… if there are several modifications (say INSERT, UPDATE and CORRECT for the same logical record) within one transaction… what we are supposed to see when transaction is committed? Just an INSERT, if the first operation was INSERT? But this “won’t be true”!

Yes, but on the other hand, imagine what will happen if we would record the “interim” state, and then later we would like to run a query “as asserted” at some time in the past, and at that exact moment some transactions will be in the uncommitted state? Then we will get results which will be in the inconsistent  status. As of now I didn’t come up with how I want these situations to be handled. I am almost convinced that I want to give a user an option: if you want to be “anti-transactional”, you can :)). But then you’ll need to accept the consequences.

Another set of problems is rather philosophical: do we believe in reincarnation? 🙂 More precisely, if an object is “bitemporally deleted”, and then a new object with the same business key value is created, is this “the same object” or a “new object”? Both ways can be supported, but I think that by default we should assume a “formal approach”, and say the this is “the same” object. And if the real world (i.e. business rules) is such, that the new object is a different object… well, that means, that something else should be included into the business key. For example, if the SSN is reused, then we need an extra piece of information, like person’d data of birth.

Related questions: can we update a deleted (inactive) record? What are the differences between UPDATE and CORRECTION if the date ranges are “equal”?  I can only imagine how many issues like this are just waiting to be discovered!

Leave a comment

Filed under Data management, Development and testing, research, SQL

Dos and Don’ts of the Data Warehouse

In the past couple of months the number of employees in our company have grown significantly. And guess what: almost all of the new employees need access to the Data warehouse!

While we were very small, I used to be able (to have time) to explain each new person, how our Data warehouse is organized, how it is being populated, how data is refreshed, and what you should and should not do. But recently I barely could memorize the names of new employees! And when I overheard one of myexperienced co-workers asking one of the new co-workers: do you know how to join tables?… I’ve realized I owe them some education.

So, last Thursday I gave a presentation about our data warehouse, and it was a big success – for many folks it was the first time realizing “how this thing works”. But un-doubtfully the most popular one was the last slide: what not to do with your database.

Since I think those statements are largely universial, I am going to paste here the contents of the last slide.

  • Although you can’t write anything to the Data Warehouse there are plenty of ways to crush the system,  so use caution.
  • Please use the copies of the core tables for exploration purposes only, do not run big queries on them
  • Please kill any query which runs over 1 min and ask somebody from the IT database group for assistance
  • Do not use temporal tables.
  • Do not create objects in the public schema.
  • Before creating a new report or requesting one, please check what’s already available. The view and mat. views in our Data Warehouse are well- documented

Couple of comments
1. “Over 1 minute” is a surprisingly good estimate. Granted, out Data Warehouse is relatively small now, but most of the time when something is running over 1 min, it indicates that either the join criteria  are not specified correctly, or one or more conditions have very low selectivity, or there is an index missing. In all of those cases an IT person should take a closer look

2. Why avoid using temporal tables? Because they occupy the same space on disk which is used to allocate the intermediate result sets, and at the end of the day slow the things down due to extra IO

3. Why not to create objects in the public schema? Well, because it’s public! Because anybody can create tables in the public schema! And everybody create tables owned by them, which other people can’t access. The public schema should only hold the publicly used functions and such.

I think, the rest is self-explanatory!

Leave a comment

Filed under Data management, SQL, talks