Category Archives: Development and testing

The best things from 2017 and what I am looking forward to in 2018

When I am talking to people about the year 2017, and what was important, I would repeat over and over: everything I was striving for during my 30+ years of professional career had happened. All wishes have come true. Which would inevitably lead to the question: well, Hettie, what are you going to next then?

And there are plenty and plenty of things I want to do, but first I wanted to reflect on 2017 one more time. I’ve already listed multiple times all our technical accomplishments, all the wonderful things I was able to implement in a real production environment.  But when I look back at what was the best, it is definitely working with our new tech team. As I am reiterating over and over again, “the database is a service”, and whatever we are doing inside the database, can only make any impact, if our results can be utilized by “somebody” – by our end users.

And most of the time our end users are  application developers.  I can’t imagine any serious database development without continuous interaction with the rest of the development team. For years my “golden standard” for the teamwork has been my work for New York Department of Education, where I had a full support of the tech leadership, which helped me to defend my approaches and produce the results I am still happy about.

But what has being going on in the past several months is even better. The way we discuss the user stories. The way we make decisions on what should go into the application, and what – into the database. The way we debug. The way we discuss, what is critical, and what’s not. The willingness of each of the team members “to go extra mile”, to adjust in order to make others job easier. Basically, whatever I could imagine for the teamwork being perfect, is there. 🙂

Being in this industry for 34 years, I know that nothing lasts forever:). I know that every tea,, every organization evolve, and that nothing is granted forever. But I also learned to be thankful for what’s going on right now, and enjoy the moment.

Looking forward to 2018 – I hope that our team will continue to be the most amazing team ever. I am looking forward for the stress -test of our new applications with high data volumes. After all, I was designing the data storage and data access to be super-scalable, and I hope that it will work this way.

I am looking  forward to use the performance data we’ll obtain to improve our bitemporal library, and I already have some ideas of what I exactly I want to do. Actually, I have new technical ideas on almost everything I’ve developed through the last year. I still didn’t get notifications on whether any of the papers I’ve submitted for the next year conferences were accepted,  but I know that both are very good papers :), so even if none will get accepted… I will try for another conference!

 

 

 

 

Advertisements

Leave a comment

Filed under Development and testing, SQL, Systems, Team and teamwork

Revisiting the old code (or not)

This blog is becoming more a collection of mistakes I’ve made than anything else, but I believe learning from other people’s mistakes is important. So each time I do something not-so-smart, I am sharing it here.

I was not sure how to call this post, and still not sure the name reflects the contents, so let me proceed to the story:). It has been over a year since I’ve started to rewrite the pieces of an old system, one by one. And granted in the very beginning I didn’t know the data so well, so after a year in production I could rewrite most of them much better.

But what is more important, that data itself has changed as well. One of the important changes was that a year ago we were using two external service providers for loans processing, and now for several months we are not using one of them (except of the servicing of the old loans). But it turned out, that I had a step in my code (which BTW had to be executed every two hours!) which would try to fill in the ID from this old system which we are not using anymore – for all records, which do not have this ID assigned! Which means, (since we do not use this system) that every two hours I was scanning all records – for nothing!

After I commented out this loop, the execution time for the whole process became pretty much invisible.

… now – how I should title this post?!

2 Comments

Filed under Data management, Development and testing

That’s what happens when you have two copies of the same git repo on your laptop!

One of the most embarrassing things ever happened to me: for two weeks I was trying to figure out why one script does not work like expected… only to find out (yes, in two weeks!), that I have been modifying one copy and running another!!!

Leave a comment

Filed under Development and testing

On importance of automation: I am migrating my data again

Moving my Data Warehouse to a separate cluster was a big and exhausting project. However, looks like it did not teach me anything – now, when I’ve started to build a staging environment, I’ve realized, that almost nothing was automated. By “automated” I mean, that you should be able to run a set of scripts on a clean database and all objects should be created.

I always had best intentions to build my data warehouse that way, but life would always get on my way in the form of urgent business requests, things, which should have being done yesterday combined with “I will clean it up tomorrow”. Now, when I am building “the same” environment for the third time in a row, I’ve decided, that I will spend extra time on cleaning up all the creation scripts and making them re-runnable, no matter how much time it will take.

Well, it takes tons of time! But now nobody by myself forces me to do things that way, and now I fully and genuinely  understand, how important is it! So it may take me another 2 weeks to finish building the staging environment, but at the end I will not only get an environment, but a process on place as well. Which will make me very proud, even if nobody but me will know 🙂

Leave a comment

Filed under Development and testing

From theory to practice

For the past several months I am implementing the bitemporal framework on the real life objects, not on the lab mice :). And this process was quite a revelation!

I’ve written the functions for bitemporal basic operations almost two years ago, and talked about them on several conferences and workshops. I could not imagine something can go wrong with them – and yet it did. And that’s exactly what happens when all your test cases are cloned lab mice!

One of the first errors I’ve got was an empty assertion interval, and that’s when I’ve realized than we never discussed the relations between transactions and bitemporal operations. Well, a transaction is a transaction, isn’t it? Nobody is supposed to see what’s inside, until transaction is finished – committed or rolled back. So… if there are several modifications (say INSERT, UPDATE and CORRECT for the same logical record) within one transaction… what we are supposed to see when transaction is committed? Just an INSERT, if the first operation was INSERT? But this “won’t be true”!

Yes, but on the other hand, imagine what will happen if we would record the “interim” state, and then later we would like to run a query “as asserted” at some time in the past, and at that exact moment some transactions will be in the uncommitted state? Then we will get results which will be in the inconsistent  status. As of now I didn’t come up with how I want these situations to be handled. I am almost convinced that I want to give a user an option: if you want to be “anti-transactional”, you can :)). But then you’ll need to accept the consequences.

Another set of problems is rather philosophical: do we believe in reincarnation? 🙂 More precisely, if an object is “bitemporally deleted”, and then a new object with the same business key value is created, is this “the same object” or a “new object”? Both ways can be supported, but I think that by default we should assume a “formal approach”, and say the this is “the same” object. And if the real world (i.e. business rules) is such, that the new object is a different object… well, that means, that something else should be included into the business key. For example, if the SSN is reused, then we need an extra piece of information, like person’d data of birth.

Related questions: can we update a deleted (inactive) record? What are the differences between UPDATE and CORRECTION if the date ranges are “equal”?  I can only imagine how many issues like this are just waiting to be discovered!

Leave a comment

Filed under Data management, Development and testing, research, SQL

OK, we know what to do with very large files, but what if there are too many?

About two weeks I wrote a post about dealing with very large csv files. The proposed solution was to break this very large file into smaller files and then merge together the results of mapping. This was definitely a doable solution, however since we had to break this one very large file into 37 smaller files, it involved having 37 foreign table on our proxy server, and another 37 foreign tables in the staging area of the data warehouse, and  also 37 matching materialized views….

And if this is not enough, soon I found out, that for each marketing campaign the number of files may vary. And later I’ve learned, that the past campaigns should be stored permanently… although later this requirement was revoked. But some other results had to be stored forever…

I any case, the problem I was facing was that there is no way you can stay sane doing 37 copy/paste… or making changes in the 37 places… So I’ve realized, that I need to automate the process of the said foreign tables and materialized views creation.

Once again, there is nothing better than an ability to generate SQL inside the function and then execute it. You can’t make a table name a function “parameter”, but… you can, if you generate a DDL statement inside it.  I was very happy when I’ve realized, that in the situation when requirements change so often it really does not make much of difference, whether you refresh a materialized view, or you drop it and create again. That’s what my refresh functions now do, and this allows to make changes to the logic really easy. Well… relatively easy. As easy as you can make it :).

Honestly if i won’t come up with my first automated generation about two weeks ago, I will be way more upset each time I need to make changes. But by know I’ve parametrized literally everything :). Tomorrow we’ll test, whether somebody except me can execute this super-flexible set of functions – because I am taking vacation next week – for real!

 

 

Leave a comment

Filed under Development and testing

I honestly hope it’s not a bug!

When I’ve shared this discovery with my co-workers, they told me I’d better not write any blog post about this, because god forbid it is actually a bug, and somebody will fix it, and we’ll loose this feature.  But I hope it’s not!

… For years I’ve being complaining about the fact, that Postgres functions are atomic, meaning there is no way to have transactions inside the function, thereby it’s impossible to commit intermediate results, it’s always either all or nothing. Not like I really wanted to have the checkpoints and such, but processing huge data volumes without the  option of committing the intermediate results is at least challenging. You are bound to have long-running transactions, extensive locks and such. I really missed this option I had with Oracle functions to be able to commit each 100,000 records….

For a while I’ve being asking the lead Postgres contributors, “how much longer”,  and for a while they were replying – in the next release, until they just stopped replying…

So, the other day I was testing my new function, which is building a table out of multiple materialized views, and for each INSERT I have a prepared statement, which is executed by a single EXECUTE operator. When the execution crashed, because one of the materialized views which meant to be on place was not, I was think: well… now I need to start all over again… and to my surprise I saw, that all inserts which happened before this crash, persisted!

So, let’s re-iterate. If might be the same function, but if the SQL statements are executed as generated statements using EXECUTE operator, each of the executions will be treated as a separate transaction! Which is pretty awesome, keeping in mind that we need to insert over 18 million records! And no, I do not mean I am going to insert 18 million times 🙂

Leave a comment

Filed under Data management, Development and testing, SQL