Tag Archives: databases

Is There a “Best Tool for a Job?”

We often hear the phrase: Choose the right tool for the job. On the surface, there is nothing to argue about. We have tools. Hammers and saws. Mixers and slow cookers. Each tool is designed to accomplish a specific task. Hammers for nails. Screw drivers for screws. Microscopes – for neither of them.

That’s true; however, I remember my grandaunt refusing to use a mixer for merengue. “I can do better with a fork!” she used to say – and she could, indeed! Like nobody else in the family! Or watch me when I come to the Youth shelter to make dinner with the residents and refuse to use a potato peeler. “I can do better with a knife!” – I tell them, and they roll their eyes and do not try to compete.

Back from the kitchen to the data center. There are operating systems, and there are database engines. There are programming languages and QA frameworks. And we do not use one in place of another.
What about “the best programming language for the job”? Do we have a way to decide objectively? What about databases? Can you tell: “This DBMS will be the best choice for you to accomplish your task?”
If your answer is “yes,” I will challenge it.

With a few exceptions and a handful of corner cases, most DBMSs can support virtually any requirements set. I can do anything possible and impossible using PostgreSQL, and my coworkers who are more proficient with Oracle would accomplish the same task using Oracle – faster than it would take them to master new PostgreSQL features.

I think “the best tool for a job” is the one you know best. The reason is that the developer’s time is always the most expensive resource, and if you can achieve a similar quality of the resulting product, the best solution is the one that takes the least time. That said, I do not think any company should support an extensive collection of different DBMSs for “having the best tools for a job.” The only justification for such a situation would be, “We already have developers who use this technology.”

Does it mean that we are stuck with our current technology stack forever? We all know that that’s not true, and changes happen. But I don’t know why 🙂

Postgres community, any thoughts?

Leave a comment

Filed under Development and testing

Why I do not want to work in a startup anymore

For those who have known me long enough and heard me saying, “I want to do one more startup before I retire,” quite often, the title of this post would be at least unexpected. And I won’t swear by it. My life had taken so many unpredictable twists that the only thing I learned very well was “never say never.”

However, recently, I was saying something different: “I hope to stay with this company until I retire.” And once again, I won’t swear by it because life is unpredictable, but I started thinking about what changed my mind so drastically.
It is not only that I enjoy working with everybody in this company (I was fortunate to have wonderful co-workers everywhere I worked), but most of the problems I have to solve here.

A startup’s appeal is that you come to uncharted territory and build everything from scratch. There is nobody to blame for some suboptimal decisions made in the past. You take s responsibility for making these core decisions because you believe your approach is the right one. And if it does not work as expected, you are the first person to notice and the first person to take responsibility for a wrong decision. When you see the revenue growth, you can link it directly with what you’ve done, and if you don’t, you know who to blame!
However, there is a twist. As I’ve said before, I do not enjoy consulting because you never know “what happens next.” You presented a customer with your suggestions, and you do not even know what happened after, whether your suggestions helped in the long run. Moreover, sometimes you do not even know whether they were implemented!
Surprisingly, working in a startup, you can experience something similar. Except for some rare cases, you start with small data volumes, a small client base, and a not-so-busy website. You can implement technology solutions that would be pretty elegant and work perfectly for a while. They may still work perfectly when you have a hundred times larger database and a hundred times more customers. And you are unlikely to reach the volumes of, say, the Bank of America by the time your startup cash out, or you become bored, or something less optimistic happens.

I found immense satisfaction in solving problems of a different magnitude than before. For example, I always tried to productionize and automate database-related tasks. Still, the need for such automation is apparent when you have hundreds of databases to supervise. You need to secure some policies programmatically rather than just tell people, “don’t do that!”
Or take my recent bitemporal partitioning project. The bitemporal model performs so well, even on very large data volumes, that I never had a reason even to start thinking about how partitioning should look like. And just three months into working in my new company, I have this new addition to pg_bitemporal.

The changes might not be as rapid as in the startup environment, but I can still see the impact of my work. And that’s what I like most.

1 Comment

Filed under Companies, Uncategorized

PGSQL Phriday #005: Relational and Non-relational Data

The topic for the February edition of PGSQLPhriday is Relational and Non-Relational Data.

I was a little puzzled that the question “How do you define non-relational data?” is the last one. It only makes sense to answer the first three questions once you clearly define what you are talking about, so that is the first question I will address. 

Do we have a formal definition of non-relational data (or relational, for that matter)? If you can’t think of one, there is a good reason for that: “relational” is a characteristic of the model, not the dataThereby, the only definition we can give would be “The data you can’t represent using the relational model.” But is there anything in the definition of the relational model that limits it to certain types of data? The answer is no, so the best way to interpret the term “non-relational data” would be “the data which can’t benefit from being stored in a relational database.” Most often, it would be documents, images, and other blobs. 

If we never need to search inside the document, in other words, we never expect to use any full-text search; in my opinion, there is no reason to store these documents in a database. Like many others, I can recall several cases like the one mentioned by Pat Wright in his blog post. The only thing we need to store in the database is the pointer to the place where the actual document is stored. There is no benefit in storing it in the database.

However, it’s a different story when we need to perform a full-text search. Knowing that PostgreSQL’s full-text search capabilities are not the best tools available on the market, I would always try to justify the necessity of that feature. In many cases, after talking to the end users, I would find out that, in reality, the required search touches a limited number of fields/tables and can be transformed into a dynamically constructed query supported by b-tree indexes. (And we know that nothing can perform better than b-tree indexes!)

Finally – what if we truly need to implement the full-text search? Would I use an external tool, or what PostgreSQL has to offer? My experience with Elastic search was quite negative, mainly because the search database gets behind the actual database, and this delay is often critical. That was a major argument in favor of using PostgreSQL. However, I never had a chance to perform precise measurements, so my opinion is more emotional than scientific.

Since I want to play by the rules and actually publish this blog on Friday, I will leave the topic of using JSON/JSONB for later coverage in a separate post! 

Thank you, Ryan Lambert, for the topic, and thanks to everybody who has already contributed to this discussion!

Leave a comment

Filed under publications and discussions, SQL

Live Events: Nordic PG Day and PG Day Paris

It was not the first live conference after the pandemic for me – the first one was in New York in December, but it felt like the first one again! Possibly because there was omicron in between, and nobody was sure whether we would continue. 

Another “first’ for me was presenting at the PostgreSQL Europen conference. Previously, I presented at European academic conferences and at US PostgreSQL conferences, but never at European PostgreSQL conferences. Interestingly, I was going to submit a proposal for a European conference at the beginning of 2020… and then nothing happened for obvious reasons. 

I liked everything about both Nordic and PG Day Paris. All talks were very interesting and educational; the room was full all the time, there were lots of questions, and there were a lot of conversations during the coffee breaks, during lunch, and after the events. 

My presentation “Working with Software Engineers” was very well received. This presentation is one of those that has to be presented live, and I was very happy with the questions and the feedback. I made a lot of new connections, and I hope that at least some of them will result in future collaboration.

The presentation can be found here.

My presentation
EDB & Friends outing

1 Comment

Filed under events, talks

About My New Position

Today opens a new chapter in my career: I started as a Director of Data Analytics at BrokerX. I am excited to work with Chad Slaughter again, to start with a new team in a new industry, in a new environment.

Both Chad and I always wanted to change the world; I hope that we will be moving it in the same direction 🙂

What will not change: I passionate about PostgreSQL and about improving the ways databases and applications interact. I continue to lead the Chicago PostgreSQL User Group. Just a reminder – we have a virtual meetup on Tuesday, July 13, and in September, I hope to welcome everybody at our first hybrid meetup. Stay tuned for more updates!

2 Comments

Filed under Companies, news

Somebody has their marking analytics VERY WRONG :)

Leave a comment

January 5, 2020 · 7:51 pm

Funny thing happened…

I have to share that:). I had a ticket to develop one operational report. Not only the report requirements were complex, but also it was very difficult to debug. We just started to collect the data required for this report, and we do not have enough of it to cover all potential issues. Moreover, since the data volumes are so small, the issues are resolved fast, so in two days, I never had a chance to catch a single case of exception. Until this morning. When a thought suddenly came to my mind, and I asked myself: Hettie, why in the world you are waiting for an exception to happen in real-time?! All your tables are bitemporal, so you can time-travel to any moment of the past, including the time, when exceptions occurred!

It’s funny and not funny that it took me two days to figure this out! Especially because I was the one who introduced bitemporality!

Worked as expected 🙂

Leave a comment

Filed under Data management, Development and testing

PostgreSQL And Academia

Recently I’ve been thinking a lot about relationships between the PostgreSQL community and the DB research community. To put it bluntly – these two communities do not talk to each other!

There are many reasons why I am concerned about this situation. First, I consider myself belonging to both of these communities. Even if right now I am 90% in industry, I can’t write off my academic past and writing a scientific paper with the hope of being accepted to the real database conference is something which appeals to me.

Second, I want to have quality candidates for the database positions when I have them. The problem is more than scientists do not speak at the Postgres conferences, and Postgres developers do not speak at the academic conferences. The bigger problem is that for many CS students, their academic research and practical experience to not intersect at all! They study some cool algorithms, and then they practice their SQL on MySQL databases, which as I have already mentioned multiple times, lacks so many basic database features, that it hardly can be considered a database!

If these students practiced using PostgreSQL, they would have a real full-scale object-relational database, not a “light” version, but a real thing, which supports tons of index types, data types, constraints, has procedural language, and the list can go on and on.

It is especially upsetting to see this disconnect since so many database researches were completed on Postgres, for Postgres, with the help of Postgres; R-trees and GIST indexes, to name a couple. Also, the SIGMOD Test of Time Award in 2018 was given to the paper “Serializable isolation for snapshot databases”, which was implemented in Postgres.

I know the answer to the question “why they do not talk?” Researches do not want to talk at the Postgres conferences, because those are not scientific conferences, and the participation in these conferences will not result in any publication. Postgres developers do not want to talk at the CS conferences, because they do not like to write long papers :), and also, even if they do submit something, their papers often are rejected as “not having any scientific value.”

I know the answer. But I do not like it :). So maybe – we can talk about it?!

2 Comments

Filed under research, SQL, Systems

PG Open submission deadline is extended!

To all my Postgres-minded friends and colleagues – the submission deadline for PG Open talks and tutorials has been extended. You have till July 7 to submit your proposal! See updated info.
Please consider, if you didn’t submit yet!

Leave a comment

Filed under talks

Databases are not sexy?

I’ve heard this line from a former colleague of mine. He was explaining why there are so little database people around, why IT students are not specializing in databases. That was his answer – that it not cool. “WEB-designer” sounds cool, “database developer” does not.

Several months passed since I heard that, and I was thinking: I should write a blog post about it! However, honestly – what else I can say except the things I’ve already said in this blog multiple times? That there is nothing more exciting, than exploring the endless possibilities of SQL, that nothing can be more rewarding than applying your magic to the jobs, which runs for an hour, and all of a sudden it runs in less than a minute and produces the same result:)

I suspect that the general public does not think that there is something behind a web page, and when somebody experiences a website slowness, they refer to it as “the internet is slow.” Also, the buzz words like “Big Data” often send a message, that “the databases are outdated,” and that there is something bigger, and cooler, and more fashionable, than just”databases,” which does not help a bit.

As I always like to be practical, and not only state a problem but come up with a way to solve it, I am now thinking about how to bridge this gap. I want to find ways to reach out to college and high school students and to give them some exposure to the Wonderful World Of Data. A couple of years ago when I was attending one of the Big Conferences, I’ve heard a discussion regarding “what our students want to be taught.” That was a time of Big Data just becoming a Big Deal :). Honestly, my opinion is that the student’s interest should not drive a curriculum entirely 🙂 and that that’s the right place to interfere.

Is anybody interested in joining me in this effort?

Leave a comment

Filed under Data management, SQL