The topic for the February edition of PGSQLPhriday is Relational and Non-Relational Data.
I was a little puzzled that the question “How do you define non-relational data?” is the last one. It only makes sense to answer the first three questions once you clearly define what you are talking about, so that is the first question I will address.
Do we have a formal definition of non-relational data (or relational, for that matter)? If you can’t think of one, there is a good reason for that: “relational” is a characteristic of the model, not the data. Thereby, the only definition we can give would be “The data you can’t represent using the relational model.” But is there anything in the definition of the relational model that limits it to certain types of data? The answer is no, so the best way to interpret the term “non-relational data” would be “the data which can’t benefit from being stored in a relational database.” Most often, it would be documents, images, and other blobs.
If we never need to search inside the document, in other words, we never expect to use any full-text search; in my opinion, there is no reason to store these documents in a database. Like many others, I can recall several cases like the one mentioned by Pat Wright in his blog post. The only thing we need to store in the database is the pointer to the place where the actual document is stored. There is no benefit in storing it in the database.
However, it’s a different story when we need to perform a full-text search. Knowing that PostgreSQL’s full-text search capabilities are not the best tools available on the market, I would always try to justify the necessity of that feature. In many cases, after talking to the end users, I would find out that, in reality, the required search touches a limited number of fields/tables and can be transformed into a dynamically constructed query supported by b-tree indexes. (And we know that nothing can perform better than b-tree indexes!)
Finally – what if we truly need to implement the full-text search? Would I use an external tool, or what PostgreSQL has to offer? My experience with Elastic search was quite negative, mainly because the search database gets behind the actual database, and this delay is often critical. That was a major argument in favor of using PostgreSQL. However, I never had a chance to perform precise measurements, so my opinion is more emotional than scientific.
Since I want to play by the rules and actually publish this blog on Friday, I will leave the topic of using JSON/JSONB for later coverage in a separate post!
Thank you, Ryan Lambert, for the topic, and thanks to everybody who has already contributed to this discussion!