Never in my life I would imagine myself writing a blog post about such a scholastic subject, as normalization.
First, we all know what normalization is, we did study the normal forms in our first Database theory class (or SQL Primer, or whatever). Second, after we learned about them, we happily forget all the definitions – because why in the world we may ever need them?!
Well… that’s what I see often happening these days. Fist, I see that the word “normalization” is widely used to describe the things, which are not. And not like it is something to care about, but… but here comes “second”. And second – people often say “this schema is bad, because it’s not normalized, thereby we should normalize it”
This being said – let’s talk about normalization.
We do not really need to know the strict definitions of the first, second or third NF; what we need to know – why they were invented, what they do, and what problems they help to solve (so, if somebody tells you otherwise, ask them, whether they can tell the differences between 3NF, BCNF and 4NF:)). And the problems they help to solve, are the 3 anomalies – INSERT, UPDATE and DELETE anomalies. That’s how redundancy manifests itself. We do not want to deal with these anomalies – we normalize the schema. We do not normalized for optimization purposes, neither we normalized for the sake of saving space – actually,normalization can either increase or decrease the total disk space usage.
In process of normalization we remove some types of functional dependencies between relation attributes. At the end we want all attributes to depend on the relation PK only, but at the same time we do not want to continue to decompose a relation into smaller relations, if there are no “bad” functional dependencies left already. If we do so, we will end up with the “vertically partitioned” schema, rather than the “normalized” schema. It’s important to remember, that when we create a lookup table, for example, a lookup for loan types, and instead of having the words “payday”, “installment”, “line of credit” in the loan_type attribute, we have loan_type_id as 1, 2 or 3, which refers to the table loan_types (1 – payday, 2 – installment 3- line of credit) we actually do not normalize anything.
We do “vertical partitioning”, or, if we want to operate in the realm of relational model rather than an SQL model, we define a domain of values. Which can be done either by creating the “lookup table”, or by introducing a check constraint. If we want to attach some other information to the loan type, it will be a different story, because then we’ll have deal with INSERT-DELETE-UPDATE anomalies.