Let’s talk about normalization…

Never in my life I would imagine myself writing a blog post about such a scholastic subject, as normalization.

First, we all know what normalization is, we did study the normal forms in our first Database theory class (or SQL Primer, or whatever). Second, after we learned about them, we happily forget all the definitions – because why in the world we may ever need them?!

Well… that’s what I see often happening these days. Fist, I see that the word “normalization” is widely used to describe the things, which are not. And not like it is something to care about, but… but here comes “second”. And second – people often say “this schema is bad, because it’s not normalized, thereby we should normalize it”

This being said – let’s talk about normalization.

We do not really need to know the strict definitions of the first, second or third NF; what we need to know – why they were invented, what they do, and what problems they help to solve (so, if somebody tells you otherwise, ask them, whether they can tell the differences between 3NF, BCNF and 4NF:)). And the problems they help to solve, are the 3 anomalies – INSERT, UPDATE and DELETE anomalies. That’s how redundancy manifests itself. We do not want to deal with these anomalies – we normalize the schema. We do not normalized for optimization purposes, neither we normalized for the sake of saving space – actually,normalization can either increase or decrease the total disk space usage.

In process of normalization we remove some types of functional dependencies between relation attributes. At the end we want all attributes to depend on the relation PK only, but at the same time we do not want to continue to decompose a relation into smaller relations, if there are no “bad” functional dependencies left already. If we do so, we will end up with the “vertically partitioned” schema, rather than the “normalized” schema. It’s important to remember, that when we create a lookup table, for example, a lookup for loan types, and instead of having the words “payday”, “installment”, “line of credit” in the loan_type attribute, we have loan_type_id as 1, 2 or 3, which refers to the table loan_types (1 – payday, 2 – installment 3- line of credit) we actually do not normalize anything.

We do “vertical partitioning”, or, if we want to operate in the realm of relational model rather than an SQL model, we define a domain of values. Which can be done either by creating the “lookup table”, or by introducing a check constraint. If we want to attach some other information to the loan type, it will be a different story, because then we’ll have deal with INSERT-DELETE-UPDATE anomalies.

To conclude, I wanted to discuss one example, which I’ve discussed with a number of people in course of the past week, and since the number of these people exceeded two, I decided it’s worth writing about :).

How you would normalize the table cars(license_plate, make, model)?

Let’s look at the functional dependencies. We assume, that the PK for this table is the license_plate (although it should have been VIN). The other two fields are functionally dependent on the PK.

Id there any dependencies between the other two attributes? Strictly speaking – no, because although models are often unique for the make, it might not always be a case (at least, there is no business rule for this). Otherwise it would be always enough to list model, and the make could be derived from it (which again, is often true, but not always). If it is true, then, indeed it can be decomposed into two tables (license_plate, model) with license_plate being a PK and (model, make) with model being a PK. If not, then the original relation in already normalized. If we want to create a separate table with all valid combinations of (make, model), to make sure we only use valid combinations, then you first need to introduce this table (which will serve as a domain definition), and only after that you schema will become non-normalized šŸ™‚ Which is not to say you shouldn’t do it, it may be the best way to check the validity of the make/model combination, but you need to create this list first

Advertisements

Leave a comment

Filed under SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s