Resist the urge to be Normal…

Nifesimi Ademoye
4 min readNov 21, 2023

This article deals with one of the most crucial data modeling principles ‘Normalisation’. I was reading about this particular concept over the weekend and kept coming across the phrase ‘Resisting normalization urges‘ in the book ‘The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling’ then I had an epiphany and came up with the perfect metaphor to explain the principle so I am going to try to explain my thought process by relating this principle to a human experience in a way that makes easy to understand and remember.

First, let’s have a quick overview of the data modeling principle in question then we can talk about how this principle in particular relates to one of the human experiences in my opinion

Normalisation

Image from hackr.io

What exactly does it mean to normalise a database? According to Wikipedia, Database normalisation is the process of structuring a relational database according to a series of so-called normal forms to reduce data redundancy and improve data integrity, From my experience, it is a technique very common among data modelers from operational backgrounds who prefer normalised fact tables and dimension tables i.e. snowflake tables as the argument is that it reduces data redundancy, saves more disk space and is easier to maintain due to one source of change and seems to be the intuitive choice.

Note: Normalisation and Snowflaking(Snowflaking schema) is being used interchangeably

While this might be true in some sense the schema type might necessarily not be the best in terms of ease of use and performance in the long term due to the following reasons:

  • Snowflake schemas can get pretty complex and the presentation of normalised dimension models can lead to a fairly convoluted presentation especially when used by Bi applications.
  • The several tables and joins will inevitably lead to poor query performance
  • Snowflaking makes it harder for the user to browse within a dimension, affecting their ability to understand the relationship between various dimensions and their attribute.
Photo by Clem Onojeghuo on Unsplash

Take note of the last bullet point, The ability to browse this ladies and gentlemen is our segue to how this concept relates to us personally.

Photo by Michael Heise on Unsplash

Have you noticed lately how it seems like we are expected to be connected to all the social networks to avoid missing out on information that can be pertinent to our daily lives or even our careers and how exhausting it feels scrolling through Twitter, Instagram, and newsletters just to stay abreast of current happenings and some of us do this consistently for the fear of missing out on new information that would otherwise have been very useful to our life and I feel like a similar argument can be made for normalized schemas.

At the expense of poor query performance and complex presentation, modelers will normalise their data models because of the fear of data redundancy or ease of change. Research has shown especially for analytical and reporting purposes, that this is not the most efficient way and it is an irrational fear that something as simple as a single deletion in a data cell might cause a chain reaction of errors in other cells across the database if it is not normalised.

Photo by ANDREI ASKIRKA on vecteezy

To sum up, it is important to note that while normalisation techniques are not normally recommended in Analytical or Olap systems, they can be a valuable tool in some instances for example, During a staging phase when the Database administrators or data engineer needs to ensure the data being ingested does not violate the many-to-one relationship, they might load such data into a normalized snowflake version of the dimension table in the staging database before loading the final database. This technique is very valuable in a situation in which the data is gotten from an informal source where verification or correction of data might not have been ensured.

As earlier mentioned, It is not a one-size-fits-all solution. In some cases using normalisation techniques can result in more efficient, adaptive, and performant database architectures while in others it might not be the most suitable use case but as with most situations in life, you will need to strike a balance. Attaining the best outcome will require choosing the best technique for what data model is best and fits the specific requirements of your application.

Photo by Crawford Jolly on Unsplash

--

--