What is the Purpose of Database Normalisation?
Database normalisation is the process of transforming a database design into somehting that adheres to a common standard for databases.
Once this process is followed, which is a standard process in database design, the database is said to be “normalised”.
Why would we follow this process?
Well, a normalised database has many advantages, which I’ll cover in this article.
Prevent the Same Data from Being Stored in Many Places
One of the advantages of normalisation is to prevent the same data from being stored in many places.
Let’s take this table as an example:
We have 5 employees shown here in this “employee” table. Each employee can be in a department, such as Finance, Sales, or Customer Support. Each employee is in a location as well.
We can see that the name of the department is stored in a few rows. The department name is something stored for each customer. But what if we insert a new row? We would need to make sure that the value matches one of the existing values, if we want to keep our data consistent. This could be done with application code, but it would not be very simple.
The same issue occurs for Location. If we want to find out all of the locations, we need to look up all unique values in this column.
This data looks OK, but the values are determined by what was entered. Is Chicago the same as North Chicago or are they different? Are both Chicago locations the same, or are they different offices in the same city?
In a normalised table, the Departments would be stored in its own table:
The Location would also be in a separate table:
This also means the Employee table would just store references to this data:
The application can link the department number in this Employee table to the Department table to find the name to be displayed or selected by the user.
It means the department name and location name is stored only once, avoiding duplicate and inconsistent data.
Prevent Updates Made to Some Data and Not Others
Another reason we should normalise our database is to avoid something called an “update anomaly”, or an issue with updating data.
Let’s say we had our sample data again:
Let’s then say that we want to update the name of one of the departments. We want to update “Finance” to say “Accounting”.
We would update the data so it looks like this:
This means we need to update every value in the table.
What if we miss one? What if someone adds a new value while we are performing this update? What if someone overwrites the value, or does something else to one of the rows, which means our update didn’t work?
We could end up with something like this:
John is in Finance and Tony is in Accounting.
This is a problem.
But, if the data is normalised, like our earlier example, there would be one record in a “department” table that contains the value of “Finance”. We would change this single value to say “Accounting”. All employees that related to the department ID would then show the updated name in the systems that used this.
The employee table is unaffected.
Prevent Deleting Unrelated Data
Another reason for normalising our database is to prevent “delete anomalies”. This is where we delete a record and other information is also deleted — a kind of unintended side effect of what we did.
Let’s return to our original example:
Let’s say we delete Mary Taylor from the system, who has an employee ID of 2.
Now, assuming this is all of the data in the table, we now have no information about Mary. This may be OK.
But, we also lost any record of a Customer Support department.
Ensure Queries are More Efficient
The final reason I’d like to mention that normalisation is useful is that it ensures your queries are more efficient.
Before your data is normalised, it’s all in one table, or a small number of tables. To find the data you need, you’ll have to:
- Select the columns you want from this table
- Use quite a few WHERE clauses
- Probably perform some string manipulation
- Perhaps use a subquery or two to get what you want.
This is because you can’t see how records are related to each other.
However, with your normalised database, you can:
- Select only the columns you need
- Use joins to relate data in several tables, only getting the data you need
- Easily use WHERE clauses to filter data to what you need, likely avoiding subqueries and string manipulation
It’s much easier to query a normalised database than one that is not normalised.
It’s also easier to insert or update data, because we only need to insert or update one set of related data in one table, and not have to specify data for all other related columns.
The only exception would be data warehouses, which use a “de-normalised” design. This design follows another set of design practices and is optimised for querying and not updating. However, that design method is still better than not normalising at all.
One argument for not normalising is that performing too many joins will make the query run slow. I’ve read this in a few places.
In response to that, I would argue that a badly designed database and poor indexing makes a query slower than joins. Joins can actually help a design overall, because of what we’ve mentioned above.
So, I hope this article has given you a better understanding of what the purpose of database normalisation is.
For more information normalisation, check out my step by step guide to database normalisation.