Make delicious recipes!

Database Normalization

Normalization in database means keeping data in separate tables to reduce duplication of information. If data is not normalized, then there can arise anomalies while insertion, update or deletion as described below:

Consider a table storing faculty information and the courses they teach.

Suppose the table has following rows:

  1. Faculty-Id

  2. Faculty-Name

  3. Faculty-City

  4. Course-Name

Insertion-Anomaly: If a faculty has not yet been assigned course, then insertion of that faculty into the above table has to be done with null value for course-name. This is called insertion anomaly.

Deletion Anomaly: Similarly, if a faculty temporarily ceases to teach any course, then removing the courses taught by that faculty will lead to the removal of that faculty itself.

Care must be taken to remove all but one courses of that faculty and set course-name for last of his courses to null. This is called deletion anomaly.

Update Anomaly: Suppose a city in the above table was spelled wrongly and its desired to change its name from A to B. There is nothing in the table which prevents us from renaming only some of them to B. Thus its possible to update some cities to B while an insert operation is going on which is putting name of city as A. Had the table been normalized, there would have been a single ID for the city which can be updated and all the records would reflect the same.

Note: However, normalization is not always desired as too much normalization usually means an increase in the number of joins while reading which can affect performance.

There are several forms of normalization.

1) 1 NF

2) 2 NF

3) 3 NF

4) BC NF (also known as 3.5 NF)

5) 4 NF

6) 5 NF

Each successive form includes all the previous normalization forms and adds some more constraints.

1 NF: 1NF form demands that duplicative columns should not be present in same row of a table. The non-duplicity of a row is also called as the atomicity of a table i.e. a table is said to be atomic if it complies to the above rule.

Example: Suppose we need to store employees and their manager in a table. Each employee can have only one manager but a manager can have many subordinates.

If for every manager row, we add as many columns as subordinates, then that design breaks the 1NF form since those columns store the same kind of data.

To conform to the 1NF form we need to either store manager per employee or have a separate table which stores manager and the employee IDs.

2 NF: 2NF form demands the reduction of duplicacy across multiple rows. (1NF removed duplicacy across multiple columns in a single row). For example, if we are storing city per employee, then 2NF compliant schema would move city to a new table and place its foreign key in employee table.

3 NF: 3NF form demands the removal of columns which are not fully dependent on the primary key i.e. if there are some columns which can be extracted using some operation on other columns of the same table, then a 3NF schema would not have those derived columns.

For example: storing total price in a table which also stores quantity and per-unit price.

Advantage of removing such columns is that the chances of error are minimized.

Disadvantage of removing them is that one has to calculate those columns every time a query is run.

Like us on Facebook to remain in touch
with the latest in technology and tutorials!

Got a thought to share or found a
bug in the code?
We'd love to hear from you:

Email: (Your email is not shared with anybody)

Facebook comments:

Site Owner: Sachin Goyal