Database Normalization
Normalization in database
means keeping data in separate tables to reduce duplication of
information. If data is not normalized, then there can arise
anomalies while insertion, update or deletion as described below:
Consider a table storing
faculty information and the courses they teach.
Suppose the table has
following rows:
Faculty-Id
Faculty-Name
Faculty-City
Course-Name
Insertion-Anomaly:
If a faculty has not yet been assigned course, then insertion of that
faculty into the above table has to be done with null value for
course-name. This is called insertion anomaly.
Deletion Anomaly:
Similarly, if a faculty temporarily ceases to teach any course, then
removing the courses taught by that faculty will lead to the removal
of that faculty itself.
Care must be taken to remove
all but one courses of that faculty and set course-name for last of
his courses to null. This is called deletion anomaly.
Update Anomaly:
Suppose a city in the above table was spelled wrongly and its desired
to change its name from A to B. There is nothing in the table which
prevents us from renaming only some of them to B. Thus its possible
to update some cities to B while an insert operation is going on
which is putting name of city as A. Had the table been normalized,
there would have been a single ID for the city which can be updated
and all the records would reflect the same.
Note:
However, normalization is not always desired as too much
normalization usually means an increase in the number of joins while
reading which can affect performance.
There are several forms of
normalization.
1) 1 NF
2) 2 NF
3) 3 NF
4) BC NF (also known as 3.5
NF)
5) 4 NF
6) 5 NF
Each successive form
includes all the previous normalization forms and adds some more
constraints.
1 NF:
1NF form demands that duplicative columns should not be present in
same row of a table. The non-duplicity of a row is also called as the
atomicity of a table i.e. a table is said to be atomic if it complies
to the above rule.
Example:
Suppose we need to store employees and their manager in a table. Each
employee can have only one manager but a manager can have many
subordinates.
If for every manager row, we
add as many columns as subordinates, then that design breaks the 1NF
form since those columns store the same kind of data.
To conform to the 1NF form
we need to either store manager per employee or have a separate table
which stores manager and the employee IDs.
2 NF:
2NF form demands the reduction of duplicacy across multiple rows.
(1NF removed duplicacy across multiple columns in a single row). For
example, if we are storing city per employee, then 2NF compliant
schema would move city to a new table and place its foreign key in
employee table.
3 NF:
3NF form demands the removal of columns which are not fully dependent
on the primary key i.e. if there are some columns which can be
extracted using some operation on other columns of the same table,
then a 3NF schema would not have those derived columns.
For example: storing total
price in a table which also stores quantity and per-unit price.
Advantage of removing such
columns is that the chances of error are minimized.
Disadvantage of removing
them is that one has to calculate those columns every time a query is
run.
|