Normalization in DBMS

Normalization is the process of taking data from a problem and reducing it to a set of relations while ensuring data integrity and eliminating data redundancy
  • Data integrity - all of the data in the database are consistent, and satisfy all integrity constraints.
  • Data redundancy – if data in the database can be found in two different locations (direct redundancy) or if data can be calculated from other data items (indirect redundancy) then the data is said to contain redundancy.
Normalization Avoids
  • Duplication of Data  – The same data is listed in multiple lines of the database
  • Insert Anomaly  – A record about an entity cannot be inserted into the table without first inserting information about another entity – Cannot enter a customer without a sales order
  • Delete Anomaly – A record cannot be deleted without deleting a record about a related entity.  Cannot delete a sales order without deleting all of the customer’s information.
  • Update Anomaly – Cannot update information without changing information in many places.  To update customer information, it must be updated for each sales order the customer has placed
For the Normalization process E.F. Codd (1972) developed one technique.
It often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form.

Functional Dependencies
Ø  Full Dependency
o   In a relation, the attribute(s) B is fully functional dependent on A if B is functionally dependent on A,
o   but not on any proper subset of A.

Ø  Partial Dependency
o   If there is some attribute that can be removed from A and the dependency still holds.
o   Eg. Staff_No, Sname -> Branch_No

Ø  Transitive Dependency
o   In a relation, if attribute(s) A->B and B->C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C)
o   Eg. Staff_No->Branch_No and Branch_No->BAddress



The Process of Normalization


Ø  Normalization: Formal technique for analyzing a relation based on its primary key and the functional dependencies between the attributes of that relation.Ø  Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.

Ø  As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.

Here in below fig. there all normal form & relationship between them is shown.
Unnormalized Normal Form(UNF)

·       A table that contains one or more repeating groups.
To create an unnormalized table Transform the data from the information source (e.g. form) into table format with columns 
First Normal Form
1NF - A relation in which the intersection of each row and column contains one and only one value.

Converting from UNF to 1NF:

Ø  Select attribute(s) to act as the key.
Ø  Identify the repeating group(s) in the unnormalized table which repeats for the key attribute(s).
Ø  Remove the repeating group by
       Entering data into empty columns of rows which contain the repeating data.

Ø  Or by
       Placing the repeating data along with a copy of the original key attribute(s) into a separate relation.
 Second Normal Form

Based on the concept of full functional dependency.
A 2NF relation is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key.

Converting from 1NF to 2NF:
Ø  Identify the primary key for the 1NF relation.
Ø  Identify the functional dependencies in the relation.
Ø  If partial dependencies exist on the primary key remove them by placing then in a new relation along with a copy of their determinant.
 Third Normal Form (3NF)

Based on the concept of transitive dependency.
A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.

Converting from 2NF to 3NF:
Ø  Identify the primary key in the 2NF relation.
Ø  Identify functional dependencies in the relation.
Ø  If transitive dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their dominant.
Fourth Normal Form

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form(BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X Y, X is a super-key—that is, X is either a candidate keyor a superset thereof.

Multivalued dependencies

If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A multivalued dependency X Y signifies that if we choose any x actually occurring in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we will find that xc is associated with the same y entries regardless of z.
A trivial multivalued dependencyX Y is one where either Y is a subset of X, or X and Y together form the whole set of attributes of the relation.
Fifth normal form (5NF)

Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.
A join dependency *{A, B, … Z} on R is implied by thecandidate key(s) of Rif and only if each of A, B, …, Z is a super-key for R.