Normalization
is the process of taking data from a problem and reducing it to a set of
relations while ensuring data integrity and eliminating data redundancy
- Data integrity - all of the
data in the database are consistent, and satisfy all integrity
constraints.
- Data redundancy – if data in
the database can be found in two different locations (direct redundancy)
or if data can be calculated from other data items (indirect redundancy)
then the data is said to contain redundancy.
Normalization Avoids
- Duplication
of Data – The same data is listed
in multiple lines of the database
- Insert
Anomaly – A record about an entity
cannot be inserted into the table without first inserting information
about another entity – Cannot enter a customer without a sales order
- Delete
Anomaly – A record cannot be deleted without deleting a record about a
related entity. Cannot delete a
sales order without deleting all of the customer’s information.
- Update
Anomaly – Cannot update information without changing information in many
places. To update customer
information, it must be updated for each sales order the customer has
placed
For
the Normalization process E.F. Codd (1972) developed one technique.
It often performed as a series of tests on a relation to determine
whether it satisfies or violates the requirements of a given normal form.
Functional
Dependencies
Ø Full Dependency
o
In a relation, the attribute(s) B is fully
functional dependent on A if B is functionally dependent on A,
o
but not on any proper subset of A.
Ø Partial Dependency
o
If there is some attribute that can be removed from
A and the dependency still holds.
o Eg. Staff_No, Sname -> Branch_No
Ø Transitive Dependency
o
In a relation, if attribute(s) A->B and B->C,
then C is transitively dependent on A via B (provided that A is not functionally
dependent on B or C)
o
Eg. Staff_No->Branch_No and
Branch_No->BAddress
The Process of
Normalization
Ø Normalization: Formal technique for analyzing a relation based on its
primary key and the functional dependencies between the attributes of that
relation. Ø Often executed as a series of steps. Each step corresponds to a specific
normal form, which has known properties.
Ø As normalization proceeds, the relations become progressively more
restricted (stronger) in format and also less vulnerable to update anomalies.
Here in below fig. there all normal form & relationship
between them is shown.
Unnormalized
Normal Form(UNF)
·
A table that contains one or more repeating groups.
To create an unnormalized table Transform the data from the information
source (e.g. form) into table format with columns
First Normal
Form
1NF - A
relation in which the intersection of each row and column contains one and only
one value.
Converting
from UNF to 1NF:
Ø Select attribute(s) to act as the key.
Ø Identify the repeating group(s) in the unnormalized table which repeats
for the key attribute(s).
Ø Remove the repeating group by
– Entering data into empty columns of rows which contain the repeating
data.
Ø Or by
–
Placing the repeating data along with a copy of the
original key attribute(s) into a separate relation.
Second Normal
Form
Based on the concept of full functional dependency.
A 2NF relation is in 1NF and every non-primary-key attribute is fully functionally
dependent on the primary key.
Converting from 1NF to 2NF:
Ø Identify the primary key for the 1NF relation.
Ø Identify the functional dependencies in the relation.
Ø If partial dependencies exist on the primary key remove them by placing
then in a new relation along with a copy of their determinant.
Third Normal
Form (3NF)
Based
on the concept of transitive dependency.
A
relation that is in 1NF and 2NF and in which no non-primary-key attribute is
transitively dependent on the primary key.
Converting
from 2NF to 3NF:
Ø Identify the primary key in the 2NF relation.
Ø Identify functional dependencies in the relation.
Ø If transitive dependencies exist on the primary key remove them by
placing them in a new relation along with a copy of their dominant.
Fourth Normal
Form
Fourth normal form
(4NF) is a normal form used in database normalization. Introduced by Ronald
Fagin in 1977, 4NF is the next level of
normalization after Boyce–Codd normal form(BCNF). Whereas the second,
third,
and Boyce–Codd
normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency
known as a multivalued dependency. A Table
is in 4NF if and only if,
for every one of its non-trivial multivalued dependencies X
Y,
X is a super-key—that
is, X is either a candidate
keyor a superset thereof.
Multivalued
dependencies
If the column headings in a relational database
table are divided into three disjoint groupings X, Y, and Z,
then, in the context of a particular row, we can refer to the data beneath each
group of headings as x, y, and z respectively. A multivalued
dependency X
Y signifies that if we choose any x actually
occurring in the table (call this choice xc), and compile a
list of all the xcyz combinations that occur in the table, we
will find that xc is associated with the same y
entries regardless of z.
A trivial
multivalued dependencyX
Y is one where either Y is a subset of X,
or X and Y together form the whole set of attributes of the
relation.
Fifth normal
form
(5NF)
Fifth normal form
(5NF), also known as Project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases
recording multi-valued facts by isolating semantically related multiple relationships. A table
is said to be in the 5NF if
and only if every join
dependency in it is implied by the candidate
keys.
A
join dependency
*{A, B, … Z} on R is implied by thecandidate
key(s) of Rif
and only if each of A, B, …, Z is a super-key for R.