Tuesday, April 19, 2011

Normalization

Normalization

Normalization is the process of efficiently organizing data in a database.
"Normalization" refers to the process of creating an efficient, reliable, flexible, and appropriate "relational" structure for storing information.

Goals:

  1. Eliminating redundant data.
  2. Ensuring data dependencies make sense. (only storing related data in a table).

Advantages:

  1. Cures the ‘SpreadSheet Syndrome’.
  2. Store only the minimal amount of information.
  3. Remove redundancies.
  4. Restructure Data.
  5. Decreased storage requirements.
  6. Faster search performance.
  7. Improved data integrity.

Types:

A series of logical steps to take to normalize data tables.
  1. First Normal Form (1NF)
  2. Second Normal Form (2NF)
  3. Third Normal Form (3NF)
  4. Boyce-Codd Normal Form (BCNF)
  5. Fourth Normal Form (4NF)
  6. Fifth Normal Form (5NF)
& many more(not in use).

First Normal Form

A table is considered to be in 1NF if all the fields contain only scalar values . i.e; All columns (fields) must be atomic : no repeating items in columns.
Here, Ord no & date are non-atomic attributes,(Not in 1 NF).
Non-atomic attributes are eliminited by filling with atomic values where there are non-atomic values are present.

First Normal Form Decomposition

  1. Place all items that appear in the repeating group in a new table.
  2. Each row must be unique - Designate a primary key for each new table produced.
  3. Duplicate in the new table the primary key of the table from which the repeating group was extracted or vice versa.
Before learning about ,You must be aware of Functional dependencies

Functional Dependencies:

If one set of attributes in a table determines another set of attributes in the table, then the second set of attributes is said to be functionally dependent on the first set of attributes.
Ex: Table Scheme: {ISBN, Title, Price}
Functional Dependencies: {ISBN}  {Title}
{ISBN}  {Price}

Second Normal Form

For a table to be in 2NF, there are two requirements
  1. If a data item is fully functionally dependent on only a part of the primary key, move that data item and that part of the primary key to a new table.
  2. If a data item is fully functionally dependent on only a part of the primary key, move that data item and that part of the primary key to a new table.
  3. If other data items are functionally dependent on the same part of the key, place them in the new table also.
Let us consider the above table
Let x={ ord no.,prod no.} -> y={date}, {prod name}, {Qty}, {U. price}.

Therexists, z ={ord no.} c x -> {date}

Therexists, z ={prod no.} c x -> {prod name}

This suffers from functional dependencies.
  1. To convert this relation into 2 NF, we must eliminate the partial functional dependencies by decomposing the relation into several no. of relations.
  2. We must create a separate relations for the attributes which causes the partial functional dependencies.

Third Normal Form

A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key. An attribute C is transitively dependent on attribute A if there exists an attribute B such that: A ® B and B ® C.
Steps to convert 2 NF into 3 NF : we have unitCode as our primary key, we also have a courseName that is dependent on courseCode and courseCode, dependent on unitCode. Though couseName could be dependent on unitCode it more dependent on courseCode, therefore it is transitively dependent on unitCode.

BCNF (Boyce - Codd Normal Form)

A relation schema R is in BCNF if for every set of attributes
X in R and attribute A in R, X is a superkey whenever X A.
--- no attribute is transitively dependent on a key

R1=(Prof, S#) K1=(Prof S#)

R2=(Prof, C#) K2=(Prof)

--- R1 and R2 are both in BCNF

Fourth Normal Form

To be in Fourth Normal Form, a relation must first be in Boyce-Codd Normal Form. Additionally, a given relation may not contain more than one multivalued dependency.

Fifth Normal Form

A table is in fifth normal form (5NF) or Project-Join Normal Form (PJNF) if it is in 4NF and it cannot have a lossless decomposition into any number of smaller tables.

DKNF

The relation is in DKNF when there can be no insertion or deletion anomalies in the database.

2 comments:

  1. Nice blog. Very need full information regarding normalization. Regarding goals, advantages and types.

    Information about First Normalization, Second Normalization and third Normalization is very informative.

    Thanks for spending your time.

    ReplyDelete

What is Windows Presentation Foundation

The Windows Presentation Foundation is Microsoft’s next generation UI framework to create applications with a rich user experience. It is ...