Sunday, 24 July 2011

Database Normalization


Normalization is the process of efficiently organizing data in a database with eliminate redundant data (for example, storing the same data in more than one table) and  ensure data dependencies make sense (for example, only storing related data in a table).

Benefits of Normalization
·         Less storage space
·         Quicker updates
·         Less data inconsistency
·         Clearer data relationships
·         Easier to add data
·         Flexible Structure

Types of Normalization:

1NF, 2NF, 3NF, BCNF, 4NF, 5NF are examples of Normal forms.


Normalization

Normalization is the process of efficiently organizing data in a database with eliminate redundant data (for example, storing the same data in more than one table) and  ensure data dependencies make sense (for example, only storing related data in a table).

Benefits of Normalization
·         Less storage space
·         Quicker updates
·         Less data inconsistency
·         Clearer data relationships
·         Easier to add data
·         Flexible Structure

Types of Normalization:

1NF, 2NF, 3NF, BCNF, 4NF, 5NF are examples of Normal forms.

1NF (First Normal Form):

If all attribute values are atomic, ie, no repeating group, no composite attributes.

Table is not in 1NF

DPT_No
RG_NO
EMP_ID
EMP_NAME
D101
12345
20000
Peter


20001
Denny


20002
Parks
D102
13456
30000
Karlie


30001
James

Table in 1NF

DPT_No
 RG_NO
EMP_ID
EMP_NAME
D101
12345
20000
Peter
D101
12345
20001
Denny
D101
12345
20002
Parks
D102
13456
30000
Karlie
D102
13456
30001
James

All attribute values are atomic because there are no repeating group and no composite attributes.

2NF (Second Normal Form):

Second normal form (2NF) further addresses the concept of removing duplicative data:

A relation R is in 2NF if;

(i) R is 1NF , and
(ii) All non-prime attributes are fully dependent on the candidate keys. Which is creating relationships between these new tables and their predecessors through the use of foreign keys.

  • A prime attribute appears in a candidate key.
  • There is no partial dependency in 2NF.
Example: The following relation is in First Normal Form, but not Second Normal Form:
Order #
Customer
Contact Person
Total
1
Acme Widgets
John Doe
$134.23
2
ABC Corporation
Fred Flintstone
$521.24
3
Acme Widgets
John Doe
$1042.42
4
Acme Widgets
John Doe
$928.53
In the table above, the order number serves as the primary key.  Notice that  the customer and total amount are dependent upon the order number -- this data is specific to each order.  However, the contact person is dependent upon the customer.  An alternative way to accomplish this would be to create two tables:
Customer
Contact Person
Acme Widgets
John Doe
ABC Corporation
Fred Flintstone

Order #
Customer
Total
1
Acme Widgets
$134.23
2
ABC Corporation
$521.24
3
Acme Widgets
$1042.42
4
Acme Widgets
$928.53

The creation of two separate tables eliminates the dependency problem experienced in the previous case.  In the first table, contact person is dependent upon the primary key -- customer name.  The second table only includes the information unique to each order.  Someone interested in the contact person for each order could obtain this information by performing a JOIN operation.


3NF (Third Normal Form):
An entity is in Third Normal Form (3NF) when it meets the requirement of being in Second Normal Form (2NF) and additionally:
  • Functional dependencies on non-key fields are eliminated by putting them in a separate table. At this level, all non-key fields are dependent on the primary key.
A row is in third normal form if and only if it is in second normal form and if attributes that do not contribute to a description of the primary key are move into a separate table. An example is creating look-up tables.
Examples: The following table is NOT in Third Normal Form:
Company
City
State
ZIP
Acme Widgets
New York
NY
10169
ABC Corporation
Miami
FL
33196
XYZ, Inc.
Columbia
MD
21046
In this example, the city and state are dependent upon the ZIP code.  To place this table in 3NF, two separate tables would be created -- one containing the company name and ZIP code and the other containing city, state, ZIP code pairings.
Table 1:
Company
ZIP
Acme Widgets
10169
ABC Corporation
33196
XYZ, Inc.
21046
Table 2:
City
State
ZIP
New York
NY
10169
Miami
FL
33196
Columbia
MD
21046
This may seem overly complex for daily applications and indeed it may be.  Database designers should always keep in mind the tradeoffs between higher level normal forms and the resource issues that complexity creates.
More Details: 
BCNF  (Boyce-Codd Normal Form)
A table is in Boyce-Codd normal form (BCNF) if and only if it is in 3NF and every determinant is a candidate key. 
Example BCNF:-
Take the following table:

Campus
Course
Class
Time
Room/Bldg
East

English 101

1

8:00-9:00


212 AYE


East


English 101


2


10:00-11:00


305 RFK


West


English 101


3


8:00-9:00


102 PPR


Note : no two buildings on any of the university campuses have the same name, thus ROOM/BLDG----->CAMPUS. As the determinant is not a candidate key this table is NOT in Boyce-Codd normal form.
This table should be decomposed into the following relations:
R1(course, class, room/bldg, time)
Course
Class
Time
Room/Bldg
English 101

1


8:00-9:00


212 AYE

English 101

2


10:00-11:00


305 RFK

English 101

3


8:00-9:00


102 PPR


R2(room/bldg, campus)
campus
room/bldg

East


212 AYE


East


305 RFK


West


102 PPR


 

4NF (Fourth Normal Form):

Fourth normal form requires that a table be BCNF and contain no multi-valued dependencies. 4NF solves the problem of multivalued dependencies, which we would encounter with our above Sales table when a customer purchased multiple products with a single order - we would have numerous rows that were dependent on one another without any clear definition of that dependency in the table.

5NF (Fifth Normal Form):

Fifth normal form, also known as join-projection normal form (JPNF), states that no non-trivial join dependencies exist. 5NF states that any fact should be able to be reconstructed without any anomalous results in any case, regardless of the number of tables being joined. A 5NF table should have only candidate keys and it's primary key should consist of only a single column.




Thanks

No comments:

Post a Comment