IMPROVING RELATIONAL SCHEMAS AND NORMALIZATION

 Relation

A functional dependency is a constraint between two attributes or two sets of attributes.  For any relation R, attribute B is functionally dependent on attribute A if, for every valid instance of A, that value of A uniquely determines the value of B.  The functional dependency of B on A is represented by an arrow, A -> B.  An attribute may be functionally dependent on two or more attributes, rather than on a single attribute.  The attribute on the left-hand side of the arrow in a functional dependency is called the determinant.  A candidate key is an attribute, or combination of attributes, that uniquely identifies a row in a relation.  The properties are:  Unique identification (which implies that each nonkey attribute is functionally dependent on that key) and nonredundacy (No attribute in the key can be deleted without destroying the property of unique identification).

THE BASIC NORMAL FORMS

A relation is in first normal form (1NF) if it contains no multivalued attributes, that is the value at the intersection of each row and column must be atomic.

A relation is in second normal form (2NF) if it is in the first normal form and every nonkey attribute is functionally dependent on the primary key.  Thus no nonkey attribute is functionally dependent on part (but not all) of the primary key.  A relation that is in the first normal form will be in the second normal form if:
1.    The primary key consists of one attribute.
2.    No nonkey attributes exist in the relation (thus all of the attributes in the relation are components of the primary key).
3.    Every nonkey attribute is functionally dependent on the full set of primary key attributes.

A partial functional dependency is a functional dependency in which one or more nonkey attributes are functionally dependent on part (but not all) of the primary key.

A relation is in third normal form (3NF) if it is in the second normal form and no transitive dependencies exist.  A transitive dependency in a relation is a functional dependency between two (or more) nonkey attributes.  Example:

SALES
     
Cust_ID
Name
Salesperson
Region
8023
Anderson
Smith
South
9167
Bancroft
Hicks
West
7924
Hobbs
Smith
South
6837
Tucker
Hernandez
East
8597
Eckersley
Hicks
West
7018
Arnold
Faulb
North
Insertion Anomaly:    A new salesperdson (Robertson) assigned to the North region cannot be entered until a customer has been assigned to that salesperson (Cust_ID is a primary key).
Deletion Anomaly:    If customer number 6837 is deleted from the table, we lose the information that salesperson Hernandez is assigned to the East region.
Modification Anomaly:    If salesperson Smith is reassigned to the East region, several rows must be changed to reflect that fact.
These anomalies arise as a result of transitive dependency, which can be removed by decomposing SALES into two relations.  Salesperson, which is the determinant in the transitive dependency in SALES, becomes the primary key in SPERSON.  Salesperson becomes a foreign key in SALES1.
SALES1
     
SPERSON
 
Cust_ID
Name
Salesperson
 
Salesperson
Region
8023
Anderson
Smith
 
Smith
South
9167
Bancroft
Hicks
 
Hicks
West
7924
Hobbs
Smith
 
Hernandez
East
6837
Tucker
Hernandez
 
 Faulb
North
8596
Eckersley
Hicks
     
7018
Arnold
Faulb
     
Another example of transitive Dependency.
SHIPMENT
     
Snum
Origin
Destination
Distance
409
Seattle
Denver
1,537
618
Chicago
Dallas
1,058
723
Boston
Atlanta
1,214
824
Denver
Los Angeles
1,150
629
Minneapolis
St. Louis
587
The primary key is Snum, so it is second normal form.  But there is a transitive dependency:  the Distance attribute is functionally dependent on the pair of nonkey attributes Origin and Destination.  We cab remove the transitive dependency in SHIPMENT by decomposing into two relations both in 3NF SHIPTO and DISTANCES.
SHIPTO
     
DISTANCES
   
Snum
Origin
Destination
 
Origin
Destination
Distance
409
Seattle
Denver
 
Seattle
Denver
1,537
618
Chicago
Dallas
 
Chicago
Dallas
1,058
723
Boston
Atlanta
 
Boston
Atlanta
1,214
824
Denver
Los Angeles
 
Denver
Los Angeles
1,150
629
Minneapolis
St. Louis
 
Minneapolis
St. Louis
587

For more information on Normalization, click here:   Normalization    or  Normalization   or   Normalization

Steps in Normalization

1.    First Normal form.  Any multivalued attributes have been removed, so there is a single value (possibly null) at the intersection of each row and column.
2.    Second normal form.    Any partial dependencies have been removed
3.    Third normal form.    Any transitive dependencies have been removed.
4.    Boyce/Codd normal form.    Any remaining anomalies that result from functional dependencies have been removed.
5.    Fourth normal form.    Any multivalued dependencies have been removed.
6.    Fifth normal form.    Any remaining anomalies have been removed.
Table with multiplied attributes
                      |--------------- Remove multivalued attributes
First normal form
                      |--------------- Remove partial dependencies
Second normal form
                      |--------------- Remove transitive dependencies
Third normal form
                      |--------------- Remove remaining anomalies resulting from functional dependencies
Boyce-Codd normal form
                      |--------------- Remove multivalued dependencies
Fourth normal form
                      |--------------- Remove remaining anomalies
Fifth normal form
 .1st Normal Form (1NF)

Definition: A table (relation) is in 1NF if

     1. There are no duplicated rows in the table.

     2. Each cell is single-valued (i.e., there are no repeating groups or arrays).

     3. Entries in a column (attribute, field) are of the same kind.

Note: The order of the rows is immaterial; the order of the columns is immaterial.

Note: The requirement that there be no duplicated rows in the table means that the table has a key (although the key might be
made up of more than one column—even, possibly, of all the columns).

2nd Normal Form (2NF)

Definition: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key.

Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite) key, the
definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial dependencies."

3rd Normal Form (3NF)

Definition: A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

Boyce-Codd Normal Form (BCNF)

Definition: A table is in BCNF if it is in 3NF and if every determinant is a candidate key.

4th Normal Form (4NF)

Definition: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.

5th Normal Form (5NF)

Definition: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if every join
dependency in the table is a consequence of the candidate keys of the table.



A relation is a named, two-dimensional table of data.  Each relation (or table) consists of a set of columns and an arbitrary number of unnamed rows.  An attribute is a named column of a relation.  Each row of a relation corresponds to a record that contains data (attribute) values for a single entity.
A well-structured relation contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies.  EMPLOYEE1 is such a relation.  Each row of the table contains data describing one employee, and any modification to an employee's data (such as a change in salary) is confined to one row of the table.:
EMPLOYEE1
     
Emp_ID
Name
Dept_Name
Salary
100
Margaret Simpson
Marketing
48,000
140
Allen Beeton
Accounting
52,000
110
Chris Lucero
Info Systems
43,00
190
Lorenzo Davis
Finance
55,000
150
Susan Martin
Marketing
42,000
Removing Multivalued Attributes from Tables.  Table below shows employee data from EMPLOYEE1 relation to include courses that may have been taken by these employees.
Emp_ID
Name
Dept_Name
Salary
Course_Title
Date_Completed
100
Margaret Simpson
Marketing
48,000
SPSS
6/19/199X
       
Surveys
10/7/199X
140
Allen Beeton
Accounting
52,000
Tax Acc
12/8/199X
110
Chris Lucero
Info Systems
43,000
SPSS
1/12/199X
 
     
C++
4/22/199X
190
Lorenzo Davis
Finance
55,000
   
150
Susan Martin
Marketing
42,000
SPSS
6/16/199X
       
Java
8/12/199X
Since a given employee may have taken more than oe course, the attributes Course_Title and Date_Completed are multivalued attributes.  If an employee has not taken any course, these attribute values are null.  One way to eliminate multivalued attributes is by filling the relevant data values into the vacant cells to have only single-valued attributes and is atomic, However, EMPLOYEE2 is not a well structured relation, as there is redundancy.  Values of Emp_ID, Name, Dept_Name, and Salary appear in two separate rows for employees 100, 110, and 150.  So, if the salary for employee 100 changes, we must record this fact in two or more rows.  Redundancies in a table may result in errors or inconsistencies (called anomalies) when a user attempts to update the data in the table.  Three types of anomalies are possible: insertion, deletion, and modification
EMPLOYEE2
         
Emp_ID
Name
Dept_Name
Salary
Course_Title
Date_Completed
100
Margaret Simpson
Marketing
48,000
SPSS
6/19/199X
100
Margaret Simpson
Marketing
48,000
Surveys
10/7/199X
140
Allen Beeton
Accounting
52,000
Tax Acc
12/8/199X
110
Chris Lucero
Info Systems
43,000
SPSS
1/12/199X
110
Chris Lucero
Info Systems
43,000
C++
4/22/199X
190
Lorenzo Davis
Finance
55,000
   
150
Susan Martin
Marketing
42,000
SPSS
6/19/199X
150
Susan Martin
Marketing
42,000
Java
8/12/199X
Insertion Anomaly:    We need to add a new employee to EWMPLOYEE2.  The two primary keys are Emp_ID and Course_Title, and this is an anomaly since the user should be able to enter employee data without supplying course data.
Deletion Anomaly:    If employee 140 is deleted, the fact that he took a course in Tax accounting is lost.
Modification Anomaly:    If there is a salary increase for employee 100 or 110 or 150, the fact must be recorded in two rows.
Normalization theory is used to divide EMPLOYEE2 into two relations, one is EMPLOYEE! and the other is EMP_COURSE shown below:
EMP_COURSE
   
Emp_ID
Course_Title
Date_Completed
100
SPSS
6/19/199X
100
Surveys
10/7/199X
140
Tax Acc
12/8/199X
110
SPSS
1/12/199X
110
C++
4/22/199X
150
SPSS
6/19/199X
150
Java
8/12/199X
The primary key of EMP_COURSE is the combination of Emp_ID and Course_Title.  These tables are free of anomalies and thus are well-structured.
PROPERTY_INSPECTION
           
Property_No.
P_Address
IDate
ITime
Comments
Staff_No
SName
Car_Reg
PG4
6 Lawrence St., Glasgow
18 Oct.96
22 Apr 97
1 Oct 98
10:00
9:00
12:00
Need crockery
In good order
Bathroom
SG37
SG17
SG17
Beech
Ford
Ford
NZ31JGR
NS33HDR
HTZ1HFR
PG16
5 Haizer Dr., Glasgow
22 Apr 96
24 Oct 97
13:00
14:00
Replace living
Good Condition
SG17
SG37
Ford
Beech
NS33HDR
HTZ1HFR
There are multivalued Attributes, hence not in compliance with 1NF.  Remove multivalued attributes.
PROPERTY_INSPECTION
           
Propety_No
IDate
ITime
P_Address
Comments
Staff_No
SName
Car_Reg
PG4
18 Oct 96
10:00
6 Lawrence St. Glasgow
Need Crockery
SG37
Beech
NZ31JGR
PG4
22 Apr 97
9:00
6 Lawrence St Glasgow
In good order
SG17
Ford
NS33HDR
PG4
1 Oct 98
12:00
6 Lawrence St Glasgow
Bathroom
SG17
Ford
HTZ1HFR
PG16
22 Apr 96
13:00
5 Haizier Dr Glasgow
Replace living
SG17
Ford
NS33HDR
PG16
24 Oct 97
14:00
5 Haizier Dr. Glasgow
Good Condition
SG37
Beech
HTZ1HFR
There are no multivalued attributes, compliance with 1NF.   Property_No and IDate are both primary keys since the same property cannot be visited twice on the same date. But there is redundancy which cause insertion, deletion, and modification anomalies.  This can be decomposed into two tables, PROP and PROP_INSPECTION.
PROP
     
PROP_INSPECTION  
                 
Property_No
P_Address
 
Property_No
IDate
ITime
Comments
Staff_No SName Car_Reg
PG4
6 Lawrence ST Glasgow  
PG4
18 Oct 96
10:00
Need Crockery
SG37
Beech
NZ31JGR
PG16
5 Haizier Dr Glasgow
 
PG4
22 Apr 97
9:00
In good order
SG17
Ford
NS33HDR
 
   
PG4
1 Oct 98
12:00
Bathroom
SG17
Ford
HTZ1HFR
     
PG16
22 Apr 96
13:00
Replace Living
SG17
Ford
NS33HDR
     
PG6
24 Oct 97
14:00
Good Condition
SG37
Beech
HTZ1HFR