L ESSON : 2BObjectives In this section, you will learn to: Describe the Top-down and Bottom-up approach Describe data redundancy Describe the first, second, and third normal forms Descri
Trang 1L ESSON : 2B
Objectives
In this section, you will learn to:
Describe the Top-down and Bottom-up approach
Describe data redundancy
Describe the first, second, and third normal forms
Describe the Boyce-Codd Normal Form (BCNF)
Appreciate the need for denormalization
Trang 3Normalizing and Denormalizing Data Lesson 2B / Slide 1 of 18
©NIIT
Normalizing and Denormalizing Data
Objectives
In this section, you will learn to:
• Describe the Top-down and Bottom-up approach
• Describe data redundancy
• Describe the first, second, and third normal forms
• Describe the Boyce-Codd Normal Form (BCNF)
• Appreciate the need for denormalization
I NSTRUCTOR N OTES
Lesson Overview
The lesson introduces the top-down and bottom-up approaches of logical database design This lesson also explains normalization as a technique to avoid data redundancy and covers the various normal forms In addition, this lesson explains denormalization as a technique for improving query performance
Trang 4Normalizing and Denormalizing Data Lesson 2B / Slide 2 of 18
©NIIT
Normalizing and Denormalizing Data
Pre-assessment Questions
1 The scenario where a student can do only one project and no other student can do
the same project, the relationship between student and project is a
2 Which of the following options is true?
a The primary key of the supertype is the primary key of the subtype.
b The foreign key of the supertype is the primary key of the subtype.
c The primary key of the supertype is the foreign key of the subtype.
d The foreign key of the supertype is the foreign key of the subtype.
Trang 5Normalizing and Denormalizing Data Lesson 2B / Slide 3 of 18
©NIIT
Normalizing and Denormalizing Data
Pre-assessment Questions (Contd )
3 A candidate key that does not become a primary key is called a(n) key.
a Candidate key
b Foreign key
c Alternate key
d Composite key
4 Which of the following problems arise when a primary key is allowed NULL values?
a It becomes difficult to identify the rows uniquely.
b It becomes difficult to identify the columns uniquely.
c It becomes difficult to join tables.
d It becomes difficult to identify foreign key.
5 In , every higher-level entity must also be a lower-level entity.
a Generalization
b E/R diagram
c Specialization
d Many-to-Many relationship
Trang 6Normalizing and Denormalizing Data Lesson 2B / Slide 4 of 18
©NIIT
Normalizing and Denormalizing Data
Solutions
Ans1 One-to-One
Ans2 The primary key of the supertype is the foreign key of the subtype.
Ans3 Alternate key
Ans4 It becomes difficult to identify the rows uniquely.
Ans5 Generalization
Trang 7N ORMALIZATION
Top-Down and Bottom-Up Approach
Normalizing and Denormalizing Data Lesson 2B / Slide 5 of 18
©NIIT
Normalizing and Denormalizing Data
Top-Down and Bottom-Up Approach
• There are two approaches to logical database design:
• The top-down approach
• The bottom-up approach
• The E/R modeling technique is the top-down approach It involves identifying entities, relationships and attributes, drawing the E/R diagram, and mapping the diagram to tables
• Normalization is the bottom-up approach It is a step-by-step decomposition of complex records into simple records.
• Normalization reduces redundancy using the principle of non-loss decomposition.
• Non-loss decomposition is the reduction of a table to smaller tables without any loss of information.
• The bottom-up approach is best for validation of existing designs
In the previous sessions, we described logical database design using the relationship diagramming technique There are two approaches to logical database design:
entity-The top-down approach
The bottom-up approach
Trang 8The E/R modeling technique is the top-down approach It involves identifying entities, relationships and attributes, drawing the E/R diagram, and mapping the diagram to tables
In this session, we will explain Normalization, which is the bottom-up approach
Normalization is a step-by-step decomposition of complex records into simple records Normalization reduces redundancy using the principle of non-loss decomposition Non-loss decomposition is the reduction of a table to smaller tables without any loss of information
Very often, the process of normalization follows the process of drawing E/R diagrams However, depending on how detailed and precise the E/R diagram is, the process of normalization may not be necessary at all The tables derived from the E/R diagram may already be normalized In fact, they will always be at least in the first normal form
Persons strictly following the bottom-up approach do not go through the E/R modeling process at all After the collection of data to be stored in the database is complete, the data is normalized The top-down approach is best for validation of existing designs
Data Redundancy
Normalizing and Denormalizing Data
Data Redundancy
• Redundancy means repetition of data.
• Redundancy increases the time involved in updating, adding, and deleting data.
• Redundancy also increases the utilization of disk space, and hence, disk I/O increases.
• Redundancy can lead to:
• Update anomalies—Inserting, modifying, and deleting data may cause inconsistencies
• Inconsistencies—Errors are more likely to occur when facts are repeated
• Unnecessary utilization of extra disk space
Trang 9Redundancy means repetition of data Redundancy increases the time involved in
updating, adding, and deleting data It also increases the utilization of disk space and hence, disk I/O increases
For example, consider the structure of the Student table:
Student
StudentId StudentName StudentBirthdate StudentAddress StudentCity StudentZip StudentClass StudentSemester StudentTest1 StudentTest2
The sample data for the Student table would be:
The details of the students along with the marks are present in one table called
Student The details of the students like StudentId, StudentName, and
StudentAddress are repeated while recording marks of different semesters The
repeated data is redundant In addition, if you need to modify the address of a
student, it has to be modified in multiple rows for that student If not done, it could lead to data inconsistency across rows
Trang 10If there are one thousand students and the details for each student occupies two
hundred bytes, then two hundred thousand bytes are repeated Hence, a lot of disk space is used up unnecessarily
Redundancy can, therefore, lead to:
Update anomalies—Inserting, modifying, and deleting data may cause
inconsistencies
Inconsistencies—Errors are more likely to occur when facts are repeated
Unnecessary utilization of extra disk space
You can use your experience and common sense to design a database However, you
can use systematic approaches like normalization to reduce redundancy and duplicity.
Need for Normalization
Normalizing and Denormalizing Data Lesson 2B / Slide 7 of 18
©NIIT
Normalizing and Denormalizing Data
Need for Normalization
• Normalization is a scientific method of breaking down complex table structures into simple table structures by using certain rules.
• Using normalization, you can reduce redundancy in a table and eliminate the problems of inconsistency and disk space usage.
• You can also ensure that there is no loss of information
• Normalization has several benefits as follows:
• It enables faster sorting and index creation.
• It helps to create more clustered indexes.
• It requires few indexes per table.
• It reduces the number of NULL values in a table.
• It makes the database compact.
Trang 11Normalizing and Denormalizing Data Lesson 2B / Slide 8 of 18
©NIIT
Normalizing and Denormalizing Data
Need for Normalization (Contd )
• The performance of an application is directly linked to the database design
• Some rules that should be followed to achieve a good database design are:
• Each table should have an identifier.
• Each table should store data for a single type of entity.
• Columns that accept NULLs should be avoided.
• The repetition of values or columns should be avoided.
Normalization is a scientific method of breaking down complex table structures into simple table structures by using certain rules Using this method, you can, reduce redundancy in a table and eliminate the problems of inconsistency and disk space usage You can also ensure that there is no loss of information
Normalization has several benefits It enables faster sorting and index creation, more clustered indexes, few indexes per table, few NULLs, and makes the database compact Normalization helps to simplify the structure of tables The performance of
an application is directly linked to the database design A poor design hinders the performance of the system The logical design of the database lays the foundation for
an optimal database
Some rules that should be followed to achieve a good database design are:
Each table should have an identifier
Each table should store data for a single type of entity
Columns that accept NULLs should be avoided
The repetition of values or columns should be avoided
Trang 12I NSTRUCTOR N OTES
Normalization
First, ask the students the following question to elicit their understanding about the approaches to logical database designing:
What are the various database design approaches that you can think of?
You can give the following additional information about non-loss decomposition:
Non-loss decomposition: Non-loss decomposition ensures that a join produces an
exact copy of the original table A decomposition is not non-loss if a join produces a superset of the original table and the rows in the table cannot be uniquely identified You can give the following additional information about data redundancy:
Data redundancy results in wastage of space and loss of data integrity in the
database Data redundancy leads to three types of anomalies These are:
1 Update anomaly: This is data inconsistency resulting from data redundancy and partial update
2 Deletion anomaly: This is the unintended loss of data due to deletion of other data
3 Insertion anomaly: This is the inability to add data to the database due to
absence of other data
Trang 13D IFFERENT N ORMAL F ORMS A ND
• A table structure is always in a certain normal form.
• The most important and widely used normal forms are:
• First Normal Form (1NF)
• Second Normal Form (2 NF)
• Third Normal Form (3 NF)
• Boyce-Codd Normal Form (BCNF)
Normalization results in the formation of tables that satisfy certain specified rules and
represent certain normal forms The normal forms are used to ensure that various
types of anomalies and inconsistencies are not introduced in the database A table structure is always in a certain normal form Several normal forms have been identified The most important and widely used normal forms are:
First Normal Form (1NF)
Second Normal Form (2 NF)
Third Normal Form (3 NF)
Trang 14Boyce-Codd Normal Form (BCNF)
be in second normal form or third normal form
Trang 15• In a relation R, attribute A is functionally dependent on attribute B
if each value of A in R is associated with precisely one value of B.
• Attribute B is called the determinant.
• All attributes of a table must be functionally dependent on the key
However, functional dependency does not require an attribute to
be the key in order to functionally determine other attributes.
• Functional dependency can also be defined as follows:
Given a relation R, attribute A is functionally dependent on B only
if whenever two tuples of R agree on their B value, they must agree on their A value.
• Functional dependencies represent many-to-one relationships.
The normalization theory is based on the fundamental notion of functional
dependency First, let’s examine the concept of functional dependency.
Given a relation (you may recall that a table is also called a relation) R, attribute A is functionally dependent on attribute B if each value of A in R is associated with precisely one value of B
In other words, attribute A is functionally dependent on B if and only if, for each value
of B, there is exactly one value of A
Attribute B is called the determinant.
Trang 16Consider the following table Employee:
Employee
Given a particular value of Code, there is precisely one corresponding value for
Name For example, for Code E1 there is exactly one value of Name, Mac Hence, Name is functionally dependent on Code Similarly, there is exactly one value of City
for each value of Code Hence, the attribute City is functionally dependent on the
attribute Code The attribute Code is the determinant You can also say that Code
determines City and Name.
Now that we know something about functional dependencies, let us redefine the
concept of keys in terms of functional dependencies In the above example of the
entity EMPLOYEE, the attribute code will be unique in every tuple Hence, it is a
candidate key All attributes must be functionally dependent on the key
However, functional dependency does not require an attribute to be the key in order
to functionally determine other attributes The following example explains this
Suppose you need to store information about scores of students for distance education programs The attributes that you need to store are:
ID : The identity codes of the students to whom the scorecard is sent
CITY : The city to which the scorecard is sent
C_CODE : The course code for which the scorecard is sent
SCORE : The total score of the student
Note that the city to which the scorecard is sent is also the city where the student is located Therefore, ID functionally determines CITY But ID is not a candidate key The candidate key in this case, will be a combination of ID and C_CODE Attributes ID and C_CODE are the foreign keys that reference the tables that store the customer and product information respectively Therefore, even though ID is not a candidate key, it still functionally determines another attribute (CITY)
Trang 17Assume that the information about scores was stored in a table named SCORES_INFO
as shown in the following figure Notice that for a particular value of ID, the value of CITY is the same in every tuple
This constraint must be observed in the database
Also, note that functional dependencies represent many-to-one relationships This functional dependency (ID determines CITY) also means that there are many students located in a city, but one student is located in only one city
We will soon see that the concepts of normalization lead to very simple means of declaring such functional dependencies
Trang 18First Normal Form (1 NF)
Normalizing and Denormalizing Data Lesson 2B / Slide 11 of 18
©NIIT
Normalizing and Denormalizing Data
First Normal Form (1 NF)
• A table is said to be in the 1 NF when each cell of the table contains precisely one value
A table is said to be in the 1 NF when each cell of the table contains precisely one
value
Trang 19Consider the following table Project.
Project
P51 P20
90
101 60
P22
109 98
P27
NULL 72
The data in the table is not normalized because the cells in ProjCode and Hours have
more than one value
By applying the 1NF definition to the Project table, you arrive at the following table:
Project
The relational model does not permit tables that are unnormalized
Therefore, the tables obtained from the E/R diagram should at least be
in 1 NF