NORMALIZING AND DENORMALIZING DATA pot

L ESSON : 2BObjectives In this section, you will learn to: Describe the Top-down and Bottom-up approach Describe data redundancy Describe the first, second, and third normal forms Descri

Trang 1

L ESSON : 2B

Objectives

In this section, you will learn to:

Describe the Top-down and Bottom-up approach

Describe data redundancy

Describe the first, second, and third normal forms

Describe the Boyce-Codd Normal Form (BCNF)

Appreciate the need for denormalization

Trang 3

Normalizing and Denormalizing Data Lesson 2B / Slide 1 of 18

©NIIT

Normalizing and Denormalizing Data

Objectives

In this section, you will learn to:

• Describe the Top-down and Bottom-up approach

• Describe data redundancy

• Describe the first, second, and third normal forms

• Describe the Boyce-Codd Normal Form (BCNF)

• Appreciate the need for denormalization

I NSTRUCTOR N OTES

Lesson Overview

The lesson introduces the top-down and bottom-up approaches of logical database design This lesson also explains normalization as a technique to avoid data redundancy and covers the various normal forms In addition, this lesson explains denormalization as a technique for improving query performance

Trang 4

©NIIT

Pre-assessment Questions

1 The scenario where a student can do only one project and no other student can do

the same project, the relationship between student and project is a

2 Which of the following options is true?

a The primary key of the supertype is the primary key of the subtype.

b The foreign key of the supertype is the primary key of the subtype.

c The primary key of the supertype is the foreign key of the subtype.

d The foreign key of the supertype is the foreign key of the subtype.

Trang 5

©NIIT

Pre-assessment Questions (Contd )

3 A candidate key that does not become a primary key is called a(n) key.

a Candidate key

b Foreign key

c Alternate key

d Composite key

4 Which of the following problems arise when a primary key is allowed NULL values?

a It becomes difficult to identify the rows uniquely.

b It becomes difficult to identify the columns uniquely.

c It becomes difficult to join tables.

d It becomes difficult to identify foreign key.

5 In , every higher-level entity must also be a lower-level entity.

a Generalization

b E/R diagram

c Specialization

d Many-to-Many relationship

Trang 6

©NIIT

Solutions

Ans1 One-to-One

Ans2 The primary key of the supertype is the foreign key of the subtype.

Ans3 Alternate key

Ans4 It becomes difficult to identify the rows uniquely.

Ans5 Generalization

Trang 7

N ORMALIZATION

Top-Down and Bottom-Up Approach

©NIIT

Top-Down and Bottom-Up Approach

• There are two approaches to logical database design:

• The top-down approach

• The bottom-up approach

• The E/R modeling technique is the top-down approach It involves identifying entities, relationships and attributes, drawing the E/R diagram, and mapping the diagram to tables

• Normalization is the bottom-up approach It is a step-by-step decomposition of complex records into simple records.

• Normalization reduces redundancy using the principle of non-loss decomposition.

• Non-loss decomposition is the reduction of a table to smaller tables without any loss of information.

• The bottom-up approach is best for validation of existing designs

In the previous sessions, we described logical database design using the relationship diagramming technique There are two approaches to logical database design:

entity-The top-down approach

The bottom-up approach

Trang 8

The E/R modeling technique is the top-down approach It involves identifying entities, relationships and attributes, drawing the E/R diagram, and mapping the diagram to tables

In this session, we will explain Normalization, which is the bottom-up approach

Normalization is a step-by-step decomposition of complex records into simple records Normalization reduces redundancy using the principle of non-loss decomposition Non-loss decomposition is the reduction of a table to smaller tables without any loss of information

Very often, the process of normalization follows the process of drawing E/R diagrams However, depending on how detailed and precise the E/R diagram is, the process of normalization may not be necessary at all The tables derived from the E/R diagram may already be normalized In fact, they will always be at least in the first normal form

Persons strictly following the bottom-up approach do not go through the E/R modeling process at all After the collection of data to be stored in the database is complete, the data is normalized The top-down approach is best for validation of existing designs

Data Redundancy

Data Redundancy

• Redundancy means repetition of data.

• Redundancy increases the time involved in updating, adding, and deleting data.

• Redundancy also increases the utilization of disk space, and hence, disk I/O increases.

• Redundancy can lead to:

• Update anomalies—Inserting, modifying, and deleting data may cause inconsistencies

• Inconsistencies—Errors are more likely to occur when facts are repeated

• Unnecessary utilization of extra disk space

Trang 9

Redundancy means repetition of data Redundancy increases the time involved in

updating, adding, and deleting data It also increases the utilization of disk space and hence, disk I/O increases

For example, consider the structure of the Student table:

Student

StudentId StudentName StudentBirthdate StudentAddress StudentCity StudentZip StudentClass StudentSemester StudentTest1 StudentTest2

The sample data for the Student table would be:

The details of the students along with the marks are present in one table called

Student The details of the students like StudentId, StudentName, and

StudentAddress are repeated while recording marks of different semesters The

repeated data is redundant In addition, if you need to modify the address of a

student, it has to be modified in multiple rows for that student If not done, it could lead to data inconsistency across rows

Trang 10

If there are one thousand students and the details for each student occupies two

hundred bytes, then two hundred thousand bytes are repeated Hence, a lot of disk space is used up unnecessarily

Redundancy can, therefore, lead to:

Update anomalies—Inserting, modifying, and deleting data may cause

inconsistencies

Inconsistencies—Errors are more likely to occur when facts are repeated

Unnecessary utilization of extra disk space

You can use your experience and common sense to design a database However, you

can use systematic approaches like normalization to reduce redundancy and duplicity.

Need for Normalization

©NIIT

Need for Normalization

• Normalization is a scientific method of breaking down complex table structures into simple table structures by using certain rules.

• Using normalization, you can reduce redundancy in a table and eliminate the problems of inconsistency and disk space usage.

• You can also ensure that there is no loss of information

• Normalization has several benefits as follows:

• It enables faster sorting and index creation.

• It helps to create more clustered indexes.

• It requires few indexes per table.

• It reduces the number of NULL values in a table.

• It makes the database compact.

Trang 11

©NIIT

Need for Normalization (Contd )

• The performance of an application is directly linked to the database design

• Some rules that should be followed to achieve a good database design are:

• Each table should have an identifier.

• Each table should store data for a single type of entity.

• Columns that accept NULLs should be avoided.

• The repetition of values or columns should be avoided.

Normalization is a scientific method of breaking down complex table structures into simple table structures by using certain rules Using this method, you can, reduce redundancy in a table and eliminate the problems of inconsistency and disk space usage You can also ensure that there is no loss of information

Normalization has several benefits It enables faster sorting and index creation, more clustered indexes, few indexes per table, few NULLs, and makes the database compact Normalization helps to simplify the structure of tables The performance of

an application is directly linked to the database design A poor design hinders the performance of the system The logical design of the database lays the foundation for

an optimal database

Some rules that should be followed to achieve a good database design are:

Each table should have an identifier

Each table should store data for a single type of entity

Columns that accept NULLs should be avoided

The repetition of values or columns should be avoided

Trang 12

I NSTRUCTOR N OTES

Normalization

First, ask the students the following question to elicit their understanding about the approaches to logical database designing:

What are the various database design approaches that you can think of?

You can give the following additional information about non-loss decomposition:

Non-loss decomposition: Non-loss decomposition ensures that a join produces an

exact copy of the original table A decomposition is not non-loss if a join produces a superset of the original table and the rows in the table cannot be uniquely identified You can give the following additional information about data redundancy:

Data redundancy results in wastage of space and loss of data integrity in the

database Data redundancy leads to three types of anomalies These are:

1 Update anomaly: This is data inconsistency resulting from data redundancy and partial update

2 Deletion anomaly: This is the unintended loss of data due to deletion of other data

3 Insertion anomaly: This is the inability to add data to the database due to

absence of other data

Trang 13

D IFFERENT N ORMAL F ORMS A ND

• A table structure is always in a certain normal form.

• The most important and widely used normal forms are:

• First Normal Form (1NF)

• Second Normal Form (2 NF)

• Third Normal Form (3 NF)

• Boyce-Codd Normal Form (BCNF)

Normalization results in the formation of tables that satisfy certain specified rules and

represent certain normal forms The normal forms are used to ensure that various

types of anomalies and inconsistencies are not introduced in the database A table structure is always in a certain normal form Several normal forms have been identified The most important and widely used normal forms are:

First Normal Form (1NF)

Second Normal Form (2 NF)

Third Normal Form (3 NF)

Trang 14

Boyce-Codd Normal Form (BCNF)

be in second normal form or third normal form

Trang 15

• In a relation R, attribute A is functionally dependent on attribute B

if each value of A in R is associated with precisely one value of B.

• Attribute B is called the determinant.

• All attributes of a table must be functionally dependent on the key

However, functional dependency does not require an attribute to

be the key in order to functionally determine other attributes.

• Functional dependency can also be defined as follows:

Given a relation R, attribute A is functionally dependent on B only

if whenever two tuples of R agree on their B value, they must agree on their A value.

• Functional dependencies represent many-to-one relationships.

The normalization theory is based on the fundamental notion of functional

dependency First, let’s examine the concept of functional dependency.

Given a relation (you may recall that a table is also called a relation) R, attribute A is functionally dependent on attribute B if each value of A in R is associated with precisely one value of B

In other words, attribute A is functionally dependent on B if and only if, for each value

of B, there is exactly one value of A

Attribute B is called the determinant.

Trang 16

Consider the following table Employee:

Employee

Given a particular value of Code, there is precisely one corresponding value for

Name For example, for Code E1 there is exactly one value of Name, Mac Hence, Name is functionally dependent on Code Similarly, there is exactly one value of City

for each value of Code Hence, the attribute City is functionally dependent on the

attribute Code The attribute Code is the determinant You can also say that Code

determines City and Name.

Now that we know something about functional dependencies, let us redefine the

concept of keys in terms of functional dependencies In the above example of the

entity EMPLOYEE, the attribute code will be unique in every tuple Hence, it is a

candidate key All attributes must be functionally dependent on the key

However, functional dependency does not require an attribute to be the key in order

to functionally determine other attributes The following example explains this

Suppose you need to store information about scores of students for distance education programs The attributes that you need to store are:

ID : The identity codes of the students to whom the scorecard is sent

CITY : The city to which the scorecard is sent

C_CODE : The course code for which the scorecard is sent

SCORE : The total score of the student

Note that the city to which the scorecard is sent is also the city where the student is located Therefore, ID functionally determines CITY But ID is not a candidate key The candidate key in this case, will be a combination of ID and C_CODE Attributes ID and C_CODE are the foreign keys that reference the tables that store the customer and product information respectively Therefore, even though ID is not a candidate key, it still functionally determines another attribute (CITY)

Trang 17

Assume that the information about scores was stored in a table named SCORES_INFO

as shown in the following figure Notice that for a particular value of ID, the value of CITY is the same in every tuple

This constraint must be observed in the database

Also, note that functional dependencies represent many-to-one relationships This functional dependency (ID determines CITY) also means that there are many students located in a city, but one student is located in only one city

We will soon see that the concepts of normalization lead to very simple means of declaring such functional dependencies

Trang 18

First Normal Form (1 NF)

©NIIT

First Normal Form (1 NF)

• A table is said to be in the 1 NF when each cell of the table contains precisely one value

A table is said to be in the 1 NF when each cell of the table contains precisely one

value

Trang 19

Consider the following table Project.

Project

P51 P20

90

101 60

P22

109 98

P27

NULL 72

The data in the table is not normalized because the cells in ProjCode and Hours have

more than one value

By applying the 1NF definition to the Project table, you arrive at the following table:

Project

The relational model does not permit tables that are unnormalized

Therefore, the tables obtained from the E/R diagram should at least be

in 1 NF

Định dạng
Số trang	36
Dung lượng	289,46 KB