Module 5: Normalizing the Logical Data Design

Module 1: Course OverviewModule 4: Deriving a Logical Data Design Module 5: Normalizing the Logical Data Design Module 3: Using a Conceptual Design for Data Requirements ImplementingEnti

Trang 1

Data Design

Trang 2

Module 1: Course Overview

Module 4: Deriving a Logical Data Design

Module 5: Normalizing the Logical Data Design

Module 3: Using a Conceptual Design for Data Requirements

ImplementingEntityRelationships

Activity 5.1: Identifying Keys in the Logical Model

NormalizationBasic

Activity 5.2: Deriving the

Third Normal Form for a

Logical Data Model

Module 5: Normalizing the Logical Data Design

Trang 3

After completing this module, you will be able to:

" Use primary and foreign keys to implement relationships between entities

" Explain the benefits of normalizing entities

" Normalize a table to third normal form

Trang 4

! Implementing Entity Relationships

" Using Keys

" Primary Keys

" Foreign Keys

" Examples of Primary and Foreign Keys

" Activity 5.1: Identifying Keys in the Logical Model

In this section

In this section, you will learn how entities are uniquely identified within the logical data model with keys You will also learn how keys are used to implement relationships between entities

Slide Objective

To introduce the topics in

this section

Lead-in

In this section, you will learn

about primary and foreign

keys and how to use them

to implement relationships

Trang 5

Using Keys

" Benefits

Keys are identifying values assigned to each instance of an entity within a data model As you learned in Module 4, “Deriving a Logical Data Design,” entities within a data model represent a grouping of information about people, places, objects, or ideas When you begin to move a logical design to a physical design, you use keys to uniquely identify each instance of an entity within the data model

Keys also provide the mechanism for tying entities together In your physical database design, you represent relationships between entities by adding the keys from parent entity tables to child entity tables so that the entities are bound together by the common key value

Slide Objective

To introduce the concept of

keys within the physical data

model

Lead-in

Keys are the mechanism for

implementing relationships

Trang 6

Primary Keys

" Identifying attributes (existing or new) for entities

" Unique values are necessary

" Types of primary keys

A primary key is any field that represents the uniqueness of all an entity’s instances The primary key value must be unique across all instances of an entity; otherwise, the key value cannot identify any one instance of the entity The uniqueness of the values in a primary key allows each instance of an entity

to exist independently

An entity’s primary key can be an existing attribute within the entity Any identifying attribute that is guaranteed to be unique across all instances of an entity is a candidate for a primary key If the uniqueness of the existing attributes cannot be guaranteed, you can create an identifying key, usually a numeric value, and assign it primary key responsibilities

The following types of primary keys exist:

" Intelligent keys These keys are created from existing attributes and are related to entities as

both attributes and keys

" Surrogate keys These keys are identifier values with no specific relation to entities other

than to uniquely identify them

" Composite keys These keys include more than one value per entity When one attribute within an entity is not enough to uniquely identify each instance of the entity, you can select more than one attribute to be a composite key For example, each project within a project-tracking database might have a project number, and each project number might contain multiple jobs, each with its own identifying job number You might use recurring job numbers within each project to denote a specific phase of the project In this case, the primary key of the project table would be a composite key consisting of both the project number and the job number, because both numbers are necessary to uniquely identify any one job

Slide Objective

To introduce primary keys

and how they are defined

within the data model

Lead-in

Primary keys define

uniqueness in the relational

It might help to draw a

picture of a project table

with a composite key

Trang 7

Foreign Keys

" Values that establish relationships between parent and child entities

" Typically primary keys of parent entities

" Keys that contain same values as parent entities’

instances

" Possibly members of child entities’ primary keys

Foreign keys are values that exist in child entities and that establish relationships with the corresponding parent entities The parent entity can find all its related child entities by searching the child entities for those instances with the appropriate foreign key

A foreign key for a child entity is usually the primary key of the parent entity, because as the primary key uniquely defines the parent, the child must in turn

be associated with a unique instance of the parent

The foreign key for a child entity cannot serve as the child’s entire primary key because many instances of the child entity may be related to the same parent instance, and as a result, have the same foreign key value In many cases, however, the foreign key can serve as a member of a composite key for the child

Slide Objective

To introduce foreign key

concepts

Lead-in

Foreign keys are used as

the links between entities

that make relationships

Also remind them about

parent and child entities, as

well as the different types of

relationships

Trang 8

Examples of Primary and Foreign Keys

The slide shows a logical data model with primary and foreign keys, as well as entity relationships

The primary keys identified for each entity are surrogate keys As shown, each employee, client, timesheet, and invoice instance of an entity has its own unique identifier The EmployeeID primary key in the Employee entity is used

as a foreign key in both the Client and Timesheet entities Although the Timesheet entity has a primary key, the EmployeeID foreign key is responsible for linking the Timesheet and Employee entities together

Notice the existence between the Timesheet and EmployeeID entities A timesheet cannot exist unless an employee fills it out However, the dotted line between the Client and Employee entities indicates that although an employee and a client are related, neither depends on the other for its existence

Slide Objective

To show how primary and

foreign keys might be used

in a logical data model

Lead-in

The following slide shows

how primary and foreign

keys might be used in a

logical data model

Trang 9

Activity 5.1: Identifying Keys in the Logical Model

In this activity, you will identify primary, foreign, and (if necessary) composite keys in the logical data model for Ferguson and Bardell, Inc

After completing this activity, you will be able to:

" Identify primary, foreign, and composite keys in a logical data model

" Select a primary and foreign key type that is appropriate for a given entity

Slide Objective

To introduce this activity

Lead-in

In this activity, you will

identify keys in the

Ferguson and Bardell, Inc

logical data model

Delivery Tip

Make the students aware

that they are focusing on

keys in this activity They

are identifying keys to

reinforce their knowledge so

that they will be able to build

a normalized data model

Trang 10

! Normalization Basics

" Normalizing Logical Models

" Creating a First Normal Form Data Model

" First Normal Form Example

" Moving to a Second Normal Form Data Model

" Creating a Third Normal Form Data Model

" Third Normal Form Example

To explain the purpose of

this section and what

students will learn

Lead-in

In this section, you will learn

about normalization and its

benefits

Trang 11

Normalizing Logical Models

" Process of eliminating duplicate data, and usually, defining relationships among tables

" Normal forms

" Normalized databases typically include more tables with fewer columns

Normalization is the process of progressively refining a logical model to

eliminate duplicate data from a database Normalization usually involves dividing a database into two or more tables and defining relationships among these tables

Database theorists have evolved a de facto standard of increasingly restrictive constraints, or normal forms, on the layout of databases Applying these normal forms results in a normalized database These de facto standards have generated

at least five commonly accepted normal form levels, each progressively more restrictive on data duplication than the preceding one As discussed later in this

module, achieving the third normal form is the most common level of

normalization because it is a compromise between too little normalization and too much

Normalized databases typically include more tables with fewer columns

Normalizing a database accomplishes the following tasks:

" Minimized duplication of information

A normalized database contains less duplicate information For example, the Ferguson and Bardell, Inc database needs to track timesheet and invoice information Storing timesheet information in the Invoice table would eventually cause this table to include a large amount of redundant data, such

as employee, job, task, and client information Normalizing the Ferguson and Bardell, Inc database creates separate, related tables for timesheets and invoices, thus avoiding duplication

" Reduced data inconsistencies Normalization reduces data inconsistencies by maintaining table relationships For example, if a client’s phone number is stored in multiple tables, or even in multiple records within the same table, and the number changes, the phone number might not be changed in all locations A client’s phone number stored only once in one table is easier to maintain

logical model to eliminate

duplicate data from a

database

Trang 12

" Faster data modification (insertions, updates, and deletions) Normalization speeds up the data modification process in a database For example, removing client names and address information from the Invoice table results in less data to track and manipulate when working with an invoice The removed data is not lost because it still exists in the Client table Additionally, reducing duplicated information improves performance during updates because fewer values must be modified in the tables

Data that is overnormalized can cause performance issues in certain types of applications and can adversely affect read/write operations

Trang 13

Creating a First Normal Form Data Model

" Create two-dimensional tables

" Assign only one value to each cell

" Assign a single meaning to each column

The first step in normalizing a database is to ensure that the tables are in first normal form To accomplish this step, the tables must adhere to the following criteria:

" Tables must be two-dimensional (with columns and rows)

Entities specified in the logical data model are transformed into database tables represented in a two-dimensional table, similar to a spreadsheet

" Each cell must contain one value

" Each column must have a single meaning

For example, a dual-purpose column, such as Order Date/Delivery Date, is not allowed

Slide Objective

To introduce the process of

creating a first normal form

data design

Lead-in

Creating first normal form is

the first step in producing a

normalized data design

Delivery Tip

Remind the students that

they are not completely out

of logical design at this

point They are still moving

from logical data design to

physical data design This is

a step in that process

Trang 14

First Normal Form Example

As shown in the slide, the Timesheet entity for the Ferguson and Bardell, Inc case study was not originally in first normal form The original Timesheet entity had no unique identifier and not all of its attributes were singularly tied to one piece of information Also, each column in the Timesheet table could have multiple meanings

For example, the Employee attribute can be divided into two distinct attributes: Employee FirstName and Employee LastName This logic also applies to the Client and Job attributes, as shown in the slide

Slide Objective

To illustrate the concept of

first normal form

Lead-in

The Ferguson and Bardell,

Inc Timesheet entity was

not originally in first normal

form

Trang 15

Moving to a Second Normal Form Data Model

" Eliminate redundant data within an entity

" Move attribute that depends

on only part of

a multivalue key to a separate table

" Consolidate information when possible

To move a data design to second normal form, you must look at several instances of an entity and move any redundant data within an entity’s attributes

to a separate table

The slide shows the Timesheet entity within the Ferguson and Bardell, Inc case study If a client moves to a different city, the client database would have to be updated, as well as every timesheet that referenced that client The solution is to remove the client information from the Timesheet table and replace the

information in the Timesheet table with a ClientID foreign key that corresponds

to the ClientID primary key of the Clients table The Timesheet table can then find the client’s name and address via the foreign key relationship When a client’s address changes, the change is recorded only in the Client’s table Similarly, the second normal form would replace the employee information with an EmployeeID foreign key that would link the timesheet to the individual employee who enters information into the timesheet

The second normal form also eliminates any other duplicate information The JobName and JobDesc attributes in the slide have been replaced with a single attribute, JobDesc, because the job description would likely contain the job name This logic also applies to the TaskName and TaskDesc attributes

1 Eliminate redundant employee and client information by making foreign key references to existing Employee and Client entities

2 Move additional employee and client information out of the Timesheet table because this information is not needed to describe a timesheet

3 Consolidate job and task information This information remains in the timesheet table because the information contains attributes that are needed

to describe a timesheet

Slide Objective

To illustrate the concept of

second normal form

Lead-in

Second normal form is a

bridge process that

eventually leads to third

normal form

Trang 16

Creating a Third Normal Form Data Model

" Eliminate any columns that do not depend on a key value for their existence

" Generally, move any data not directly related to entity to another table

" Reduce or eliminate update and deletion anomalies

" Verify that no redundant data remains

Third normal form eliminates any columns that do not depend on a key value for their existence Any data not directly related to the entity is generally moved

to another table Third normal form is generally the final form that you should implement

Third normal form helps to avoid update and deletion anomalies because all data can be reached via foreign key values and redundant data within each table

no longer exists This level of normalization greatly increases database robustness and generates a more optimized design

1 Normalize to first and second normal form, as appropriate

2 Identify any attributes that do not depend on a key value for their existence

3 Move these independent attributes into separate tables and identify each with a primary key Relate that primary key back to the parent entity as a foreign key

Slide Objective

To illustrate the concepts of

third normal form

Lead-in

Third normal form is the

level to which most design

teams strive to normalize

their data designs

Định dạng
Số trang	24
Dung lượng	639,53 KB