Beginning Database Design- P10 potx

Customer customer_id customer_name address phone fax email exchange ticker balance_outstanding last_date_activity days_credit Edition ISBN publisher_id FK publication_id FK print_date pa

Trang 1

It makes perfect sense to begin by demonstrating denormalization from the highest Normal Form downward

Denormalizing Beyond 3NF Figure 6-1 shows reversal of the normalization processing applied in Figure 4-28, Figure 4-29, and Figure 4-30 Removing nullable fields to separate tables is a common method of saving space, particularly in databases with fixed record lengths Many modern databases allow variable record lengths If variable length records are allowed, removal of NULLvalued fields is pointless because the space saved is either none, or completely negligible

Figure 6-1: Denormalizing NULLvalued table fields

Customer customer_id customer_name address

phone fax email exchange ticker balance_outstanding last_date_activity days_credit

Edition ISBN publisher_id (FK) publication_id (FK) print_date pages

Rank ISBN (FK) rank ingram_units

Edition ISBN publisher_id publication_id print_date pages list_price format

Rank ISBN (FK) rank

Ingram ISBN (FK) ingram_units

Denormalize beyond 3rd NF Transform

Multiple NULL values can be separated into multiple tables

Potentially NULL values were separated out

153

Trang 2

A fixed-length record is where each record always occupies the same number of characters In other

words, all string and number values are a fixed length For example, a string value of 30 characters,

can never be NULL, but will contain 30 space characters It follows that a 30-character string with a

10-character name contains the name followed by 20 trailing space characters.

Figure 6-2 shows reversal of normalization processing applied in Figure 4-31 Figure 6-2 shows a particularly upsetting application of BCNF where all candidate keys are separated into separate tables A

candidate key is any field that potentially can be used as a primary key (unique identifier) for the original

entity, in this case a customer (the CUSTOMERtable) Applying this type of normalization in a commercial environment would result in incredibly poor performance and is more of a mathematical nicety rather than a commercial necessity

Figure 6-2: Denormalizing separation of candidate keys into separate tables

Once again Figure 6-3 shows another application of the reversal of BCNF Application of BCNF in Figure 4-32 and Figure 4-33 created three tables, each with unique combinations of unique values from the PROJECTStable on the left Accessing of unique records in the PROJECTtable can be handled with application coding more effectively, without the downside of creating too much granularity in table structures

Customer

customer_id

customer_name

address

phone

fax

email

exchange

ticker

balance_outstanding

last_date_activity

days_credit

Customer customer_id balance_outstanding last_activity days_credit

Customer_Stock_Ticker customer_id (FK) exchange ticker

Customer_Phone customer_id (FK) phone

Customer_Address customer_id (FK) address

Customer Email customer_id (FK) email

Customer_Fax customer_id (FK) fax

Customer Name customer_id (FK) customer_name

Denormalize BCNF Transform

154

Trang 3

Figure 6-3: Denormalization of candidate keys, created for the sake of uniqueness in tables.

Figure 4-34, Figure 4-35, Figure 4-36, and Figure 4-38 show a typical 4NF transformation where multiple valued lists in individual fields are separated out into separate tables This type of denormalization is shown in Figure 6-4

Figure 6-4: Denormalization of multiple valued lists

The problem with denormalizing the structure shown in Figure 6-4 is that the relationships between EMPLOYEEand SKILLtables, plus EMPLOYEEand CERTIFICATIONStables, are many-to-many, and not one-to-many Even in a denormalized state, each EMPLOYEErecord must have some kind of collection of SKILLSand CERTIFICATIONSvalues A better solution might be a combination of collection arrays in the EMPLOYEEtable, and 2NF static tables for skills and certifications as shown in Figure 6-5

Employee manager

Employee employee skills certification

Denormalize 4th NF Transform

Employee_Certification employee (FK) certification

Employee_Skill employee (FK) skill

It is interesting to note that the relational database modeling tool, ERWin, would not allow the MANAGERtable to have more than the MANAGERfield in its primary key For 5NF, the MANAGER table could contain either the PROJECT or EMPLOYEE field as a subset part of the primary key ERWin perhaps “thinks” that 5NF in this case is excessive, useless, or invalid.

Projects project manager employee

Manager manager

Employee employee manager (FK)

Project project manager (FK)

Denormalize BCNF Transform

155

Trang 4

Figure 6-5: Denormalization of multiple valued lists using collections and 2NF.

Figure 4-39 to Figure 4-42 shows a 5NF transformation As already noted, ERWin does not appear to allow construction of 5NF table structures of this nature The reason is suspect! Once again, as shown in Figure 6-6, application of this type of normalization is overkill It is better to place this type of layering into application coding, leaving the EMPLOYEEtable as it is, shown in the upper left of the diagram in Figure 6-6

Figure 6-6: Denormalization of 5NF cyclic dependencies

Employee project employee manager

Denormalize 5th NF Transform

Project_Manager project

manager

Manager_Employee manager

employee

Project_Employee project

employee

Employee employee skills certification

Skill skill_id skill

Certification certification_id certification

Employee employee skills certifications

2nd NF plus transform + collection array

Object-relational database collection arrays of SKILLS and CERTIFICATION

156

Trang 5

Denormalizing 3NF

The role of 3NF is to eliminate what are called transitive dependencies A transitive dependency is where a

field is not directly determined by the primary key, but indirectly determined by the primary key, through another field

Most of the Normal Form layers beyond 3NF are often impractical in a commercial environment because applications can often do better at that level What happens in reality is that 3NF occupies a gray area, fitting in between what should not be done in the database model (beyond 3NF), and what should done

in the database model (1NF and 2NF)

There are a number of different ways of interpreting 3NF, as shown in Figure 4-21, Figure 4-23, Figure 4-24, and Figure 4-25 All of these example interpretations of 3NF are completely different Figure 6-7 shows the denormalization of a many-to-many join resolution table As a general rule, a many-to-many join resolution table is usually required by applications when it can be specifically named, as for the ASSIGNMENTtable shown in Figure 6-7 If it was nonsensical to call the new table ASSIGNMENT, and it was called something such as EMPLOYEE_TASK, chances are that the extra table is unnecessary Quite often these types of tables are created without forethought as to application requirements If a table like this is not essential to application requirements, it is probably unnecessary The result of too many new tables is more tables in joins and slower queries

Figure 6-7: Denormalize unnecessary 3NF many-to-many join resolution tables

Employee employee

Denormalize 3rd NF Transform

Task task

Employee

task

Assignment employee (FK) task (FK) Employee Task

157

Trang 6

Figure 6-8 shows another version of 3NF where common fields are extracted into a new table Once again, this type of normalization is quite often more for mathematical precision and clarity, and quite contrary to commercial performance requirements Of course, there is still a transitive dependency in the new FOREIGN_EXCHANGElink table itself, because EXCHANGE_RATEdepends on CURRENCY, which in turn depends on CURRENCY_CODE Normalizing further would complicate something even more than it is already

Figure 6-8: Denormalization of 3NF amalgamated fields into an extra table

Figure 6-9 shows a classic 3NF transitive dependency resolution, or the creation of a new table The 3NF transformation is providing mathematical precision; however, practical commercial value is dubious because a new table is created, containing potentially a very small number of fields and records The bene-fit will very likely be severely outweighed by the loss in performance, as a result of bigger joins in queries

Customer customer currency_code currency exchange_rate address

Supplier supplier currency_code currency exchange_rate address

Customer customer_id currency_code (FK) address

Foreign Exchange currency_code currency exchange_rate

Supplier customer_id currency_code (FK) address

Currency data common to both

158

Trang 7

Figure 6-9: Denormalization of 3NF transitive dependence resolution table.

Figure 6-10 shows a 3NF transformation removing a total value of one field on the same table The value

of including the total amount on each record, containing the elements of the expression as well, is determined by how much a total value is used at the application level If the constituents of the totaling expression are not required, perhaps only the total value should be stored Again, this is a matter to be decided only from the perspective of application requirements

Figure 6-10: Denormalization of 3NF calculated fields

Stock stock description min max qtyonhand price total value

Stock stock description min max qtyonhand price

TOTALVALUE dependent on QTYONHAND and PRICE

Employee employee department city

Employee employee department (FK)

Department department city

1 City depends on department

2 Department depends on employee

3 Thus city indirectly or transitively dependent on employee

159

Trang 8

Denormalizing 2NF

The role of 2NF is to separate static data into separate tables, removing repeated static values from transactional tables Figure 6-11 shows an example of over-application of 2NF The lower right of the diagram shows an extreme of four tables, created from what is essentially a more-than-adequately normalized COMPANYtable at the upper left of the diagram

Figure 6-11: Denormalization of 2NF into a single static table

Listing listing exchange (FK) ticker

Classification classification

Exchange exchange classification (FK)

Company company listing (FK) address phone fax email

Listing listing classification exchange ticker

Company company address phone fax email classification exchange ticker

Company is static data-too much normalization

Insanely over normalized Over normalization

160

Trang 9

Denormalizing 1NF Just don’t do it! Data warehouse fact tables can be interpreted as being in 0th Normal Form, but the connections to dimensions are 2NF So, denormalization of 1NF is not advisable

Try It Out Denormalize to 2NF

Figure 6-12 shows a highly normalized table structure representing bands, their released CDs, tracks on the CDs, ranks of tracks, charts the tracks are listed on, plus the genres and regions of the country those charts are located in

1. The RANKand TRACKtables are one-to-one related (TRACKto RANK: one-to-zero or one) This implies a BCNF or 4NF transformation, zero or one meaning a track does not have to be ranked Thus, a track’s rank can be NULLvalued Push the RANKcolumn back into the TRACKtable and remove the RANKtable

2. The three tables BAND_ADDRESS, BAND_PHONE, and BAND_EMAILwere created because of each prospective band attribute being a candidate primary key in itself Reverse the BCNF transfor-mation, pushing address, phone, and email details back into the BANDtable

3. The CHART, GENRE, and REGIONtables are an absurd application of multiple layers of 2NF transformation, separating static information, from what is effectively parent static information Chart, genre, and region details can all be pushed back into the TRACKtable

Figure 6-12: Normalized chart toppers

Chart chart genre (FK)

Region region

Genre genre region (FK)

Track track_id chart (FK) cd_id (FK) track length

Band band_id name

CD listing classification exchange

Band_Address band_id (FK) address

Band_Phone band_id (FK) phone

Band_Email band_id (FK) email

Rank track_id (FK) rank

161

Trang 10

How It Works

Figure 6-13 shows what the tables should look like in 2NF

Figure 6-13: Denormalized chart toppers

Denormalization Using Specialized Database Objects

Many databases have specialized database objects for certain types of tasks Some specialized objects allow for physical copies of data, copying data into a denormalized form

❑ Materialized views — Materialized views are allowed in many larger relational databases These

objects are commonly used in data warehouses for pre-calculated aggregation queries Queries can be automatically switched to direct access of materialized views The result is less I/O activity by direct access to aggregated data stored in materialized views Typically, aggregated materialized views contain far fewer records than underlying tables, reducing I/O activity and thus increasing performance

Views are not the same thing as materialized views Views are overlays and not duplications of data and interfere with underlying source tables Views often cause far more in the way of performance problems than application design issues they might ease.

❑ Clusters — These objects allow physical copies of heavily accessed fields and tables in join

queries, allowing for faster access to data with more precise I/O

❑ Index-organized tables — A table can be constructed, including both index and data fields in the

same physical space The table itself becomes both the index and the data because the table is constructed as a sorted index (usually as a BTree index), rather than just a heap or “pile” of unorganized “bits and pieces.”

CD cd_id band_id (FK) title length tracks Track

track_id cd_id (FK) track length rank region genre chart

Band band_id name address phone email

162

Trang 11

❑ Temporary tables — Temporary tables can be used on a temporary basis, either for a connected

session or for a period of time Typically, temporary tables perform intermediary functions, helping to eliminate duplication or processing, and reducing repetitive I/O activities

Denormalization Tricks

There are many tricks to denormalizing data, not reversals of the steps of normalization These are some ideas to consider:

❑ Separate active and inactive data — Data can be separated into separate physical tables, namely

active and inactive tables This is a factor often missed where inactive (historical) data can occupy sometimes as much as thousands of times more space than active data This can drastically decrease performance to the most frequently needed data, the active data

Separation of active and inactive data is the purpose of a data warehouse, the data warehouse being the inactive data.

❑ Copy fields between tables — Make copies of fields between tables not directly related to each

other This can help to avoid multiple table joins between two tables where other tables must

be “passed through” to join the two desired tables An example is shown in Figure 6-14 where the SUBJECT_IDfield is duplicated into the EDITIONtable The objective is to minimize the size

of subsequent SQL code joins

Figure 6-14: Denormalization by copying fields between tables

Publisher publisher_id name

Author author_id name

Review review_id publication_id (FK) review_date text

Subject subject_id parent_id name

Publication publication_id subject_id (FK) author_id (FK) title

Edition ISBN publisher_id (FK) publication_id (FK) subject_id print_date pages list_price format rank ingram_units

CoAuthor coauthor_id (FK) publication_id (FK)

Duplication

163

Định dạng
Số trang	20
Dung lượng	757,17 KB