Customer customer_id customer_name address phone fax email exchange ticker balance_outstanding last_date_activity days_credit Edition ISBN publisher_id FK publication_id FK print_date pa
Trang 1It makes perfect sense to begin by demonstrating denormalization from the highest Normal Form downward
Denormalizing Beyond 3NF Figure 6-1 shows reversal of the normalization processing applied in Figure 4-28, Figure 4-29, and Figure 4-30 Removing nullable fields to separate tables is a common method of saving space, particularly in databases with fixed record lengths Many modern databases allow variable record lengths If variable length records are allowed, removal of NULLvalued fields is pointless because the space saved is either none, or completely negligible
Figure 6-1: Denormalizing NULLvalued table fields
Customer customer_id customer_name address
phone fax email exchange ticker balance_outstanding last_date_activity days_credit
Edition ISBN publisher_id (FK) publication_id (FK) print_date pages
Rank ISBN (FK) rank ingram_units
Edition ISBN publisher_id publication_id print_date pages list_price format
Rank ISBN (FK) rank
Ingram ISBN (FK) ingram_units
Denormalize beyond 3rd NF Transform
Denormalize beyond 3rd NF Transform
Multiple NULL values can be separated into multiple tables
Potentially NULL values were separated out
153
Trang 2A fixed-length record is where each record always occupies the same number of characters In other
words, all string and number values are a fixed length For example, a string value of 30 characters,
can never be NULL, but will contain 30 space characters It follows that a 30-character string with a
10-character name contains the name followed by 20 trailing space characters.
Figure 6-2 shows reversal of normalization processing applied in Figure 4-31 Figure 6-2 shows a particularly upsetting application of BCNF where all candidate keys are separated into separate tables A
candidate key is any field that potentially can be used as a primary key (unique identifier) for the original
entity, in this case a customer (the CUSTOMERtable) Applying this type of normalization in a commercial environment would result in incredibly poor performance and is more of a mathematical nicety rather than a commercial necessity
Figure 6-2: Denormalizing separation of candidate keys into separate tables
Once again Figure 6-3 shows another application of the reversal of BCNF Application of BCNF in Figure 4-32 and Figure 4-33 created three tables, each with unique combinations of unique values from the PROJECTStable on the left Accessing of unique records in the PROJECTtable can be handled with application coding more effectively, without the downside of creating too much granularity in table structures
Customer
customer_id
customer_name
address
phone
fax
exchange
ticker
balance_outstanding
last_date_activity
days_credit
Customer customer_id balance_outstanding last_activity days_credit
Customer_Stock_Ticker customer_id (FK) exchange ticker
Customer_Phone customer_id (FK) phone
Customer_Address customer_id (FK) address
Customer Email customer_id (FK) email
Customer_Fax customer_id (FK) fax
Customer Name customer_id (FK) customer_name
Denormalize BCNF Transform
Denormalize BCNF Transform
154
Trang 3Figure 6-3: Denormalization of candidate keys, created for the sake of uniqueness in tables.
Figure 4-34, Figure 4-35, Figure 4-36, and Figure 4-38 show a typical 4NF transformation where multiple valued lists in individual fields are separated out into separate tables This type of denormalization is shown in Figure 6-4
Figure 6-4: Denormalization of multiple valued lists
The problem with denormalizing the structure shown in Figure 6-4 is that the relationships between EMPLOYEEand SKILLtables, plus EMPLOYEEand CERTIFICATIONStables, are many-to-many, and not one-to-many Even in a denormalized state, each EMPLOYEErecord must have some kind of collection of SKILLSand CERTIFICATIONSvalues A better solution might be a combination of collection arrays in the EMPLOYEEtable, and 2NF static tables for skills and certifications as shown in Figure 6-5
Employee manager
Employee employee skills certification
Denormalize 4th NF Transform
Denormalize 4th NF Transform
Employee_Certification employee (FK) certification
Employee_Skill employee (FK) skill
It is interesting to note that the relational database modeling tool, ERWin, would not allow the MANAGERtable to have more than the MANAGERfield in its primary key For 5NF, the MANAGER table could contain either the PROJECT or EMPLOYEE field as a subset part of the primary key ERWin perhaps “thinks” that 5NF in this case is excessive, useless, or invalid.
Projects project manager employee
Manager manager
Employee employee manager (FK)
Project project manager (FK)
Denormalize BCNF Transform
Denormalize BCNF Transform
155
Trang 4Figure 6-5: Denormalization of multiple valued lists using collections and 2NF.
Figure 4-39 to Figure 4-42 shows a 5NF transformation As already noted, ERWin does not appear to allow construction of 5NF table structures of this nature The reason is suspect! Once again, as shown in Figure 6-6, application of this type of normalization is overkill It is better to place this type of layering into application coding, leaving the EMPLOYEEtable as it is, shown in the upper left of the diagram in Figure 6-6
Figure 6-6: Denormalization of 5NF cyclic dependencies
Employee project employee manager
Denormalize 5th NF Transform
Denormalize 5th NF Transform
Project_Manager project
manager
Manager_Employee manager
employee
Project_Employee project
employee
Employee employee skills certification
Skill skill_id skill
Certification certification_id certification
Employee employee skills certifications
2nd NF plus transform + collection array
2nd NF plus transform + collection array
Object-relational database collection arrays of SKILLS and CERTIFICATION
156
Trang 5Denormalizing 3NF
The role of 3NF is to eliminate what are called transitive dependencies A transitive dependency is where a
field is not directly determined by the primary key, but indirectly determined by the primary key, through another field
Most of the Normal Form layers beyond 3NF are often impractical in a commercial environment because applications can often do better at that level What happens in reality is that 3NF occupies a gray area, fitting in between what should not be done in the database model (beyond 3NF), and what should done
in the database model (1NF and 2NF)
There are a number of different ways of interpreting 3NF, as shown in Figure 4-21, Figure 4-23, Figure 4-24, and Figure 4-25 All of these example interpretations of 3NF are completely different Figure 6-7 shows the denormalization of a many-to-many join resolution table As a general rule, a many-to-many join resolution table is usually required by applications when it can be specifically named, as for the ASSIGNMENTtable shown in Figure 6-7 If it was nonsensical to call the new table ASSIGNMENT, and it was called something such as EMPLOYEE_TASK, chances are that the extra table is unnecessary Quite often these types of tables are created without forethought as to application requirements If a table like this is not essential to application requirements, it is probably unnecessary The result of too many new tables is more tables in joins and slower queries
Figure 6-7: Denormalize unnecessary 3NF many-to-many join resolution tables
Employee employee
Denormalize 3rd NF Transform
Denormalize 3rd NF Transform
Task task
Employee
task
Assignment employee (FK) task (FK) Employee Task
157
Trang 6Figure 6-8 shows another version of 3NF where common fields are extracted into a new table Once again, this type of normalization is quite often more for mathematical precision and clarity, and quite contrary to commercial performance requirements Of course, there is still a transitive dependency in the new FOREIGN_EXCHANGElink table itself, because EXCHANGE_RATEdepends on CURRENCY, which in turn depends on CURRENCY_CODE Normalizing further would complicate something even more than it is already
Figure 6-8: Denormalization of 3NF amalgamated fields into an extra table
Figure 6-9 shows a classic 3NF transitive dependency resolution, or the creation of a new table The 3NF transformation is providing mathematical precision; however, practical commercial value is dubious because a new table is created, containing potentially a very small number of fields and records The bene-fit will very likely be severely outweighed by the loss in performance, as a result of bigger joins in queries
Denormalize 3rd NF Transform
Denormalize 3rd NF Transform
Customer customer currency_code currency exchange_rate address
Supplier supplier currency_code currency exchange_rate address
Customer customer_id currency_code (FK) address
Foreign Exchange currency_code currency exchange_rate
Supplier customer_id currency_code (FK) address
Currency data common to both
158
Trang 7Figure 6-9: Denormalization of 3NF transitive dependence resolution table.
Figure 6-10 shows a 3NF transformation removing a total value of one field on the same table The value
of including the total amount on each record, containing the elements of the expression as well, is determined by how much a total value is used at the application level If the constituents of the totaling expression are not required, perhaps only the total value should be stored Again, this is a matter to be decided only from the perspective of application requirements
Figure 6-10: Denormalization of 3NF calculated fields
Denormalize 3rd NF Transform
Denormalize 3rd NF Transform
Stock stock description min max qtyonhand price total value
Stock stock description min max qtyonhand price
TOTALVALUE dependent on QTYONHAND and PRICE
Denormalize 3rd NF Transform
Denormalize 3rd NF Transform
Employee employee department city
Employee employee department (FK)
Department department city
1 City depends on department
2 Department depends on employee
3 Thus city indirectly or transitively dependent on employee
159
Trang 8Denormalizing 2NF
The role of 2NF is to separate static data into separate tables, removing repeated static values from transactional tables Figure 6-11 shows an example of over-application of 2NF The lower right of the diagram shows an extreme of four tables, created from what is essentially a more-than-adequately normalized COMPANYtable at the upper left of the diagram
Figure 6-11: Denormalization of 2NF into a single static table
Listing listing exchange (FK) ticker
Classification classification
Exchange exchange classification (FK)
Company company listing (FK) address phone fax email
Company company listing (FK) address phone fax email
Listing listing classification exchange ticker
Company company address phone fax email classification exchange ticker
Company is static data-too much normalization
Insanely over normalized Over normalization
160
Trang 9Denormalizing 1NF Just don’t do it! Data warehouse fact tables can be interpreted as being in 0th Normal Form, but the connections to dimensions are 2NF So, denormalization of 1NF is not advisable
Try It Out Denormalize to 2NF
Figure 6-12 shows a highly normalized table structure representing bands, their released CDs, tracks on the CDs, ranks of tracks, charts the tracks are listed on, plus the genres and regions of the country those charts are located in
1. The RANKand TRACKtables are one-to-one related (TRACKto RANK: one-to-zero or one) This implies a BCNF or 4NF transformation, zero or one meaning a track does not have to be ranked Thus, a track’s rank can be NULLvalued Push the RANKcolumn back into the TRACKtable and remove the RANKtable
2. The three tables BAND_ADDRESS, BAND_PHONE, and BAND_EMAILwere created because of each prospective band attribute being a candidate primary key in itself Reverse the BCNF transfor-mation, pushing address, phone, and email details back into the BANDtable
3. The CHART, GENRE, and REGIONtables are an absurd application of multiple layers of 2NF transformation, separating static information, from what is effectively parent static information Chart, genre, and region details can all be pushed back into the TRACKtable
Figure 6-12: Normalized chart toppers
Chart chart genre (FK)
Region region
Genre genre region (FK)
Track track_id chart (FK) cd_id (FK) track length
Band band_id name
CD listing classification exchange
Band_Address band_id (FK) address
Band_Phone band_id (FK) phone
Band_Email band_id (FK) email
Rank track_id (FK) rank
161
Trang 10How It Works
Figure 6-13 shows what the tables should look like in 2NF
Figure 6-13: Denormalized chart toppers
Denormalization Using Specialized Database Objects
Many databases have specialized database objects for certain types of tasks Some specialized objects allow for physical copies of data, copying data into a denormalized form
❑ Materialized views — Materialized views are allowed in many larger relational databases These
objects are commonly used in data warehouses for pre-calculated aggregation queries Queries can be automatically switched to direct access of materialized views The result is less I/O activity by direct access to aggregated data stored in materialized views Typically, aggregated materialized views contain far fewer records than underlying tables, reducing I/O activity and thus increasing performance
Views are not the same thing as materialized views Views are overlays and not duplications of data and interfere with underlying source tables Views often cause far more in the way of performance problems than application design issues they might ease.
❑ Clusters — These objects allow physical copies of heavily accessed fields and tables in join
queries, allowing for faster access to data with more precise I/O
❑ Index-organized tables — A table can be constructed, including both index and data fields in the
same physical space The table itself becomes both the index and the data because the table is constructed as a sorted index (usually as a BTree index), rather than just a heap or “pile” of unorganized “bits and pieces.”
CD cd_id band_id (FK) title length tracks Track
track_id cd_id (FK) track length rank region genre chart
Band band_id name address phone email
162
Trang 11❑ Temporary tables — Temporary tables can be used on a temporary basis, either for a connected
session or for a period of time Typically, temporary tables perform intermediary functions, helping to eliminate duplication or processing, and reducing repetitive I/O activities
Denormalization Tricks
There are many tricks to denormalizing data, not reversals of the steps of normalization These are some ideas to consider:
❑ Separate active and inactive data — Data can be separated into separate physical tables, namely
active and inactive tables This is a factor often missed where inactive (historical) data can occupy sometimes as much as thousands of times more space than active data This can drastically decrease performance to the most frequently needed data, the active data
Separation of active and inactive data is the purpose of a data warehouse, the data warehouse being the inactive data.
❑ Copy fields between tables — Make copies of fields between tables not directly related to each
other This can help to avoid multiple table joins between two tables where other tables must
be “passed through” to join the two desired tables An example is shown in Figure 6-14 where the SUBJECT_IDfield is duplicated into the EDITIONtable The objective is to minimize the size
of subsequent SQL code joins
Figure 6-14: Denormalization by copying fields between tables
Publisher publisher_id name
Author author_id name
Review review_id publication_id (FK) review_date text
Subject subject_id parent_id name
Publication publication_id subject_id (FK) author_id (FK) title
Edition ISBN publisher_id (FK) publication_id (FK) subject_id print_date pages list_price format rank ingram_units
CoAuthor coauthor_id (FK) publication_id (FK)
Duplication
163