At this stage, using 5NF is thus a little pointless; however, take a quick look at Figure 10-5 earlier in this chapter where surrogate keys were not yet implemented into the online aucti
Trang 1❑ Western Union
So, the PAYMENT_METHODSfield for a specific listing could be something like this:
Cashier’s Check, Western Union, Visa, MasterCard
This string is a comma-delimited list A comma-delimited list is by definition a multi-valued set A
multi-valued set is thus a set, or a single item containing more than one possible value 4NF demands that
comma delimited strings should be split up In the case of an online auction house, it is likely that the PAYMENT_METHODSfield would only be used for online display Then again, the list could be split in applications For example, the string value Visa determines that a specific type of credit card is acceptable, perhaps processing payment through an online credit card payment service for Visa credit cards 4NF would change the OLTP database model in Figure 10-18 to that shown in Figure 10-20
Figure 10-20: Applying 4NF to the OLTP database model
Seller_History
Seller seller_id seller popularity_rating join_date address return_policy international
seller_history_id buyer_id (FK) seller_id (FK) comment_date comments
Buyer_History buyer_history_id seller_id (FK) buyer_id (FK) comment_date comments
Buyer buyer_id buyer popularity_rating join_date address
Category_Primary primary_id primary
secondary
Seller_Payment_Methods seller_id (FK) payment method
Category_Secondary secondary_id primary_id (FK)
tertiary
Category_Tertiary tertiary_id secondary_id (FK)
Listing
seller_id (FK) tertiary_id (FK) secondary_id (FK) buyer_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids
bidder_id (FK) listing# (FK) bid_price bid_date listing#
Trang 2The sensibility of the application of 4NF, as shown in Figure 10-20, depends on applications Once again, increasing the number of tables in a database model leads to more tables in query joins The more tables there are in query joins, the more performance is adversely affected Using the 4NF application shown in Figure 10-20, a seller could allow four payment methods as follows:
Cashier’s Check, Western Union, Visa, MasterCard
That seller would have four records as shown in Figure 10-21
Figure 10-21: Dividing a comma delimited list into separate records using 4NF
Reading SELLERrecords using the database model shown in Figure 10-20 would require a two-table join
of the SELLERand SELLER_PAYMENT_METHODStables On the contrary, without the 4NF application, as for the database model shown in Figure 10-18, only a single table would be read Querying a single table is better and easier than a two table join; however, two-table joins perform perfectly adequately between a few tables, with no significant effect on performance, unless one of the tables has a huge number of records The only problem with the database model structure in Figure 10-20 is that the SELLER_PAYMENT_METHODStable potentially has very few records for each SELLERrecord Is there any point in dividing up multi-valued strings in this case? Splitting comma-delimited strings in
programming languages for applications, is one of the easiest things in the world, and is extremely unlikely to cause performance problems in applications Doing this type of normalization at the
database model level using 4NF, on this scale, is a little overzealous — to say the least!
Denormalizing 5NF
5NF can be used, and not necessarily should be used, to eliminate cyclic dependencies A cyclic
dependency is something that depends on one thing, such that the one thing is either directly or indirectly
dependent upon itself Thus, a cyclic dependency is a form of circular dependency, where three pairs
result, as a combination of a single three-field composite primary key table For example, the three pairs could be field 1 with field 2, field 2 with field 3, and field 1 with field 3 In other words, the cyclic dependency means that everything is related to everything else, including itself There is a combination
or a permutation, which excludes repetitions If tables are joined, again using a three-table join, the resulting records will be the same as that present in the original table It is a stated requirement of the validity of 5NF that the post-transformation join must match the number of records for a query on the pre-transformation table Effectively, 5NF is similar to 4NF, in that both attempt to minimize the number
of fields in composite keys
Figure 10-18 has no composite primary keys, because surrogate keys are used At this stage, using 5NF is thus a little pointless; however, take a quick look at Figure 10-5 (earlier in this chapter) where surrogate keys were not yet implemented into the online auction house OLTP database model The structure of the
SELLER_ID 1 1 1 1
PAYMENT_METHOD Cashier’s Check Western Union Visa
Mastercard
Trang 3Figure 10-22: 5NF can help to break down composite primary keys.
Does the end justify the means? Commercially, probably not! As you can see in Figure 10-22, the 5NF implementation starts to look a little like the hierarchical structure shown on the left of Figure 10-22
Case Study: Backtracking and Refining an OLTP Database Model
This is the part where you get to ignore the deep-layer normalization applied in the previous section, and go back to the OLTP database model shown in Figure 10-18 And, yes, the database model in Figure 10-18 can be denormalized
Essentially, there are no rules or any kind of process with respect to performing denormalization Denormalization is mostly common sense In this case, common sense is the equivalent of experience Figure 10-18 is repeated here again, in Figure 10-23, for convenience
5NF
Category_Primary primary
secondary
Category_Secondary primary (FK)
secondary (FK)
Category_Tertiary primary (FK) tertiary
Primary_Secondary primary
secondary
Primary_Tertiary primary tertiary
Secondary_Tertiary secondary tertiary
Trang 4Figure 10-23: The online auction house OLTP database model normalized to 3NF.
What can and should be denormalized in the database model shown in Figure 10-23?
❑ The three category tables should be merged into a single self-joining table Not only does this make management of categories easier, it also allows any number of layers in the category hierarchy, rather than restricting to the three of primary, secondary, and tertiary categories
❑ Seller and buyer histories could benefit by being a single table, not only because fields are the same but also because a seller can also be a buyer and visa versa Merging the two tables could make group search of historical information a little slower; however, proper indexing might even improve performance in general (for all applications) Also, because buyers can be sellers, and sellers can be buyers, it makes no logical sense to store historical records in two separate tables If sellers and buyers are merged, it might be expedient to remove fields exclusive to the SELLERtable, into a 4NF, one-to-one subset table, to remove NULLvalues from the merged table These fields are the RETURN_POLICY, INTERNATIONAL, and the PAYMENT_METHODSfields
❑ Depending on the relative numbers of buyers, sellers, and buyer-sellers (those who do both buying and selling), it might be expedient to even merge the sellers and buyers into a single table, as well as merging histories Once again, fields are largely the same The number of buyer-sellers in operation might preempt the merge as well
The resulting OLTP database model could look similar to that shown in Figure 10-24
Seller_History
Seller seller_id seller popularity_rating join_date address return_policy international payment_methods
seller_history_id buyer_id (FK) seller_id (FK) comment_date comments
Buyer_History buyer_history_id seller_id (FK) buyer_id (FK) comment_date comments
Buyer buyer_id buyer popularity_rating join_date address
Category_Primary
primary_id
primary
secondary
Category_Secondary
secondary_id
primary_id (FK)
tertiary
Category_Tertiary
tertiary_id
secondary_id (FK)
Listing
tertiary_id (FK) secondary_id (FK) buyer_id (FK) seller_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids
buyer_id (FK) listing# (FK) bid_price bid_date listing#
Trang 5Figure 10-24: Denormalizing the online auction house OLTP database model.
Denormalization is, in general, far more significant for data warehouse database models than it is for OLTP database models One of the problems with predicting what and how to denormalize is that in the analysis and design phases of database modeling and design, denormalization is a little like a
Shakespearian undiscovered country If you don’t denormalize beyond 3NF, your system design could
meet its maker And then if you do denormalize an OLTP database model, you could kill the simplicity
of the very structure you have just created
In general, denormalization is not quantifiable because no one has really thought up a formal approach for it, like many have devised for normalization Denormalization, therefore, might be somewhat akin to guesswork Guesswork is always dangerous, but if analysis is all about expert subconscious knowledge through experience, don’t let the lack of formal methods in denormalization scare you away from it The biggest problem with denormalization is that it requires extensive application knowledge Typically, this kind of foresight is available only when a system has been analyzed, designed, implemented, and placed into production Generally, when in production, any further database modeling changes are not possible So, when hoping to denormalize a database model for efficiency and ease of use by developers,
History
User user_id name popularity_rating join_date address
Seller user_id (FK) return_policy international payment_methods
user_history_id user_id (FK) comment_date comments
category
Category category_id parent_id
Listing
category_id (FK) user_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price
Bid listing# (FK) user_id (FK) bid_price bid_date listing#
Trang 6try to learn as much about how applications use tables, in terms of record quantities, how many records are accessed at once on GUI screens, how large reports will be, and so on And do that learning process
as part of analysis and design It might be impossible to rectify in production and even in development
Denormalization requires as much applications knowledge as possible.
Example Application Queries
The following state the obvious:
❑ The database model is the backbone of any application that uses data of any kind That data is most likely stored in some kind of database That database is likely to be a relational database of one form or another
❑ Better designed database models tend to lend themselves to clearer and easier construction of SQL code queries The ease of construction of, and the ultimate performance of queries, depends largely on the soundness of the underlying database model The database model is the backbone of applications
The better the database model design, the better queries are produced, the better applications will
ultimately be and the happier your end-users will be A good application often easily built by programmers
is often not also easily usable by end-users Similar to database modelers, programmers often write code for themselves, in an elegant fashion Elegant solutions are not always going to produce the most end-user happy-smiley face result Applications must run fast enough Applications must not encourage end-users
to become frustrated Do not let elegant modeling and coding ultimately drive away your customers No customer — no business No business — no company No company — no job! And, if your end-user happens to be your boss, well, you know the rest
So, you must be able to build good queries The soundness of those queries, and ultimately applications, are dependent upon the soundness of the underlying database model A highly normalized database model is likely to be unsound because there are too many tables, too much complexity, and too many tables in joins Lots of tables and lots of complex inter-table relationships confuse people, especially the query programmers Denormalize for successful applications And preferably perform denormalization
of database models in the analysis and design phases, not after the fact in production Changing
database model structure for production systems is generally problematic, extremely expensive, and disruptive to end-users (applications go down for maintenance) After all, the objective is to turn a profit This means keeping your end-users interested If the database is an in-house thing, you need to keep your job Denormalize, denormalize, denormalize!
Once again, the efficiency of queries comes down to how many tables are joined in a single query Figure 10-23 shows the original normalized OLTP database model for the online auction house In Figure 10-24, the following denormalization has occurred:
against the three category tables would look similar to this:
SELECT *
FROM CATEGORY_PRIMARY CP JOIN CATEGORY_SECONDARY CS USING (PRIMARY_ID)
JOIN CATEGORY_TERTIARY CT USING (SECONDARY_ID);
Trang 7A query against the single category table could be constructed as follows:
SELECT * FROM CATEGORY;
If the single category table was required to display a hierarchy, a self join could be used (some database engines have special syntax for single-table hierarchical queries):
SELECT P.CATEGORY, C.CATEGORY FROM CATEGORY P JOIN CATEGORY C ON(P.CATEGORY_ID = C.CATEGORY_ID) ORDER BY P.CATEGORY, C.CATEGORY;
Denormalizing categories in this way is probably a very sensible idea for the OLTP database model of the online auction house
was used to separate seller details from buyers Using the normalized database model in Figure 10-23 to find all listings for a specific seller, the following query applies (joining two tables and applying a WHEREclause to the SELLERtable):
SELECT * FROM SELLER S JOIN LISTING L USING (SELLER_ID) WHERE S.SELLER = “Joe Soap”;
Once again, using the normalized database model in Figure 10-23, the following query finds all existing bids, on all listings, for a particular buyer (joining three tables and applying a WHERE clause to the BUYERtable):
SELECT * FROM LISTING L JOIN BID BID USING (LISTING#) JOIN BUYER B USING (BUYER_ID)
WHERE B.BUYER = “Jim Smith”;
Using the denormalized database model in Figure 10-24, this query finds all listings for a spe-cific seller (the SELLER and USER tables are actually normalized):
SELECT * FROM USER U JOIN SELLER S USING (SELLER_ID) JOIN LISTING L USING (USER_ID)
WHERE U.NAME = “Joe Soap”;
This query is actually worse for the denormalized database model because it joins three tables instead of two And again, using the denormalized database model in Figure 10-24, the follow-ing query finds all existfollow-ing bids on all listfollow-ings for a particular buyer:
SELECT * FROM LISTING L JOIN BID BID USING (LISTING#) JOIN USER U USING (USER_ID)
WHERE U.NAME = “Jim Smith”
AND U.USER_ID NOT IN (SELECT USER_ID FROM SELLER);
Trang 8This query is also worse for the denormalized version because not only does it join three tables,
but additionally performs a semi-join (and an anti semi-join at that) An anti semi-join is a
nega-tive search A neganega-tive search tries to find what is not in a table, and therefore must read all records in that table Indexes can’t be used at all and, thus, a full table scan results Full table scans can be I/O heavy for larger tables
It should be clear to conclude that denormalizing the BUYERand SELLERtables into the USER and normalized SELLERtables (as shown in Figure 10-24) is probably quite a bad idea! At least
it appears that way from the perspective of query use; however, an extra field could be added to the USERtable to dissimilate between users and buyers, in relation to bids and listings (a person performing both buying and selling will appear in both buyer and seller data sets) The extra field could be used as a base for very efficient indexing or even something as advanced as
parti-tioning Partitioning physically breaks tables into separate physical chunks If the USERtable were partitioned between users and sellers, reading only sellers from the USERtable would only perform I/O against a partition containing sellers (not buyers) It is still not really very sensible
to denormalize the BUYERand SELLERtable into the USERtable
10-24 Executing a query using the normalized database model in Figure 10-23 to find the history for a specific seller, could be performed using a query like the following:
SELECT *
FROM SELLER S JOIN SELLER_HISTORY SH USING (SELLER_ID)
WHERE S.SELLER = “Joe Soap”;
Finding a history for a specific seller using the denormalized database model shown in Figure 10-24 could use a query like this:
SELECT *
FROM USER U JOIN HISTORY H (USER_ID)
WHERE U.NAME = “Joe Soap”
AND U.USER_ID IN (SELECT USER_ID FROM SELLER);
Once again, as with denormalization of SELLERand BUYERtables into the USERtable, denormal-izing the SELLER_HISTORYand BUYER HISTORY tables into the HISTORYtable, might actually
be a bad idea The first query above joins two tables The second query also joins two tables, but also executes a semi-join This semi-join is not as bad as for denormalization of users, which used an anti semi-join; however, this is still effectively a three-way join
So, you have discovered that perhaps the most effective, descriptive, and potentially efficient database model for the OLTP online auction house is as shown in Figure 10-25 The only denormalization making sense at this stage is to merge the three separate category hierarchy tables into the single self-joining CATEGORYtable Buyer, seller, and history information is probably best left in separate tables
Trang 9Denormalization is rarely effective for OLTP database models for anything between 1NF and 3NF; however (and this very important), remember that previously in this chapter you read about layers of normalization beyond 3NF (BCNF, 4NF, 5NF and DKNF) None of these intensive Normal Forms have
so far been applied to the OLTP database model for the online auction house As of Figure 10-23, you began to attempt to backtrack on previously performed normalization, by denormalizing You began with the 3NF database model as shown in Figure 10-23 In other words, any normalization beyond 3NF was simply ignored, having already been proved to be completely superfluous and over the top for this particular database model
Figure 10-25: The online auction house OLTP database model, 3NF, partially denormalized
The only obvious issue still with the database model as shown in Figure 10-25 is that the BUYER_HIS-TORYand SELLER_HISTORYtables have both BUYER_IDand SELLER_IDfields In other words, both his-tory tables are linked (related) to both of the BUYERand SELLERtables It therefore could make perfect sense to denormalize not only the category tables, but the history tables as well, leave BUYERand SELLERtables normalized, and separate, as shown in Figure 10-26
Seller_History
Seller seller_id seller popularity_rating join_date address return_policy international payment_methods
seller_history_id buyer_id (FK) seller_id (FK) comment_date comments
Buyer_History buyer_history_id seller_id (FK) buyer_id (FK) comment_date comments
Buyer buyer_id buyer popularity_rating join_date address category
Category category_id parent_id
Listing
category_id (FK) buyer_id (FK) seller_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price
Bid bidder_id (FK) listing# (FK) bid_price bid_date listing#
Trang 10Figure 10-26: The online auction house OLTP database model, 3NF, slightly further denormalized.
The newly denormalized HISTORYtable can be accessed efficiently by splitting the history records based on buyers and sellers, using indexing or something hairy fairy and sophisticated like physical partitioning
Try It Out Designing an OLTP Database Model
Create a simple design level OLTP database model for a Web site This Web site allows creation of free classified ads for musicians and bands Use the simple OLTP database model presented in Figure 10-27 (copied from Figure 9-19, in Chapter 9) Here’s a basic approach:
1. Create surrogate primary keys for all tables
2. Enforce referential integrity using appropriate primary keys, foreign keys, and inter-table
relationships
3. Refine inter-table relationships properly, according to requirements, as identifying, non-identifying relationships, and also be precise about whether each crow’s foot allows zero
History
Seller seller_id seller popularity_rating join_date address return_policy international payment_methods
history_id seller_id (FK) buyer_id (FK) comment_date comments
Buyer buyer_id buyer popularity_rating join_date address
category
Category
category_id
parent_id
Listing
category_id (FK) buyer_id (FK) seller_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price
Bid bidder_id (FK) listing# (FK) bid_price bid_date listing#