Beginning Database Design- P17 potx

At this stage, using 5NF is thus a little pointless; however, take a quick look at Figure 10-5 earlier in this chapter where surrogate keys were not yet implemented into the online aucti

Trang 1

❑ Western Union

So, the PAYMENT_METHODSfield for a specific listing could be something like this:

Cashier’s Check, Western Union, Visa, MasterCard

This string is a comma-delimited list A comma-delimited list is by definition a multi-valued set A

multi-valued set is thus a set, or a single item containing more than one possible value 4NF demands that

comma delimited strings should be split up In the case of an online auction house, it is likely that the PAYMENT_METHODSfield would only be used for online display Then again, the list could be split in applications For example, the string value Visa determines that a specific type of credit card is acceptable, perhaps processing payment through an online credit card payment service for Visa credit cards 4NF would change the OLTP database model in Figure 10-18 to that shown in Figure 10-20

Figure 10-20: Applying 4NF to the OLTP database model

Seller_History

Seller seller_id seller popularity_rating join_date address return_policy international

seller_history_id buyer_id (FK) seller_id (FK) comment_date comments

Buyer_History buyer_history_id seller_id (FK) buyer_id (FK) comment_date comments

Buyer buyer_id buyer popularity_rating join_date address

Category_Primary primary_id primary

secondary

Seller_Payment_Methods seller_id (FK) payment method

Category_Secondary secondary_id primary_id (FK)

tertiary

Category_Tertiary tertiary_id secondary_id (FK)

Listing

seller_id (FK) tertiary_id (FK) secondary_id (FK) buyer_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids

bidder_id (FK) listing# (FK) bid_price bid_date listing#

Trang 2

The sensibility of the application of 4NF, as shown in Figure 10-20, depends on applications Once again, increasing the number of tables in a database model leads to more tables in query joins The more tables there are in query joins, the more performance is adversely affected Using the 4NF application shown in Figure 10-20, a seller could allow four payment methods as follows:

Cashier’s Check, Western Union, Visa, MasterCard

That seller would have four records as shown in Figure 10-21

Figure 10-21: Dividing a comma delimited list into separate records using 4NF

Reading SELLERrecords using the database model shown in Figure 10-20 would require a two-table join

of the SELLERand SELLER_PAYMENT_METHODStables On the contrary, without the 4NF application, as for the database model shown in Figure 10-18, only a single table would be read Querying a single table is better and easier than a two table join; however, two-table joins perform perfectly adequately between a few tables, with no significant effect on performance, unless one of the tables has a huge number of records The only problem with the database model structure in Figure 10-20 is that the SELLER_PAYMENT_METHODStable potentially has very few records for each SELLERrecord Is there any point in dividing up multi-valued strings in this case? Splitting comma-delimited strings in

programming languages for applications, is one of the easiest things in the world, and is extremely unlikely to cause performance problems in applications Doing this type of normalization at the

database model level using 4NF, on this scale, is a little overzealous — to say the least!

Denormalizing 5NF

5NF can be used, and not necessarily should be used, to eliminate cyclic dependencies A cyclic

dependency is something that depends on one thing, such that the one thing is either directly or indirectly

dependent upon itself Thus, a cyclic dependency is a form of circular dependency, where three pairs

result, as a combination of a single three-field composite primary key table For example, the three pairs could be field 1 with field 2, field 2 with field 3, and field 1 with field 3 In other words, the cyclic dependency means that everything is related to everything else, including itself There is a combination

or a permutation, which excludes repetitions If tables are joined, again using a three-table join, the resulting records will be the same as that present in the original table It is a stated requirement of the validity of 5NF that the post-transformation join must match the number of records for a query on the pre-transformation table Effectively, 5NF is similar to 4NF, in that both attempt to minimize the number

of fields in composite keys

Figure 10-18 has no composite primary keys, because surrogate keys are used At this stage, using 5NF is thus a little pointless; however, take a quick look at Figure 10-5 (earlier in this chapter) where surrogate keys were not yet implemented into the online auction house OLTP database model The structure of the

SELLER_ID 1 1 1 1

PAYMENT_METHOD Cashier’s Check Western Union Visa

Mastercard

Trang 3

Figure 10-22: 5NF can help to break down composite primary keys.

Does the end justify the means? Commercially, probably not! As you can see in Figure 10-22, the 5NF implementation starts to look a little like the hierarchical structure shown on the left of Figure 10-22

Case Study: Backtracking and Refining an OLTP Database Model

This is the part where you get to ignore the deep-layer normalization applied in the previous section, and go back to the OLTP database model shown in Figure 10-18 And, yes, the database model in Figure 10-18 can be denormalized

Essentially, there are no rules or any kind of process with respect to performing denormalization Denormalization is mostly common sense In this case, common sense is the equivalent of experience Figure 10-18 is repeated here again, in Figure 10-23, for convenience

5NF

Category_Primary primary

secondary

Category_Secondary primary (FK)

secondary (FK)

Category_Tertiary primary (FK) tertiary

Primary_Secondary primary

secondary

Primary_Tertiary primary tertiary

Secondary_Tertiary secondary tertiary

Trang 4

Figure 10-23: The online auction house OLTP database model normalized to 3NF.

What can and should be denormalized in the database model shown in Figure 10-23?

❑ The three category tables should be merged into a single self-joining table Not only does this make management of categories easier, it also allows any number of layers in the category hierarchy, rather than restricting to the three of primary, secondary, and tertiary categories

❑ Seller and buyer histories could benefit by being a single table, not only because fields are the same but also because a seller can also be a buyer and visa versa Merging the two tables could make group search of historical information a little slower; however, proper indexing might even improve performance in general (for all applications) Also, because buyers can be sellers, and sellers can be buyers, it makes no logical sense to store historical records in two separate tables If sellers and buyers are merged, it might be expedient to remove fields exclusive to the SELLERtable, into a 4NF, one-to-one subset table, to remove NULLvalues from the merged table These fields are the RETURN_POLICY, INTERNATIONAL, and the PAYMENT_METHODSfields

❑ Depending on the relative numbers of buyers, sellers, and buyer-sellers (those who do both buying and selling), it might be expedient to even merge the sellers and buyers into a single table, as well as merging histories Once again, fields are largely the same The number of buyer-sellers in operation might preempt the merge as well

The resulting OLTP database model could look similar to that shown in Figure 10-24

Seller_History

Seller seller_id seller popularity_rating join_date address return_policy international payment_methods

Category_Primary

primary_id

primary

secondary

Category_Secondary

secondary_id

primary_id (FK)

tertiary

Category_Tertiary

tertiary_id

secondary_id (FK)

Listing

tertiary_id (FK) secondary_id (FK) buyer_id (FK) seller_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids

buyer_id (FK) listing# (FK) bid_price bid_date listing#

Trang 5

Figure 10-24: Denormalizing the online auction house OLTP database model.

Denormalization is, in general, far more significant for data warehouse database models than it is for OLTP database models One of the problems with predicting what and how to denormalize is that in the analysis and design phases of database modeling and design, denormalization is a little like a

Shakespearian undiscovered country If you don’t denormalize beyond 3NF, your system design could

meet its maker And then if you do denormalize an OLTP database model, you could kill the simplicity

of the very structure you have just created

In general, denormalization is not quantifiable because no one has really thought up a formal approach for it, like many have devised for normalization Denormalization, therefore, might be somewhat akin to guesswork Guesswork is always dangerous, but if analysis is all about expert subconscious knowledge through experience, don’t let the lack of formal methods in denormalization scare you away from it The biggest problem with denormalization is that it requires extensive application knowledge Typically, this kind of foresight is available only when a system has been analyzed, designed, implemented, and placed into production Generally, when in production, any further database modeling changes are not possible So, when hoping to denormalize a database model for efficiency and ease of use by developers,

History

User user_id name popularity_rating join_date address

Seller user_id (FK) return_policy international payment_methods

user_history_id user_id (FK) comment_date comments

category

Category category_id parent_id

Listing

category_id (FK) user_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price

Bid listing# (FK) user_id (FK) bid_price bid_date listing#

Trang 6

try to learn as much about how applications use tables, in terms of record quantities, how many records are accessed at once on GUI screens, how large reports will be, and so on And do that learning process

as part of analysis and design It might be impossible to rectify in production and even in development

Denormalization requires as much applications knowledge as possible.

Example Application Queries

The following state the obvious:

❑ The database model is the backbone of any application that uses data of any kind That data is most likely stored in some kind of database That database is likely to be a relational database of one form or another

❑ Better designed database models tend to lend themselves to clearer and easier construction of SQL code queries The ease of construction of, and the ultimate performance of queries, depends largely on the soundness of the underlying database model The database model is the backbone of applications

The better the database model design, the better queries are produced, the better applications will

ultimately be and the happier your end-users will be A good application often easily built by programmers

is often not also easily usable by end-users Similar to database modelers, programmers often write code for themselves, in an elegant fashion Elegant solutions are not always going to produce the most end-user happy-smiley face result Applications must run fast enough Applications must not encourage end-users

to become frustrated Do not let elegant modeling and coding ultimately drive away your customers No customer — no business No business — no company No company — no job! And, if your end-user happens to be your boss, well, you know the rest

So, you must be able to build good queries The soundness of those queries, and ultimately applications, are dependent upon the soundness of the underlying database model A highly normalized database model is likely to be unsound because there are too many tables, too much complexity, and too many tables in joins Lots of tables and lots of complex inter-table relationships confuse people, especially the query programmers Denormalize for successful applications And preferably perform denormalization

of database models in the analysis and design phases, not after the fact in production Changing

database model structure for production systems is generally problematic, extremely expensive, and disruptive to end-users (applications go down for maintenance) After all, the objective is to turn a profit This means keeping your end-users interested If the database is an in-house thing, you need to keep your job Denormalize, denormalize, denormalize!

Once again, the efficiency of queries comes down to how many tables are joined in a single query Figure 10-23 shows the original normalized OLTP database model for the online auction house In Figure 10-24, the following denormalization has occurred:

against the three category tables would look similar to this:

SELECT *

FROM CATEGORY_PRIMARY CP JOIN CATEGORY_SECONDARY CS USING (PRIMARY_ID)

JOIN CATEGORY_TERTIARY CT USING (SECONDARY_ID);

Trang 7

A query against the single category table could be constructed as follows:

SELECT * FROM CATEGORY;

If the single category table was required to display a hierarchy, a self join could be used (some database engines have special syntax for single-table hierarchical queries):

SELECT P.CATEGORY, C.CATEGORY FROM CATEGORY P JOIN CATEGORY C ON(P.CATEGORY_ID = C.CATEGORY_ID) ORDER BY P.CATEGORY, C.CATEGORY;

Denormalizing categories in this way is probably a very sensible idea for the OLTP database model of the online auction house

was used to separate seller details from buyers Using the normalized database model in Figure 10-23 to find all listings for a specific seller, the following query applies (joining two tables and applying a WHEREclause to the SELLERtable):

SELECT * FROM SELLER S JOIN LISTING L USING (SELLER_ID) WHERE S.SELLER = “Joe Soap”;

Once again, using the normalized database model in Figure 10-23, the following query finds all existing bids, on all listings, for a particular buyer (joining three tables and applying a WHERE clause to the BUYERtable):

SELECT * FROM LISTING L JOIN BID BID USING (LISTING#) JOIN BUYER B USING (BUYER_ID)

WHERE B.BUYER = “Jim Smith”;

Using the denormalized database model in Figure 10-24, this query finds all listings for a spe-cific seller (the SELLER and USER tables are actually normalized):

SELECT * FROM USER U JOIN SELLER S USING (SELLER_ID) JOIN LISTING L USING (USER_ID)

WHERE U.NAME = “Joe Soap”;

This query is actually worse for the denormalized database model because it joins three tables instead of two And again, using the denormalized database model in Figure 10-24, the follow-ing query finds all existfollow-ing bids on all listfollow-ings for a particular buyer:

SELECT * FROM LISTING L JOIN BID BID USING (LISTING#) JOIN USER U USING (USER_ID)

WHERE U.NAME = “Jim Smith”

AND U.USER_ID NOT IN (SELECT USER_ID FROM SELLER);

Trang 8

This query is also worse for the denormalized version because not only does it join three tables,

but additionally performs a semi-join (and an anti semi-join at that) An anti semi-join is a

nega-tive search A neganega-tive search tries to find what is not in a table, and therefore must read all records in that table Indexes can’t be used at all and, thus, a full table scan results Full table scans can be I/O heavy for larger tables

It should be clear to conclude that denormalizing the BUYERand SELLERtables into the USER and normalized SELLERtables (as shown in Figure 10-24) is probably quite a bad idea! At least

it appears that way from the perspective of query use; however, an extra field could be added to the USERtable to dissimilate between users and buyers, in relation to bids and listings (a person performing both buying and selling will appear in both buyer and seller data sets) The extra field could be used as a base for very efficient indexing or even something as advanced as

parti-tioning Partitioning physically breaks tables into separate physical chunks If the USERtable were partitioned between users and sellers, reading only sellers from the USERtable would only perform I/O against a partition containing sellers (not buyers) It is still not really very sensible

to denormalize the BUYERand SELLERtable into the USERtable

10-24 Executing a query using the normalized database model in Figure 10-23 to find the history for a specific seller, could be performed using a query like the following:

SELECT *

FROM SELLER S JOIN SELLER_HISTORY SH USING (SELLER_ID)

WHERE S.SELLER = “Joe Soap”;

Finding a history for a specific seller using the denormalized database model shown in Figure 10-24 could use a query like this:

SELECT *

FROM USER U JOIN HISTORY H (USER_ID)

WHERE U.NAME = “Joe Soap”

AND U.USER_ID IN (SELECT USER_ID FROM SELLER);

Once again, as with denormalization of SELLERand BUYERtables into the USERtable, denormal-izing the SELLER_HISTORYand BUYER HISTORY tables into the HISTORYtable, might actually

be a bad idea The first query above joins two tables The second query also joins two tables, but also executes a semi-join This semi-join is not as bad as for denormalization of users, which used an anti semi-join; however, this is still effectively a three-way join

So, you have discovered that perhaps the most effective, descriptive, and potentially efficient database model for the OLTP online auction house is as shown in Figure 10-25 The only denormalization making sense at this stage is to merge the three separate category hierarchy tables into the single self-joining CATEGORYtable Buyer, seller, and history information is probably best left in separate tables

Trang 9

Denormalization is rarely effective for OLTP database models for anything between 1NF and 3NF; however (and this very important), remember that previously in this chapter you read about layers of normalization beyond 3NF (BCNF, 4NF, 5NF and DKNF) None of these intensive Normal Forms have

so far been applied to the OLTP database model for the online auction house As of Figure 10-23, you began to attempt to backtrack on previously performed normalization, by denormalizing You began with the 3NF database model as shown in Figure 10-23 In other words, any normalization beyond 3NF was simply ignored, having already been proved to be completely superfluous and over the top for this particular database model

Figure 10-25: The online auction house OLTP database model, 3NF, partially denormalized

The only obvious issue still with the database model as shown in Figure 10-25 is that the BUYER_HIS-TORYand SELLER_HISTORYtables have both BUYER_IDand SELLER_IDfields In other words, both his-tory tables are linked (related) to both of the BUYERand SELLERtables It therefore could make perfect sense to denormalize not only the category tables, but the history tables as well, leave BUYERand SELLERtables normalized, and separate, as shown in Figure 10-26

Seller_History

Buyer buyer_id buyer popularity_rating join_date address category

Category category_id parent_id

Listing

category_id (FK) buyer_id (FK) seller_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price

Bid bidder_id (FK) listing# (FK) bid_price bid_date listing#

Trang 10

Figure 10-26: The online auction house OLTP database model, 3NF, slightly further denormalized.

The newly denormalized HISTORYtable can be accessed efficiently by splitting the history records based on buyers and sellers, using indexing or something hairy fairy and sophisticated like physical partitioning

Try It Out Designing an OLTP Database Model

Create a simple design level OLTP database model for a Web site This Web site allows creation of free classified ads for musicians and bands Use the simple OLTP database model presented in Figure 10-27 (copied from Figure 9-19, in Chapter 9) Here’s a basic approach:

1. Create surrogate primary keys for all tables

2. Enforce referential integrity using appropriate primary keys, foreign keys, and inter-table

relationships

3. Refine inter-table relationships properly, according to requirements, as identifying, non-identifying relationships, and also be precise about whether each crow’s foot allows zero

History

history_id seller_id (FK) buyer_id (FK) comment_date comments

Định dạng
Số trang	20
Dung lượng	707,38 KB