Load Frequency: Daily Load Statistics: • Last Load Date: N/A • Number of Rows Loaded: N/A Usage Statistics: • Average Number of Queries/Day: N/A • Average Rows Returned/Query: N/A • Aver
Trang 1duplicate is detected, a sequence number is appended
to the name This check is repeated until the name andsequence number combination are determined to beunique Once uniqueness has been confirmed, theupdate/insert takes place
• Selection Logic: Only new or changed rows areselected
• Name: Customer Location Table
• Conversion Rules: Rows in each customer locationtable are copied on a daily basis For existingcustomer locations, the ship-to address is updated Fornew customer locations, the key is generated and arow inserted
• Selection Logic: Only new or changed rows areselected
Attributes:
• Name: Customer Key
• Definition: This is an arbitrary value assigned toguarantee uniqueness for each customer and location
• Source: System Generated
of the same organization, a number will be appended
to names where duplicates exist
• Source: Name in Customer Table
• Name: Ship-to Address
• Definition: This is an address where CelDial shipsgoods to a corporate customer It is possible for acorporate customer to have multiple ship-to locations.For retail customers no ship-to address is kept
Therefore, there can only be one entry in the customerdimension for a retail customer
• Alias: None
• Change Rules: When a ship-to address changes it isupdated in place in this dimension
• Data Type: Character(60)
Appendix A The CelDial Case Study 175
Trang 2• Domain: All valid addresses within CelDial′s servicearea.
• Derivation Rules: The ship-to address is a direct copy
Definition: The manufacturing dimension represents the
manufacturing plants owned and operated by CelDial.Plants are grouped into geographic regions
Hierarchy: Data can be summarized at two levels for manufacturing
The lowest level of summarization is the manufacturingplant Data from each plant can be further rolled up tosummarize for an entire geographic region
Change Rules: New plants are inserted as new rows into the dimension
Changes to existing plants are updated in place
Load Frequency: Daily
Load Statistics:
• Last Load Date: N/A
• Number of Rows Loaded: N/A
Usage Statistics:
• Average Number of Queries/Day: N/A
• Average Rows Returned/Query: N/A
• Average Query Runtime: N/A
• Maximum Number of Queries/Day: N/A
• Maximum Rows Returned/Query: N/A
• Maximum Query Runtime: N/A
Archive Rules: Manufacturing plant data is not archived
Archive Statistics:
• Last Archive Date: N/A
• Date Archived to: N/A
Purge Rules: Manufacturing plants that have been closed for at least 48
months will be purged on a monthly basis
Purge Statistics:
• Last Purge Date: N/A
• Date Purged to: N/A
Data Quality: There are no opportunities for error or misinterpretation of
manufacturing plant data
Data Accuracy: Manufacturing plant data is 100% accurate
Key: The key to the manufacturing plant dimension consists of a
system generated number
Key Generation Method: When a manufacturing plant is copied from the
operational system, the translation table is checked todetermine if the plant already exists in the warehouse Ifnot, a new key is generated and the key along with theplant ID and region ID are added to the translation table Ifthe plant and region already exist, the key from the
Trang 3translation table is used to determine which plant in thewarehouse to update.
Source:
• Name: Manufacturing Plant Table
• Conversion Rules: rows in each plant table are copied
on a daily basis For existing plants, the plant name isupdated For new plants, once a region is determined,the key is generated and a row inserted
• Selection Logic: Only new or changed rows areselected
• Name: Manufacturing Region Table
• Conversion Rules: Rows in each region table arecopied on a daily basis For existing regions, theregion name is updated for all plants in the region.For new regions, the key is generated and a rowinserted
• Selection Logic: Only new or changed rows areselected
Attributes:
• Name: Manufacturing Key
• Definition: This is an arbitrary value assigned toguarantee uniqueness for each plant and region
creating a new plant and region entry
• Source: System Generated
• Name: Region Name
• Definition: This is the name CelDial uses to identify ageographic region for the purpose of groupingmanufacturing plants
• Source: Name in Manufacturing Region Table
• Name: Plant Name
• Definition: This is the name CelDial uses to identify anindividual manufacturing plant
Trang 4• Source: Name in Manufacturing Plant Table
Measures: Quantity on hand, Reorder level, Total cost, Total revenue,
Total quantity sold, and Discount amount
Subsidiary Dimensions: None Contact Person: Plant Manager
Definition: The time dimension represents the time frames used by
CelDial for reporting purposes
Hierarchy: The lowest level of summarization is a day Data for a
given day can be rolled up into either weeks or months.Weeks cannot be rolled up into months
Change Rules: Once a year the following year′s dates are inserted as new
rows into the dimension There are no updates to thisdimension
Load Frequency: Annually
Load Statistics:
• Last Load Date: N/A
• Number of Rows Loaded: N/A
Usage Statistics:
• Average Number of Queries/Day: N/A
• Average Rows Returned/Query: N/A
• Average Query Runtime: N/A
• Maximum Number of Queries/Day: N/A
• Maximum Rows Returned/Query: N/A
• Maximum Query Runtime: N/A
Archive Rules: Time data is not archived
Archive Statistics:
• Last Archive Date: N/A
• Date Archived to: N/A
Purge Rules: Time data more than 4 years old will be purged on a
yearly basis
Purge Statistics:
• Last Purge Date: N/A
• Date Purged to: N/A
Data Quality: There are no opportunities for error or misinterpretation of
time data
Data Accuracy: Time data is 100% accurate
Key: The key to the time dimension is a date in YYYYMMDD
• Selection Logic: All rows are selected
Attributes:
• Name: Time Key
Trang 5• Alias: None
• Change Rules: Once assigned, the values of thisattribute never change
• Data Type: Numeric
• Domain: valid dates
• Derivation Rules: This date is a direct copy from thesource
• Source: Numeric Date in Calendar spreadsheet
• Name: Date
• Definition: This is the descriptive date equivalent to thenumeric date used as the key to this dimension It isthe date used on reports and to limit what dataappears on a report It is in the format MMM DD,YYYY
• Alias: None
• Change Rules: Once assigned, the values of thisattribute never change
• Data Type: Character(12)
• Domain: valid dates in descriptive format
• Derivation Rules: This date is a direct copy from thesource
• Source: Descriptive Date in Calendar spreadsheet
• Name: Week of Year
• Definition: Each day of the year is assigned to a weekfor reporting purposes Because years don′t divideevenly into weeks it is possible for a given day nearthe beginning or end of a calendar year to fall into adifferent year for weekly reporting purposes Theformat is WW-YYYY
• Alias: None
• Change Rules: Once assigned, the values of thisattribute never change
• Data Type: Character(7)
• Domain: WW is 1-52 YYYY is any valid year
• Derivation Rules: This date is a direct copy from thesource
• Source: Week of Year in Calendar spreadsheet
Measures: Quantity on hand, Reorder level, Total cost, Total revenue,
Total quantity sold, and Discount amount
Subsidiary Dimensions: None
Contact Person: Data Warehouse Administrator
Definition: This is the cost of all components used to create product
models that have been sold
Derivation Rules: The total cost is the product of the unit cost of a product
model and quantity of the product model sold
Appendix A The CelDial Case Study 179
Trang 6Usage Statistics:
• Average Number of Queries/Day: N/A
• Maximum Number of Queries/Day: N/A
Data Quality: This figure only represents the cost of components No
attempt is made to record labor or overhead costs Aswell, cost is calculated using the current cost at the time aproduct model is sold No attempt is made to determinewhen the model was produced and the cost at that time
Data Accuracy: We estimate that the cost reported for a product model is
accurate to within +/- 5%
Dimensions: Customer, Manufacturing, Product, Seller, and Time
Definition: This is the amount billed to customers for product models
that have been sold
Derivation Rules: The total revenue is the product of the negotiated selling
price of a product model and quantity of the product modelsold
Usage Statistics:
• Average Number of Queries/Day: N/A
• Maximum Number of Queries/Day: N/A
Data Quality: This figure only represents the amount billed for product
models sold Defaults on accounts receivable are notconsidered
Data Accuracy: Defaults on accounts receivable are insignificant for the
purpose of analyzing product sales trends and patterns
Dimensions: Customer, Manufacturing, Product, Seller, and Time
Definition: This is the number of units of a product model that have
• Average Number of Queries/Day: N/A
• Maximum Number of Queries/Day: N/A
Data Quality: This figure only represents the quantity billed for product
models sold Defaults on accounts receivable are notconsidered
Data Accuracy: Defaults on accounts receivable are insignificant for the
purpose of analyzing product movement trends andpatterns
Dimensions: Customer, Manufacturing, Product, Seller, and Time
Trang 7Name: Discount Amount
Definition: This is the difference between the list price for a product
model and the actual amount billed to the customer
Derivation Rules: The discount amount is the product of the quantity of the
product model sold and the difference between thesuggested wholesale or retail price of the product modeland the negotiated selling price The suggested wholesaleprice is used if the model is sold through a corporate salesoffice The suggested retail price is used if the model issold through a retail store
Usage Statistics:
• Average Number of Queries/Day: N/A
• Maximum Number of Queries/Day: N/A
Data Quality: A study of the discount amounts recorded has concluded
that the data is being recorded correctly However, it ispossible that discounts are being offered at inappropriatetimes
Data Accuracy: Discount amounts are 100% accurate with respect to
actual discounts given
Dimensions: Customer, Manufacturing, Product, Seller, and Time
Definition: This is the number of complete units of a product model
available for distribution from a manufacturing plant at aspecific point in time (the end of a business day)
Derivation Rules: The quantity on hand for each product model for each
manufacturing plant is recorded directly from theoperational inventory records at the end of each businessday
Usage Statistics:
• Average Number of Queries/Day: N/A
• Maximum Number of Queries/Day: N/A
Data Quality: The quantity of a product produced and/or shipped on a
given business day varies greatly Therefore, noconclusions can be drawn about inventory levels at points
in time other than those actually recorded
Data Accuracy: The quantity on hand is 100% accurate as of the point in
time recorded and only at that point in time
Dimensions: Manufacturing, Product, and Time
Definition: The reorder level is used to determine when more of a
product model should be produced More of a model will
be produced when the quantity on hand for a model falls to
or below the reorder level
Appendix A The CelDial Case Study 181
Trang 8Data Type: Numeric (7,0)
Derivation Rules: The reorder level for each product model for each
manufacturing plant is recorded directly from theoperational inventory records at the end of each businessday
Usage Statistics:
• Average Number of Queries/Day: N/A
• Maximum Number of Queries/Day: N/A
Data Quality: Users in the manufacturing plants report that reorder
levels are reviewed infrequently Because of this, workersresponsible for initiating new production of product modelwill often disregard relevant warnings and plan production
by ″gut feel″
Data Accuracy: The reorder level is 100% accurate as of the point in time
recorded and only at that point in time
Dimensions: Manufacturing, Product, and Time
• SOURCE METADATA
Extract Method: The table is searched for orders recorded on the current
transaction date These orders are extracted
Extract Schedule: The extract is run daily after the close of the business day
Extract Statistics:
• Last Extract Date: N/A
• Number of Rows Extracted: N/A
• EXTRACT METADATA
Extract Schedule: The extract is run daily after the close of the business day
and prior to the Order and Inventory Extract
Extract Method: The transaction log is searched for changes to the Product,
Product Model, Product Component, and Componenttables These changes are extracted
Extract Steps: See 7.5.4.3, “Getting from Source to Target” on page 74
Extract Statistics:
• Last Extract Date: N/A
• Number of Rows Extracted: N/A
Trang 9Appendix B Special Notices
This publication is intended to guide data architects, database administrators,and developers in the design of data models for data warehouses and datamarts The information in this publication is not intended as the specifications ofany programming interfaces that are provided by any IBM products See thePUBLICATIONS section of the IBM Programming Announcement for moreinformation about what publications are considered to be product documentation
References in this publication to IBM products, programs or services do notimply that IBM intends to make these available in all countries in which IBMoperates Any reference to an IBM product, program, or service is not intended
to state or imply that only IBM′s product, program, or service may be used Anyfunctionally equivalent program that does not infringe any of IBM′s intellectualproperty rights may be used instead of the IBM product, program or service
Information in this book was developed in conjunction with use of the equipmentspecified, and is limited in application to those specific hardware and softwareproducts and levels
IBM may have patents or pending patent applications covering subject matter inthis document The furnishing of this document does not give you any license tothese patents You can send license inquiries, in writing, to the IBM Director ofLicensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA
Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently createdprograms and other programs (including this one) and (ii) the mutual use of theinformation which has been exchanged, should contact IBM Corporation, Dept.600A, Mail Drop 1329, Somers, NY 10589 USA
Such information may be available, subject to appropriate terms and conditions,including in some cases, payment of a fee
The information contained in this document has not been submitted to anyformal IBM test and is distributed AS IS The information about non-IBM(″vendor″) products in this manual has been supplied by the vendor and IBMassumes no responsibility for its accuracy or completeness The use of thisinformation or the implementation of any of these techniques is a customerresponsibility and depends on the customer′s ability to evaluate and integratethem into the customer′s operational environment While each item may havebeen reviewed by IBM for accuracy in a specific situation, there is no guaranteethat the same or similar results will be obtained elsewhere Customers
attempting to adapt these techniques to their own environments do so at theirown risk
The following document contains examples of data and reports used in dailybusiness operations To illustrate them as completely as possible, the examplescontain the names of individuals, companies, brands, and products All of thesenames are fictitious and any similarity to the names and addresses used by anactual business enterprise is entirely coincidental
The following terms are trademarks of the International Business MachinesCorporation in the United States and/or other countries:
Trang 10The following terms are trademarks of other companies:
C-bus is a trademark of Corollary, Inc
Java and HotJava are trademarks of Sun Microsystems, Incorporated
Microsoft, Windows, Windows NT, and the Windows 95 logo are trademarks
or registered trademarks of Microsoft Corporation
PC Direct is a trademark of Ziff Communications Company and is used
by IBM Corporation under license
Pentium, MMX, ProShare, LANDesk, and ActionMedia are trademarks orregistered trademarks of Intel Corporation in the U.S and other
Trang 11Appendix C Related Publications
The publications listed in this section are considered particularly suitable formore discussion on topics covered in this redbook
C.1 International Technical Support Organization Publications
For information on ordering these ITSO publications see “How to Get ITSORedbooks” on page 189
• Information Warehouse in the Retail Industry, GG24-4342
• Information Warehouse in the Finance Industry, GG24-4340
• Information Warehouse in the Insurance Industry, GG24-4341
• Data Warehouse Solutions on the AS/400, SG24-4872
• Data Where You Need It, The DPROPR Way, GG24-4492
C.2 Redbooks on CD-ROMs
Redbooks are also available on CD-ROMs Order a subscription and receive
updates 2-4 times a year at significant savings
Number
Collection Kit Number
Networking and Systems Management Redbooks Collection SBOF-7370 SK2T-6022 Transaction Processing and Data Management Redbook SBOF-7240 SK2T-8038
RS/6000 Redbooks Collection (HTML, BkMgr) SBOF-7230 SK2T-8040 RS/6000 Redbooks Collection (PostScript) SBOF-7205 SK2T-8041 Application Development Redbooks Collection SBOF-7290 SK2T-8037 Personal Systems Redbooks Collection SBOF-7250 SK2T-8042
C.3 Other Publications
The following publications are also relevant as information sources:
C.3.1 Books
Adriaans, P., and D Zaantinge Data Mining Addison-Wesley, 1996
Aiken, P Data Reverse Engineering: Slaying the Legacy Dragon McGraw-Hill, 1995
Barquin, R., and H Edelstein (eds.) Building, Using, and Managing theWarehouse Prentice Hall, 1997
Bischoff, J., and T Alexander (eds.) Data Warehouse: Practical Advice fromthe Experts Prentice Hall, 1997
Brackett, M H Data Sharing John Wiley & Sons, 1994
Brodie, M L., and M Stonebreaker Migrating Legacy Systems: Gateways,Interfaces, and the Incremental Approach Morgan Kaufmann, 1995
Corey, M., and M Abbey Oracle Data Warehousing McGraw-Hill, 1996
Trang 12Devlin, B Data Warehousing: From Architecture to Implementation.
_ Information Systems Architecture Prentice Hall, 1992
_ Using DB2 to Build Decision Support Systems Wiley-Qed,1990
Inmon, W H., and R D Hackathorn Using the Data Warehouse Wiley-Qed,1994
Inmon, W H., Imhoff, C., and G Battas Building the Operational Data Store.John Wiley & Sons, 1996
Inmon, W H., Welch, J D., and K Glassey Managing the Data Warehouse.John Wiley & Sons, 1996
Kelly, B W AS/400 Data Warehousing: The Complete Implementation Guide.CBM Books, 1996
Kelly, S Data Warehousing: The Route to Mass Customization John Wiley
Data Warehousing and Decision Support: The State of the Art Spiral Books,1995
C.3.2 Journal Articles, Technical Reports, and Miscellaneous Sources
Agrawal, R., et al., “Modeling Multidimensional Databases,” IBM ResearchReport, Almaden Research Center
Appleton, E L., “Use Your Data Warehouse to Compete,” Datamation, May1996
Codd, E F., Codd, S B., and C T Salley, “Providing OLAP to User-Analysts,”
E F Codd Associates, 1993
Darling, C B., “How to Integrate Your Data Warehouse,” Datamation, May
Trang 13Erickson, C G., “Multidimensionalism and the Data Warehouse,” The DataWarehouse Conference, February 1995.
Foley, J., and B DePompa, “Data Marts: Low Cost, High Appeal,” InformationWeek, March 1996
Gordon, K I., “Data Warehouse Implementation,” 1996
, “Data Warehouse Implementation Plan,” 1996
, “The Why of Data Standards - Do You Really Know YourData?,” 1996
Graham, S., Coburn, D., and Carsten Olesen, “The Foundations of Wisdom: AStudy of the Financial Impact of Data Warehousing,” IDC Special EditionWhite Paper, 1996
Inmon, W H., “Creating the Data Warehouse Data Model from the CorporateData Model,” PRISM Tech Topics, Vol 1, No 2
_, “Data Relationships in the Data Warehouse,” PRISM TechTopics, Vol 1, No 5
_, “Information Management: Charting the Course,” Data
Management Review, May 1996
_, “Loading Data into the Warehouse,” PRISM Tech Topics, Vol
Lambert, B., “Data Modeling for Data Warehouse Development,” Data
Management Review, February 1996
Raden, N., “Maximizing Your Data Warehouse,” parts 1 and 2, InformationWeek, March 1996
Snoddgrass, R., “Temporal Databases: Status and Research Directions,”SIGMOD Record, Vol 19, No 4, December 1990
Teale, P., “Data Warehouse Environment: End-to-End Blueprint,” presentationmaterial, IBM UK Ltd 1996
_, “Data Warehouse Environment: System Architecture,”
presentation material, IBM UK Ltd., 1996
Appendix C Related Publications 187