1. Trang chủ
  2. » Công Nghệ Thông Tin

Data Modeling Techniques for Data Warehousing phần 10 docx

26 325 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 26
Dung lượng 78,22 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Load Frequency: Daily Load Statistics: • Last Load Date: N/A • Number of Rows Loaded: N/A Usage Statistics: • Average Number of Queries/Day: N/A • Average Rows Returned/Query: N/A • Aver

Trang 1

duplicate is detected, a sequence number is appended

to the name This check is repeated until the name andsequence number combination are determined to beunique Once uniqueness has been confirmed, theupdate/insert takes place

• Selection Logic: Only new or changed rows areselected

• Name: Customer Location Table

• Conversion Rules: Rows in each customer locationtable are copied on a daily basis For existingcustomer locations, the ship-to address is updated Fornew customer locations, the key is generated and arow inserted

• Selection Logic: Only new or changed rows areselected

Attributes:

• Name: Customer Key

• Definition: This is an arbitrary value assigned toguarantee uniqueness for each customer and location

• Source: System Generated

of the same organization, a number will be appended

to names where duplicates exist

• Source: Name in Customer Table

• Name: Ship-to Address

• Definition: This is an address where CelDial shipsgoods to a corporate customer It is possible for acorporate customer to have multiple ship-to locations.For retail customers no ship-to address is kept

Therefore, there can only be one entry in the customerdimension for a retail customer

• Alias: None

• Change Rules: When a ship-to address changes it isupdated in place in this dimension

• Data Type: Character(60)

Appendix A The CelDial Case Study 175

Trang 2

• Domain: All valid addresses within CelDial′s servicearea.

• Derivation Rules: The ship-to address is a direct copy

Definition: The manufacturing dimension represents the

manufacturing plants owned and operated by CelDial.Plants are grouped into geographic regions

Hierarchy: Data can be summarized at two levels for manufacturing

The lowest level of summarization is the manufacturingplant Data from each plant can be further rolled up tosummarize for an entire geographic region

Change Rules: New plants are inserted as new rows into the dimension

Changes to existing plants are updated in place

Load Frequency: Daily

Load Statistics:

• Last Load Date: N/A

• Number of Rows Loaded: N/A

Usage Statistics:

• Average Number of Queries/Day: N/A

• Average Rows Returned/Query: N/A

• Average Query Runtime: N/A

• Maximum Number of Queries/Day: N/A

• Maximum Rows Returned/Query: N/A

• Maximum Query Runtime: N/A

Archive Rules: Manufacturing plant data is not archived

Archive Statistics:

• Last Archive Date: N/A

• Date Archived to: N/A

Purge Rules: Manufacturing plants that have been closed for at least 48

months will be purged on a monthly basis

Purge Statistics:

• Last Purge Date: N/A

• Date Purged to: N/A

Data Quality: There are no opportunities for error or misinterpretation of

manufacturing plant data

Data Accuracy: Manufacturing plant data is 100% accurate

Key: The key to the manufacturing plant dimension consists of a

system generated number

Key Generation Method: When a manufacturing plant is copied from the

operational system, the translation table is checked todetermine if the plant already exists in the warehouse Ifnot, a new key is generated and the key along with theplant ID and region ID are added to the translation table Ifthe plant and region already exist, the key from the

Trang 3

translation table is used to determine which plant in thewarehouse to update.

Source:

• Name: Manufacturing Plant Table

• Conversion Rules: rows in each plant table are copied

on a daily basis For existing plants, the plant name isupdated For new plants, once a region is determined,the key is generated and a row inserted

• Selection Logic: Only new or changed rows areselected

• Name: Manufacturing Region Table

• Conversion Rules: Rows in each region table arecopied on a daily basis For existing regions, theregion name is updated for all plants in the region.For new regions, the key is generated and a rowinserted

• Selection Logic: Only new or changed rows areselected

Attributes:

• Name: Manufacturing Key

• Definition: This is an arbitrary value assigned toguarantee uniqueness for each plant and region

creating a new plant and region entry

• Source: System Generated

• Name: Region Name

• Definition: This is the name CelDial uses to identify ageographic region for the purpose of groupingmanufacturing plants

• Source: Name in Manufacturing Region Table

• Name: Plant Name

• Definition: This is the name CelDial uses to identify anindividual manufacturing plant

Trang 4

• Source: Name in Manufacturing Plant Table

Measures: Quantity on hand, Reorder level, Total cost, Total revenue,

Total quantity sold, and Discount amount

Subsidiary Dimensions: None Contact Person: Plant Manager

Definition: The time dimension represents the time frames used by

CelDial for reporting purposes

Hierarchy: The lowest level of summarization is a day Data for a

given day can be rolled up into either weeks or months.Weeks cannot be rolled up into months

Change Rules: Once a year the following year′s dates are inserted as new

rows into the dimension There are no updates to thisdimension

Load Frequency: Annually

Load Statistics:

• Last Load Date: N/A

• Number of Rows Loaded: N/A

Usage Statistics:

• Average Number of Queries/Day: N/A

• Average Rows Returned/Query: N/A

• Average Query Runtime: N/A

• Maximum Number of Queries/Day: N/A

• Maximum Rows Returned/Query: N/A

• Maximum Query Runtime: N/A

Archive Rules: Time data is not archived

Archive Statistics:

• Last Archive Date: N/A

• Date Archived to: N/A

Purge Rules: Time data more than 4 years old will be purged on a

yearly basis

Purge Statistics:

• Last Purge Date: N/A

• Date Purged to: N/A

Data Quality: There are no opportunities for error or misinterpretation of

time data

Data Accuracy: Time data is 100% accurate

Key: The key to the time dimension is a date in YYYYMMDD

• Selection Logic: All rows are selected

Attributes:

• Name: Time Key

Trang 5

• Alias: None

• Change Rules: Once assigned, the values of thisattribute never change

• Data Type: Numeric

• Domain: valid dates

• Derivation Rules: This date is a direct copy from thesource

• Source: Numeric Date in Calendar spreadsheet

• Name: Date

• Definition: This is the descriptive date equivalent to thenumeric date used as the key to this dimension It isthe date used on reports and to limit what dataappears on a report It is in the format MMM DD,YYYY

• Alias: None

• Change Rules: Once assigned, the values of thisattribute never change

• Data Type: Character(12)

• Domain: valid dates in descriptive format

• Derivation Rules: This date is a direct copy from thesource

• Source: Descriptive Date in Calendar spreadsheet

• Name: Week of Year

• Definition: Each day of the year is assigned to a weekfor reporting purposes Because years don′t divideevenly into weeks it is possible for a given day nearthe beginning or end of a calendar year to fall into adifferent year for weekly reporting purposes Theformat is WW-YYYY

• Alias: None

• Change Rules: Once assigned, the values of thisattribute never change

• Data Type: Character(7)

• Domain: WW is 1-52 YYYY is any valid year

• Derivation Rules: This date is a direct copy from thesource

• Source: Week of Year in Calendar spreadsheet

Measures: Quantity on hand, Reorder level, Total cost, Total revenue,

Total quantity sold, and Discount amount

Subsidiary Dimensions: None

Contact Person: Data Warehouse Administrator

Definition: This is the cost of all components used to create product

models that have been sold

Derivation Rules: The total cost is the product of the unit cost of a product

model and quantity of the product model sold

Appendix A The CelDial Case Study 179

Trang 6

Usage Statistics:

• Average Number of Queries/Day: N/A

• Maximum Number of Queries/Day: N/A

Data Quality: This figure only represents the cost of components No

attempt is made to record labor or overhead costs Aswell, cost is calculated using the current cost at the time aproduct model is sold No attempt is made to determinewhen the model was produced and the cost at that time

Data Accuracy: We estimate that the cost reported for a product model is

accurate to within +/- 5%

Dimensions: Customer, Manufacturing, Product, Seller, and Time

Definition: This is the amount billed to customers for product models

that have been sold

Derivation Rules: The total revenue is the product of the negotiated selling

price of a product model and quantity of the product modelsold

Usage Statistics:

• Average Number of Queries/Day: N/A

• Maximum Number of Queries/Day: N/A

Data Quality: This figure only represents the amount billed for product

models sold Defaults on accounts receivable are notconsidered

Data Accuracy: Defaults on accounts receivable are insignificant for the

purpose of analyzing product sales trends and patterns

Dimensions: Customer, Manufacturing, Product, Seller, and Time

Definition: This is the number of units of a product model that have

• Average Number of Queries/Day: N/A

• Maximum Number of Queries/Day: N/A

Data Quality: This figure only represents the quantity billed for product

models sold Defaults on accounts receivable are notconsidered

Data Accuracy: Defaults on accounts receivable are insignificant for the

purpose of analyzing product movement trends andpatterns

Dimensions: Customer, Manufacturing, Product, Seller, and Time

Trang 7

Name: Discount Amount

Definition: This is the difference between the list price for a product

model and the actual amount billed to the customer

Derivation Rules: The discount amount is the product of the quantity of the

product model sold and the difference between thesuggested wholesale or retail price of the product modeland the negotiated selling price The suggested wholesaleprice is used if the model is sold through a corporate salesoffice The suggested retail price is used if the model issold through a retail store

Usage Statistics:

• Average Number of Queries/Day: N/A

• Maximum Number of Queries/Day: N/A

Data Quality: A study of the discount amounts recorded has concluded

that the data is being recorded correctly However, it ispossible that discounts are being offered at inappropriatetimes

Data Accuracy: Discount amounts are 100% accurate with respect to

actual discounts given

Dimensions: Customer, Manufacturing, Product, Seller, and Time

Definition: This is the number of complete units of a product model

available for distribution from a manufacturing plant at aspecific point in time (the end of a business day)

Derivation Rules: The quantity on hand for each product model for each

manufacturing plant is recorded directly from theoperational inventory records at the end of each businessday

Usage Statistics:

• Average Number of Queries/Day: N/A

• Maximum Number of Queries/Day: N/A

Data Quality: The quantity of a product produced and/or shipped on a

given business day varies greatly Therefore, noconclusions can be drawn about inventory levels at points

in time other than those actually recorded

Data Accuracy: The quantity on hand is 100% accurate as of the point in

time recorded and only at that point in time

Dimensions: Manufacturing, Product, and Time

Definition: The reorder level is used to determine when more of a

product model should be produced More of a model will

be produced when the quantity on hand for a model falls to

or below the reorder level

Appendix A The CelDial Case Study 181

Trang 8

Data Type: Numeric (7,0)

Derivation Rules: The reorder level for each product model for each

manufacturing plant is recorded directly from theoperational inventory records at the end of each businessday

Usage Statistics:

• Average Number of Queries/Day: N/A

• Maximum Number of Queries/Day: N/A

Data Quality: Users in the manufacturing plants report that reorder

levels are reviewed infrequently Because of this, workersresponsible for initiating new production of product modelwill often disregard relevant warnings and plan production

by ″gut feel″

Data Accuracy: The reorder level is 100% accurate as of the point in time

recorded and only at that point in time

Dimensions: Manufacturing, Product, and Time

SOURCE METADATA

Extract Method: The table is searched for orders recorded on the current

transaction date These orders are extracted

Extract Schedule: The extract is run daily after the close of the business day

Extract Statistics:

• Last Extract Date: N/A

• Number of Rows Extracted: N/A

EXTRACT METADATA

Extract Schedule: The extract is run daily after the close of the business day

and prior to the Order and Inventory Extract

Extract Method: The transaction log is searched for changes to the Product,

Product Model, Product Component, and Componenttables These changes are extracted

Extract Steps: See 7.5.4.3, “Getting from Source to Target” on page 74

Extract Statistics:

• Last Extract Date: N/A

• Number of Rows Extracted: N/A

Trang 9

Appendix B Special Notices

This publication is intended to guide data architects, database administrators,and developers in the design of data models for data warehouses and datamarts The information in this publication is not intended as the specifications ofany programming interfaces that are provided by any IBM products See thePUBLICATIONS section of the IBM Programming Announcement for moreinformation about what publications are considered to be product documentation

References in this publication to IBM products, programs or services do notimply that IBM intends to make these available in all countries in which IBMoperates Any reference to an IBM product, program, or service is not intended

to state or imply that only IBM′s product, program, or service may be used Anyfunctionally equivalent program that does not infringe any of IBM′s intellectualproperty rights may be used instead of the IBM product, program or service

Information in this book was developed in conjunction with use of the equipmentspecified, and is limited in application to those specific hardware and softwareproducts and levels

IBM may have patents or pending patent applications covering subject matter inthis document The furnishing of this document does not give you any license tothese patents You can send license inquiries, in writing, to the IBM Director ofLicensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA

Licensees of this program who wish to have information about it for the purpose

of enabling: (i) the exchange of information between independently createdprograms and other programs (including this one) and (ii) the mutual use of theinformation which has been exchanged, should contact IBM Corporation, Dept.600A, Mail Drop 1329, Somers, NY 10589 USA

Such information may be available, subject to appropriate terms and conditions,including in some cases, payment of a fee

The information contained in this document has not been submitted to anyformal IBM test and is distributed AS IS The information about non-IBM(″vendor″) products in this manual has been supplied by the vendor and IBMassumes no responsibility for its accuracy or completeness The use of thisinformation or the implementation of any of these techniques is a customerresponsibility and depends on the customer′s ability to evaluate and integratethem into the customer′s operational environment While each item may havebeen reviewed by IBM for accuracy in a specific situation, there is no guaranteethat the same or similar results will be obtained elsewhere Customers

attempting to adapt these techniques to their own environments do so at theirown risk

The following document contains examples of data and reports used in dailybusiness operations To illustrate them as completely as possible, the examplescontain the names of individuals, companies, brands, and products All of thesenames are fictitious and any similarity to the names and addresses used by anactual business enterprise is entirely coincidental

The following terms are trademarks of the International Business MachinesCorporation in the United States and/or other countries:

Trang 10

The following terms are trademarks of other companies:

C-bus is a trademark of Corollary, Inc

Java and HotJava are trademarks of Sun Microsystems, Incorporated

Microsoft, Windows, Windows NT, and the Windows 95 logo are trademarks

or registered trademarks of Microsoft Corporation

PC Direct is a trademark of Ziff Communications Company and is used

by IBM Corporation under license

Pentium, MMX, ProShare, LANDesk, and ActionMedia are trademarks orregistered trademarks of Intel Corporation in the U.S and other

Trang 11

Appendix C Related Publications

The publications listed in this section are considered particularly suitable formore discussion on topics covered in this redbook

C.1 International Technical Support Organization Publications

For information on ordering these ITSO publications see “How to Get ITSORedbooks” on page 189

• Information Warehouse in the Retail Industry, GG24-4342

• Information Warehouse in the Finance Industry, GG24-4340

• Information Warehouse in the Insurance Industry, GG24-4341

• Data Warehouse Solutions on the AS/400, SG24-4872

• Data Where You Need It, The DPROPR Way, GG24-4492

C.2 Redbooks on CD-ROMs

Redbooks are also available on CD-ROMs Order a subscription and receive

updates 2-4 times a year at significant savings

Number

Collection Kit Number

Networking and Systems Management Redbooks Collection SBOF-7370 SK2T-6022 Transaction Processing and Data Management Redbook SBOF-7240 SK2T-8038

RS/6000 Redbooks Collection (HTML, BkMgr) SBOF-7230 SK2T-8040 RS/6000 Redbooks Collection (PostScript) SBOF-7205 SK2T-8041 Application Development Redbooks Collection SBOF-7290 SK2T-8037 Personal Systems Redbooks Collection SBOF-7250 SK2T-8042

C.3 Other Publications

The following publications are also relevant as information sources:

C.3.1 Books

Adriaans, P., and D Zaantinge Data Mining Addison-Wesley, 1996

Aiken, P Data Reverse Engineering: Slaying the Legacy Dragon McGraw-Hill, 1995

Barquin, R., and H Edelstein (eds.) Building, Using, and Managing theWarehouse Prentice Hall, 1997

Bischoff, J., and T Alexander (eds.) Data Warehouse: Practical Advice fromthe Experts Prentice Hall, 1997

Brackett, M H Data Sharing John Wiley & Sons, 1994

Brodie, M L., and M Stonebreaker Migrating Legacy Systems: Gateways,Interfaces, and the Incremental Approach Morgan Kaufmann, 1995

Corey, M., and M Abbey Oracle Data Warehousing McGraw-Hill, 1996

Trang 12

Devlin, B Data Warehousing: From Architecture to Implementation.

_ Information Systems Architecture Prentice Hall, 1992

_ Using DB2 to Build Decision Support Systems Wiley-Qed,1990

Inmon, W H., and R D Hackathorn Using the Data Warehouse Wiley-Qed,1994

Inmon, W H., Imhoff, C., and G Battas Building the Operational Data Store.John Wiley & Sons, 1996

Inmon, W H., Welch, J D., and K Glassey Managing the Data Warehouse.John Wiley & Sons, 1996

Kelly, B W AS/400 Data Warehousing: The Complete Implementation Guide.CBM Books, 1996

Kelly, S Data Warehousing: The Route to Mass Customization John Wiley

Data Warehousing and Decision Support: The State of the Art Spiral Books,1995

C.3.2 Journal Articles, Technical Reports, and Miscellaneous Sources

Agrawal, R., et al., “Modeling Multidimensional Databases,” IBM ResearchReport, Almaden Research Center

Appleton, E L., “Use Your Data Warehouse to Compete,” Datamation, May1996

Codd, E F., Codd, S B., and C T Salley, “Providing OLAP to User-Analysts,”

E F Codd Associates, 1993

Darling, C B., “How to Integrate Your Data Warehouse,” Datamation, May

Trang 13

Erickson, C G., “Multidimensionalism and the Data Warehouse,” The DataWarehouse Conference, February 1995.

Foley, J., and B DePompa, “Data Marts: Low Cost, High Appeal,” InformationWeek, March 1996

Gordon, K I., “Data Warehouse Implementation,” 1996

, “Data Warehouse Implementation Plan,” 1996

, “The Why of Data Standards - Do You Really Know YourData?,” 1996

Graham, S., Coburn, D., and Carsten Olesen, “The Foundations of Wisdom: AStudy of the Financial Impact of Data Warehousing,” IDC Special EditionWhite Paper, 1996

Inmon, W H., “Creating the Data Warehouse Data Model from the CorporateData Model,” PRISM Tech Topics, Vol 1, No 2

_, “Data Relationships in the Data Warehouse,” PRISM TechTopics, Vol 1, No 5

_, “Information Management: Charting the Course,” Data

Management Review, May 1996

_, “Loading Data into the Warehouse,” PRISM Tech Topics, Vol

Lambert, B., “Data Modeling for Data Warehouse Development,” Data

Management Review, February 1996

Raden, N., “Maximizing Your Data Warehouse,” parts 1 and 2, InformationWeek, March 1996

Snoddgrass, R., “Temporal Databases: Status and Research Directions,”SIGMOD Record, Vol 19, No 4, December 1990

Teale, P., “Data Warehouse Environment: End-to-End Blueprint,” presentationmaterial, IBM UK Ltd 1996

_, “Data Warehouse Environment: System Architecture,”

presentation material, IBM UK Ltd., 1996

Appendix C Related Publications 187

Ngày đăng: 14/08/2014, 06:22

TỪ KHÓA LIÊN QUAN