1. Trang chủ
  2. » Công Nghệ Thông Tin

Mastering Data Warehouse DesignRelational and Dimensional Techniques phần 1 pptx

46 357 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Mastering Data Warehouse Design Relational and Dimensional Techniques
Tác giả Claudia Imhoff, Nicholas Galemmo, Jonathan G. Geiger
Người hướng dẫn Julius Archibald at the State University of New York at Plattsburgh
Trường học Wiley Publishing, Inc.
Chuyên ngành Data Warehouse Design
Thể loại sách (book)
Năm xuất bản 2003
Thành phố Indianapolis
Định dạng
Số trang 46
Dung lượng 1,21 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Overview of Business Intelligence 3 Role and Purpose of the Data Warehouse 10The Corporate Information Factory 11 The Data Warehouse Data Model 22Nonredundant 22Stable 23Consistent 23Fle

Trang 3

Claudia Imhoff Nicholas Galemmo Jonathan G Geiger

Mastering Data Warehouse Design Relational and Dimensional

Techniques

Trang 4

Vice President and Executive Publisher: Robert Ipsen

Publisher: Joe Wikert

Executive Editor: Robert M Elliott

Developmental Editor: Emilie Herman

Editorial Manager: Kathryn Malm

Managing Editor: Pamela M Hanley

Text Design & Composition: Wiley Composition Services

This book is printed on acid-free paper ∞

Copyright © 2003 by Claudia Imhoff, Nicholas Galemmo, and Jonathan G Geiger All rights reserved.

Published by Wiley Publishing, Inc., Indianapolis, Indiana

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted

in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rose- wood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8700 Requests to the Pub- lisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc.,

10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail: permcoordinator@wiley.com.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect

to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may

be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with

a professional where appropriate Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, inci- dental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Trademarks:Wiley, the Wiley Publishing logo and related trade dress are trademarks or registered trademarks of Wiley Publishing, Inc., in the United States and other countries, and may not be used without written permission All other trademarks are the property of their respective owners Wiley Publishing, Inc., is not associated with any product or ven- dor mentioned in this book.

Wiley also publishes its books in a variety of electronic formats Some content that appears

in print may not be available in electronic books.

ISBN: 0-471-32421-3

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

Trang 5

Claudia: For all their patience and understanding throughout the years, this

book is dedicated to David and Jessica Imhoff.

Nick: To my wife Sarah, and children Amanda and Nick Galemmo, for their understanding over the many weekends I spent working on this book Also to

my college professor, Julius Archibald at the State University of New York at Plattsburgh for instilling in me the science and art of computing.

Jonathan: To my wife, Alma Joy, for her patience and understanding of the time spent writing this book, and to my children, Avi and Shana, who are embarking

on their respective careers and of whom I am extremely proud.

Trang 7

Overview of Business Intelligence 3

Role and Purpose of the Data Warehouse 10The Corporate Information Factory 11

The Data Warehouse Data Model 22Nonredundant 22Stable 23Consistent 23Flexible in Terms of the Ultimate Data Usage 24The Codd and Date Premise 24Impact on Data Mart Creation 25Summary 26

Trang 8

Chapter 2 Fundamental Relational Concepts 29

Why Do You Need a Data Model? 29Relational Data-Modeling Objects 30Subject 31Entity 31

Relationships 34

Subject Area Model Benefits 38

Business Data Model Benefits 39

Relational Data-Modeling Guidelines 45Guidelines and Best Practices 45Normalization 48Normalization of the Relational Data Model 48

Other Normalization Levels 52Summary 52

Considerations for Specific Industries 65Retail Industry Considerations 65Manufacturing Industry Considerations 66Utility Industry Considerations 66Property and Casualty Insurance Industry Considerations 66Petroleum Industry Considerations 67Health Industry Considerations 67Subject Area Model Development Process 67Closed Room Development 68Development through Interviews 70Development through Facilitated Sessions 72Subject Area Model Benefits 78Subject Area Model for Zenith Automobile Company 79

C o n t e n t s

vi

Trang 9

Business Data Model 82Business Data Development Process 82Identify Relevant Subject Areas 83Identify Major Entities and Establish Identifiers 85Define Relationships 90

Confirm Model Structure 93Confirm Model Content 94Summary 95

Methodology 98Step 1: Select the Data of Interest 99Inputs 99

Step 2: Add Time to the Key 111Capturing Historical Data 115Capturing Historical Relationships 117Dimensional Model Considerations 118Step 3: Add Derived Data 119Step 4: Determine Granularity Level 121Step 5: Summarize Data 124Summaries for Period of Time Data 125Summaries for Snapshot Data 126

Step 6: Merge Entities 129Step 7: Create Arrays 131Step 8: Segregate Data 132Summary 133

Inconsistent Business Definition of Customer 136Inconsistent System Definition of Customer 138Inconsistent Customer Identifier among Systems 140Inclusion of External Data 140Data at a Customer Level 140Data Grouped by Customer Characteristics 140Customers Uniquely Identified Based on Role 141Customer Hierarchy Not Depicted 142Data Warehouse System Model 144Inconsistent Business Definition of Customer 144Inconsistent System Definition of Customer 144

Trang 10

Inconsistent Customer Identifier among Systems 145Absorption of External Data 145Customers Uniquely Identified Based on Role 145Customer Hierarchy Not Depicted 146Data Warehouse Technology Model 146Key from the System of Record 147Key from a Recognized Standard 149

Dimensional Data Mart Implications 151Differences in a Dimensional Model 152Maintaining Dimensional Conformance 153Summary 155

The Fiscal Calendar 159The 4-5-4 Fiscal Calendar 161Thirteen-Month Fiscal Calendar 164Other Fiscal Calendars 164The Billing Cycle Calendar 164The Factory Calendar 164

Time and the Data Warehouse 169

C o n t e n t s

viii

Trang 11

Case Study: A Multilingual Calendar 184Analysis 185Storing Multiple Languages 185Handling Different Date Presentation Formats 185Database Localization 187Query Tool Localization 187Delivery Localization 187Delivering Multiple Languages 188Monolingual Reporting 188Creating a Multilingual Data Mart 190Case Study: Multiple Fiscal Calendars 190Analysis 191Expanding the Calendar 192Case Study: Seasonal Calendars 193Analysis 193Seasonal Calendar Structures 194Delivering Seasonal Data 194Summary 195

Updating the Bridge 221

Trang 12

The Customer Hierarchy 222The Recursive Hierarchy Tree 223Using Recursive Trees in the Data Mart 226Maintaining History 228Case Study: Retail Purchasing 231Analysis 232Implementing the Business Model 234The Buyer Hierarchy 234Implementing Buyer Responsibility 236Delivering the Buyer Responsibility Relationship 238Case Study: The Combination Pack 241Analysis 241Adding a Bill of Materials 244

Making a Recursive Tree 245Flattening a Recursive Tree 246Summary 248

Business Use of the Data Warehouse 251Average Lines per Transaction 252Business Rules Concerning Changes 253

Method 1—Using Foreign Keys 269Method 2—Using Associative Entities 272Technique 3: Change Snapshot with Delta Capture 275

C o n t e n t s

x

Trang 13

Case Study: Transaction Interface 278Modeling the Transactions 279Processing the Transactions 281Simultaneous Delivery 281

Summary 283

Optimizing the Development Process 285Optimizing Design and Analysis 286Optimizing Application Development 286Selecting an ETL Tool 286

Reasons for Partitioning 290Indexing Partitioned Tables 296Enforcing Referential Integrity 299Index-Organized Tables 301

Conclusion 309Optimizing the System Model 310Vertical Partitioning 310Vertical Partitioning for Performance 311Vertical Partitioning of Change History 312Vertical Partitioning of Large Columns 314Denormalization 315

Summary 317

The Changing Data Warehouse 321

Modeling for Business Change 326Assuming the Worst Case 326Imposing Relationship Generalization 327Using Surrogate Keys 330

Trang 14

Implementing Business Change 332Integrating Subject Areas 333Standardizing Attributes 333Inferring Roles and Integrating Entities 335Adding Subject Areas 336Summary 337

Governing Models and Their Evolution 339

Technology Data Model 344Synchronization Implications 344

Subject Area and Business Data Models 346Color-Coding 348

Including the Subject Area within the Entity Name 349Business and System Data Models 351System and Technology Data Models 353Managing Multiple Modelers 355Roles and Responsibilities 355

Business Data Model 356System and Technology Data Model 356Collision Management 357

Modifications 357Comparison 358Incorporation 358Summary 358

Criteria for Being in-Architecture 366Migrating from Data Mart Chaos 367Conform the Dimensions 368Create the Data Warehouse Data Model 371Create the Data Warehouse 373Convert by Subject Area 373Convert One Data Mart at a Time 374

C o n t e n t s

xii

Trang 15

Build New Data Marts Only “In-Architecture”—

Leave Old Marts Alone 377Build the Architecture from One Data Mart 378Choosing the Right Migration Path 380Summary 381

Chapter 13 Comparison of Data Warehouse Methodologies 383

The Multidimensional Architecture 383The Corporate Information Factory Architecture 387Comparison of the CIF and MD Architectures 389Scope 389Perspective 391

Volatility 392Flexibility 394Complexity 394Functionality 395

Trang 17

A C K N O W L E D G M E N T S

We gratefully acknowledge the following individuals who directly or indirectly

contributed to this book:

Greg Backhus – Helzberg Diamonds

William Baker – Microsoft Corporation

John Crawford – Merrill Lynch

David Gleason – Intelligent Solutions, Inc

William H Inmon – Inmon Associates, Inc

Dr Ralph S Kimball- Kimball Associates

Lisa Loftis – Intelligent Solutions, Inc

Bob Lokken – ProClarity Corporation

Anthony Marino – L’Oreal Corporation

Joyce Norris-Montanari – Intelligent Solutions, Inc

Laura Reeves – StarSoft, Inc

Ron Powell – DM Review Magazine

Kim Stannick – Teradata Corporation

Barbara von Halle – Knowledge Partners, Inc

John Zachman – Zachman International, Inc

We would also like to thank our editors, Bob Elliott, Pamela Hanley, andEmilie Herman, whose tireless prodding and assistance kept us honest and onschedule

Trang 19

Claudia Imhoff, Ph.D. is the president and founder of Intelligent Solutions

(www.IntelSols.com), a leading consultancy on CRM (Customer RelationshipManagement) and business intelligence technologies and strategies She is apopular speaker and internationally recognized expert and serves as an advi-sor to many corporations, universities, and leading technology companies onthese topics She has coauthored five books and over 50 articles on these top-ics She can be reached at CImhoff@IntelSols.com

Nicholas Galemmowas an information architect at Nestlé USA Nicholas has 27

years’ experience as a practitioner and consultant involved in all aspects ofapplication systems design and development within the manufacturing, dis-tribution, education, military, health care, and financial industries He hasbeen actively involved in large-scale data warehousing and systems integra-tion projects for the past 11 years He has built numerous data warehouses,using both dimensional and relational architectures He has published manyarticles and has presented at national conferences This is his first book

Mr Galemmo is now an independent consultant and can be reached atngalemmo@yahoo.com

Jonathan G Geiger is executive vice president at Intelligent Solutions, Inc

Jonathan has been involved in many Corporate Information Factory and tomer relationship management projects within the utility, telecommunica-tions, manufacturing, education, chemical, financial, and retail industries Inhis 30 years as a practitioner and consultant, Jonathan has managed or per-formed work in virtually every aspect of information management He hasauthored or coauthored over 30 articles and two other books, presents fre-quently at national and international conferences, and teaches several publicseminars Mr Geiger can be reached at JGeiger@IntelSols.com

Trang 21

We have found that an understanding of why a particular approach is being

pro-moted helps us recognize its value and apply it Therefore, we start this sectionwith an introduction to the Corporate Information Factory (CIF) This provenand stable architecture includes two formal data stores for business intelli-gence, each with a specific role in the BI environment

The first data store is the data warehouse The major role of the data house is to serve as a data repository that stores data from disparate sources,making it accessible to another set of data stores – the data marts As the col-lection point, the most effective design approach for the data warehouse isbased on an entity-relationship data model and the normalization techniquesdeveloped by Codd and Date in their seminal work throughout the 1970’s, 80’sand 90’s for relational databases

ware-The major role of the data mart is to provide the business users with easyaccess to quality, integrated information There are several types of data marts,and these are also described in Chapter 1 The most popular data mart is built

to support online analytical processing, and the most effective designapproach for it is the dimensional data model

Continuing with the conceptual theme, we explain the importance of tional modeling techniques, introduce the different types of models that areneeded, and provide a process for building a relational data model in Chap-ter 2 We also explain the relationship between the various data models used

rela-in constructrela-ing a solid foundation for any enterprise—the busrela-iness, system,and technology data models—and how they share or inherit characteristicsfrom each other

ONE

Trang 23

Introduction 1

Welcome to the first book that thoroughly describes the data modeling

tech-niques used in constructing a multipurpose, stable, and sustainable data house used to support business intelligence (BI) This chapter introduces thedata warehouse by describing the objectives of BI and the data warehouse and

ware-by explaining how these fit into the overall Corporate Information Factory(CIF) architecture It discusses the iterative nature of the data warehouse con-struction and demonstrates the importance of the data warehouse data modeland the justification for the type of data model format suggested in this book

We discuss why the format of the model should be based on relational designtechniques, illustrating the need to maximize nonredundancy, stability, andmaintainability Another section of the chapter outlines the characteristics of amaintainable data warehouse environment The chapter ends with a discus-sion of the impact of this modeling approach on the ultimate delivery of thedata marts This chapter sets up the reader to understand the rationale behindthe ensuing chapters, which describe in detail how to create the data ware-house data model

Overview of Business Intelligence

BI, in the context of the data warehouse, is the ability of an enterprise to studypast behaviors and actions in order to understand where the organization has

3

Ngày đăng: 08/08/2014, 22:20

TỪ KHÓA LIÊN QUAN