Oracle® Database Data Warehousing Guide potx

■ Oracle OLAP Option Data Warehousing FeaturesThe OLAP Option of the Oracle Database has been enhanced with several features designed to make OLAP cubes attractive alternatives to tables

Trang 2

Oracle Database Data Warehousing Guide, 11g Release 1 (11.1)

B28313-02

Primary Author: Paul Lane

Contributing Author: Viv Schupmann and Ingrid Stuart (Change Data Capture)

Contributor: Patrick Amor, Hermann Baer, Mark Bauer, Subhransu Basu, Srikanth Bellamkonda, Randy Bello, Paula Bingham, Tolga Bozkaya, Lucy Burgess, Donna Carver, Rushan Chen, Benoit Dageville, John Haydu, Lilian Hobbs, Hakan Jakobsson, George Lumpkin, Alex Melidis, Valarie Moore, Cetin Ozbutun, Ananth Raghavan, Jack Raitto, Ray Roccaforte, Sankar Subramanian, Gregory Smith, Margaret Taft, Murali Thiyagarajan, Ashish Thusoo, Thomas Tong, Mark Van de Wiel, Jean-Francois Verrier, Gary Vincent, Andreas Walter, Andy Witkowski, Min Xiao, Tsae-Feng Yu

The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected

by copyright, patent, and other intellectual and industrial property laws Reverse engineering, disassembly,

or decompilation of the Programs, except to the extent required to obtain interoperability with other independently created software or as specified by law, is prohibited.

The information contained in this document is subject to change without notice If you find any problems in the documentation, please report them to us in writing This document is not warranted to be error-free Except as may be expressly permitted in your license agreement for these Programs, no part of these Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose.

If the Programs are delivered to the United States Government or anyone licensing or using the Programs

on behalf of the United States Government, the following notice is applicable:

U.S GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental

regulations As such, use, duplication, disclosure, modification, and adaptation of the Programs, including documentation and technical data, shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement, and, to the extent applicable, the additional rights set forth in FAR 52.227-19, Commercial Computer Software—Restricted Rights (June 1987) Oracle USA, Inc., 500 Oracle Parkway, Redwood City, CA 94065.

The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications It shall be the licensee's responsibility to take all appropriate fail-safe, backup, redundancy and other measures to ensure the safe use of such applications if the Programs are used for such purposes, and we disclaim liability for any damages caused by such use of the Programs

Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of Oracle Corporation and/or its affiliates Other names may be trademarks of their respective owners.

The Programs may provide links to Web sites and access to content, products, and services from third parties Oracle is not responsible for the availability of, or any content provided on, third-party Web sites You bear all risks associated with the use of such content If you choose to purchase any products or services from a third party, the relationship is directly between you and the third party Oracle is not responsible for: (a) the quality of third-party products or services; or (b) fulfilling any of the terms of the agreement with the third party, including delivery of products or services and warranty obligations related to purchased products or services Oracle is not responsible for any loss or damage of any sort that you may incur from dealing with any third party

Trang 3

Contents

Preface xxi

Audience xxi

Documentation Accessibility xxi

Related Documents xxii

Conventions xxii

What's New in Oracle Database? xxiii

Oracle Database 11g Release 1 (11.1) New Features in Data Warehousing xxiii

Oracle Database 10g Release 2 (10.2) New Features in Data Warehousing xxv

Part I Concepts

1 Data Warehousing Concepts

What is a Data Warehouse? 1-1 Subject Oriented 1-2 Integrated 1-2 Nonvolatile 1-2 Time Variant 1-2 Contrasting OLTP and Data Warehousing Environments 1-2

Data Warehouse Architectures 1-3

Data Warehouse Architecture: Basic 1-4 Data Warehouse Architecture: with a Staging Area 1-4 Data Warehouse Architecture: with a Staging Area and Data Marts 1-5

Extracting Information from a Data Warehouse 1-6

Data Mining 1-6 Oracle Data Mining Functionality 1-6 Oracle Data Mining Interfaces 1-7

Part II Logical Design

2 Logical Design in Data Warehouses

Logical Versus Physical Design in Data Warehouses 2-1 Creating a Logical Design 2-2 Data Warehousing Schemas 2-2

Trang 4

Star Schemas 2-3Other Data Warehousing Schemas 2-3

Data Warehousing Objects 2-3Data Warehousing Objects: Fact Tables 2-4Requirements of Fact Tables 2-4Data Warehousing Objects: Dimension Tables 2-4Hierarchies 2-4Typical Dimension Hierarchy 2-5Data Warehousing Objects: Unique Identifiers 2-5Data Warehousing Objects: Relationships 2-5Example of Data Warehousing Objects and Their Relationships 2-5

Part III Physical Design

3 Physical Design in Data Warehouses

Moving from Logical to Physical Design 3-1

Physical Design 3-1Physical Design Structures 3-2Tablespaces 3-2Tables and Partitioned Tables 3-3Table Compression 3-3Views 3-3Integrity Constraints 3-4Indexes and Partitioned Indexes 3-4Materialized Views 3-4Dimensions 3-4

4 Hardware and I/O Considerations in Data Warehouses

Overview of Hardware and I/O Considerations in Data Warehouses 4-1

Configure I/O for Bandwidth not Capacity 4-1Stripe Far and Wide 4-2Use Redundancy 4-2Test the I/O System Before Building the Database 4-2Plan for Growth 4-3

Trang 5

Typical Data Warehouse Integrity Constraints 7-2

UNIQUE Constraints in a Data Warehouse 7-2FOREIGN KEY Constraints in a Data Warehouse 7-3RELY Constraints 7-4NOT NULL Constraints 7-4Integrity Constraints and Parallelism 7-5Integrity Constraints and Partitioning 7-5View Constraints 7-5

8 Basic Materialized Views

Overview of Data Warehousing with Materialized Views 8-1Materialized Views for Data Warehouses 8-2Materialized Views for Distributed Computing 8-2Materialized Views for Mobile Computing 8-2The Need for Materialized Views 8-2Components of Summary Management 8-3Data Warehousing Terminology 8-5Materialized View Schema Design 8-5Schemas and Dimension Tables 8-6Materialized View Schema Design Guidelines 8-6Loading Data into Data Warehouses 8-7Overview of Materialized View Management Tasks 8-8

Types of Materialized Views 8-8

Materialized Views with Aggregates 8-9Requirements for Using Materialized Views with Aggregates 8-10Materialized Views Containing Only Joins 8-11Materialized Join Views FROM Clause Considerations 8-11Nested Materialized Views 8-12Why Use Nested Materialized Views? 8-12Nesting Materialized Views with Joins and Aggregates 8-13Nested Materialized View Usage Guidelines 8-13Restrictions When Using Nested Materialized Views 8-14

Creating Materialized Views 8-14Creating Materialized Views with Column Alias Lists 8-15Naming Materialized Views 8-16Storage And Table Compression 8-16Build Methods 8-16

Trang 6

Enabling Query Rewrite 8-17Query Rewrite Restrictions 8-17Materialized View Restrictions 8-17General Query Rewrite Restrictions 8-17Refresh Options 8-18General Restrictions on Fast Refresh 8-19Restrictions on Fast Refresh on Materialized Views with Joins Only 8-20Restrictions on Fast Refresh on Materialized Views with Aggregates 8-20Restrictions on Fast Refresh on Materialized Views with UNION ALL 8-21Achieving Refresh Goals 8-22Refreshing Nested Materialized Views 8-22ORDER BY Clause 8-23Materialized View Logs 8-23Using the FORCE Option with Materialized View Logs 8-24Using Oracle Enterprise Manager 8-24Using Materialized Views with NLS Parameters 8-24Adding Comments to Materialized Views 8-24

Registering Existing Materialized Views 8-25

Choosing Indexes for Materialized Views 8-26

Dropping Materialized Views 8-27

Analyzing Materialized View Capabilities 8-27

Using the DBMS_MVIEW.EXPLAIN_MVIEW Procedure 8-27DBMS_MVIEW.EXPLAIN_MVIEW Declarations 8-28Using MV_CAPABILITIES_TABLE 8-28MV_CAPABILITIES_TABLE.CAPABILITY_NAME Details 8-30MV_CAPABILITIES_TABLE Column Details 8-31

9 Advanced Materialized Views

Partitioning and Materialized Views 9-1Partition Change Tracking 9-1Partition Key 9-2Join Dependent Expression 9-3Partition Marker 9-4Partial Rewrite 9-5Partitioning a Materialized View 9-5Partitioning a Prebuilt Table 9-5Benefits of Partitioning a Materialized View 9-6Rolling Materialized Views 9-6

Materialized Views in Analytic Processing Environments 9-7

Cubes 9-7Benefits of Partitioning Materialized Views 9-8Compressing Materialized Views 9-8Materialized Views with Set Operators 9-8Examples of Materialized Views Using UNION ALL 9-8

Materialized Views and Models 9-9 Invalidating Materialized Views 9-10 Security Issues with Materialized Views 9-11

Trang 7

Querying Materialized Views with Virtual Private Database (VPD) 9-11Using Query Rewrite with Virtual Private Database 9-11Restrictions with Materialized Views and Virtual Private Database 9-12

Altering Materialized Views 9-12

10 Dimensions

What are Dimensions? 10-1 Creating Dimensions 10-3Dropping and Creating Attributes with Columns 10-6Multiple Hierarchies 10-7Using Normalized Dimension Tables 10-8

Viewing Dimensions 10-8Using Oracle Enterprise Manager 10-8Using the DESCRIBE_DIMENSION Procedure 10-9

Using Dimensions with Constraints 10-9

Validating Dimensions 10-10 Altering Dimensions 10-10

Deleting Dimensions 10-11

Part IV Managing the Data Warehouse Environment

11 Overview of Extraction, Transformation, and Loading

Overview of ETL in Data Warehouses 11-1ETL Basics in Data Warehousing 11-1Extraction of Data 11-1Transportation of Data 11-2

ETL Tools for Data Warehouses 11-2

Daily Operations in Data Warehouses 11-2Evolution of the Data Warehouse 11-2

12 Extraction in Data Warehouses

Overview of Extraction in Data Warehouses 12-1

Introduction to Extraction Methods in Data Warehouses 12-2Logical Extraction Methods 12-2Full Extraction 12-2Incremental Extraction 12-2Physical Extraction Methods 12-2Online Extraction 12-3Offline Extraction 12-3Change Data Capture 12-3Timestamps 12-4Partitioning 12-4Triggers 12-4

Data Warehousing Extraction Examples 12-5Extraction Using Data Files 12-5Extracting into Flat Files Using SQL*Plus 12-5

Trang 8

Extracting into Flat Files Using OCI or Pro*C Programs 12-7Exporting into Export Files Using the Export Utility 12-7Extracting into Export Files Using External Tables 12-7Extraction Through Distributed Operations 12-8

13 Transportation in Data Warehouses

Overview of Transportation in Data Warehouses 13-1 Introduction to Transportation Mechanisms in Data Warehouses 13-1

Transportation Using Flat Files 13-1Transportation Through Distributed Operations 13-2Transportation Using Transportable Tablespaces 13-2Transportable Tablespaces Example 13-2Other Uses of Transportable Tablespaces 13-4

14 Loading and Transformation

Overview of Loading and Transformation in Data Warehouses 14-1

Transformation Flow 14-1Multistage Data Transformation 14-1Pipelined Data Transformation 14-2

Loading Mechanisms 14-3

Loading a Data Warehouse with SQL*Loader 14-3Loading a Data Warehouse with External Tables 14-4Loading a Data Warehouse with OCI and Direct-Path APIs 14-5Loading a Data Warehouse with Export/Import 14-5

Transformation Mechanisms 14-5

Transforming Data Using SQL 14-5CREATE TABLE AS SELECT And INSERT /*+APPEND*/ AS SELECT 14-6Transforming Data Using UPDATE 14-6Transforming Data Using MERGE 14-6Transforming Data Using Multitable INSERT 14-7Transforming Data Using PL/SQL 14-9Transforming Data Using Table Functions 14-9What is a Table Function? 14-9

Error Logging and Handling Mechanisms 14-15

Business Rule Violations 14-16Data Rule Violations (Data Errors) 14-16Handling Data Errors in PL/SQL 14-16Handling Data Errors with an Error Logging Table 14-17

Loading and Transformation Scenarios 14-18Key Lookup Scenario 14-18Business Rule Violation Scenario 14-19Data Error Scenarios 14-20Pivoting Scenarios 14-22

15 Maintaining the Data Warehouse

Using Partitioning to Improve Data Warehouse Refresh 15-1

Trang 9

Refresh Scenarios 15-4Scenarios for Using Partitioning for Refreshing Data Warehouses 15-5Refresh Scenario 1 15-5Refresh Scenario 2 15-5

Optimizing DML Operations During Refresh 15-6

Implementing an Efficient MERGE Operation 15-6Maintaining Referential Integrity 15-9Purging Data 15-9

Refreshing Materialized Views 15-10Complete Refresh 15-11Fast Refresh 15-11Partition Change Tracking (PCT) Refresh 15-11

ON COMMIT Refresh 15-12Manual Refresh Using the DBMS_MVIEW Package 15-12Refresh Specific Materialized Views with REFRESH 15-12Refresh All Materialized Views with REFRESH_ALL_MVIEWS 15-13Refresh Dependent Materialized Views with REFRESH_DEPENDENT 15-14Using Job Queues for Refresh 15-15When Fast Refresh is Possible 15-15Recommended Initialization Parameters for Parallelism 15-15Monitoring a Refresh 15-16Checking the Status of a Materialized View 15-16Viewing Partition Freshness 15-16Scheduling Refresh 15-18Tips for Refreshing Materialized Views with Aggregates 15-19Tips for Refreshing Materialized Views Without Aggregates 15-21Tips for Refreshing Nested Materialized Views 15-22Tips for Fast Refresh with UNION ALL 15-22Tips After Refreshing Materialized Views 15-23

Using Materialized Views with Partitioned Tables 15-23

Fast Refresh with Partition Change Tracking 15-23PCT Fast Refresh Scenario 1 15-23PCT Fast Refresh Scenario 2 15-25PCT Fast Refresh Scenario 3 15-25Fast Refresh with CONSIDER FRESH 15-26

16 Change Data Capture

Overview of Change Data Capture 16-1

Capturing Change Data Without Change Data Capture 16-1Capturing Change Data with Change Data Capture 16-3Publish and Subscribe Model 16-4Publisher 16-4Subscribers 16-6

Change Sources and Modes of Change Data Capture 16-8Synchronous Change Data Capture 16-8Asynchronous Change Data Capture 16-9Asynchronous HotLog Mode 16-9

Trang 10

Asynchronous Distributed HotLog Mode 16-10Asynchronous AutoLog Mode 16-11

Change Sets 16-13Valid Combinations of Change Sources and Change Sets 16-14

Change Tables 16-14

Getting Information About the Change Data Capture Environment 16-15

Preparing to Publish Change Data 16-16Creating a User to Serve As a Publisher 16-17Granting Privileges and Roles to the Publisher 16-17Creating a Default Tablespace for the Publisher 16-17Password Files and Setting the REMOTE_LOGIN_PASSWORDFILE Parameter 16-18Determining the Mode in Which to Capture Data 16-18Setting Initialization Parameters for Change Data Capture Publishing 16-19Initialization Parameters for Synchronous Publishing 16-19Initialization Parameters for Asynchronous HotLog Publishing 16-19Initialization Parameters for Asynchronous Distributed HotLog Publishing 16-20Initialization Parameters for Asynchronous AutoLog Publishing 16-22Adjusting Initialization Parameter Values When Oracle Streams Values Change 16-25Tracking Changes to the CDC Environment 16-25

Publishing Change Data 16-25Performing Synchronous Publishing 16-25Performing Asynchronous HotLog Publishing 16-28Performing Asynchronous Distributed HotLog Publishing 16-31Performing Asynchronous AutoLog Publishing 16-37

Subscribing to Change Data 16-43

Managing Published Data 16-47

Managing Asynchronous Change Sources 16-47Enabling And Disabling Asynchronous Distributed HotLog Change Sources 16-47Managing Asynchronous Change Sets 16-48Creating Asynchronous Change Sets with Starting and Ending Dates 16-48Enabling and Disabling Asynchronous Change Sets 16-48Stopping Capture on DDL for Asynchronous Change Sets 16-49Recovering from Errors Returned on Asynchronous Change Sets 16-50Managing Synchronous Change Sets 16-52Enabling and Disabling Synchronous Change Sets 16-53Managing Change Tables 16-53Creating Change Tables 16-53Understanding Change Table Control Columns 16-54Understanding TARGET_COLMAP$ and SOURCE_COLMAP$ Values 16-56Using Change Markers 16-58Controlling Subscriber Access to Change Tables 16-59Purging Change Tables of Unneeded Data 16-60Dropping Change Tables 16-61Exporting and Importing Change Data Capture Objects Using Oracle Data Pump 16-62Restrictions on Using Oracle Data Pump with Change Data Capture 16-62Examples of Oracle Data Pump Export and Import Commands 16-63Publisher Considerations for Exporting and Importing Change Tables 16-63

Trang 11

Considerations for Asynchronous Change Data Capture 16-66Asynchronous Change Data Capture and Redo Log Files 16-67Asynchronous Change Data Capture and Supplemental Logging 16-69Asynchronous Change Data Capture and Oracle Streams Components 16-69Datatypes and Table Structures Supported for Asynchronous Change Data Capture 16-70Restrictions for NOLOGGING and UNRECOVERABLE Operations 16-71

Implementation and System Configuration 16-71Database Configuration Assistant Considerations 16-71Summary of Supported Distributed HotLog Configurations and Restrictions 16-72Oracle Database Releases for Source and Staging Databases 16-72Upgrading a Distributed HotLog Change Source to Oracle Release 11.1 16-72Hardware Platforms and Operating Systems 16-72Requirements for Multiple Publishers on the Staging Database 16-73Requirements for Database Links 16-73

Part V Data Warehouse Performance

17 Basic Query Rewrite

Overview of Query Rewrite 17-1When Does Oracle Rewrite a Query? 17-2

Ensuring that Query Rewrite Takes Effect 17-2Initialization Parameters for Query Rewrite 17-3Controlling Query Rewrite 17-3Accuracy of Query Rewrite 17-3Privileges for Enabling Query Rewrite 17-4Sample Schema and Materialized Views 17-5How to Verify Query Rewrite Occurred 17-6

Example of Query Rewrite 17-6

18 Advanced Query Rewrite

How Oracle Rewrites Queries 18-1Cost-Based Optimization 18-1General Query Rewrite Methods 18-3When are Constraints and Dimensions Needed? 18-3Checks Made by Query Rewrite 18-3Join Compatibility Check 18-3Data Sufficiency Check 18-8Grouping Compatibility Check 18-8Aggregate Computability Check 18-8Rewrite Using Dimensions 18-8

Trang 12

Other Query Rewrite Considerations 18-37Query Rewrite Using Nested Materialized Views 18-37Query Rewrite in the Presence of Inline Views 18-38Query Rewrite Using Remote Tables 18-39Query Rewrite in the Presence of Duplicate Tables 18-39Query Rewrite Using Date Folding 18-41Query Rewrite Using View Constraints 18-42View Constraints Restrictions 18-44Query Rewrite Using Set Operator Materialized Views 18-44UNION ALL Marker 18-46Query Rewrite in the Presence of Grouping Sets 18-47Query Rewrite When Using GROUP BY Extensions 18-47Hint for Queries with Extended GROUP BY 18-50Query Rewrite in the Presence of Window Functions 18-50Query Rewrite and Expression Matching 18-51Query Rewrite Using Partially Stale Materialized Views 18-51Cursor Sharing and Bind Variables 18-54Handling Expressions in Query Rewrite 18-55

Advanced Query Rewrite Using Equivalences 18-55

Verifying that Query Rewrite has Occurred 18-58Using EXPLAIN PLAN with Query Rewrite 18-58Using the EXPLAIN_REWRITE Procedure with Query Rewrite 18-59DBMS_MVIEW.EXPLAIN_REWRITE Syntax 18-59Using REWRITE_TABLE 18-60Using a Varray 18-61EXPLAIN_REWRITE Benefit Statistics 18-63

Trang 13

Support for Query Text Larger than 32KB in EXPLAIN_REWRITE 18-63EXPLAIN_REWRITE and Multiple Materialized Views 18-63EXPLAIN_REWRITE Output 18-64

Design Considerations for Improving Query Rewrite Capabilities 18-65

Query Rewrite Considerations: Constraints 18-65Query Rewrite Considerations: Dimensions 18-65Query Rewrite Considerations: Outer Joins 18-66Query Rewrite Considerations: Text Match 18-66Query Rewrite Considerations: Aggregates 18-66Query Rewrite Considerations: Grouping Conditions 18-66Query Rewrite Considerations: Expression Matching 18-66Query Rewrite Considerations: Date Folding 18-67Query Rewrite Considerations: Statistics 18-67Query Rewrite Considerations: Hints 18-67REWRITE and NOREWRITE Hints 18-67REWRITE_OR_ERROR Hint 18-68Multiple Materialized View Rewrite Hints 18-68EXPAND_GSET_TO_UNION Hint 18-68

19 Schema Modeling Techniques

Schemas in Data Warehouses 19-1 Third Normal Form 19-1

Optimizing Third Normal Form Queries 19-2

Star Schemas 19-2

Snowflake Schemas 19-3

Optimizing Star Queries 19-4

Tuning Star Queries 19-4Using Star Transformation 19-4Star Transformation with a Bitmap Index 19-5Execution Plan for a Star Transformation with a Bitmap Index 19-6Star Transformation with a Bitmap Join Index 19-7Execution Plan for a Star Transformation with a Bitmap Join Index 19-7How Oracle Chooses to Use Star Transformation 19-8Star Transformation Restrictions 19-8

20 SQL for Aggregation in Data Warehouses

Overview of SQL for Aggregation in Data Warehouses 20-1

Analyzing Across Multiple Dimensions 20-2Optimized Performance 20-3

An Aggregate Scenario 20-4Interpreting NULLs in Examples 20-4

ROLLUP Extension to GROUP BY 20-5When to Use ROLLUP 20-5ROLLUP Syntax 20-5Partial Rollup 20-6

CUBE Extension to GROUP BY 20-7

Trang 14

When to Use CUBE 20-7CUBE Syntax 20-8Partial CUBE 20-8Calculating Subtotals Without CUBE 20-9

GROUPING Functions 20-10

GROUPING Function 20-10When to Use GROUPING 20-11GROUPING_ID Function 20-12GROUP_ID Function 20-13

GROUPING SETS Expression 20-13

GROUPING SETS Syntax 20-14

Composite Columns 20-15

Concatenated Groupings 20-17Concatenated Groupings and Hierarchical Data Cubes 20-18

Considerations when Using Aggregation 20-20Hierarchy Handling in ROLLUP and CUBE 20-20Column Capacity in ROLLUP and CUBE 20-21HAVING Clause Used with GROUP BY Extensions 20-21ORDER BY Clause Used with GROUP BY Extensions 20-21Using Other Aggregate Functions with ROLLUP and CUBE 20-21

Computation Using the WITH Clause 20-21

Working with Hierarchical Cubes in SQL 20-22

Specifying Hierarchical Cubes in SQL 20-22Querying Hierarchical Cubes in SQL 20-22SQL for Creating Materialized Views to Store Hierarchical Cubes 20-24Examples of Hierarchical Cube Materialized Views 20-24

21 SQL for Analysis and Reporting

Overview of SQL for Analysis and Reporting 21-1

Ranking Functions 21-4RANK and DENSE_RANK Functions 21-4Ranking Order 21-5Ranking on Multiple Expressions 21-5RANK and DENSE_RANK Difference 21-6Per Group Ranking 21-6Per Cube and Rollup Group Ranking 21-7Treatment of NULLs 21-7Bottom N Ranking 21-9CUME_DIST Function 21-9PERCENT_RANK Function 21-9NTILE Function 21-10ROW_NUMBER Function 21-11

Windowing Aggregate Functions 21-11

Treatment of NULLs as Input to Window Functions 21-12Windowing Functions with Logical Offset 21-12Centered Aggregate Function 21-14Windowing Aggregate Functions in the Presence of Duplicates 21-14

Trang 15

Varying Window Size for Each Row 21-15Windowing Aggregate Functions with Physical Offsets 21-16FIRST_VALUE and LAST_VALUE Functions 21-16

Reporting Aggregate Functions 21-17

RATIO_TO_REPORT Function 21-18

LAG/LEAD Functions 21-19LAG/LEAD Syntax 21-19

FIRST/LAST Functions 21-19FIRST/LAST Syntax 21-20FIRST/LAST As Regular Aggregates 21-20FIRST/LAST As Reporting Aggregates 21-20

Inverse Percentile Functions 21-21Normal Aggregate Syntax 21-21Inverse Percentile Example Basis 21-21

As Reporting Aggregates 21-23Inverse Percentile Restrictions 21-24

Hypothetical Rank and Distribution Functions 21-24

Hypothetical Rank and Distribution Syntax 21-24

Linear Regression Functions 21-25

REGR_COUNT Function 21-26REGR_AVGY and REGR_AVGX Functions 21-26REGR_SLOPE and REGR_INTERCEPT Functions 21-26REGR_R2 Function 21-26REGR_SXX, REGR_SYY, and REGR_SXY Functions 21-26Linear Regression Statistics Examples 21-26Sample Linear Regression Calculation 21-27

Pivoting Operations 21-27

Example: Pivoting 21-28Pivoting on Multiple Columns 21-28Pivoting: Multiple Aggregates 21-29Distinguishing PIVOT-Generated Nulls from Nulls in Source Data 21-29Unpivoting Operations 21-30Wildcard and Subquery Pivoting with XML Operations 21-31

Other Analytic Functionality 21-31Linear Algebra 21-32Frequent Itemsets 21-33Descriptive Statistics 21-34Hypothesis Testing - Parametric Tests 21-34Crosstab Statistics 21-34Hypothesis Testing - Non-Parametric Tests 21-35Non-Parametric Correlation 21-35

WIDTH_BUCKET Function 21-35WIDTH_BUCKET Syntax 21-36

User-Defined Aggregate Functions 21-37

CASE Expressions 21-38Creating Histograms With User-Defined Buckets 21-39

Data Densification for Reporting 21-40

Trang 16

Partition Join Syntax 21-40Sample of Sparse Data 21-41Filling Gaps in Data 21-41Filling Gaps in Two Dimensions 21-42Filling Gaps in an Inventory Table 21-44Computing Data Values to Fill Gaps 21-45

Time Series Calculations on Densified Data 21-46Period-to-Period Comparison for One Time Level: Example 21-47Period-to-Period Comparison for Multiple Time Levels: Example 21-49Creating a Custom Member in a Dimension: Example 21-53

22 SQL for Modeling

Overview of SQL Modeling 22-1How Data is Processed in a SQL Model 22-3Why Use SQL Modeling? 22-3SQL Modeling Capabilities 22-4

Basic Topics in SQL Modeling 22-7

Base Schema 22-7MODEL Clause Syntax 22-8Keywords in SQL Modeling 22-10Assigning Values and Null Handling 22-10Calculation Definition 22-10Cell Referencing 22-11Symbolic Dimension References 22-11Positional Dimension References 22-12Rules 22-12Single Cell References 22-12Multi-Cell References on the Right Side 22-12Multi-Cell References on the Left Side 22-13Use of the CV Function 22-13Use of the ANY Wildcard 22-14Nested Cell References 22-14Order of Evaluation of Rules 22-14Global and Local Keywords for Rules 22-15UPDATE, UPSERT, and UPSERT ALL Behavior 22-16UPDATE Behavior 22-16UPSERT Behavior 22-16UPSERT ALL Behavior 22-17Treatment of NULLs and Missing Cells 22-18Distinguishing Missing Cells from NULLs 22-19Use Defaults for Missing Cells and NULLs 22-20Using NULLs in a Cell Reference 22-20Reference Models 22-20

Advanced Topics in SQL Modeling 22-23

FOR Loops 22-23Evaluation of Formulas with FOR Loops 22-26Iterative Models 22-28

Trang 17

Rule Dependency in AUTOMATIC ORDER Models 22-29Ordered Rules 22-30Analytic Functions 22-31Unique Dimensions Versus Unique Single References 22-32Rules and Restrictions when Using SQL for Modeling 22-33

Performance Considerations with SQL Modeling 22-35Parallel Execution 22-35Aggregate Computation 22-36Using EXPLAIN PLAN to Understand Model Queries 22-37Using ORDERED FAST: Example 22-37Using ORDERED: Example 22-37Using ACYCLIC FAST: Example 22-38Using ACYCLIC: Example 22-38Using CYCLIC: Example 22-38

Examples of SQL Modeling 22-39

23 OLAP and Data Mining

OLAP and Data Mining Comparison 23-1

OLAP Overview 23-2

OLAP Technology in the Oracle Database 23-2Full Integration of Multidimensional Technology 23-2Ease of Application Development 23-2Ease of Administration 23-2Security 23-3Unmatched Performance and Scalability 23-3Reduced Costs 23-3Querying Dimensional Objects 23-4Tools for Creating and Managing Dimensional Objects 23-4

24 Advanced Business Intelligence Queries

Examples of Business Intelligence Queries 24-1

25 Using Parallel Execution

Introduction to Parallel Execution Tuning 25-1When to Implement Parallel Execution 25-2When Not to Implement Parallel Execution 25-2Operations That Can Be Parallelized 25-2

How Parallel Execution Works 25-3Degree of Parallelism 25-4The Parallel Execution Server Pool 25-4Variations in the Number of Parallel Execution Servers 25-5Processing Without Enough Parallel Execution Servers 25-5How Parallel Execution Servers Communicate 25-5Parallelizing SQL Statements 25-6Dividing Work Among Parallel Execution Servers 25-6Parallelism Between Operations 25-8

Trang 18

Producer/Consumer Operations 25-8Granules of Parallelism 25-9Block Range Granules 25-10Partition Granules 25-10

Types of Parallelism 25-10

Parallel Query 25-10Parallel Queries on Index-Organized Tables 25-11Nonpartitioned Index-Organized Tables 25-11Partitioned Index-Organized Tables 25-11Parallel Queries on Object Types 25-11Parallel DDL 25-12DDL Statements That Can Be Parallelized 25-12CREATE TABLE AS SELECT in Parallel 25-13Recoverability and Parallel DDL 25-13Space Management for Parallel DDL 25-14Storage Space When Using Dictionary-Managed Tablespaces 25-14Free Space and Parallel DDL 25-14Parallel DML 25-15Advantages of Parallel DML over Manual Parallelism 25-16When to Use Parallel DML 25-16Enabling Parallel DML 25-17Transaction Restrictions for Parallel DML 25-18Rollback Segments 25-18Recovery for Parallel DML 25-18Space Considerations for Parallel DML 25-19Locks for Parallel DML 25-19Restrictions on Parallel DML 25-19Data Integrity Restrictions 25-20Trigger Restrictions 25-21Distributed Transaction Restrictions 25-21Examples of Distributed Transaction Parallelization 25-21Parallel Execution of Functions 25-21Functions in Parallel Queries 25-22Functions in Parallel DML and DDL Statements 25-22Other Types of Parallelism 25-22

Initializing and Tuning Parameters for Parallel Execution 25-23

Using Default Parameter Settings 25-24Setting the Degree of Parallelism for Parallel Execution 25-24How Oracle Database Determines the Degree of Parallelism for Operations 25-25Hints and Degree of Parallelism 25-25Table and Index Definitions 25-26Default Degree of Parallelism 25-26Adaptive Multiuser Algorithm 25-26Minimum Number of Parallel Execution Servers 25-26Limiting the Number of Available Instances 25-27Balancing the Workload 25-27Parallelization Rules for SQL Statements 25-28

Trang 19

Rules for Parallelizing Queries 25-28Rules for UPDATE, MERGE, and DELETE 25-29Rules for INSERT SELECT 25-30Rules for DDL Statements 25-31Rules for [CREATE | REBUILD] INDEX or [MOVE | SPLIT] PARTITION 25-31Rules for CREATE TABLE AS SELECT 25-31Summary of Parallelization Rules 25-32Enabling Parallelism for Tables and Queries 25-33Degree of Parallelism and Adaptive Multiuser: How They Interact 25-34How the Adaptive Multiuser Algorithm Works 25-34Forcing Parallel Execution for a Session 25-34Controlling Performance with the Degree of Parallelism 25-35

Tuning General Parameters for Parallel Execution 25-35

Parameters Establishing Resource Limits for Parallel Operations 25-35PARALLEL_MAX_SERVERS 25-35Increasing the Number of Concurrent Users 25-36Limiting the Number of Resources for a User 25-36PARALLEL_MIN_SERVERS 25-37SHARED_POOL_SIZE 25-37Computing Additional Memory Requirements for Message Buffers 25-38Adjusting Memory After Processing Begins 25-39PARALLEL_MIN_PERCENT 25-41Parameters Affecting Resource Consumption 25-41PGA_AGGREGATE_TARGET 25-41PARALLEL_EXECUTION_MESSAGE_SIZE 25-42Parameters Affecting Resource Consumption for Parallel DML and Parallel DDL 25-42Parameters Related to I/O 25-44DB_CACHE_SIZE 25-44DB_BLOCK_SIZE 25-45DB_FILE_MULTIBLOCK_READ_COUNT 25-45DISK_ASYNCH_IO and TAPE_ASYNCH_IO 25-45

Monitoring and Diagnosing Parallel Execution Performance 25-45

Is There Regression? 25-46

Is There a Plan Change? 25-47

Is There a Parallel Plan? 25-47

Is There a Serial Plan? 25-47

Is There Parallel Execution? 25-47

Is the Workload Evenly Distributed? 25-48Monitoring Parallel Execution Performance with Dynamic Performance Views 25-48V$PX_BUFFER_ADVICE 25-48V$PX_SESSION 25-49V$PX_SESSTAT 25-49V$PX_PROCESS 25-49V$PX_PROCESS_SYSSTAT 25-49V$PQ_SESSTAT 25-49V$FILESTAT 25-49V$PARAMETER 25-50

Trang 20

V$PQ_TQSTAT 25-50V$SESSTAT and V$SYSSTAT 25-50Monitoring Session Statistics 25-51Monitoring System Statistics 25-52Monitoring Operating System Statistics 25-53

Affinity and Parallel Operations 25-53Affinity and Parallel Queries 25-53Affinity and Parallel DML 25-54

Miscellaneous Parallel Execution Tuning Tips 25-54Setting Buffer Cache Size for Parallel Operations 25-55Overriding the Default Degree of Parallelism 25-55Rewriting SQL Statements 25-55Creating and Populating Tables in Parallel 25-55Creating Temporary Tablespaces for Parallel Sort and Hash Join 25-56Size of Temporary Extents 25-57Executing Parallel SQL Statements 25-57Using EXPLAIN PLAN to Show Parallel Operations Plans 25-57Additional Considerations for Parallel DML 25-58PDML and Direct-Path Restrictions 25-58Limitation on the Degree of Parallelism 25-58Using Local and Global Striping 25-58Increasing INITRANS 25-59Limitation on Available Number of Transaction Free Lists for Segments 25-59Using Multiple Archivers 25-59Database Writer Process (DBWn) Workload 25-59[NO]LOGGING Clause 25-60Creating Indexes in Parallel 25-60Parallel DML Tips 25-61Parallel DML Tip 1: INSERT 25-61Parallel DML Tip 2: Direct-Path INSERT 25-62Parallel DML Tip 3: Parallelizing INSERT, MERGE, UPDATE, and DELETE 25-62Incremental Data Loading in Parallel 25-63Updating the Table in Parallel 25-64Inserting the New Rows into the Table in Parallel 25-64Merging in Parallel 25-64

Glossary

Index

Trang 21

Documentation Accessibility

Our goal is to make Oracle products, services, and supporting documentation accessible, with good usability, to the disabled community To that end, our documentation includes features that make information available to users of assistive technology This documentation is available in HTML format, and contains markup to facilitate access by the disabled community Accessibility standards will continue to evolve over time, and Oracle is actively engaged with other market-leading

technology vendors to address technical obstacles so that our documentation can be accessible to all of our customers For more information, visit the Oracle Accessibility Program Web site at

http://www.oracle.com/accessibility/

Accessibility of Code Examples in Documentation

Screen readers may not always correctly read the code examples in this document The conventions for writing code require that closing braces should appear on an

otherwise empty line; however, some screen readers may not always read a line of text that consists solely of a bracket or brace

Accessibility of Links to External Web Sites in Documentation

This documentation may contain links to Web sites of other companies or organizations that Oracle does not own or control Oracle neither evaluates nor makes any representations regarding the accessibility of these Web sites

Trang 22

TTY Access to Oracle Support Services

Oracle provides dedicated Text Telephone (TTY) access to Oracle Support Services within the United States of America 24 hours a day, 7 days a week For TTY support, call 800.446.2398 Outside the United States, call +1.407.458.2479

Related Documents

Many of the examples in this book use the sample schemas of the seed database, which

is installed by default when you install Oracle Refer to Oracle Database Sample Schemas

for information on how these schemas were created and how you can use them yourself

Note that this book is meant as a supplement to standard texts about data warehousing This book focuses on Oracle-specific material and does not reproduce in detail material of a general nature For additional information, see:

■ The Data Warehouse Toolkit by Ralph Kimball (John Wiley and Sons, 1996)

■ Building the Data Warehouse by William Inmon (John Wiley and Sons, 1996)

Conventions

The following text conventions are used in this document:

boldface Boldface type indicates graphical user interface elements associated

with an action, or terms defined in text or the glossary

italic Italic type indicates book titles, emphasis, or placeholder variables for

which you supply particular values

monospace Monospace type indicates commands within a paragraph, URLs, code

in examples, text that appears on the screen, or text that you enter

Trang 23

What's New in Oracle Database?

This section describes the new features of Oracle Database 11g Release 1 (11.1) and

provides pointers to additional information New features information from previous releases is also retained to help those users migrating to the current release

The following section describes new features in Oracle Database:

■ Oracle Database 11g Release 1 (11.1) New Features in Data Warehousing

■ Oracle Database 10g Release 2 (10.2) New Features in Data Warehousing

Oracle Database 11g Release 1 (11.1) New Features in Data

Warehousing

■ Pivot and Unpivot OperatorsThe PIVOT operator makes it easy to create aggregated cross-tabular output that condenses many rows into a compact result set useful for reports For instance, input data holding sales of one month in each row can be pivoted into output holding twelve months in each row, with each month in its own column By combining multiple input rows into each output row, PIVOT also enables inter-row comparison without a table self-join The UNPIVOT operator reshapes data into a format useful for further relational operations For example, if a source data set presents twelve months of sales values in each row, UNPIVOT can reshape each source row into twelve output rows, each holding one month of sales data The unpivoted results are in a more normalized relational form than the source data, and they can be manipulated with simpler and more efficient SQL

■ Partition AdvisorThe SQL Access Advisor has been enhanced to include partition advice It recommends the right strategy to partition tables, indexes, and materialized views

to get best performance from an application

■ Change Data Capture (CDC) Enhancements

See Also: Chapter 20, "SQL for Aggregation in Data Warehouses"

for more information

See Also: Chapter 5, "Partitioning in Data Warehouses" for more information

Trang 24

so the window can be moved forward and backward

■ Query Rewrite EnhancementsQuery rewrite has been enhanced to support queries containing inline views Prior

to this release, queries containing inline views could rewrite only if there was an exact text match with the inline views in the materialized views Because inline views no longer need to textually match between the query and the materialized view, a larger number of queries with inline views can be rewritten Another significant query rewrite improvement is the ability to rewrite queries that reference remote tables

■ Refresh EnhancementsRefresh has been enhanced to support automatic index creation for UNIONALLmaterialized views, the use of query rewrite during a materialized view's atomic refresh, and materialized view refresh with set operators Also, partition change tracking refresh of UNIONALL materialized views is now possible Finally, catalog views have been enhanced to contain information on the staleness of partitioned materialized views These improvements will lead to faster refresh performance

■ Resource ConsumptionAdministrators can now specify with a single parameter (MEMORY_TARGET) the total amount of memory (shared memory and SQL execution memory) that can be used by the Oracle Database, leaving to the server the responsibility to determine the optimal distribution of memory across the various memory components of the database instance

■ Oracle OLAP Option Data Warehousing FeaturesThe OLAP Option of the Oracle Database has been enhanced with several features designed to make OLAP cubes attractive alternatives to tables for managing and querying aggregate data in the data warehouse These include:

– Further integration of cubes into the SQL query engine Advancements include integration of cubes with the Oracle query optimizer and a cube row source These features dramatically increase the efficiency of SQL queries that select from OLAP cubes and dimensions by pushing joins directly into the Oracle Database's multidimensional engine, allowing efficient joins between

See Also: Chapter 16, "Change Data Capture" for more information

See Also: Chapter 17, "Basic Query Rewrite" for more information

See Also: Chapter 15, "Maintaining the Data Warehouse" for more information

See Also: Chapter 25, "Using Parallel Execution" for more information

Trang 25

tables and cubes and by improving overall row/second throughput when selecting from cubes

– Automatic query rewrite to cube organized materialized views

Cube-organized materialized views access data from OLAP cubes rather than tables Like table-based materialized views, application can write queries to detail tables or views and let the database automatically rewrite the query to pre-aggregated data in the cube

– Database-managed automatic refresh of cubes In this release, cubes can be refreshed using the DBMS_MVIEW.REFRESH program, just like table-based materialized views Cubes provide excellent support for FAST (incremental) refresh

– Cost-based aggregation In many situations, cubes are much more efficient at managing aggregate data as compared to tables Cost-based aggregation improves upon these advantages by improving the efficiency of

pre-aggregating and querying aggregate data, and by simplifying the process

of managing aggregate data

Database administrators who support dimensionally modeled data sets (for example, star/snowflake schema) for query by business intelligence tools and applications should consider using OLAP cubes as a summary management solution because they may offer significant performance advantages

Oracle Database 10g Release 2 (10.2) New Features in Data

Warehousing

■ SQL Model CalculationsThe MODEL clause enables you to specify complex formulas while avoiding multiple joins and UNION clauses This clause supports analytical queries such as share of ancestor and prior period comparisons, as well as calculations typically done in large spreadsheets The MODEL clause provides building blocks for budgeting, forecasting, and statistical applications

■ Materialized View Refresh EnhancementsMaterialized view fast refresh involving multiple tables, whether partitioned or non-partitioned, no longer requires that a materialized view log be present

■ Query Rewrite EnhancementsQuery rewrite performance has been improved because query rewrite is now able

to use multiple materialized views to rewrite a query

■ Partitioning EnhancementsYou can now use partitioning with index-organized tables Also, materialized views in OLAP are able to use partitioning You can now use hash-partitioned global indexes

See Also: Chapter 22, "SQL for Modeling"

See Also: Chapter 15, "Maintaining the Data Warehouse"

See Also: Chapter 17, "Basic Query Rewrite"

Trang 26

■ Change Data CaptureOracle now supports asynchronous change data capture as well as synchronous change data capture

■ ETL EnhancementsOracle's extraction, transformation, and loading capabilities have been improved with several MERGE improvements and better external table capabilities

See Also: Chapter 5, "Partitioning in Data Warehouses"

See Also: Chapter 16, "Change Data Capture"

See Also: Chapter 11, "Overview of Extraction, Transformation, and Loading"

Trang 27

Part I Concepts

This section introduces basic data warehousing concepts

It contains the following chapter:

■ Chapter 1, "Data Warehousing Concepts"

Trang 29

Data Warehousing Concepts 1-1

1 Data Warehousing Concepts

This chapter provides an overview of the Oracle data warehousing implementation It includes:

■ What is a Data Warehouse?

■ Data Warehouse Architectures

■ Extracting Information from a Data Warehouse

Note that this book is meant as a supplement to standard texts about data warehousing This book focuses on Oracle-specific material and does not reproduce in detail material of a general nature Two standard texts are:

■ The Data Warehouse Toolkit by Ralph Kimball (John Wiley and Sons, 1996)

■ Building the Data Warehouse by William Inmon (John Wiley and Sons, 1996)

What is a Data Warehouse?

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing It usually contains historical data derived from transaction data, but can include data from other sources Data warehouses separate analysis workload from transaction workload and enable an organization to

consolidate data from several sources This helps in:

■ Maintaining historical records

■ Analyzing the data to gain a better understanding of the business and to improve the business

In addition to a relational database, a data warehouse environment can include an extraction, transportation, transformation, and loading (ETL) solution, statistical analysis, reporting, data mining capabilities, client analysis tools, and other applications that manage the process of gathering data, transforming it into useful, actionable information, and delivering it to business users

A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by William Inmon:

Trang 30

What is a Data Warehouse?

1-2 Oracle Database Data Warehousing Guide

■ Time Variant

Subject Oriented

Data warehouses are designed to help you analyze data For example, to learn more about your company's sales data, you can build a data warehouse that concentrates on sales Using this data warehouse, you can answer questions such as "Who was our best customer for this item last year?" or "Who is likely to be our best customer next year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented

Integrated

Integration is closely related to subject orientation Data warehouses must put data from disparate sources into a consistent format They must resolve such problems as naming conflicts and inconsistencies among units of measure When they achieve this, they are said to be integrated

Nonvolatile

Nonvolatile means that, once entered into the data warehouse, data should not change This is logical because the purpose of a data warehouse is to enable you to analyze what has occurred

Time Variant

A data warehouse's focus on change over time is what is meant by the term time variant In order to discover trends and identify hidden patterns and relationships in business, analysts need large amounts of data This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive

Contrasting OLTP and Data Warehousing Environments

Figure 1–1 illustrates key differences between an OLTP system and a data warehouse

Figure 1–1 Contrasting OLTP and Data Warehousing Environments

Few

Rare

Normalized DBMS Many

Indexes

Derived Data and Aggregates

Duplicated Data Joins

Many

Complex data structures (3NF databases)

Multidimensional data structures

OLTP Data Warehouse

Common

Denormalized DBMS Some

Trang 31

Data Warehouse Architectures

One major difference between the types of system is that data warehouses are not usually in third normal form (3NF), a type of data normalization common in OLTP environments

Data warehouses and OLTP systems have very different requirements Here are some examples of differences between typical data warehouses and OLTP systems:

■ Workload

Data warehouses are designed to accommodate ad hoc queries and data analysis

You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query and analytical operations

OLTP systems support only predefined operations Your applications might be specifically tuned or designed to support only these operations

In OLTP systems, end users routinely issue individual data modification statements to the database The OLTP database is always up to date, and reflects the current state of each business transaction

■ Schema designData warehouses often use denormalized or partially denormalized schemas (such

as a star schema) to optimize query and analytical performance

OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to guarantee data consistency

■ Typical operations

A typical data warehouse query scans thousands or millions of rows For example,

"Find the total sales for all customers last month."

A typical OLTP operation accesses only a handful of records For example,

"Retrieve the current order for this customer."

■ Historical dataData warehouses usually store many months or years of data This is to support historical analysis and reporting

OLTP systems usually store data from only a few weeks or months The OLTP system stores only historical data as needed to successfully meet the requirements

of the current transaction

Data Warehouse Architectures

Data warehouses and their architectures vary depending upon the specifics of an organization's situation Three common architectures are:

■ Data Warehouse Architecture: Basic

■ Data Warehouse Architecture: with a Staging Area

Trang 32

■ Data Warehouse Architecture: with a Staging Area and Data Marts

Data Warehouse Architecture: Basic

Figure 1–2 shows a simple architecture for a data warehouse End users directly access data derived from several source systems through the data warehouse

Figure 1–2 Architecture of a Data Warehouse

In Figure 1–2, the metadata and raw data of a traditional OLTP system is present, as is

an additional type of data, summary data Summaries are very valuable in data warehouses because they pre-compute long operations in advance For example, a typical data warehouse query is to retrieve something such as August sales A summary in an Oracle database is called a materialized view

Data Warehouse Architecture: with a Staging Area

You need to clean and process your operational data before putting it into the warehouse, as shown in Figure 1–2 You can do this programmatically, although most data warehouses use a staging area instead A staging area simplifies building summaries and general warehouse management Figure 1–3 illustrates this typical architecture

Warehouse Data Sources

Summary Data Raw Data

Data for Mining

Metadata

Operational System

Trang 33

Figure 1–3 Architecture of a Data Warehouse with a Staging Area

Data Warehouse Architecture: with a Staging Area and Data Marts

Although the architecture in Figure 1–3 is quite common, you may want to customize your warehouse's architecture for different groups within your organization You can

do this by adding data marts, which are systems designed for a particular line of

business Figure 1–4 illustrates an example where purchasing, sales, and inventories are separated In this example, a financial analyst might want to analyze historical data for purchases and sales or mine historical data to make predictions about customer behavior

Figure 1–4 Architecture of a Data Warehouse with a Staging Area and Data Marts

Note: Data marts are an important part of many data warehouses, but they are not the focus of this book

Operational System

Data Sources

Staging Area Warehouse Users

Operational System

Data Sources

Staging Area Warehouse

Data Marts Users

Operational System

Flat Files

Sales Purchasing

Trang 34

Extracting Information from a Data Warehouse

Extracting Information from a Data Warehouse

You can extract information from the masses of data stored in a data warehouse by analyzing the data The Oracle Database provides several ways to analyze data:

■ A wide array of statistical functions, including descriptive statistics, hypothesis testing, correlations analysis, test for distribution fit, cross tabs with Chi-square statistics, and analysis of variance (ANOVA); these functions are described in the

Oracle Database SQL Language Reference.

■ Predict those customers likely to change service providers

■ Discover the factors involved with a disease

■ Identify fraudulent behavior

Data mining is not restricted to solving business problems For example, data mining can be used in the life sciences to discover gene and protein targets and to identify leads for new drugs

Oracle Data Mining performs data mining in the Oracle Database Oracle Data Mining does not require data movement between the database and an external mining server, thereby eliminating redundancy, improving efficient data storage and processing, ensuring that up-to-date data is used, and maintaining data security

For detailed information about Oracle Data Mining, see Oracle Data Mining Concepts.

Oracle Data Mining Functionality

Oracle Data Mining supports the major data mining functions There is at least one algorithm for each data mining function

Oracle Data Mining supports the following data mining functions:

■ Classification: Grouping items into discrete classes and predicting which class an item belongs to; classification algorithms are Decision Tree, Naive Bayes,

Generalized Linear Models (Binary Logistic Regression), and Support Vector Machines

■ Regression: Approximating and predicting continuous numerical values; the algorithms for regression are Support Vector Machines and Generalized Linear Models (Multivariate Linear Regression)

■ Anomaly Detection: Detecting anomalous cases, such as fraud and intrusions; the algorithm for anomaly detection is one-class Support Vector Machines

■ Attribute Importance: Identifying the attributes that have the strongest relationships with the target attribute (for example, customers likely to churn); the algorithm for attribute importance is Minimum Descriptor Length

■ Clustering: Finding natural groupings in the data that are often used for

identifying customer segments; the algorithms for clustering are k-Means and

O-Cluster

Trang 35

■ Associations: Analyzing "market baskets", items that are likely to be purchased together; the algorithm for associations is a priori

■ Feature Extraction: Creating new attributes (features) as a combination of the original attributes; the algorithm for feature extraction is Non-Negative Matrix Factorization

In addition to mining structured data, ODM permits mining of text data (such as police reports, customer comments, or physician's notes) or spatial data

Oracle Data Mining Interfaces

Oracle Data Mining APIs provide extensive support for building applications that automate the extraction and dissemination of data mining insights

Data mining activities such as model building, testing, and scoring are accomplished through a PL/SQL API, a Java API, and SQL Data Mining functions The Java API is compliant with the data mining standard JSR 73 The Java API and the PL/SQL API are fully interoperable

Oracle Data Mining allows the creation of a supermodel, that is, a model that contains the instructions for its own data preparation The embedded data preparation can be implemented automatically and/or manually Embedded Data Preparation supports user-specified data transformations; Automatic Data Preparation supports

algorithm-required data preparation, such as binning, normalization, and outlier treatment

SQL Data Mining functions support the scoring of classification, regression, clustering, and feature extraction models Within the context of standard SQL statements,

pre-created models can be applied to new data and the results returned for further processing, just like any other SQL query

Predictive Analytics automates the process of data mining Without user intervention, Predictive Analytics routines manage data preparation, algorithm selection, model building, and model scoring so that the user can benefit from data mining without having to be a data mining expert

ODM programmatic interfaces include

■ Data mining functions in Oracle SQL for high performance scoring of data

■ DBMS_DATA_MINING PL/SQL packages for model creation, description, analysis, and deployment

■ DBMS_DATA_MINING_TRANSFORM PL/SQL package for transformations required for data mining

■ Java interface based on the Java Data Mining standard for model creation,

description, analysis, and deployment

■ DBMS_PREDICTIVE_ANALYTICS PL/SQL package supports the following procedures:

■ EXPLAIN - Ranks attributes in order of influence in explaining a target column

■ PREDICT - Predicts the value of a target column

■ PROFILE - Creates segments and rules that identify the records that have the same target value

Trang 36

Trang 37

Part II Logical Design

This section deals with the issues in logical design in a data warehouse

It contains the following chapter:

■ Chapter 2, "Logical Design in Data Warehouses"

Trang 39

Logical Design in Data Warehouses 2-1

2 Logical Design in Data Warehouses

This chapter explains how to create a logical design for a data warehousing environment and includes the following topics:

■ Logical Versus Physical Design in Data Warehouses

■ Creating a Logical Design

■ Data Warehousing Schemas

■ Data Warehousing Objects

Logical Versus Physical Design in Data Warehouses

Your organization has decided to build a data warehouse You have defined the business requirements and agreed upon the scope of your application, and created a conceptual design Now you need to translate your requirements into a system deliverable To do so, you create the logical and physical design for the data warehouse You then define:

■ The specific data content

■ Relationships within and between groups of data

■ The system environment supporting your data warehouse

■ The data transformations required

■ The frequency with which data is refreshedThe logical design is more conceptual and abstract than the physical design In the logical design, you look at the logical relationships among the objects In the physical design, you look at the most effective way of storing and retrieving the objects as well

as handling them from a transportation and backup/recovery perspective

Orient your design toward the needs of the end users End users typically want to perform analysis and look at aggregated data, rather than at individual transactions However, end users might not know what they need until they see it In addition, a well-planned design allows for growth and changes as the needs of users change and evolve

By beginning with the logical design, you focus on the information requirements and save the implementation details for later

Trang 40

Creating a Logical Design

Creating a Logical Design

A logical design is conceptual and abstract You do not deal with the physical implementation details yet You deal only with defining the types of information that you need

One technique you can use to model your organization's logical information requirements is entity-relationship modeling Entity-relationship modeling involves identifying the things of importance (entities), the properties of these things

(attributes), and how they are related to one another (relationships)

The process of logical design involves arranging data into a series of logical relationships called entities and attributes An entity represents a chunk of information In relational databases, an entity often maps to a table An attribute is a component of an entity that helps define the uniqueness of the entity In relational databases, an attribute maps to a column

To be sure that your data is consistent, you need to use unique identifiers A unique identifier is something you add to tables so that you can differentiate between the same item when it appears in different places In a physical design, this is usually a primary key

While entity-relationship diagramming has traditionally been associated with highly normalized models such as OLTP applications, the technique is still useful for data warehouse design in the form of dimensional modeling In dimensional modeling, instead of seeking to discover atomic units of information (such as entities and attributes) and all of the relationships between them, you identify which information belongs to a central fact table and which information belongs to its associated dimension tables You identify business subjects or fields of data, define relationships between business subjects, and name the attributes for each subject

Your logical design should result in (1) a set of entities and attributes corresponding to fact tables and dimension tables and (2) a model of operational data from your source into subject-oriented information in your target data warehouse schema

You can create the logical design using a pen and paper, or you can use a design tool such as Oracle Warehouse Builder (specifically designed to support modeling the ETL process)

Data Warehousing Schemas

A schema is a collection of database objects, including tables, views, indexes, and synonyms You can arrange schema objects in the schema models designed for data warehousing in a variety of ways Most data warehouses use a dimensional model.The model of your source data and the requirements of your users help you design the data warehouse schema You can sometimes get the source model from your

company's enterprise data model and reverse-engineer the logical data model for the data warehouse from this The physical implementation of the logical data warehouse model may require some changes to adapt it to your system parameters—size of computer, number of users, storage capacity, type of network, and software

See Also: Chapter 10, "Dimensions" for further information regarding dimensions

See Also: Oracle Warehouse Builder documentation set

Tiêu đề	Oracle® Database Data Warehousing Guide 11g Release 1
Tác giả	Paul Lane
Trường học	Oracle Corporation
Chuyên ngành	Database and Data Warehousing
Thể loại	Guide
Năm xuất bản	2007
Thành phố	Redwood City

Định dạng
Số trang	584
Dung lượng	7,6 MB