Best practices for a Data Warehouse on Oracle Database 11g An Oracle White Paper September 2008... Best Practices for a Data Warehouse on Oracle Database 11g Page 3 Best Practices for
Trang 1Best practices for a Data Warehouse on Oracle Database 11g
An Oracle White Paper September 2008
Trang 2NOTE:
The following is intended to outline our general product direction It is intended for information purposes only, and may not be incorporated into any contract It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions The development, release, and timing
of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle
Trang 3Best Practices for a Data Warehouse on Oracle Database 11g
Page 3
Best Practices for a Data Warehouse on Oracle Database 11g
Note: 2
Executive Summary 4
Introduction 4
Balanced Configuration 5
Interconnect 6
Disk Layout 7
Logical Model 9
Physical Model 10
Staging layer 10
Efficient Data Loading 11
Foundation layer - Third Normal Form 14
Optimizing 3NF 15
Access layer - Star Schema 19
Optimizing Star Queries 20
System Management 22
Workload Management 22
Workload Monitoring 26
Resource Manager 31
Optimizer Statistics Management 32
Initialization Parameter 34
Memory allocation 34
Controlling Parallel Execution 36
Enabling efficient IO throughput 36
Star Query 37
Conclusion 37
Trang 4Best practices for a Data Warehouse on Oracle Database 11g
EXECUTIVE SUMMARY
Increasingly companies are recognizing the value of an enterprise data warehouse (EDW) A true EDW provides a single 360-degree view of the business and a powerful platform for a wide spectrum of business intelligence tasks ranging from predictive analysis to near real-time strategic and tactical decision support
throughout the organization In order to ensuring the EDW will get the optimal performance and will scale as your data set grows you need to get three
fundamental things correct, the hardware configuration, the data model and the data loading process By designing these three corner stones correctly you can seamlessly scale out your EDW without having to constantly tune or tweak the system
INTRODUCTION
Today’s information architecture is much more dynamic than it was just a few years ago Businesses now demand more information sooner and they are delivering analytics from their EDW to an every-widening set of users and applications than ever before In order to keep up with this increase in demand the EDW must now
be near real-time and be highly available How do you know if your data warehouse
is getting the best possible performance? Or whether you've made the right
decisions to keep your multi-TB system highly available?
Based on over a decade of successful customer data warehouse implementations this white paper provides a set of best practices and “how-to” examples for
deploying a data warehouse on Oracle Database 11g and leveraging it’s best-of-breed functionality The paper is divided into four sections:
The first section deals with the key aspects of configuring your hardware
platform of choice to ensure optimal performance
The second briefly describes the two fundamental logical models used for
database warehouses
Trang 5Best Practices for a Data Warehouse on Oracle Database 11g
Page 5
This paper is by no means a complete guide for Data Warehousing with Oracle You should refer to the Oracle Database’s documentation, especially the Oracle Data Warehouse Guide and the VLDB and Partitioning Guide, for complete details on all of Oracle’s warehousing features
BALANCED CONFIGURATION
Regardless of the design or implementation of a data warehouse the initial key to good performance lies in the hardware configuration used This has never been more evident than with the recent increase in the number of Data Warehouse appliances in the market Many data warehouse operations are based upon large tables scans and other IO-intensive operations, which perform vast quantities of random IOs In order to achieve optimal performance the hardware configuration must be sized end to end to sustain this level of throughput This type of hardware configuration is called a balanced system In a balanced system all components - from the CPU to the disks - are orchestrated to work together to guarantee the maximum possible IO throughput
But how do you go about sizing such a system? You must first understand how much throughput capacity is required for your system and how much throughput each individual CPU or core in your configuration can drive Both pieces of information can be determined from an existing system However, if no
environment specific values are available, a value of approximately 200MB/sec IO throughput per core is a good planning number for designing a balanced system All subsequent critical components on the IO path - the Host Bus Adapters, fiber channel connections, the switch, the controller, and the disks – have to be sized appropriately
Disk Array 1 Array 2Disk Array 3Disk Array 4Disk Array 5Disk Array 6Disk Array 7Disk Array 8Disk
FC-Switch1 FC-Switch2
Figure 1 A balance system - 4-node RAC environment
Figure 1 shows a conceptual diagram of a 4-node RAC system Four servers (each with one dual core CPU) are equipped with two host bus adapters (HBAs) The
Trang 6db_file_multiblock_read_count SQL parallel execution is generally used for queries that will access a lot of data, for example when doing a full table scan Since parallel execution will by-pass the buffer cache and access data directly from disk you want each I/O to be as efficient as possible, and using large I/Os is a way
to reduce latency
Set db_file_multiblock_read_count to 1024/db_block_size E.g for 8K block size, use db_file_multiblock_read_count=128
I/Os This is the default value for the majority of platforms
Star Query
Star_transformation_enabled controls whether or not the optimizer will use a cost-based transformation on queries in a star schema By default this parameter is set too false If you have a star schema and you have created a bitmap index on the foreign key columns of the fact table you should set this parameter to true
CONCLUSION
In order to guarantee you will get the optimal performance from your data warehouse and to ensure it will scale as the data set increases you need to get three fundamental things correct:
• The hardware configuration It must be balanced and must achieve the necessary IO throughput required to meet the systems peak load.,
• The data model If it is a 3NF it should always achieve partition-wise joins
or if it’s a Star Schema it should use star transformation,
• The data loading process It should be as fast as possible and have zero impact on the business user
By designing these three corner stones correctly you can seamlessly scale out your EDW without having to constantly tune or tweak the system
•
•Use Parallel Execution where appropriate
•
•Take hourly AWR or statspack report
•
•Use EM to do real-time system monitoring
•
•Use Resource Manager to ensure necessary
users get high priority on the system
•
•Always have accurate Optimizer statistics
•
•Use INCREMENTAL statistic maintenance or
copy_stats to keep large partitioned fact -
table up to date in a timely manner
•
•Set only the initialization parameters that
you need to
Trang 7Data Warehouse Best Practices for Oracle Database 11g
September 2008
Author: Maria Colgan
Contributing Authors: Doug Cackett, George Spears, and Andrew Bond
Oracle Corporation
World Headquarters
500 Oracle Parkway
Redwood Shores, CA 94065
U.S.A
Worldwide Inquiries:
Phone: +1.650.506.7000
Fax: +1.650.506.7200
oracle.com
Copyright © 2008, Oracle All rights reserved
This document is provided for information purposes only and the
contents hereof are subject to change without notice
This document is not warranted to be error-free, nor subject to any
other warranties or conditions, whether expressed orally or implied
in law, including implied warranties and conditions of merchantability
or fitness for a particular purpose We specifically disclaim any
liability with respect to this document and no contractual obligations
are formed either directly or indirectly by this document This document
may not be reproduced or transmitted in any form or by any means,
electronic or mechanical, for any purpose, without our prior written permission
Oracle is a registered trademark of Oracle Corporation and/or its affiliates
Other names may be trademarks of their respective owners