OLAP Applications ◆ Although OLAP applications are found in widely divergent functional areas, they all have the following key features: – multi-dimensional views of data – support for
Trang 1Chapter 33
OLAP Transparencies
Trang 2◆ The key features of OLAP applications.
◆ The potential benefits associated with successful OLAP applications.
Trang 3Chapter 33 - Objectives
◆ How to represent multi-dimensional data.
◆ The rules for OLAP tools.
◆ The main categories of OLAP tools.
◆ OLAP extensions to the SQL standard.
◆ How Oracle supports OLAP.
Trang 4Business Intelligence Technologies
◆ Accompanying the growth in data warehousing
is an ever-increasing demand by users for more powerful access tools that provide advanced
analytical capabilities
◆ There are two main types of access tools
available to meet this demand, namely Online
Trang 5Business Intelligence Technologies
◆ OLAP and Data Mining differ in what they offer the user and because of this they are
Trang 6Online Analytical Processing (OLAP)
◆ The dynamic synthesis, analysis, and
consolidation of large volumes of dimensional data, Codd (1993).
multi-◆ Describes a technology that uses a
multi-dimensional view of aggregate data to provide quick access to strategic information for the
Trang 7Online Analytical Processing (OLAP)
◆ Enables users to gain a deeper understanding
and knowledge about various aspects of their corporate data through fast, consistent,
interactive access to a wide variety of possible views of the data
◆ Allows users to view corporate data in such a
way that it is a better model of the true dimensionality of the enterprise.
Trang 8Online Analytical Processing (OLAP)
◆ Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘what if?’ and ‘why?’ type questions distinguishes OLAP from general- purpose query tools
◆ Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations,
Trang 9OLAP Benchmarks
◆ OLAP Council published an analytical
processing benchmark referred to as the APB-1 (OLAP Council, 1998)
◆ Aim is to measure a server’s overall OLAP
performance rather than the performance of individual tasks
Trang 10OLAP Benchmarks
◆ APB-1 assesses the most common business
operations including:
– bulk loading of data from internal or
external data sources
– incremental loading of data from
operational systems;
– aggregation of input level data along
hierarchies;
Trang 11OLAP Benchmarks
◆ APB-1 assesses the most common business
operations including (continued):
– calculation of new data based on business
models;
– time series analysis;
– queries with a high degree of complexity;
– drill-down through hierarchies;
– ad hoc queries;
– multiple online sessions
Trang 12OLAP Benchmarks
◆ OLAP applications are judged on their ability to provide just-in-time (JIT) information, a core requirement of supporting effective decision- making
◆ This requirement is more than measuring
processing performance but includes its abilities
Trang 13performance, and query performance into a singe metric
Trang 14OLAP Benchmarks
◆ Publication of APB-1 benchmark results must include both the database schema and all code required for executing the benchmark
◆ An essential requirement of all OLAP
applications is the ability to provide users with JIT information, which is necessary to make
Trang 16Examples of OLAP applications in various
functional areas
Trang 17OLAP Applications
◆ Although OLAP applications are found in widely divergent functional areas, they all have the
following key features:
– multi-dimensional views of data – support for complex calculations – time intelligence
Trang 18OLAP Applications - multi-dimensional
Trang 19OLAP Applications - support for complex
◆ Mechanisms for implementing computational
methods should be clear and non-procedural.
Trang 20OLAP Applications – time intelligence
◆ Key feature of almost any analytical application
as performance is almost always judged over time.
◆ Time hierarchy is not always used in the same manner as other hierarchies.
Trang 21OLAP Benefits
◆ Increased productivity of end-users.
◆ Reduced backlog of applications development for
IT staff.
◆ Retention of organizational control over the
integrity of corporate data.
◆ Reduced query drag and network traffic on
OLTP systems or on the data warehouse
◆ Improved potential revenue and profitability.
Trang 22Representation of Multi-dimensional Data
◆ Example of two-dimensional query.
» What is the total revenue generated by property sales in each city, in each quarter of 2004?’
◆ Choice of representation is based on types of
queries end-user may ask
Trang 23Multi-dimensional Data as Three-field table
versus Two-dimensional Matrix
Trang 24Representation of Multi-dimensional Data
◆ Example of three-dimensional query.
– ‘What is the total revenue generated by property
sales for each type of property (Flat or House) in each city, in each quarter of 2004?’
◆ Compare representation - four-field relational table versus three-dimensional cube.
Trang 25Multi-dimensional Data as Four-field Table
versus Three-dimensional Cube
Trang 26Representation of Multi-dimensional Data
◆ Cube represents data as cells in an array.
◆ Relational table only represents
multi-dimensional data in two dimensions.
Trang 27Representation of Multi-dimensional Data
◆ Use multi-dimensional structures to store data and relationships between data
◆ Multi-dimensional structures are best visualized as cubes of data, and cubes within cubes of data Each side of a cube is a dimension.
◆ A cube can be expanded to include other
dimensions.
Trang 28Representation of Multi-dimensional Data
◆ A cube supports matrix arithmetic.
◆ Multi-dimensional query response time depends
on how many cells have to be added ‘on the fly’
◆ As number of dimensions increases, number of the cube’s cells increases exponentially
Trang 29Representation of Multi-dimensional Data
◆ However, majority of multi-dimensional queries use summarized, high-level data.
◆ Solution is to pre-aggregate (consolidate) all
logical subtotals and totals along all dimensions
Trang 30Representation of Multi-dimensional Data
◆ Pre-aggregation is valuable, as typical
dimensions are hierarchical in nature.
– (e.g Time dimension hierarchy - years,
quarters, months, weeks, and days)
◆ Predefined hierarchy allows logical
pre-aggregation and, conversely, allows for a logical
Trang 31Representation of Multi-dimensional Data
◆ Supports common analytical operations
– Consolidation – Drill-down
– Slicing and dicing
Trang 32Representation of Multi-dimensional Data
◆ Consolidation - aggregation of data such as
simple ‘roll-ups’ or complex expressions involving inter-related data.
◆ Drill-Down - is the reverse of consolidation and involves displaying the detailed data that
comprises the consolidated data.
Trang 33Representation of Multi-dimensional Data
◆ Slicing and Dicing - (also called pivoting) refers
to the ability to look at the data from different viewpoints
Trang 34Representation of Multi-dimensional Data
◆ Can store data in a compressed form by
dynamically selecting physical storage organizations and compression techniques that maximize space utilization
◆ Dense data (that is, data that exists for a high
percentage of cells) can be stored separately from
Trang 35Representation of Multi-dimensional Data
◆ Ability to omit empty or repetitive cells can
greatly reduce the size of the cube and the amount of processing
◆ Allows analysis of exceptionally large amounts
of data
Trang 36Representation of Multi-dimensional Data
◆ In summary, pre-aggregation, dimensional
hierarchy, and sparse data management can significantly reduce the size of the cube and the need to calculate values ‘on-the-fly’
◆ Removes need for multi-table joins and provides quick and direct access to arrays of data, thus
Trang 37OLAP Tools
◆ There are many varieties of OLAP tools
available in the marketplace
◆ This choice has resulted in some confusion with much debate regarding what OLAP actually
means to a potential buyer and in particular what are the available architectures for OLAP tools
Trang 38Codd’s Rules for OLAP Systems
◆ In 1993, E.F Codd formulated twelve rules as the basis for selecting OLAP tools
Trang 39Codd’s Rules for OLAP Systems
◆ Multi-dimensional conceptual view
Trang 40Codd’s rules for OLAP
◆ Dynamic sparse matrix handling
◆ Multi-user support
◆ Unrestricted cross-dimensional operations
◆ Intuitive data manipulation
◆ Flexible reporting
◆ Unlimited dimensions and aggregation levels
Trang 41Codd’s Rules for OLAP Systems
◆ There are proposals to re-defined or extended
the rules For example to also include
– Comprehensive database management tools – Ability to drill down to detail (source
record) level
– Incremental database refresh – SQL interface to the existing enterprise
environment
Trang 42Categories of OLAP Tools
◆ OLAP tools are categorized according to the
architecture used to store and process dimensional data
multi-◆ There are four main categories:
– Multi-dimensional OLAP (MOLAP) – Relational OLAP (ROLAP)
Trang 43Multi-dimensional OLAP (MOLAP)
◆ Use specialized data structures and
multi-dimensional Database Management Systems (MDDBMSs) to organize, navigate, and analyze data
◆ Data is typically aggregated and stored
according to predicted usage to enhance query performance
Trang 44Multi-dimensional OLAP (MOLAP)
◆ Use array technology and efficient storage
techniques that minimize the disk space requirements through sparse data management
◆ Provides excellent performance when data is
used as designed, and the focus is on data for a specific decision-support application
Trang 45Multi-dimensional OLAP (MOLAP)
◆ Traditionally, require a tight coupling with the application layer and presentation layer
◆ Recent trends segregate the OLAP from the data structures through the use of published
application programming interfaces (APIs)
Trang 46Typical Architecture for MOLAP Tools
Trang 47MOLAP Tools - Development Issues
◆ Underlying data structures are limited in their ability to support multiple subject areas and to provide access to detailed data
◆ Navigation and analysis of data is limited
because the data is designed according to previously determined requirements
Trang 48MOLAP Tools - Development Issues
◆ MOLAP products require a different set of skills and tools to build and maintain the database,
thus increasing the cost and complexity of support.
Trang 49Relational OLAP (ROLAP)
◆ Fastest-growing style of OLAP technology due to requirements to analyze ever-increasing
amounts of data and the realization that users cannot store all the data they require in MOLAP databases
Trang 50Relational OLAP (ROLAP)
◆ Supports RDBMS products using a metadata
layer - avoids need to create a static dimensional data structure - facilitates the creation of multiple multi-dimensional views of the two-dimensional relation
Trang 51multi-Relational OLAP (ROLAP)
◆ To improve performance, some products use
SQL engines to support the complexity of dimensional analysis, while others recommend,
multi-or require, the use of highly denmulti-ormalized database designs such as the star schema.
Trang 52Typical Architecture for ROLAP Tools
Trang 53ROLAP Tools - Development Issues
◆ Performance problems associated with the
processing of complex queries that require multiple passes through the relational data.
◆ Middleware to facilitate the development of
multi-dimensional applications (Software that converts the two-dimensional relation into a multi-dimensional structure).
Trang 54ROLAP Tools - Development Issues
◆ Development of an option to create persistent, multi-dimensional structures with facilities to assist in the administration of these structures.
Trang 55Hybrid OLAP (HOLAP)
◆ Provide limited analysis capability, either
directly against RDBMS products, or by using
an intermediate MOLAP server
◆ Deliver selected data directly from the DBMS or via a MOLAP server to the desktop (or local
server) in the form of a datacube, where it is stored, analyzed, and maintained locally.
Trang 56Hybrid OLAP (HOLAP)
◆ Promoted as being relatively simple to install and administer with reduced cost and maintenance
Trang 57Typical Architecture for HOLAP Tools
Trang 58HOLAP Tools - Development Issues
◆ Architecture results in significant data redundancy and may cause problems for networks that support many users
◆ Ability of each user to build a custom datacube may cause a lack of data consistency among users
Trang 59Desktop OLAP (DOLAP)
◆ Store the OLAP data in client-based files and
support multi-dimensional processing using a client multi-dimensional engine
◆ Requires that relatively small extracts of data are held on client machines They may be distributed
in advance, or created on demand (possibly through the Web)
Trang 60Desktop OLAP (DOLAP)
◆ As with multi-dimensional databases on the
server, OLAP data may be held on disk or in RAM, however, some DOLAP products allow only read access
◆ Most vendors of DOLAP exploit the power of
desktop PC to perform some, if not most,
Trang 61multi-Desktop OLAP (DOLAP)
◆ The administration of a DOLAP database is
typically performed by a central server or processing routine that prepares data cubes or sets of data for each user
◆ Once the basic processing is done, each user can then access their portion of the data
Trang 62Typical Architecture for DOLAP Tools
Trang 63DOLAP Tools - Development Issues
◆ Provision of appropriate security controls to
support all parts of the DOLAP environment
Since the data is physically extracted from the system, security is generally implemented by limiting the information compiled into each cube Once each cube is uploaded to the user's desktop, all additional meta data becomes the property of the local user
Trang 64DOLAP Tools - Development Issues
◆ Reduction in the effort involved in deploying and maintaining the DOLAP tools Some DOLAP
vendors now provide a range of alternative ways
of deploying OLAP data such as through e-mail, the Web or using traditional client/server
architecture
Trang 65OLAP Extensions to SQL
◆ Advantages of SQL include that it is easy to learn, non-procedural, free-format, DBMS-independent, and that it is a recognized international standard
◆ However, major limitation of SQL is the inability
to answer routinely asked business queries such
as computing the percentage change in values between this month and a year ago or to compute moving averages, cumulative sums, and other
statistical functions
Trang 66OLAP Extensions to SQL
◆ Answer is ANSI adopted a set of OLAP
functions as an extension to SQL to enable these calculations as well as many others that used to
be impossible or even impractical within SQL
◆ IBM and Oracle jointly proposed these
extensions early in 1999 and they now form part
of the current SQL standard, namely SQL: 2003
Trang 67OLAP Extensions to SQL - RISQL
◆ The extensions are collectively referred to as the
‘OLAP package’ and are described as follows:
– Feature T431, ‘Extended Grouping
capabilities’
– Feature T611, ‘Extended OLAP operators’
Trang 68Extended Grouping Capabilities
◆ Aggregation is a fundamental part of OLAP To
improve aggregation capabilities the SQL standard provides extensions to the GROUP BY clause such
as the ROLLUP and CUBE functions.
Trang 69Extended Grouping Capabilities
◆ ROLLUP supports calculations using aggregations
such as SUM, COUNT, MAX, MIN, and AVG at
increasing levels of aggregation, from the most
detailed up to a grand total
◆ CUBE is similar to ROLLUP, enabling a single
statement to calculate all possible combinations of
aggregations CUBE can generate the information
needed in cross-tabulation reports with a single query.