FIGURE 51.3 A star-schema data warehouse design with a central fact table and multiple dimensions of these facts as the source for an OLAP cube in SSAS.. Here, we show you a high-tech co
Trang 1FIGURE 51.3 A star-schema data warehouse design with a central fact table and multiple
dimensions of these facts as the source for an OLAP cube in SSAS
Every cube has a schema from which the cube draws its source data The central table in a
schema is the fact table that yields the cube’s data measures The other tables in the
schema are the dimension tables that are the source of the cube dimensions A classic
star-schema data warehouse design has this central fact table along with multiple dimension
tables This is a great starting point for OLAP cube creation, as you can see in Figure 51.3
Here, we show you a high-tech company’s computer sales star-schema data warehouse
that can be used as the source of building up an OLAP cube within SSAS
SSAS allows you to build dimensions and cubes from heterogeneous data sources It can
access relational OLTP databases, multidimensional data databases, text data, and any
other source that has an OLE DB provider available You don’t have to move all your data
first; you just connect to its source In SSAS, you can also design OLAP cubes from scratch
Then you can have SSAS create the relational schema of tables in SQL Server that you
want to populate with the transactional data that will drive the OLAP cube
Essentially, cubes can be regular or local cubes Regular cubes are based on real tables as
the data source, have aggregations, and occupy physical storage space of some kind If a
data source that contributes to this cube changes, the cube must be reprocessed Figure
51.4 shows this cube representation and that it consists of something called partitions.
Local cubes are entirely contained in portable SSAS files (that is, tables) and can be
browsed without a connection to an SSAS instance This is really like being in
“discon-nected” mode
Write-enabled dimensions within a cube enable updates (that is, writes) of data that can
be shared back (that is, written back) with the data sources
Trang 2Partitions
SQL Server
2008
Partitions
OLAP Cube
Cubes
FIGURE 51.4 The SSAS cube representations: regular OLAP cubes and partitions
Following is a quick summary of all the essential cube terms in SSAS:
Database—A database is a logical container of one or more cubes Cubes are defined
within Analysis Server databases
Cube—A cube is a multidimensional representation of the business facts Types of
cubes are regular and local
Data source—The data source is the origin of a cube’s data.
Measure group—This group is a collection (or grouping) of one or more measures
into some type of logical unit for business purposes A measure group does not
occupy any physical space It is metadata only
Measure—A measure is a data fact representation A measure is typically a data
value fact, such as price, unit, or quantity
Cell—A cell is the part of a data measure that is at the intersection of the
dimen-sions The cell contains the data value If an intersection (that is, cell) has no value
yet, it does not physically exist until it is populated
Dimension—A cube’s dimension is defined by the aggregation levels of the data
that are needed to support the data requirements A dimension can be shared with
other cubes, or it can be private to a cube The structure of a dimension is directly
related to the dimension table columns, member properties, or structure of OLAP
data mining models This structure becomes the hierarchy and should be organized
accordingly You can also have strict parent/child dimensions in which two columns
are identified as being parent and child and the dimension is organized according to
them In a regular dimension, each column in the dimension contributes a hierarchy
level
Trang 3Level—A level includes the nodes of the hierarchy or data mining model Each level
contains the members Millions of members are possible for each level
Partition—One or more partitions comprise a cube Using a partition is a way to
physically separate parts of a cube This separation essentially lets you deal with
individual slices of a data cube separately, querying only the relevant data sources If
you partition by dimension, you can perform incremental updates to change that
dimension independently of the rest of the cube Consequently, you have to
reprocess only the aggregations that are affected by those changes This is an
excel-lent feature for scalability
Hierarchy—A hierarchy is a set of members in a dimension and their position
rela-tive to each other Hierarchies can either be balanced or unbalanced Being balanced
simply means that all branches of the hierarchy descend to the same level An
unbalanced hierarchy allows for branches to descend to different levels It is also
possible to define more than one hierarchy for a single dimension A great example
of this is “fiscal calendar time” and “Gregorian calendar time” being defined in one
dimension—a Time dimension that contains both time.gregorianandtime.fiscal
As mentioned previously, SSAS has many wizards Which wizards you use depends on
what you need to create The “Creating an OLAP Database” section, later in this chapter,
outlines the order and path through this maze of wizards
OLAP Versus OLTP
One of the primary goals of OLAP is to increase data retrieval speed for business-related
queries that are critical to decisions Very often, there is a need to broaden the scope of a
business query or to drill down into more granular details of the query OLAP was created
to facilitate this type of capability A multidimensional schema is not a typical normalized
relational database; redundant data is stored to facilitate quick retrieval The data in a
multidimensional database should be relatively static; in fact, data is not useful for
deci-sion support if it changes constantly The information in a data warehouse is built out of
carefully chosen snapshots of business data from OLTP systems If you capture data at the
right times for transfer to the data warehouse, you can quickly make accurate
compar-isons of important business activities over time
In an OLTP system, transaction speed is paramount Data modification operations must be
quick, deal with concurrency (locking/holding of resources), and provide transactional
consistency An OLTP system is constantly changing; snapshots of the OLTP system, even
if taken only a few seconds apart, are all different Although historical information is
certainly available in an OLTP system, using it for BI-type analysis might be impractical
Storing old data in an OLTP system becomes expensive, and you might need to
recon-struct history dynamically from a series of transactions In addition, OLTP designs and
indexes usually don’t support large-scale decision support querying
SSAS supports three OLAP storage methods—MOLAP, ROLAP, and HOLAP—providing
flex-ibility to the data warehousing solution and enabling powerful partitioning and
aggrega-tion optimizaaggrega-tion capabilities
Trang 4FIGURE 51.5 MOLAP, HOLAP, and ROLAP storage continuum
Figure 51.5 shows the MOLAP, HOLAP, and ROLAP storage continuum MOLAP stores all
data locally (to SSAS), and ROLAP is the opposite (storing all data in the relational
data-base) MOLAP is by far the most often used storage approach The following sections take
a closer look at them
MOLAP
Multidimensional OLAP (MOLAP) is an approach in which cubes are built directly from
OLTP data sources or from dimensional databases and downloaded to a persistent store
In SSAS, data is downloaded to the server, and the details and aggregations are stored in a
native Microsoft OLAP format No zero-activity records are stored
The dimension keys in the fact tables are compressed, and bitmap indexing is used A
high-speed MOLAP query processor retrieves the data
ROLAP
Relational OLAP (ROLAP) uses fact data in summary tables in the OLTP data source to
make data much more current (real-time) The summary tables are populated by
processes in the OLTP system and are not downloaded to SSAS The summary tables are
known as materialized views and contain various levels of aggregation, depending on the
options you select when building data cubes with SSAS SSAS builds the summary tables
with a column for each dimension and each measure It indexes each dimension column
and creates an additional index on all the dimension columns
HOLAP
SSAS implements a combination of MOLAP and ROLAP called hybrid OLAP (HOLAP)
Here, the facts are left in the OLTP data source, and aggregations are stored in the SSAS
server You use SSAS to boost query performance This approach helps avoid data
duplica-tion, but performance suffers a bit when you query fact data in the OLTP summary tables
The amount of performance degradation depends on the level of aggregation selected
ROLAP and HOLAP are useful in situations in which an organization wants to leverage
its investment in relational database technology and existing infrastructure The
summary tables of facts are also accessible in the OLTP system via normal data access
methods However, when you are using SSAS, both ROLAP and HOLAP require more
storage space because they don’t use the storage optimizations of the pure
MOLAP-compressed implementation
Trang 5An Analytics Design Methodology
A data warehouse can be built from the top down or from the bottom up To build a
top-down warehouse, you need to form a complete picture or logical data model for the entire
organization (or all the subsystems within the scope of the project, such as all financial
systems) In contrast, building a warehouse from the bottom up takes a much more
departmental or specific business-area focus (for example, a sales order system only) This
breaks the task of modeling the data into more manageable chunks Such a departmental
approach produces data marts that are potentially subsets of the overall data warehouse
The bottom-up approach can simplify implementation It helps get departmental or
busi-ness-area information to the people who need it, makes it easier to protect sensitive data,
and results in better query response times because data marts deal with less data than a
voluminous transactional system The potential risk in the data mart approach is that
disparity in data mart implementation can result in a logically disjointed enterprise data
warehouse if efforts aren’t carefully coordinated across the organization
Before you embark on an OLAP database creation effort, the time you spend
understand-ing the underlyunderstand-ing requirements is the best time you can give your effort If scope is set
correctly, you will be able to achieve an industrial-strength OLAP design without much
difficulty First, you need to take care of some groundwork:
1 Carefully assess the scope of what you want to represent in the BI environment
Start small, as the bottom-up approach suggests For instance, just tackle the sales
data facts
2 Coordinate your efforts with other related BI efforts Let people know that you are
carving out a specific subject area or departmental data and, when you finish,
publish your design to everyone
3 Seek out any shared dimensions that might have already been created for other
cubes You want to leverage these as much as possible for the sake of data
consis-tency and nonredundant processing
4 Understand your data sources The OLAP cube you create will be only as good as the
data you put into it It’s best to understand the dirty data issues of what you are
about to touch long before you try to build an OLAP cube with it
An Analytics Mini-Methodology
To successfully build OLAP solutions, you are advised to carefully assess the requirements
of your end users in as detailed fashion as is possible A mini-methodology that focuses on
the essential usages and characteristics of an Analytic solution can prove invaluable The
following sections outline a solid approach to nailing down your BI requirements and
yielding optimal OLAP designs that solve your end users’ needs
Assumption: You are building a business area–focused OLAP cube
Trang 6Requirements Phase
1 Identify the processing requirements for this DSS What analysis do you need to do?
Are trend reporting, forecasting, and so on necessary? These can often be
repre-sented in use case form (via UML)
a Ask each user what business decision questions he or she needs to have
answered
b Ask each user how often he or she needs these questions answered and exactly
when the questions must be answered
c Ask each user how current the data must be to get accurate answers (This
speaks to data latency.)
2 Identify the data needed to fulfill these requirements What data must be touched to
provide answers? The best way to capture this type of information is a logical data
model Even a rough model is better than none at all This is the point where you
focus on the facts that need to be analyzed
3 Identify all possible hierarchies and level representations (that is, aggregations) This
is how the data is used Most users are likely to tell you that they want to see
product data in the product hierarchy structure that has already been set up (for
example, product family, product groups)
4 Identify the time hierarchies that the users need Because time is usually implicit, it
just needs to be clarified in terms of levels of aggregation (for example, years,
quar-ters, months, weeks, days) and whether it needs to be fiscal versus Gregorian
calen-dar, both, or something else
5 Understand the data that each user can view from a security point of view
Design Phase
1 Analyze which data sources are needed to fulfill the requirements See whether
dimensions or OLAP cubes that already exist can be shared
2 Understand what data transformations need to be done to the source data to provide
it to the OLAP world This might include pre-aggregation, reformatting, data
integrity verifications, and so on
3 Translate these requirements into an OLAP model design:
a Translate to MOLAP if your data sources are not going to be leveraged at all
and you will be taking full advantage of OLAP storage
b Translate to ROLAP if you are going to leverage an existing relational design
and storage
c Translate to HOLAP if you are going to partially utilize the source data storage
and partially utilize OLAP storage This is the most frequently used approach
Construction Phase
1 Implement data extraction, transformation, and loading (ETL) logic (via T-SQL, SSIS,
or other methods)
Trang 72 Create the data sources to be used
3 Create the dimensions
4 Create the cube
5 Select data measures (that is, the data facts) for the cube
6 Design the storage and aggregations
7 Process the cube This brings the data into the OLAP environment
8 Verify data integrity
Implementation Phase
1 Define the security roles in the cube
2 Train the user to use the system
3 Process the data into the OLAP environment (from production data sources)
4 Verify data integrity
5 Allow users to use the OLAP cube
Maintenance Phase
1 Evaluate access optimization in the OLAP cube via usage analysis
2 Do data mining discovery, if desired
3 Make schema changes/enhancements, as necessary
An OLAP Requirements Example: CompSales
International
Following is an abbreviated requirement that reflects an actual implementation that was
done for a large Silicon Valley company We follow the mini-methodology as closely as
possible to implement this requirement in SSAS, pointing out which facilities of SSAS
should be used for which purpose along the way
CompSales International Requirements
A large computer manufacturer named CompSales International needs to do basic
analyti-cal processing of its product data in a new BI environment The main business issues at
hand are related to minimizing channel inventory and better understanding market
demand for the company’s most popular products The detailed data processing
require-ments are as follows:
1 You want to view sales unit actuals and sales returns for system and nonsystem
products for the past two years via the product hierarchy (All Products, Product
Types, Product Lines, Product Families, SKUs), geography hierarchy (All Geos, Major
Geos, Countries, Channels, Customers), and different time levels (All Time, Years,
Quarters, Months)
2 You want to view data primarily at the yearly and monthly levels, although the
finance department also uses it a little bit at quarterly levels
Trang 83 You want to view net sales (sales minus returns) at all levels of the hierarchy
4 The fiscal and Gregorian calendar are the same for CompSales International
5 One day past month-end processing, all “actuals” data from the prior month is
avail-able (sales units and returns)
You need to implement some general design decisions using SSAS, including the following:
Hierarchies (dimensions)—This includes product, geography, and time.
Facts (measures)—This includes sales units, sales returns, and net sales (units
minus returns) calculated
OLAP storage—This will be MOLAP or HOLAP (if you want to use the star-schema
data mart that already contains most of what you are after)
Physical tables that exist—This includes Geo_Dimension,Prod_Dimension,
Time_Dimension, and CompSalesFactoid(the fact table that will become your
mea-sures in the OLAP cube) This data is updated weekly Each of these tables uses an
artificial key into the main facts table for performance reasons (GeoID,ProductID,
TimeID) In addition, several member/value description tables are associated with
each dimension table Basically, there is one table for each level in a dimension
These description tables can be leveraged to make the result rows from OLAP queries
much more user friendly (look back at Figure 51.3 and you can see all tables
includ-ed in CompSales and how they are relatinclud-ed via primary/foreign key references)
Figure 51.6 illustrates the desired hierarchies and facts for CompSales International’s
requirements
All Product
Product Type
Product Line
All Geo
Country Channel
All Time
Quarter
Month
Year
Product Family
SKU
Major Geo
Customer
TIME
PRODUCT
Facts (Measures)
OLAP Cube
PRODUCT
Jan06 Feb06 Mar06 Apr06
996
FIGURE 51.6 CompSales International’s multidimensional OLAP requirements
Trang 9OLAP Cube Creation
A star-schema data mart/warehouse named CompSales2008is used as the basis of creating
the OLAP cube example in this chapter You can download this data mart,
CompsSales2008.zip, from the Sams Publishing website for this book title at www
samspublishing.com, and it is also on this book’s CD You can easily unzip and attach this
database to any SQL Server 2008 database instance This is not an SSAS database; it is a
SQL Server database of a star-schema data warehouse/mart We use this SQL Server
data-base as the source for the exercises in this chapter You will build the SSAS OLAP cube
yourself (by following the steps outlined here)
You’ll spend most of the construction phase using SQL Server Business Intelligence
Development Studio (BIDS; also known as Visual Studio) and Microsoft SQL Server
Management Studio (SSMS) All wizards and editors are invoked from either BIDS or
SSMS As mentioned earlier, Microsoft has moved to a project orientation For this
reason, you need to start out in the BIDS (which actually invokes Visual Studio with the
BI plug-ins) You must have already installed SSAS In general, here’s what you’ll be doing
in this example:
1 Create a BI project
2 Identify data sources and data source views that you want to use for a new cube
3 Define the basic dimensions for the cube (Time, Geography, Product)
4 Define the hierarchies
5 Process the dimensions
6 Create a cube structure
7 Define the measure groups/measures
8 Process the cube
9 Deploy the solution
10 Use the cube
Using SQL Server BIDS
The SQL Server BIDS (a.k.a Visual Studio with the BI plug-ins) is launched from the SQL
Server 2008 Program group on the Start menu or from the Visual Studio 2008 Program
group on the Start menu We will assume you have installed Visual Studio and SQL Server
Analysis Services When this is open, you choose File, New Project, Business Intelligence
Projects Figure 51.7 shows the New Project dialog from which you should highlight the
Analysis Services Project template option and specify a project name, project location, and
solution name for this new BI project In this case, the solution name is
CompSalesUnleashed
NOTE
You can also start a new project by leveraging any other existing SSAS database
pro-ject You can easily clone an existing project and tweak it a bit to fit your new needs To
do this, you use the Import Analysis Services Database option
Trang 10FIGURE 51.7 The SQL Server BIDS New Project dialog
After you create a new project, a set of objects is presented to you in the upper-right pane,
which is the Solution Explorer Figure 51.8 shows the Solution Explorer for the new
project All OLAP project objects reside here, including data sources, dimensions, cubes,
mining structures, and roles
FIGURE 51.8 The Solution Explorer view for the new CompSalesUnleashed project