Microsoft SQL Server 2008 R2 Unleashed- P205 docx

FIGURE 51.3 A star-schema data warehouse design with a central fact table and multiple dimensions of these facts as the source for an OLAP cube in SSAS.. Here, we show you a high-tech co

Trang 1

FIGURE 51.3 A star-schema data warehouse design with a central fact table and multiple

dimensions of these facts as the source for an OLAP cube in SSAS

Every cube has a schema from which the cube draws its source data The central table in a

schema is the fact table that yields the cube’s data measures The other tables in the

schema are the dimension tables that are the source of the cube dimensions A classic

star-schema data warehouse design has this central fact table along with multiple dimension

tables This is a great starting point for OLAP cube creation, as you can see in Figure 51.3

Here, we show you a high-tech company’s computer sales star-schema data warehouse

that can be used as the source of building up an OLAP cube within SSAS

SSAS allows you to build dimensions and cubes from heterogeneous data sources It can

access relational OLTP databases, multidimensional data databases, text data, and any

other source that has an OLE DB provider available You don’t have to move all your data

first; you just connect to its source In SSAS, you can also design OLAP cubes from scratch

Then you can have SSAS create the relational schema of tables in SQL Server that you

want to populate with the transactional data that will drive the OLAP cube

Essentially, cubes can be regular or local cubes Regular cubes are based on real tables as

the data source, have aggregations, and occupy physical storage space of some kind If a

data source that contributes to this cube changes, the cube must be reprocessed Figure

51.4 shows this cube representation and that it consists of something called partitions.

Local cubes are entirely contained in portable SSAS files (that is, tables) and can be

browsed without a connection to an SSAS instance This is really like being in

“discon-nected” mode

Write-enabled dimensions within a cube enable updates (that is, writes) of data that can

be shared back (that is, written back) with the data sources

Trang 2

Partitions

SQL Server

2008

Partitions

OLAP Cube

Cubes

FIGURE 51.4 The SSAS cube representations: regular OLAP cubes and partitions

Following is a quick summary of all the essential cube terms in SSAS:

Database—A database is a logical container of one or more cubes Cubes are defined

within Analysis Server databases

Cube—A cube is a multidimensional representation of the business facts Types of

cubes are regular and local

Data source—The data source is the origin of a cube’s data.

Measure group—This group is a collection (or grouping) of one or more measures

into some type of logical unit for business purposes A measure group does not

occupy any physical space It is metadata only

Measure—A measure is a data fact representation A measure is typically a data

value fact, such as price, unit, or quantity

Cell—A cell is the part of a data measure that is at the intersection of the

dimen-sions The cell contains the data value If an intersection (that is, cell) has no value

yet, it does not physically exist until it is populated

Dimension—A cube’s dimension is defined by the aggregation levels of the data

that are needed to support the data requirements A dimension can be shared with

other cubes, or it can be private to a cube The structure of a dimension is directly

related to the dimension table columns, member properties, or structure of OLAP

data mining models This structure becomes the hierarchy and should be organized

accordingly You can also have strict parent/child dimensions in which two columns

are identified as being parent and child and the dimension is organized according to

them In a regular dimension, each column in the dimension contributes a hierarchy

level

Trang 3

Level—A level includes the nodes of the hierarchy or data mining model Each level

contains the members Millions of members are possible for each level

Partition—One or more partitions comprise a cube Using a partition is a way to

physically separate parts of a cube This separation essentially lets you deal with

individual slices of a data cube separately, querying only the relevant data sources If

you partition by dimension, you can perform incremental updates to change that

dimension independently of the rest of the cube Consequently, you have to

reprocess only the aggregations that are affected by those changes This is an

excel-lent feature for scalability

Hierarchy—A hierarchy is a set of members in a dimension and their position

rela-tive to each other Hierarchies can either be balanced or unbalanced Being balanced

simply means that all branches of the hierarchy descend to the same level An

unbalanced hierarchy allows for branches to descend to different levels It is also

possible to define more than one hierarchy for a single dimension A great example

of this is “fiscal calendar time” and “Gregorian calendar time” being defined in one

dimension—a Time dimension that contains both time.gregorianandtime.fiscal

As mentioned previously, SSAS has many wizards Which wizards you use depends on

what you need to create The “Creating an OLAP Database” section, later in this chapter,

outlines the order and path through this maze of wizards

OLAP Versus OLTP

One of the primary goals of OLAP is to increase data retrieval speed for business-related

queries that are critical to decisions Very often, there is a need to broaden the scope of a

business query or to drill down into more granular details of the query OLAP was created

to facilitate this type of capability A multidimensional schema is not a typical normalized

relational database; redundant data is stored to facilitate quick retrieval The data in a

multidimensional database should be relatively static; in fact, data is not useful for

deci-sion support if it changes constantly The information in a data warehouse is built out of

carefully chosen snapshots of business data from OLTP systems If you capture data at the

right times for transfer to the data warehouse, you can quickly make accurate

compar-isons of important business activities over time

In an OLTP system, transaction speed is paramount Data modification operations must be

quick, deal with concurrency (locking/holding of resources), and provide transactional

consistency An OLTP system is constantly changing; snapshots of the OLTP system, even

if taken only a few seconds apart, are all different Although historical information is

certainly available in an OLTP system, using it for BI-type analysis might be impractical

Storing old data in an OLTP system becomes expensive, and you might need to

recon-struct history dynamically from a series of transactions In addition, OLTP designs and

indexes usually don’t support large-scale decision support querying

SSAS supports three OLAP storage methods—MOLAP, ROLAP, and HOLAP—providing

flex-ibility to the data warehousing solution and enabling powerful partitioning and

aggrega-tion optimizaaggrega-tion capabilities

Trang 4

FIGURE 51.5 MOLAP, HOLAP, and ROLAP storage continuum

Figure 51.5 shows the MOLAP, HOLAP, and ROLAP storage continuum MOLAP stores all

data locally (to SSAS), and ROLAP is the opposite (storing all data in the relational

data-base) MOLAP is by far the most often used storage approach The following sections take

a closer look at them

MOLAP

Multidimensional OLAP (MOLAP) is an approach in which cubes are built directly from

OLTP data sources or from dimensional databases and downloaded to a persistent store

In SSAS, data is downloaded to the server, and the details and aggregations are stored in a

native Microsoft OLAP format No zero-activity records are stored

The dimension keys in the fact tables are compressed, and bitmap indexing is used A

high-speed MOLAP query processor retrieves the data

ROLAP

Relational OLAP (ROLAP) uses fact data in summary tables in the OLTP data source to

make data much more current (real-time) The summary tables are populated by

processes in the OLTP system and are not downloaded to SSAS The summary tables are

known as materialized views and contain various levels of aggregation, depending on the

options you select when building data cubes with SSAS SSAS builds the summary tables

with a column for each dimension and each measure It indexes each dimension column

and creates an additional index on all the dimension columns

HOLAP

SSAS implements a combination of MOLAP and ROLAP called hybrid OLAP (HOLAP)

Here, the facts are left in the OLTP data source, and aggregations are stored in the SSAS

server You use SSAS to boost query performance This approach helps avoid data

duplica-tion, but performance suffers a bit when you query fact data in the OLTP summary tables

The amount of performance degradation depends on the level of aggregation selected

ROLAP and HOLAP are useful in situations in which an organization wants to leverage

its investment in relational database technology and existing infrastructure The

summary tables of facts are also accessible in the OLTP system via normal data access

methods However, when you are using SSAS, both ROLAP and HOLAP require more

storage space because they don’t use the storage optimizations of the pure

MOLAP-compressed implementation

Trang 5

An Analytics Design Methodology

A data warehouse can be built from the top down or from the bottom up To build a

top-down warehouse, you need to form a complete picture or logical data model for the entire

organization (or all the subsystems within the scope of the project, such as all financial

systems) In contrast, building a warehouse from the bottom up takes a much more

departmental or specific business-area focus (for example, a sales order system only) This

breaks the task of modeling the data into more manageable chunks Such a departmental

approach produces data marts that are potentially subsets of the overall data warehouse

The bottom-up approach can simplify implementation It helps get departmental or

busi-ness-area information to the people who need it, makes it easier to protect sensitive data,

and results in better query response times because data marts deal with less data than a

voluminous transactional system The potential risk in the data mart approach is that

disparity in data mart implementation can result in a logically disjointed enterprise data

warehouse if efforts aren’t carefully coordinated across the organization

Before you embark on an OLAP database creation effort, the time you spend

understand-ing the underlyunderstand-ing requirements is the best time you can give your effort If scope is set

correctly, you will be able to achieve an industrial-strength OLAP design without much

difficulty First, you need to take care of some groundwork:

1 Carefully assess the scope of what you want to represent in the BI environment

Start small, as the bottom-up approach suggests For instance, just tackle the sales

data facts

2 Coordinate your efforts with other related BI efforts Let people know that you are

carving out a specific subject area or departmental data and, when you finish,

publish your design to everyone

3 Seek out any shared dimensions that might have already been created for other

cubes You want to leverage these as much as possible for the sake of data

consis-tency and nonredundant processing

4 Understand your data sources The OLAP cube you create will be only as good as the

data you put into it It’s best to understand the dirty data issues of what you are

about to touch long before you try to build an OLAP cube with it

An Analytics Mini-Methodology

To successfully build OLAP solutions, you are advised to carefully assess the requirements

of your end users in as detailed fashion as is possible A mini-methodology that focuses on

the essential usages and characteristics of an Analytic solution can prove invaluable The

following sections outline a solid approach to nailing down your BI requirements and

yielding optimal OLAP designs that solve your end users’ needs

Assumption: You are building a business area–focused OLAP cube

Trang 6

Requirements Phase

1 Identify the processing requirements for this DSS What analysis do you need to do?

Are trend reporting, forecasting, and so on necessary? These can often be

repre-sented in use case form (via UML)

a Ask each user what business decision questions he or she needs to have

answered

b Ask each user how often he or she needs these questions answered and exactly

when the questions must be answered

c Ask each user how current the data must be to get accurate answers (This

speaks to data latency.)

2 Identify the data needed to fulfill these requirements What data must be touched to

provide answers? The best way to capture this type of information is a logical data

model Even a rough model is better than none at all This is the point where you

focus on the facts that need to be analyzed

3 Identify all possible hierarchies and level representations (that is, aggregations) This

is how the data is used Most users are likely to tell you that they want to see

product data in the product hierarchy structure that has already been set up (for

example, product family, product groups)

4 Identify the time hierarchies that the users need Because time is usually implicit, it

just needs to be clarified in terms of levels of aggregation (for example, years,

quar-ters, months, weeks, days) and whether it needs to be fiscal versus Gregorian

calen-dar, both, or something else

5 Understand the data that each user can view from a security point of view

Design Phase

1 Analyze which data sources are needed to fulfill the requirements See whether

dimensions or OLAP cubes that already exist can be shared

2 Understand what data transformations need to be done to the source data to provide

it to the OLAP world This might include pre-aggregation, reformatting, data

integrity verifications, and so on

3 Translate these requirements into an OLAP model design:

a Translate to MOLAP if your data sources are not going to be leveraged at all

and you will be taking full advantage of OLAP storage

b Translate to ROLAP if you are going to leverage an existing relational design

and storage

c Translate to HOLAP if you are going to partially utilize the source data storage

and partially utilize OLAP storage This is the most frequently used approach

Construction Phase

1 Implement data extraction, transformation, and loading (ETL) logic (via T-SQL, SSIS,

or other methods)

Trang 7

2 Create the data sources to be used

3 Create the dimensions

4 Create the cube

5 Select data measures (that is, the data facts) for the cube

6 Design the storage and aggregations

7 Process the cube This brings the data into the OLAP environment

8 Verify data integrity

Implementation Phase

1 Define the security roles in the cube

2 Train the user to use the system

3 Process the data into the OLAP environment (from production data sources)

4 Verify data integrity

5 Allow users to use the OLAP cube

Maintenance Phase

1 Evaluate access optimization in the OLAP cube via usage analysis

2 Do data mining discovery, if desired

3 Make schema changes/enhancements, as necessary

An OLAP Requirements Example: CompSales

International

Following is an abbreviated requirement that reflects an actual implementation that was

done for a large Silicon Valley company We follow the mini-methodology as closely as

possible to implement this requirement in SSAS, pointing out which facilities of SSAS

should be used for which purpose along the way

CompSales International Requirements

A large computer manufacturer named CompSales International needs to do basic

analyti-cal processing of its product data in a new BI environment The main business issues at

hand are related to minimizing channel inventory and better understanding market

demand for the company’s most popular products The detailed data processing

require-ments are as follows:

1 You want to view sales unit actuals and sales returns for system and nonsystem

products for the past two years via the product hierarchy (All Products, Product

Types, Product Lines, Product Families, SKUs), geography hierarchy (All Geos, Major

Geos, Countries, Channels, Customers), and different time levels (All Time, Years,

Quarters, Months)

2 You want to view data primarily at the yearly and monthly levels, although the

finance department also uses it a little bit at quarterly levels

Trang 8

3 You want to view net sales (sales minus returns) at all levels of the hierarchy

4 The fiscal and Gregorian calendar are the same for CompSales International

5 One day past month-end processing, all “actuals” data from the prior month is

avail-able (sales units and returns)

You need to implement some general design decisions using SSAS, including the following:

Hierarchies (dimensions)—This includes product, geography, and time.

Facts (measures)—This includes sales units, sales returns, and net sales (units

minus returns) calculated

OLAP storage—This will be MOLAP or HOLAP (if you want to use the star-schema

data mart that already contains most of what you are after)

Physical tables that exist—This includes Geo_Dimension,Prod_Dimension,

Time_Dimension, and CompSalesFactoid(the fact table that will become your

mea-sures in the OLAP cube) This data is updated weekly Each of these tables uses an

artificial key into the main facts table for performance reasons (GeoID,ProductID,

TimeID) In addition, several member/value description tables are associated with

each dimension table Basically, there is one table for each level in a dimension

These description tables can be leveraged to make the result rows from OLAP queries

much more user friendly (look back at Figure 51.3 and you can see all tables

includ-ed in CompSales and how they are relatinclud-ed via primary/foreign key references)

Figure 51.6 illustrates the desired hierarchies and facts for CompSales International’s

requirements

All Product

Product Type

Product Line

All Geo

Country Channel

All Time

Quarter

Month

Year

Product Family

SKU

Major Geo

Customer

TIME

PRODUCT

Facts (Measures)

OLAP Cube

PRODUCT

Jan06 Feb06 Mar06 Apr06

996

FIGURE 51.6 CompSales International’s multidimensional OLAP requirements

Trang 9

OLAP Cube Creation

A star-schema data mart/warehouse named CompSales2008is used as the basis of creating

the OLAP cube example in this chapter You can download this data mart,

CompsSales2008.zip, from the Sams Publishing website for this book title at www

samspublishing.com, and it is also on this book’s CD You can easily unzip and attach this

database to any SQL Server 2008 database instance This is not an SSAS database; it is a

SQL Server database of a star-schema data warehouse/mart We use this SQL Server

data-base as the source for the exercises in this chapter You will build the SSAS OLAP cube

yourself (by following the steps outlined here)

You’ll spend most of the construction phase using SQL Server Business Intelligence

Development Studio (BIDS; also known as Visual Studio) and Microsoft SQL Server

Management Studio (SSMS) All wizards and editors are invoked from either BIDS or

SSMS As mentioned earlier, Microsoft has moved to a project orientation For this

reason, you need to start out in the BIDS (which actually invokes Visual Studio with the

BI plug-ins) You must have already installed SSAS In general, here’s what you’ll be doing

in this example:

1 Create a BI project

2 Identify data sources and data source views that you want to use for a new cube

3 Define the basic dimensions for the cube (Time, Geography, Product)

4 Define the hierarchies

5 Process the dimensions

6 Create a cube structure

7 Define the measure groups/measures

8 Process the cube

9 Deploy the solution

10 Use the cube

Using SQL Server BIDS

The SQL Server BIDS (a.k.a Visual Studio with the BI plug-ins) is launched from the SQL

Server 2008 Program group on the Start menu or from the Visual Studio 2008 Program

group on the Start menu We will assume you have installed Visual Studio and SQL Server

Analysis Services When this is open, you choose File, New Project, Business Intelligence

Projects Figure 51.7 shows the New Project dialog from which you should highlight the

Analysis Services Project template option and specify a project name, project location, and

solution name for this new BI project In this case, the solution name is

CompSalesUnleashed

NOTE

You can also start a new project by leveraging any other existing SSAS database

pro-ject You can easily clone an existing project and tweak it a bit to fit your new needs To

do this, you use the Import Analysis Services Database option

Trang 10

FIGURE 51.7 The SQL Server BIDS New Project dialog

After you create a new project, a set of objects is presented to you in the upper-right pane,

which is the Solution Explorer Figure 51.8 shows the Solution Explorer for the new

project All OLAP project objects reside here, including data sources, dimensions, cubes,

mining structures, and roles

FIGURE 51.8 The Solution Explorer view for the new CompSalesUnleashed project

Định dạng
Số trang	10
Dung lượng	402,25 KB