Microsoft SQL Server 2008 R2 Unleashed- P213 doc

This training data, in the form of dimensions, levels, member properties and measures, is used to process the OLAP data mining model and further define the data mining column structure f

Trang 1

FIGURE 51.57 Creating KPIs in the cube designer

previously known As you create dimensions, you can even choose a data mining model

as the basis for a dimension

Basically, a data mining model is a reference structure that represents the grouping and

predictive analysis of relational or multidimensional data It is composed of rules,

patterns, and other statistical information of the data that it was analyzing These are

called cases A case set is simply a means for viewing the physical data Different case sets

can be constructed from the same physical data Basically, a case is defined from a

particu-lar point of view If the algorithm you are using supports the view, you can use mining

models to make predictions based on these findings

Another aspect of a data mining model is using training data This process determines the

relative importance of each attribute in a data mining model It does this by recursively

partitioning data into smaller groups until no more splitting can occur During this

parti-tioning process, information is gathered from the attributes used to determine the split

Probability can be established for each categorization of data in these splits This type of

data can be used to help determine factors about other data utilizing these probabilities

This training data, in the form of dimensions, levels, member properties and measures, is

used to process the OLAP data mining model and further define the data mining column

structure for the case set

In SSAS, Microsoft provides several data mining algorithms (or techniques):

likely to appear together in a transaction The rules help predict when the presence

of one item is likely with another item (which has appeared in the same type of

transaction before)

dataset into clusters that contain similar characteristics This is one of the best

algo-rithms, and it can be used to find general groupings in data

Trang 2

clustering, and it identifies clusters of similarly ordered events in a sequence The

clusters can be used to predict the likely ordering of events in a sequence, based on

known characteristics

supports the prediction of both discrete and continuous attributes

It is a configuration variation of the Decision Trees algorithm, obtained by disabling

splits (The whole regression formula is built in a single root node.) The algorithm

supports the prediction of continuous attributes

It is a configuration variation of the Neural Network algorithm, obtained by

elimi-nating the hidden layer This algorithm supports the prediction of both discrete and

continuous attributes

predictive modeling It supports only discrete attributes, and it considers all the

input attributes to be independent, given the predictable attribute

multilayer networks to predict multiple attributes It can be used for classification of

discrete attributes as well as regression of continuous attributes

ana-lyze time-related data, such as monthly sales data or yearly profits The patterns it

discovers can be used to predict values for future time steps across a time horizon

To create an OLAP data mining model, SSAS uses either an existing source OLAP cube or

an existing relational database/data warehouse, a particular data mining

technique/algo-rithm, case dimension and level, predicted entity, or, optionally, training data The source

OLAP cube provides the information needed to create a case set for the data mining

model You then select the data mining technique (decision tree, clustering, or one of the

others) It uses the dimension and level that you choose to establish key columns for the

case sets The case dimension and level provide a certain orientation for the data mining

model into the cube for creating a case set The predicted entity can be either a measure

from the source OLAP cube, a member property of the case dimension and level, or any

member of another dimension in the source OLAP cube

NOTE

The Data Mining Wizard can also create a new dimension for a source cube and

enables users to query the data mining data model data just as they would query OLAP

data (using the SQL DMX extension or the mining structures browser)

In Visual Studio, you simply initiate the Data Mining Wizard by right-clicking the Mining

Structuresentry in the Solution Explorer You cannot create new mining structures from

Trang 3

SSMS When you are past the wizard’s splash screen, you have the option of creating your

mining model from either an existing relational database (or data warehouse) or an

exist-ing OLAP cube (as shown in Figure 51.58)

You want to define a data mining model that can shed light on product (SKU) sales

char-acteristics and that will be based on the data and structure you have created so far in your

Comp Sales Unleashed cube For this example, you choose to use the existing OLAP cube

you already have (from the existing cube method)

You must now select the data mining technique you think will help you find value in

your cube’s data Clustering is probably the best one to start from because it finds natural

groupings of data in a multidimensional space It is useful when you want to see general

groupings in your data, such as hot spots You are trying to find just such things with

sales of products (for example, things that sell together or belong together) Figure 51.59

shows the data mining technique Microsoft Clustering being selected

Now you have to identify the source cube dimension to use to build the mining structure

As you can see in Figure 51.60, you choose Product Dimension to fit the mining

inten-tions stated earlier

You then select the case key or point of view for the mining analysis Figure 51.61

illus-trates the case to be based on the product dimension and at the SKU level (that is, the

individual product level)

FIGURE 51.58 Selecting the definition method to used for the mining structure in the Data

Mining Wizard

Trang 4

FIGURE 51.59 Using clustering to identify natural groups in the Data Mining Wizard

FIGURE 51.60 Identifying the product dimension as the basis for the mining structure in the

Data Mining Wizard

Trang 5

You now specify the attributes and measures as case-level columns of the new mining

structure Figure 51.62 shows the possible selections You can simply choose all the data

measures for this mining structure Then you click the Next button

As you can see in Figure 51.63, the next few wizard dialogs allow you to specify the

mining structure column’s content and data types (use the defaults that were detected for

most items unless we specifically describe something different), identify a filtered slice to

use for the model training (you don’t need to use this now because you want the whole

cube), and finally identify the number of cases to be reserved for model testing (use a

percentage of data for testing to be about 33%)

The mining model is now specified and must be named and processed Figure 51.64 shows

what you have named the mining structure (Product Dimension MS) and the mining

model name itself (Product Dimension MM) Also, you select the Allow Drill Through

option so you can look further into the data in the mining model after it is processed

Then you click the Finish button

When the Data Mining Wizard is complete, the mining structure viewer pops up, with

your mining structure case-level column’s specification (on the center left) and its

correla-tion to your cube (see Figure 51.65)

You must now process the mining structure to see what you come up with You do this by

selecting the Mining Model toolbar option and selecting the Process option You then see

the usual Process dialog, and you have to choose to run this (process the mining

struc-ture) After the mining structure processing completes, a quick click on the Cluster

FIGURE 51.61 Identifying the basic unit of analysis for the mining model in the Data Mining

Wizard

Trang 6

FIGURE 51.62 Specifying the measure for the mining model in the Data Mining Wizard

FIGURE 51.63 Specifying a column’s content, slice filters, and model data training

percent-ages

Diagram tab shows the results of the clustering analysis (see Figure 51.66) Notice that

because you selected to allow drill through, you can simply right-click any of the clusters

identified and see the data that is part of the cluster (and choose Drill Through) This

viewer clearly shows that there is some clustering of SKU values that might indicate

prod-ucts that sell together or belong together

Trang 7

ptg FIGURE 51.64 Naming the mining model and completing the Data Mining Wizard

FIGURE 51.65 Your new mining structure in the mining structure viewer

If you click the Cluster Profiles tab of this viewer, you see the data value profile

character-istics that were processed (see Figure 51.67)

Figure 51.68 shows the clusters of data values of each data measure in the data mining

model This characteristic information gives you a good idea of what the actual data

values are and how they cluster together

Trang 8

FIGURE 51.66 Clustering results and drilling through to the data in the mining model viewer

FIGURE 51.67 Cluster data profiles in the mining model viewer

Finally, you can see the cluster node contents at the detail level by changing the mining

model viewer type to Microsoft Generic Content Tree Viewer, which is just below the

Mining Model Viewer tab on top Figure 51.69 shows the detail contents of each model

node and its technical specification of a report format

If you want, you can now build new cube dimensions that can help you do predictive

modeling based on the findings of the data mining structures you just processed In this

way, you could predict sales units of one SKU and the number of naturally clustered SKUs

quite easily (based on the past data mining analysis) This type of predictive modeling is

very powerful

Trang 9

ptg FIGURE 51.68 Cluster characteristics of the data values for each measure in the mining

model viewer

FIGURE 51.69 The Microsoft Generic Content Tree Viewer of the cluster nodes in the mining

model viewer

Trang 10

SSIS

SSIS provides a robust means to move data between sources and targets Data can be

exported, validated, cleaned up, consolidated, transformed, and then imported into a

destination of any kind With any OLAP/SSAS implementation, you will undoubtedly

have to transform, clean, or preprocess data in some way You can now tap into SSIS

capa-bilities from within the SSAS platform

You can combine multiple column values into a single calculated destination column or

divide column values from a single source column into multiple destination columns You

might need to translate values in operational systems For example, many OLTP systems

use product codes stored as numeric data Few people are willing to memorize an entire

collection of product codes An entry of 100235 for a type of shampoo in a product

dimension table is useless to a vice president of marketing who is interested in how much

of that shampoo was sold in California in the past quarter

Cleanup and validation of data are critical to the data’s value in the data warehouse The

old saying “garbage in, garbage out” applies If data is missing, redundant, or inconsistent,

high-level aggregations can be inaccurate, so you should at least know that these

condi-tions exist Perhaps data should be rejected for use in the warehouse until the source data

can be reconciled If the shampoo of interest to the vice president is called Shamp in one

database and Shampoo in another, aggregations on either value would not produce

complete information about the product

The SSIS packages define the steps in a transformation workflow You can execute the

steps serially and in combinations of serially, in parallel, or conditionally For more

infor-mation on SSIS, refer to Chapter 46, “SQLCR: Developing SQL Server Objects in NET.”

OLAP Performance

Performance is a big emphasis of SSAS Usage-based aggregation is at the heart of much of

what you can do to help in this area In addition, the proactive caching mechanism in

SSAS has allowed much of what was previously a bottleneck (and a slowdown) to be

circumvented

When designing cubes for deployment, you should consider the data scope of all the data

accesses (that is, all the OLAP queries that will ever touch the cube) You should only

build a cube that is big enough to handle these known data scopes If you don’t have

requirements for something, you shouldn’t build it This helps keep things a smaller, more

manageable size (that is, smaller cubes), which translates into faster overall performance

for those who use the cube

You can also take caching to the extreme by relocating the OLAP physical storage

compo-nents on a solid-state disk device (that is, a persistent memory device) This can give you

tenfold performance gains The price of this type of technology has been dramatically

Định dạng
Số trang	10
Dung lượng	900,75 KB