Hướng dẫn học Microsoft SQL Server 2008 part 154 pdf

Once the dimension has been added to the cube, review the dimension usage view to ensure that the dimension is appropriately related to all measure groups.. Dimension usage The dimension

Trang 1

Service view, exposing only what makes sense to them From the designer’s perspective, limiting the

number of cubes and databases keeps the number of linked dimensions and measures to a minimum

Using the Cube Wizard has been covered in earlier sections, both from the top-down approach

(see ‘‘Analysis Services Quick Start’’) using the cube design to generate corresponding relational and

Integration Services packages, and from the bottom-up approach (see ‘‘Creating a Cube’’) Once the cube

structure has been created, it is refined using the Cube Designer

Open any cube from the Solution Explorer to use the Cube Designer, shown in Figure 71-7 The Cube

Designer presents information in several tabbed views described in the remainder of this section

FIGURE 71-7

Cube Designer

Cube structure

The cube structure view is the primary design surface for defining a cube Along with the ever-present

Solution Explorer and Properties panes, three panes present the cube’s structure:

■ Data Source View: This pane, located in the center of the view, shows a chosen portion of the data source view on which the cube is built Each table is color-coded: yellow for fact

Trang 2

tables, blue for dimensions, and white for neither The tables available can be changed by

right-clicking on the design surface and choosing an option from the context menu

Right-clicking a table presents options to hide that table or to show related tables Diagrams defined

within the data source view can be used as well by selecting the Copy Diagram From option

on the context menu Additionally, the toolbar can be used to toggle between diagram and

tree views of the table and relationship data; the tree view can be very useful for answering

questions about complex diagrams

■ Measures: This pane, located in the upper-left section of the view, lists all of the cube’s

measures organized by measure group Both the toolbar and the context menu toggle between

the tree and grid view of measures

■ Dimensions: This pane, located in the lower-left section of the view, lists all dimensions

asso-ciated with the cube This list may be a subset of the defined dimensions from the Solution

Explorer if not every dimension is in the current cube Each dimension in the Dimensions

pane shows the user hierarchies and attributes, and has a link to edit that dimension in the

Dimension Designer

Because the order in which measures and dimensions appear in their respective lists determines the

order in which users see them presented, the lists can be reordered using either the right-click Move

Up/Move Down options or drag-and-drop while in tree view Like the Dimension Designer, changes to a

cube must be deployed before they can be browsed

Measures

Each measure is based on a column from the data source view and an aggregate function The aggregate

function determines how data is processed from the fact table and how it is summarized For example,

consider a simple fact table with columns of day, store, and sales amount being read into a cube with a

sales amount measure, a stores dimension, and a time dimension with year, month, and day attributes

If the aggregate function for the sales amount measure isSum, then rows are read into the cube’s leaf

level by summing the sales amount for any rows with the same store/day combinations Higher levels,

such as the store/month level, are determined by adding up individual days in that month However, if

the aggregate function isMin, then the smallest value is saved from all the sales on a given day, and the

store/month level would be determined as the smallest of all the days in that month

Available aggregate functions include the following:

■ Sum: Adds the values of all children

■ Min: Minimum value of children

■ Max: Maximum value of children

■ Count: Count of the corresponding rows in the fact table

■ Distinct Count: Counts unique occurrences of the column value (e.g., Unique Customer

Count)

■ None: No aggregation performed Any value not read directly from the fact table will be null

■ AverageOfChildren: Averages non-empty children

■ FirstChild: Value of the first child member as evaluated along the time dimension

■ FirstNonEmpty: Value of the first non-empty child member as evaluated along the time

dimension

Trang 3

■ LastChild: Value of the last child member as evaluated along the time dimension.

■ LastNonEmpty: Value of the last non-empty child member as evaluated along the time dimension

■ ByAccount: Aggregation varies based on the values in the Account Dimension The dimen-sion’sTypeproperty must beAccounts, and one of the dimension’s attributes must have the

Typeproperty set toAccountType The column corresponding toAccountTypecontains defined strings that identify the type of account, and thus the aggregation method, to Analysis Services

The best way to add a new measure is to right-click in the Measures pane and choose New Measure

Specify the aggregation function and table/column combination in the New Measure dialog The new

measure will automatically be added to the appropriate measure group Measure groups are created

for each fact table plus any distinct count measure defined These groups correspond to different SQL

queries that are run to retrieve the cube’s data

Beyond measures derived directly from fact tables, calculated measures can be added by the Business

Intelligence Wizard and directly via the calculations view

For more information about calculated measures, see Chapter 72, ‘‘Programming MDX Queries.’’

Measures can be presented to the user grouped in folders by setting theDisplayFolderproperty to

the name of the folder in which the measure should appear It is also good practice to assign each

mea-sure a default format by setting theFormatStringproperty, either by choosing one of the common

formats from the list or by directly entering a custom format

Each cube can have a default measure specified if desired, which provides a measure for queries when

no measure is explicitly requested To set the default measure, select the cube name at the top of the

Measures pane tree view, and set theDefaultMeasureproperty by selecting a measure from the list

Cube dimensions

The hierarchies and attributes for each dimension can be either disabled (Enabledand

AttributeHierarchyEnabledproperties, respectively) or made invisible (Visibleand

AttributeHierarchyVisibleproperties, respectively) if appropriate for a particular cube context

(see ‘‘Visibility and Organization’’ earlier in the chapter, for example scenarios) Access these settings

in the Dimensions pane and then adjust the associated properties These properties are specific to a

dimension’s role in the cube and do not change the underlying dimension design

Dimensions can be added to the cube by right-clicking the Dimensions pane and choosing New

Dimen-sion Once the dimension has been added to the cube, review the dimension usage view to ensure that

the dimension is appropriately related to all measure groups

Dimension usage

The dimension usage view displays a table showing how each dimension is related to each measure

group With dimensions and measure groups as row and column headers, respectively, each cell of the

table defines the relationship between the corresponding dimension/measure group pair Drop-down lists

in the upper-left corner enable rows and columns to be hidden to simplify large views

Trang 4

The Cube Designer creates default relationships based on the data source view relationships, which are

accurate in most cases, although any linked objects require special review because they are not derived

from the data source view Click on the ellipses in any table cell to launch the Define Relationship

dia-log and choose the relationship type Different relationship types require different mapping information,

as described in the following sections

No relationship

For a database with more than one fact table, there will likely be dimensions that don’t relate to some

measure groups Signified by gray table cells with no annotation, this setting is expected for measure

group/dimension pairs that don’t share a meaningful relationship When a query is run that specifies

dimension information unrelated to a given measure, it is ignored by default

Regular

The regular relationship is a fact table relating directly to a dimension table, as in a star schema Within

the Define Relationship dialog, choose the Granularity attribute as the dimension attribute that relates

directly to the measure group, usually the dimension’s key attribute Once the granularity attribute has

been chosen, specify the fact table column names that match the granularity attribute’s key columns in

the relationships grid at the bottom of the dialog

Choosing to relate a dimension to a measure group via a non-key attribute does work, but it must be

considered in the context of the dimension’s natural hierarchy (see ‘‘Attribute Relationships,’’ earlier

in the chapter) Think of the natural hierarchy as a tree with the key attribute at the bottom Any

attribute at or above the related attribute will be related to the measure group and behave as expected

Any attribute below or on a different branch from the related attribute will have ‘‘no relationship,’’ as

described in the preceding section

Fact

Fact dimensions are those derived directly from the fact table when a fact table contains both fact and

dimension data No settings are required beyond the relationship type Only one dimension can have a

fact relationship with a given measure group, effectively requiring a single fact dimension per fact table

containing all dimension data in that fact table

Referenced

When dimension tables are connected to a fact table in a snowflake schema, the dimension could

be implemented as a single dimension that has a regular relationship with the measure group, or the

dimension could be implemented as a regular dimension plus one or more referenced dimensions A

referenced dimension is indirectly related to the measure group through another dimension The single

dimension with a regular relationship is certainly simpler, but if a referenced dimension can be created

and used with multiple chains of different regular dimensions (e.g., a Geography dimension used with

both Store and Customer dimensions), then the referenced option will be more storage and process

efficient Referenced relationships can chain together dimensions to any depth

Create the referenced relationship in the Define Relationship dialog by selecting an intermediate

dimension by which the referenced dimension relates to the measure group Then choose the attributes

by which the referenced and intermediate dimensions relate Normally, the Materialize option should be

selected for best performance

Trang 5

Relationships discussed so far have all been one-to-many: One store has many sales transactions, one

country has many customers For an example of a many-to-many relationship, consider tracking book

sales by book and author, whereby each book can have many authors and each author can create many

books The many-to-many relationship can be modeled in Analysis Services, but it requires a specific

configuration beginning with the data source view (see Figure 71-8) The many-to-many relationship

is implemented via an intermediate fact table that lists each pairing of the regular and many-to-many

dimensions For other slightly simpler applications, the regular dimension can be omitted and the

inter-mediate fact table related directly to the fact table

FIGURE 71-8

Example of a many-to-many relationship

SalesItemID

FactSalesItem

PK

FK1 BookID

BookID AuthorID

FactBookAuthor

PK, FK1

PK, FK2 BookID

dimBook

Title .

AuthorID

dimAuthor

Fact Table Intermediate Fact

Table

Regular Dimension

Many-to-Many Dimension

FirstName LastName

The Define Relationship dialog only requires the name of a measure group created on the

intermedi-ate fact table to configure the many-to-many relationship Other configuration is derived from the data

source view

Many-to-many relationships have the query side effect of generating result sets that don’t total in

an intuitive way Using the book sales example, assume that many of the books sold have multiple

authors A query showing books by author will display a list of numbers whose arithmetic total is

greater than the total number of books sold Often, this will be expected and understood behavior,

although some applications will require MDX scripting to gain the desired behavior in all views of

the cube

Calculations

The Calculations tab enables the definition of calculated measures, sets of dimension members, and

dynamic control over cube properties While the Calculations tab offers forms to view many of the

objects defined here, the underlying language is MDX (Multidimensional Expressions), so details on how

to manipulate calculations are covered in the next chapter

For more information about defining scripting, see Chapter 72, ‘‘Programming MDX Queries.’’

Trang 6

A Key Performance Indicator (KPI) is a server-side calculation meant to define an organization’s most

important metrics These metrics, such as net profit, client utilization, or funnel conversion rate, are

frequently used in dashboards or other reporting tools for distribution at all levels throughout the

organization Using a KPI to host such a metric helps ensure consistent calculation and presentation

Within the KPI’s view, an individual KPI consists of several components:

■ The actual value of the metric, entered as an MDX expression that calculates the metric

■ The goal for the metric — for example, what the budget says net profit should be The goal is

entered as an MDX expression that calculates the metric’s goal value

■ The status for the metric, comparing the actual and goal values This is entered as an MDX

expression that returns values between -1 (very bad) to+1 (very good) A graphic can also be

chosen as a suggestion to applications that present KPI data, helping to keep the presentation

consistent across applications

■ The trend for the metric, showing which direction the metric is headed Like status, trend is entered

as an MDX expression that returns values between -1 and+1, with a suggested graphic

As KPI definitions are entered, use the toolbar to switch between form (definition) and browser mode

to view results The Calculations Tools pane (lower left) provides cube metadata and the MDX

func-tions list for drag-and drop-creation of MDX expressions The Templates tab provides templates for some

common KPIs

Actions

The Actions tab of the Cube Designer provides a way to define actions that a client can perform for a

given context For example, a drillthrough action can show detailed rows behind a total, or a reporting

action can launch a report based on a dimension attribute’s value Actions can be specific to any

dis-played data, including individual cells and dimension members, resulting in more detailed analysis or

even integration of the analysis application into a larger data management framework

New in 2008

Drillthrough actions now use cube data to display their results Prior versions required access to the

underlying relational data to provide the display of detail data

Partitions

Partitions are the unit of storage in Analysis Services, storing the data of a measure group Initially, the

Cube Designer creates a single MOLAP partition for each measure group MOLAP is the preferred

stor-age mode for most scenarios, but setting partition sizes and aggregations is key to both effective

process-ing and efficient query execution

Trang 7

Partition sizing

Cube development normally begins by using a small but representative slice of the data, yet production

volumes are frequently quite large, with cubes summarizing a billion rows per quarter and more A partitioning

strategy is needed to manage data through both the relational and Analysis Services databases, beginning

with the amount of data to be kept online and the size of the partitions that will hold that data

The amount of data to be kept online is a trade-off between the desire for access to historical data and

the cost of storing that data Once the retention policy has been determined, there are many possible

ways to partition that data into manageable chunks, but a time-based approach is widely used, usually

keeping either a year’s or a month’s worth of data in a single partition For partitions being populated

on the front end, the size of the partition is important for the time it takes to process — processing time

should be kept to a few hours at most For partitions being deleted at the back end, the size of the

par-tition is important for the amount of data it removes at one time

Matching the partition size and retention between the relational database and Analysis Services is a

simple and effective approach As the number of rows imported each day grows, smaller partition sizes

(such as week or day) may be required to expedite initial processing As long as the aggregation design

is consistent across partitions, Analysis Services will allow smaller partitions to be merged, keeping the

overall count at a manageable level

Best Practice

Take time to consider retention, processing, and partitioning strategies before an application goes into

production Once in place, changes may be very expensive given the large quantities of data involved

Creating partitions

The key to accurate partitions is including every data row exactly once Because it is the combination

of all partitions that is reported by the cube, including rows multiple times will inflate the results A

common mistake is to add new partitions while forgetting to delete the default partition created by the

Designer; because the new partitions contain one copy of all the source data, and the default partition

contains another, cube results are exactly double the true values

The partition view consists of one collapsible pane for each measure group, each pane containing a grid

listing the currently defined partitions for that measure group Highlighting a grid row will select that

partition and display its associated properties in the Properties pane

Start the process of adding a partition by clicking the New Partition link, which launches a series of

Par-tition Wizard dialogs:

■ Specify Source Information: Choose the appropriate Measure group (the default is the measure group selected when the wizard is launched) If the source table is included as part

of the data source view, then it will appear in the Available tables list and can be selected there If the source table is not part of the data source view, then choose the appropriate data source from the Look in list and press the Find Tables button to list other tables with the same structure Optionally, enter a portion of the source table’s name in the Filter Tables text box to limit the list of tables returned

Trang 8

■ Restrict Rows: If the source table contains exactly the rows to be included in the partition,

then skip this page If the source table contains more rows than should be included in the

partition, then select the ‘‘Specify query to restrict rows’’ option, and the Query box will be

populated with a fully populatedSELECTquery missing only theWHEREclause Supply the

missing constraint(s) in the Query window and press the Check button to validate syntax

■ Processing and Storage Locations: The defaults will suffice for most situations If necessary,

choose options to balance load across disks and servers

■ Completing the Wizard: Supply a name for the partition — generally the same name as the

measure group suffixed with the partition slice (e.g.,Internet_Orders_2004) If

aggre-gations have not been defined, define them now If aggreaggre-gations have already been defined

for another partition, then copy these existing aggregations from that partition to ensure

consistency across partitions

Once a partition has been added, the name and source can be edited by clicking in the appropriate cell

in the partition grid

Aggregation design

The best trade-off between processing time, partition storage, and query performance is defining only

aggregations that help answer queries commonly run against a cube Analysis Services’ usage-based

optimization tracks queries run against the cube and then designs aggregations to meet that query load

However, representative query history usually requires a period of production use, so the aggregations

can also be based on intelligent guesses

New in 2008

The Cube Designer now includes an Aggregations tab that allows summary and detailed views of

aggregations for each partition It introduces the concept of named aggregation designs, which are

groups of aggregations specific to a measure group that can be assigned to its associated partitions

A good approach is to first create a modest number of aggregations using the Aggregation Design

Wizard and assign that design to all active partitions Then deploy the cube for use to collect a realistic

query history by enabling query logging (see Analysis Server ‘‘Log’’ properties by right clicking on

the server in SQL Server Management Studio) Finally, use the query log to generate a more efficient

aggregation design based on usage-based optimization

Aggregation Design Wizard

The Aggregation Design Wizard will create aggregations based on intelligent guesses Invoke the wizard

from the toolbar on the Aggregations tab of the Cube Designer The wizard steps through several pages:

■ Select Partitions to Modify: Each run of the wizard is specific to the measure group selected

when the wizard is invoked Check all the partitions to be updated with the new

aggrega-tion design At least one partiaggrega-tion must be selected, and designs can also be moved to other

partitions later

Trang 9

■ Review Aggregation Usage: All the attributes for every dimension related to the measure group are presented with their usage settings The default generally suffices, but options include the following:

■ Full: Include this attribute in every aggregation

■ None: Don’t include this attribute in any aggregation

■ Unrestricted: Considers this attribute for inclusion in the design without restrictions

■ Specify Object Counts: Accurate row counts for each partition and dimension table drive how aggregations are calculated Pressing the Count button will provide current row counts, with the Estimated Count reflecting the total number of rows currently in the database, and the Partition Count reflecting the number of rows that will be included in the first partition

Numbers can be manually entered if the current data source is different from the target design (e.g., a small development data set)

■ Set Aggregation Options: This page actually designs the aggregations Options on the left tell the designer when to stop creating new aggregations, while the graph on the right provides estimated storage versus performance gain Press the Continue button to create an aggregation design before pressing the Next button

There are no strict rules, but some general guidelines may help:

■ Unless storage is the primary constraint, target an initial performance gain of 10–20 per-cent On the most complex cubes this will be difficult to obtain with a reasonable number

of aggregations (and associated processing time) On simpler cubes more aggregations can

be afforded, but they are already so fast that the additional aggregations don’t buy much

■ Keep the total number of aggregations under 200 (aggregation count is shown at the bottom, just above the progress bar)

■ Look for an obvious knee (flattening of the curve) in the storage/performance graph and stop there

■ Completing the Wizard: Give the new aggregation design a name Choose either to save the design or to save and process it

Best Practice

The best aggregations are usage-based: Collect usage history in the query log and use it to

opti-mize each partition’s aggregation design periodically Query logging must be enabled in Analysis

Server’s Server properties, in the Log\QueryLog section: Set CreateQueryLogTable to true, define a

QueryLogConnectionString, and specify a QueryLogTableName

Aggregations tab

The toolbar of this tab can launch the wizard as described earlier and the usage-based optimization

wiz-ard as well The pane itself toggles between standwiz-ard and advanced views Standwiz-ard view lists all the

measure groups and summarizes which aggregation designs are assigned to which partitions Right-click

a design’s name to assign partitions to it

Trang 10

The advanced view allows detailed exploration and manual modification of an aggregation design.

Choose the measure group and design name in the header, and a table of dimensions vs individual

aggregations appears Any check that appears in the table indicates that the aggregation (such as A5)

includes summaries by the indicated dimension attributes (such as Product Line and Quarter) Manual

updates to a design are generally not effective because usage-based optimization tends to be more

accurate than individual judgment, but cases do arise in which problem queries can be addressed

by a well-placed aggregation Use the toolbar to copy an existing design to a new name, and then

modify as needed New columns (aggregations) can be copied/added to the table using the toolbar

as well

Perspectives

A perspective is a view of a cube that hides items and functionality not relevant to a specific purpose

Perspectives appear as additional cubes to the end user, so each group within the company can have its

own ‘‘cube,’’ each just a targeted view of the same data

Add a perspective by either right-clicking or using the toolbar, and a new column will appear Overwrite

the default name at the top of the column with a meaningful handle, and then uncheck the items not

relevant to the perspective A default measure can be chosen for the perspective as well — look for the

DefaultMeasureobject type in the second row of the grid

Data Storage

The data storage strategy chosen for a cube and its components determines not only how the cube will

be stored, but also how it can be processed Storage settings can be set at three different levels, with

parent settings determining defaults for the children:

■ Cube: Begin by establishing storage settings at the cube level to set defaults for the entire

cube (dimensions, measure groups, and partitions) Access the Cube Storage Settings dialog by

choosing a cube in the Cube Designer and then clicking the ellipses on the Proactive Caching

property of the cube

■ Measure Group: Used in the unlikely case that storage settings for a particular measure group

differ from cube defaults Access the Measure Group Storage Settings dialog by either clicking

the ellipses on the measure group’s Proactive Caching property in the Cube Designer or by

choosing the Storage Settings link in partition view without highlighting a specific partition

■ Object level (specific partition or dimension): Sets the storage options for a single object

Access the Dimension Storage Settings dialog by clicking the ellipses on the dimension’s

Proac-tive Caching property in the Dimension Designer Access the Partition Storage Settings dialog

by selecting a partition in the partition view and clicking the Storage Settings link

Each of the storage settings dialogs are essentially the same, differing only in the scope of the setting’s

effect The main page of the dialog contains a slider that selects preconfigured option settings — from

the most real-time (far left) to the least real-time (far right) Each ‘‘stop’’ on the slider displays a

sum-mary of the options available Alternately, position the slider and click the Options button to examine

the options associated with a particular position Beyond these few presets, the Storage Options dialog

enables a wide range of behaviors

Định dạng
Số trang	10
Dung lượng	693,14 KB