Once the dimension has been added to the cube, review the dimension usage view to ensure that the dimension is appropriately related to all measure groups.. Dimension usage The dimension
Trang 1Service view, exposing only what makes sense to them From the designer’s perspective, limiting the
number of cubes and databases keeps the number of linked dimensions and measures to a minimum
Using the Cube Wizard has been covered in earlier sections, both from the top-down approach
(see ‘‘Analysis Services Quick Start’’) using the cube design to generate corresponding relational and
Integration Services packages, and from the bottom-up approach (see ‘‘Creating a Cube’’) Once the cube
structure has been created, it is refined using the Cube Designer
Open any cube from the Solution Explorer to use the Cube Designer, shown in Figure 71-7 The Cube
Designer presents information in several tabbed views described in the remainder of this section
FIGURE 71-7
Cube Designer
Cube structure
The cube structure view is the primary design surface for defining a cube Along with the ever-present
Solution Explorer and Properties panes, three panes present the cube’s structure:
■ Data Source View: This pane, located in the center of the view, shows a chosen portion of the data source view on which the cube is built Each table is color-coded: yellow for fact
Trang 2tables, blue for dimensions, and white for neither The tables available can be changed by
right-clicking on the design surface and choosing an option from the context menu
Right-clicking a table presents options to hide that table or to show related tables Diagrams defined
within the data source view can be used as well by selecting the Copy Diagram From option
on the context menu Additionally, the toolbar can be used to toggle between diagram and
tree views of the table and relationship data; the tree view can be very useful for answering
questions about complex diagrams
■ Measures: This pane, located in the upper-left section of the view, lists all of the cube’s
measures organized by measure group Both the toolbar and the context menu toggle between
the tree and grid view of measures
■ Dimensions: This pane, located in the lower-left section of the view, lists all dimensions
asso-ciated with the cube This list may be a subset of the defined dimensions from the Solution
Explorer if not every dimension is in the current cube Each dimension in the Dimensions
pane shows the user hierarchies and attributes, and has a link to edit that dimension in the
Dimension Designer
Because the order in which measures and dimensions appear in their respective lists determines the
order in which users see them presented, the lists can be reordered using either the right-click Move
Up/Move Down options or drag-and-drop while in tree view Like the Dimension Designer, changes to a
cube must be deployed before they can be browsed
Measures
Each measure is based on a column from the data source view and an aggregate function The aggregate
function determines how data is processed from the fact table and how it is summarized For example,
consider a simple fact table with columns of day, store, and sales amount being read into a cube with a
sales amount measure, a stores dimension, and a time dimension with year, month, and day attributes
If the aggregate function for the sales amount measure isSum, then rows are read into the cube’s leaf
level by summing the sales amount for any rows with the same store/day combinations Higher levels,
such as the store/month level, are determined by adding up individual days in that month However, if
the aggregate function isMin, then the smallest value is saved from all the sales on a given day, and the
store/month level would be determined as the smallest of all the days in that month
Available aggregate functions include the following:
■ Sum: Adds the values of all children
■ Min: Minimum value of children
■ Max: Maximum value of children
■ Count: Count of the corresponding rows in the fact table
■ Distinct Count: Counts unique occurrences of the column value (e.g., Unique Customer
Count)
■ None: No aggregation performed Any value not read directly from the fact table will be null
■ AverageOfChildren: Averages non-empty children
■ FirstChild: Value of the first child member as evaluated along the time dimension
■ FirstNonEmpty: Value of the first non-empty child member as evaluated along the time
dimension
Trang 3■ LastChild: Value of the last child member as evaluated along the time dimension.
■ LastNonEmpty: Value of the last non-empty child member as evaluated along the time dimension
■ ByAccount: Aggregation varies based on the values in the Account Dimension The dimen-sion’sTypeproperty must beAccounts, and one of the dimension’s attributes must have the
Typeproperty set toAccountType The column corresponding toAccountTypecontains defined strings that identify the type of account, and thus the aggregation method, to Analysis Services
The best way to add a new measure is to right-click in the Measures pane and choose New Measure
Specify the aggregation function and table/column combination in the New Measure dialog The new
measure will automatically be added to the appropriate measure group Measure groups are created
for each fact table plus any distinct count measure defined These groups correspond to different SQL
queries that are run to retrieve the cube’s data
Beyond measures derived directly from fact tables, calculated measures can be added by the Business
Intelligence Wizard and directly via the calculations view
For more information about calculated measures, see Chapter 72, ‘‘Programming MDX Queries.’’
Measures can be presented to the user grouped in folders by setting theDisplayFolderproperty to
the name of the folder in which the measure should appear It is also good practice to assign each
mea-sure a default format by setting theFormatStringproperty, either by choosing one of the common
formats from the list or by directly entering a custom format
Each cube can have a default measure specified if desired, which provides a measure for queries when
no measure is explicitly requested To set the default measure, select the cube name at the top of the
Measures pane tree view, and set theDefaultMeasureproperty by selecting a measure from the list
Cube dimensions
The hierarchies and attributes for each dimension can be either disabled (Enabledand
AttributeHierarchyEnabledproperties, respectively) or made invisible (Visibleand
AttributeHierarchyVisibleproperties, respectively) if appropriate for a particular cube context
(see ‘‘Visibility and Organization’’ earlier in the chapter, for example scenarios) Access these settings
in the Dimensions pane and then adjust the associated properties These properties are specific to a
dimension’s role in the cube and do not change the underlying dimension design
Dimensions can be added to the cube by right-clicking the Dimensions pane and choosing New
Dimen-sion Once the dimension has been added to the cube, review the dimension usage view to ensure that
the dimension is appropriately related to all measure groups
Dimension usage
The dimension usage view displays a table showing how each dimension is related to each measure
group With dimensions and measure groups as row and column headers, respectively, each cell of the
table defines the relationship between the corresponding dimension/measure group pair Drop-down lists
in the upper-left corner enable rows and columns to be hidden to simplify large views
Trang 4The Cube Designer creates default relationships based on the data source view relationships, which are
accurate in most cases, although any linked objects require special review because they are not derived
from the data source view Click on the ellipses in any table cell to launch the Define Relationship
dia-log and choose the relationship type Different relationship types require different mapping information,
as described in the following sections
No relationship
For a database with more than one fact table, there will likely be dimensions that don’t relate to some
measure groups Signified by gray table cells with no annotation, this setting is expected for measure
group/dimension pairs that don’t share a meaningful relationship When a query is run that specifies
dimension information unrelated to a given measure, it is ignored by default
Regular
The regular relationship is a fact table relating directly to a dimension table, as in a star schema Within
the Define Relationship dialog, choose the Granularity attribute as the dimension attribute that relates
directly to the measure group, usually the dimension’s key attribute Once the granularity attribute has
been chosen, specify the fact table column names that match the granularity attribute’s key columns in
the relationships grid at the bottom of the dialog
Choosing to relate a dimension to a measure group via a non-key attribute does work, but it must be
considered in the context of the dimension’s natural hierarchy (see ‘‘Attribute Relationships,’’ earlier
in the chapter) Think of the natural hierarchy as a tree with the key attribute at the bottom Any
attribute at or above the related attribute will be related to the measure group and behave as expected
Any attribute below or on a different branch from the related attribute will have ‘‘no relationship,’’ as
described in the preceding section
Fact
Fact dimensions are those derived directly from the fact table when a fact table contains both fact and
dimension data No settings are required beyond the relationship type Only one dimension can have a
fact relationship with a given measure group, effectively requiring a single fact dimension per fact table
containing all dimension data in that fact table
Referenced
When dimension tables are connected to a fact table in a snowflake schema, the dimension could
be implemented as a single dimension that has a regular relationship with the measure group, or the
dimension could be implemented as a regular dimension plus one or more referenced dimensions A
referenced dimension is indirectly related to the measure group through another dimension The single
dimension with a regular relationship is certainly simpler, but if a referenced dimension can be created
and used with multiple chains of different regular dimensions (e.g., a Geography dimension used with
both Store and Customer dimensions), then the referenced option will be more storage and process
efficient Referenced relationships can chain together dimensions to any depth
Create the referenced relationship in the Define Relationship dialog by selecting an intermediate
dimension by which the referenced dimension relates to the measure group Then choose the attributes
by which the referenced and intermediate dimensions relate Normally, the Materialize option should be
selected for best performance
Trang 5Relationships discussed so far have all been one-to-many: One store has many sales transactions, one
country has many customers For an example of a many-to-many relationship, consider tracking book
sales by book and author, whereby each book can have many authors and each author can create many
books The many-to-many relationship can be modeled in Analysis Services, but it requires a specific
configuration beginning with the data source view (see Figure 71-8) The many-to-many relationship
is implemented via an intermediate fact table that lists each pairing of the regular and many-to-many
dimensions For other slightly simpler applications, the regular dimension can be omitted and the
inter-mediate fact table related directly to the fact table
FIGURE 71-8
Example of a many-to-many relationship
SalesItemID
FactSalesItem
PK
FK1 BookID
BookID AuthorID
FactBookAuthor
PK, FK1
PK, FK2 BookID
dimBook
Title .
AuthorID
dimAuthor
Fact Table Intermediate Fact
Table
Regular Dimension
Many-to-Many Dimension
FirstName LastName
The Define Relationship dialog only requires the name of a measure group created on the
intermedi-ate fact table to configure the many-to-many relationship Other configuration is derived from the data
source view
Many-to-many relationships have the query side effect of generating result sets that don’t total in
an intuitive way Using the book sales example, assume that many of the books sold have multiple
authors A query showing books by author will display a list of numbers whose arithmetic total is
greater than the total number of books sold Often, this will be expected and understood behavior,
although some applications will require MDX scripting to gain the desired behavior in all views of
the cube
Calculations
The Calculations tab enables the definition of calculated measures, sets of dimension members, and
dynamic control over cube properties While the Calculations tab offers forms to view many of the
objects defined here, the underlying language is MDX (Multidimensional Expressions), so details on how
to manipulate calculations are covered in the next chapter
For more information about defining scripting, see Chapter 72, ‘‘Programming MDX Queries.’’
Trang 6A Key Performance Indicator (KPI) is a server-side calculation meant to define an organization’s most
important metrics These metrics, such as net profit, client utilization, or funnel conversion rate, are
frequently used in dashboards or other reporting tools for distribution at all levels throughout the
organization Using a KPI to host such a metric helps ensure consistent calculation and presentation
Within the KPI’s view, an individual KPI consists of several components:
■ The actual value of the metric, entered as an MDX expression that calculates the metric
■ The goal for the metric — for example, what the budget says net profit should be The goal is
entered as an MDX expression that calculates the metric’s goal value
■ The status for the metric, comparing the actual and goal values This is entered as an MDX
expression that returns values between -1 (very bad) to+1 (very good) A graphic can also be
chosen as a suggestion to applications that present KPI data, helping to keep the presentation
consistent across applications
■ The trend for the metric, showing which direction the metric is headed Like status, trend is entered
as an MDX expression that returns values between -1 and+1, with a suggested graphic
As KPI definitions are entered, use the toolbar to switch between form (definition) and browser mode
to view results The Calculations Tools pane (lower left) provides cube metadata and the MDX
func-tions list for drag-and drop-creation of MDX expressions The Templates tab provides templates for some
common KPIs
Actions
The Actions tab of the Cube Designer provides a way to define actions that a client can perform for a
given context For example, a drillthrough action can show detailed rows behind a total, or a reporting
action can launch a report based on a dimension attribute’s value Actions can be specific to any
dis-played data, including individual cells and dimension members, resulting in more detailed analysis or
even integration of the analysis application into a larger data management framework
New in 2008
Drillthrough actions now use cube data to display their results Prior versions required access to the
underlying relational data to provide the display of detail data
Partitions
Partitions are the unit of storage in Analysis Services, storing the data of a measure group Initially, the
Cube Designer creates a single MOLAP partition for each measure group MOLAP is the preferred
stor-age mode for most scenarios, but setting partition sizes and aggregations is key to both effective
process-ing and efficient query execution
Trang 7Partition sizing
Cube development normally begins by using a small but representative slice of the data, yet production
volumes are frequently quite large, with cubes summarizing a billion rows per quarter and more A partitioning
strategy is needed to manage data through both the relational and Analysis Services databases, beginning
with the amount of data to be kept online and the size of the partitions that will hold that data
The amount of data to be kept online is a trade-off between the desire for access to historical data and
the cost of storing that data Once the retention policy has been determined, there are many possible
ways to partition that data into manageable chunks, but a time-based approach is widely used, usually
keeping either a year’s or a month’s worth of data in a single partition For partitions being populated
on the front end, the size of the partition is important for the time it takes to process — processing time
should be kept to a few hours at most For partitions being deleted at the back end, the size of the
par-tition is important for the amount of data it removes at one time
Matching the partition size and retention between the relational database and Analysis Services is a
simple and effective approach As the number of rows imported each day grows, smaller partition sizes
(such as week or day) may be required to expedite initial processing As long as the aggregation design
is consistent across partitions, Analysis Services will allow smaller partitions to be merged, keeping the
overall count at a manageable level
Best Practice
Take time to consider retention, processing, and partitioning strategies before an application goes into
production Once in place, changes may be very expensive given the large quantities of data involved
Creating partitions
The key to accurate partitions is including every data row exactly once Because it is the combination
of all partitions that is reported by the cube, including rows multiple times will inflate the results A
common mistake is to add new partitions while forgetting to delete the default partition created by the
Designer; because the new partitions contain one copy of all the source data, and the default partition
contains another, cube results are exactly double the true values
The partition view consists of one collapsible pane for each measure group, each pane containing a grid
listing the currently defined partitions for that measure group Highlighting a grid row will select that
partition and display its associated properties in the Properties pane
Start the process of adding a partition by clicking the New Partition link, which launches a series of
Par-tition Wizard dialogs:
■ Specify Source Information: Choose the appropriate Measure group (the default is the measure group selected when the wizard is launched) If the source table is included as part
of the data source view, then it will appear in the Available tables list and can be selected there If the source table is not part of the data source view, then choose the appropriate data source from the Look in list and press the Find Tables button to list other tables with the same structure Optionally, enter a portion of the source table’s name in the Filter Tables text box to limit the list of tables returned
Trang 8■ Restrict Rows: If the source table contains exactly the rows to be included in the partition,
then skip this page If the source table contains more rows than should be included in the
partition, then select the ‘‘Specify query to restrict rows’’ option, and the Query box will be
populated with a fully populatedSELECTquery missing only theWHEREclause Supply the
missing constraint(s) in the Query window and press the Check button to validate syntax
■ Processing and Storage Locations: The defaults will suffice for most situations If necessary,
choose options to balance load across disks and servers
■ Completing the Wizard: Supply a name for the partition — generally the same name as the
measure group suffixed with the partition slice (e.g.,Internet_Orders_2004) If
aggre-gations have not been defined, define them now If aggreaggre-gations have already been defined
for another partition, then copy these existing aggregations from that partition to ensure
consistency across partitions
Once a partition has been added, the name and source can be edited by clicking in the appropriate cell
in the partition grid
Aggregation design
The best trade-off between processing time, partition storage, and query performance is defining only
aggregations that help answer queries commonly run against a cube Analysis Services’ usage-based
optimization tracks queries run against the cube and then designs aggregations to meet that query load
However, representative query history usually requires a period of production use, so the aggregations
can also be based on intelligent guesses
New in 2008
The Cube Designer now includes an Aggregations tab that allows summary and detailed views of
aggregations for each partition It introduces the concept of named aggregation designs, which are
groups of aggregations specific to a measure group that can be assigned to its associated partitions
A good approach is to first create a modest number of aggregations using the Aggregation Design
Wizard and assign that design to all active partitions Then deploy the cube for use to collect a realistic
query history by enabling query logging (see Analysis Server ‘‘Log’’ properties by right clicking on
the server in SQL Server Management Studio) Finally, use the query log to generate a more efficient
aggregation design based on usage-based optimization
Aggregation Design Wizard
The Aggregation Design Wizard will create aggregations based on intelligent guesses Invoke the wizard
from the toolbar on the Aggregations tab of the Cube Designer The wizard steps through several pages:
■ Select Partitions to Modify: Each run of the wizard is specific to the measure group selected
when the wizard is invoked Check all the partitions to be updated with the new
aggrega-tion design At least one partiaggrega-tion must be selected, and designs can also be moved to other
partitions later
Trang 9■ Review Aggregation Usage: All the attributes for every dimension related to the measure group are presented with their usage settings The default generally suffices, but options include the following:
■ Full: Include this attribute in every aggregation
■ None: Don’t include this attribute in any aggregation
■ Unrestricted: Considers this attribute for inclusion in the design without restrictions
■ Specify Object Counts: Accurate row counts for each partition and dimension table drive how aggregations are calculated Pressing the Count button will provide current row counts, with the Estimated Count reflecting the total number of rows currently in the database, and the Partition Count reflecting the number of rows that will be included in the first partition
Numbers can be manually entered if the current data source is different from the target design (e.g., a small development data set)
■ Set Aggregation Options: This page actually designs the aggregations Options on the left tell the designer when to stop creating new aggregations, while the graph on the right provides estimated storage versus performance gain Press the Continue button to create an aggregation design before pressing the Next button
There are no strict rules, but some general guidelines may help:
■ Unless storage is the primary constraint, target an initial performance gain of 10–20 per-cent On the most complex cubes this will be difficult to obtain with a reasonable number
of aggregations (and associated processing time) On simpler cubes more aggregations can
be afforded, but they are already so fast that the additional aggregations don’t buy much
■ Keep the total number of aggregations under 200 (aggregation count is shown at the bottom, just above the progress bar)
■ Look for an obvious knee (flattening of the curve) in the storage/performance graph and stop there
■ Completing the Wizard: Give the new aggregation design a name Choose either to save the design or to save and process it
Best Practice
The best aggregations are usage-based: Collect usage history in the query log and use it to
opti-mize each partition’s aggregation design periodically Query logging must be enabled in Analysis
Server’s Server properties, in the Log\QueryLog section: Set CreateQueryLogTable to true, define a
QueryLogConnectionString, and specify a QueryLogTableName
Aggregations tab
The toolbar of this tab can launch the wizard as described earlier and the usage-based optimization
wiz-ard as well The pane itself toggles between standwiz-ard and advanced views Standwiz-ard view lists all the
measure groups and summarizes which aggregation designs are assigned to which partitions Right-click
a design’s name to assign partitions to it
Trang 10The advanced view allows detailed exploration and manual modification of an aggregation design.
Choose the measure group and design name in the header, and a table of dimensions vs individual
aggregations appears Any check that appears in the table indicates that the aggregation (such as A5)
includes summaries by the indicated dimension attributes (such as Product Line and Quarter) Manual
updates to a design are generally not effective because usage-based optimization tends to be more
accurate than individual judgment, but cases do arise in which problem queries can be addressed
by a well-placed aggregation Use the toolbar to copy an existing design to a new name, and then
modify as needed New columns (aggregations) can be copied/added to the table using the toolbar
as well
Perspectives
A perspective is a view of a cube that hides items and functionality not relevant to a specific purpose
Perspectives appear as additional cubes to the end user, so each group within the company can have its
own ‘‘cube,’’ each just a targeted view of the same data
Add a perspective by either right-clicking or using the toolbar, and a new column will appear Overwrite
the default name at the top of the column with a meaningful handle, and then uncheck the items not
relevant to the perspective A default measure can be chosen for the perspective as well — look for the
DefaultMeasureobject type in the second row of the grid
Data Storage
The data storage strategy chosen for a cube and its components determines not only how the cube will
be stored, but also how it can be processed Storage settings can be set at three different levels, with
parent settings determining defaults for the children:
■ Cube: Begin by establishing storage settings at the cube level to set defaults for the entire
cube (dimensions, measure groups, and partitions) Access the Cube Storage Settings dialog by
choosing a cube in the Cube Designer and then clicking the ellipses on the Proactive Caching
property of the cube
■ Measure Group: Used in the unlikely case that storage settings for a particular measure group
differ from cube defaults Access the Measure Group Storage Settings dialog by either clicking
the ellipses on the measure group’s Proactive Caching property in the Cube Designer or by
choosing the Storage Settings link in partition view without highlighting a specific partition
■ Object level (specific partition or dimension): Sets the storage options for a single object
Access the Dimension Storage Settings dialog by clicking the ellipses on the dimension’s
Proac-tive Caching property in the Dimension Designer Access the Partition Storage Settings dialog
by selecting a partition in the partition view and clicking the Storage Settings link
Each of the storage settings dialogs are essentially the same, differing only in the scope of the setting’s
effect The main page of the dialog contains a slider that selects preconfigured option settings — from
the most real-time (far left) to the least real-time (far right) Each ‘‘stop’’ on the slider displays a
sum-mary of the options available Alternately, position the slider and click the Options button to examine
the options associated with a particular position Beyond these few presets, the Storage Options dialog
enables a wide range of behaviors