Data sources Define a data source for each distinct database or other source of data needed for the Analysis Services database.. A data source can be defined on any data for which an OLE
Trang 1Data can also be left in the relational database, or ROLAP store, which generally results in the fastest
processing times at the expense of query times Without aggregations, queries against a ROLAP store
cause the equivalent SQL to be executed as needed Aggregations can be pre-calculated for ROLAP,
but doing so requires processing all the detailed data, so MOLAP is the preferred option A relational
database in this context is not limited to SQL Server, but may be any data source for which an OLE DB
provider exists
A compromise between the speed of MOLAP storage and the need for preprocessing, called proactive
caching, serves queries out of MOLAP storage when possible, but queries the relational database to
retrieve the latest data not yet processed into the MOLAP store
Finally, the Analysis Services server uses XML for Analysis (XMLA) as its sole protocol, which is why
you see XMLA inside the arrow in Figure 71-1
Client
Clients communicate with Analysis Services, like any other Web Service, via the Simple Object Access
Protocol (SOAP) Client applications can hide XMLA and SOAP details by using the provided data access
interfaces to access Analysis Services:
■ All NET languages can use ADOMD.NET
■ Win32 applications (such as C++) can use the OLE DB for OLAP driver
■ Other COM-based applications (such as VB6, VBA, scripting) can use ADOMD
While the server will only speak XMLA via TCP/IP, clients have the option of using the HTTP protocol
for their communications, if an appropriately configured IIS server is available to translate
In addition to custom applications, Analysis Services can be accessed by several provided tools,
includ-ing the followinclud-ing:
■ Business Intelligence Development Studio, for defining database structure
■ SQL Server Management Studio, for managing and querying the server
■ Reporting Services, which can base report definitions on Analysis Services data
■ Excel features and add-ins, for querying and analyzing data
A wide variety of third-party tools are also available to exploit the features of Analysis Services
Building a Database
An Analysis Services database is built by identifying the data to include in the database, specifying the
relationships between that data, defining dimension structures on that data, and finally building one or
more cubes to combine the dimensions and measures This section describes the overall process with an
emphasis on gathering the data needed to define the database Subsequent sections describe the many
facets of dimensions and cubes
Trang 2Business Intelligence Development Studio
The process of building an Analysis Services database begins by opening a new Analysis Services project
in the Business Intelligence Development Studio Each project corresponds to a database that will be
cre-ated on the target server when the project is deployed
Best Practice
Along with opening an Analysis Services project, it is also possible to directly open an existing database in
Business Intelligence Development Studio While this is a useful feature for examining the configuration
of a running server, changes should be made in a project, deployed first to a development server, and
deployed to production only after testing Keep the project and related files in source control
Be sure to set the target server before attempting to deploy your new database Right-click on the project
in the Solution Explorer and choose Properties Set the target server in the deployment property page for
the configuration(s) of interest (for example, development vs production) Taking care with this setup
when you create a project will prevent inadvertently creating a database on the wrong server
Data sources
Define a data source for each distinct database or other source of data needed for the Analysis Services
database Each data source encapsulates the connection string, authentication, and properties for reading
a particular set of data A data source can be defined on any data for which an OLE DB provider exists,
enabling Analysis Services to use many types of data beyond the traditional relational sources
Start the New Data Source Wizard by right-clicking the Data Sources folder in the Solutions Explorer
and selecting the New option After you view the optional welcome screen, the ‘‘Select how to define a
connection’’ screen appears and presents a list of connections Select the appropriate connection if it exists
If the appropriate connection does not exist, bring up the connection manager by clicking the New
but-ton and add it Within the connection manager, choose an appropriate provider, giving preference to
native OLE DB providers for best performance Then enter the server name, authentication information,
database name, and any other properties required by the chosen provider Review entries on the All tab
and test the connection before clicking OK to complete the connection creation
Work through the remaining wizard screens, choosing the appropriate login (impersonation) information
for the target environment and finally the name of the data source The choice of impersonation method
depends on how access is granted in your environment Any method that provides access to the
neces-sary tables is sufficient for development
■ Use a specific Windows user name and password allows the entry of the credential to be
used when connecting to the relational database This option is best when the developer and
target server would not otherwise have access to the necessary data
Trang 3■ Use the service account will use the account that the Analysis Server service is logged in under to connect to the relational database This is the simplest option provided that the login specified for the service has been granted access to the relational database
■ Use the credentials of the current user uses the current developer’s login to read the relational database This can be a good choice for development, but it won’t work when the database is deployed to a server because there is no ‘‘current user.’’
■ Inherit uses the Analysis Services database impersonation method, which defaults to using the service account, but it can be changed in database properties
When managing multiple projects in a single solution, basing a data source in one project on
informa-tion in another project can be useful For those cases, instead of choosing a connecinforma-tion at the ‘‘Select
how to define a connection’’ window, select the option to ‘‘Create a data source based on another
object.’’ This leads the wizard through the ‘‘Data sources from existing objects’’ page This page offers
two alternatives:
■ ‘‘Create a data source based on an existing data source in your solution’’ minimizes the number
of places in which connection information must be edited when it changes
■ ‘‘Create a data source based on an Analysis Services project’’ enables two projects to share data This functionality is similar to using the Analysis Services OLE DB provider to access
an existing database, but in this case the databases can be developed simultaneously without deployment complications
Data source view
Whereas a data source describes where to look for tables of data, the data source view specifies which
available tables to use and how they relate to each other The data source view also associates metadata,
such as friendly names and calculations, with those tables and columns
Creating the data source view
The following steps create a data source view:
1 Add needed tables and named queries to a data source view.
2 Establish logical primary keys for tables without a primary key.
3 Establish relationships between related tables.
4 Annotate tables/columns with friendly names and calculations.
Begin by creating the data source view via the wizard: Right-click on the Data Source Views folder and
select the New option There are several pages in the wizard:
■ Select a Data Source: Choose one of the data sources to be included in this data source view If more than one data source is to be included in the data source view, then the first data source must be a SQL Server data source Pressing the Advanced button to limit the schemas retrieved can be helpful if there are many tables in the source database
■ Name Matching: This page appears only when no foreign keys exist in the source database, providing the option of defining relationships based on a selection of common naming con-ventions Matching can also be enabled via theNameMatchingCriteriaproperty once the data source view has been created, identifying matches as additional tables added to an existing view
Trang 4■ Select Tables and Views: Move tables to be included from the left pane (available objects)
to the right (included objects) pane To narrow the list of available objects, enter any part of
a table name in the Filter box and press the Filter button To add objects related to included
objects, select one or more included objects and press the Add Related Tables button This
same dialog is available as the Add/Remove Tables dialog after the data source view has been
created
■ Completing the Wizard: Specify a name for the data source view
Once the data source view has been created, more tables can be added by right-clicking in the diagram
and choosing Add/Remove Tables Use this method to include tables from other data sources as well
Similar to a SQL view, named queries can be defined, which behave as if they were tables Either
right-click on the diagram and choose New Named Query or right-click on a table and choose Replace
Table/with New Named Query to bring up a Query Designer to define the contents of the named query
If the resulting named query will be similar to an existing table, then it is preferable to replace that table
because the Query Designer will default to a query that is equivalent to the replaced table Using named
queries avoids the need to define views in the underlying data sources and allows all metadata to be
centralized in a single model
As tables are added to the data source view, primary keys and unique indexes in the underlying data
source are imported as primary keys in the model Foreign keys and selected name matches (see Name
Matching presented earlier in the section ‘‘Creating the data source view’’) are automatically imported
as relationships between tables For cases in which primary keys or relationships are not imported, they
must be defined manually
For tables without primary keys, select one or more columns that define the primary key in a given
table, right-click and select Set Logical Primary Key Once primary keys are in place, any tables without
appropriate relationships can be related by dragging and dropping the related columns between tables If
the new relationship is valid, the model will show the new relationship without additional prompting
If errors occur, the Edit Relationship dialog will appear Resolving the error may be as simple as pressing
Reverse to correct the direction of the relationship, as shown in Figure 71-2, or it may take additional
effort depending on the type of error
A common issue when working with multiple data sources is different data types For example, a key
in one database may be a 16-bit integer, while another database may store the same information in
a 32-bit integer This situation can be addressed by using a named query to cast the 16-bit integer as its
32-bit equivalent
The Edit Relationship dialog can also be accessed by double-clicking an existing relationship, by
right-clicking the diagram, and from toolbar and menu selections Be sure to define all relationships, including
relationships between different columns of the fact table and the same dimension table (for example,
OrderDate and ShipDate both relate to the Time dimension table), as this enables role-playing dimension
functionality when a cube is created
Managing the data source view
As the number of tables participating in the data source view grows, it can become difficult to view
all the tables and relationships at once An excellent way to manage the complexity is to divide the
tables into a number of diagrams The Diagram Organizer pane in the upper-left corner of the Data
Source View page is initially populated with a single <All Tables> diagram Right-click in the Diagram
Trang 5Organizer pane and choose the New Diagram option to define a new diagram, and then drag and drop
tables from the lower-left corner Tables pane to add tables to the new diagram Alternately, right-click
the diagram and use the Show Tables dialog to include tables currently in the <All Tables> diagram.
However, don’t confuse the Show Tables dialog, which determines the data source view in which tables
appear in a given diagram, with the Add/Remove Tables dialog, which determines which tables are in
the data source view as a whole
FIGURE 71-2
The Edit Relationship dialog
Other tools for managing data source views include the following:
■ Tables pane: All the tables in a data source view are listed in the Tables pane Click on any table, and it will be shown and highlighted in the current diagram (provided the table exists
in the current diagram) You can also drag tables from the Tables pane onto diagrams as an alternative to the Show Tables dialog
■ Find Table: Invoked from toolbar or menu, this dialog lists only tables in the current dia-gram and allows filtering to speed the search process Once chosen, the diadia-gram shows and highlights the selected table
■ Locator: The locator tool enables quick scrolling over the current diagram Find it at the lower-right corner at the intersection of the scroll bars Click and drag the locator to move around quickly within the diagram
■ Switch layout: Right-click the diagram to toggle between rectangular and diagonal layout
The rectangular layout is table oriented and good for understanding many relationships
at once The diagonal layout is column oriented and thus good for inspecting relationship details
Trang 6■ Explore data: Looking at a sample of the data in a table can be very useful when
build-ing a data source view Right-click any table to open the Explore page, which presents
four tabbed views: The table view provides a direct examination of the sample data, while
the pivot table and pivot chart views enable exploration of patterns in the data The
chart view shows a series of charts, breaking down the sample data by category based on
columns in the sample data The columns selected for analysis are adjustable using the
drop-down at the top of the page, as are the basic charting options The size and type of
sample is adjustable from the Sampling Options button on the page’s toolbar After
adjust-ing sampladjust-ing characteristics, press the Resample button to refresh the currently displayed
sample
The data source view can be thought of as a cache of underlying schemas that enables a responsive
modeling environment, and like all cache it can become outdated When the underlying schema
changes, right-click on the diagram and choose Refresh to reflect the latest version of the schema
in the data source view The refresh function, also available from the toolbar and menu, opens
the Refresh Data Source View dialog, which lists all the changes affecting the data source view
Before accepting the changes, scan the list for deleted tables, canceling changes if any deleted tables
are found Inspect the underlying schema for renamed and restructured tables to determine how
equivalent data can be retrieved, and resolve any conflicts before attempting the refresh again For
example, right-click on a renamed table and choose Replace Table/with Other Table to select the
new table This approach prevents losing relationship and other context information during the
refresh
Refining the data source view
One of the strengths of the UDM is that queries against that model do not require an understanding
of the underlying table structures and relationships However, even the table name itself often conveys
important semantics to the user For example, referencing a column asaccounting.hr.staff
.employee.hourly_rateindicates that this hourly rate is on the accounting server,hrdatabase, staff
schema, andemployeetable, which suggests this hourly rate column contains an employee pay rate
and not the hourly charge for equipment rental Because the source of this data is hidden by the unified
dimensional model, these semantics will be lost
The data source view enables the definition of friendly names for every table and column It also
includes a description property for every table, column, and relationship Friendly names and
descriptions enable the preservation of existing semantics and the addition of others as appropriate
Best Practice
Make the data source view the place where metadata lives If a column needs to be renamed to give it
context at query time, give it a friendly name in the data source view, rather than rename a measure
or dimension attribute — the two names are displayed side by side in the data source view and help future
modelers understand how data is used Use description properties for non-obvious notes, capturing the
results of research required in building and modifying the model
Trang 7Add a friendly name or description to any table or column by selecting the item and updating the
corre-sponding properties in the Properties pane Similarly, add a description to any relationship by selecting
the relationship and updating the Properties pane, or by entering the description from the Edit
Relation-ship dialog The display of friendly names can be toggled by right-clicking the diagram
Best Practice
Applications and reports based on Analysis Services data are likely a large change for the target
organization Assign friendly names that correspond to the names commonly used throughout the
organization to help speed adoption and understanding
Many simple calculations are readily included in the data source view as well As a rule of thumb, place
calculations that depend on a single row of a single table or named query in the data source view, but
implement multi-row or multi-table calculations in MDX Add calculations to named queries by coding
them as part of the query Add calculations to tables by right-clicking the table and choosing New Named
Calculation Enter a name and any expression the underlying data provider can interpret For example, if
SQL Server’s relational database is your data source, basic math, null replacement, and data conversion are
all available for creating named calculations (think of any expression that can be written in T-SQL)
Creating a cube
The data source view forms the basis for creating the cubes, which in turn present data to database
users Running the Cube Wizard generally provides a good first draft of a cube Begin by right-clicking
the Cubes folder and selecting New, and then work through these pages:
■ Select Build Method: Choose ‘‘Use existing tables.’’ The ‘‘generate tables in the data source’’
option is outlined previously in the ‘‘Analysis Services Quick Start’’ section The option ‘‘Create
an empty cube’’ does exactly that, essentially bypassing the wizard
■ Select Measure Group Tables: Choose the appropriate data source view from the drop-down, and then indicate which tables are to be used as fact tables — meaning they will contain measures Pressing the Suggest button will make an educated guess about which tables
to check, but the guesses are not accurate in all cases
■ Select Measures: The wizard presents a list of numeric columns that may be measures from the measure group tables Check/uncheck columns as appropriate; measures can also be added/removed/adjusted at the conclusion of the wizard Both Measure Groups and Measures can be renamed from this page
■ Select Existing Dimensions: If the current project already has dimensions defined, then this page will be displayed to enable those dimensions to be included in the new cube
Check/uncheck dimensions as appropriate for the created cube
■ Select New Dimensions: The wizard presents a list of dimensions and the tables that will be used to construct those dimensions Deselect any dimensions that are not desired or any tables that should not be included in that dimension Dimensions, but not tables, can be renamed from this page
Trang 8■ Completing the Wizard: Enter a name for the new cube and optionally review the measures
and dimensions that will be created
Upon completion of the wizard, a new cube and associated dimensions will be created
Dimensions
Recall from the discussion of star schema that dimensions are useful categorizations used to summarize
the data of interest, the ‘‘group by’’ attributes that would be used in a SQL query Dimensions created by
a wizard generally prove to be good first drafts, but they need refinement before deploying a database to
production
Background on Business Intelligence and Data Warehousing concepts is presented in
Chapter 70, ‘‘BI Design.’’
Careful study of the capabilities of a dimension reveal a complex topic, but fortunately the bulk of
the work involves relatively simple setup This section deals first with that core functionality and then
expands into more complex topics in ‘‘Beyond Basic Dimensions.’’
Dimension Designer
Open any dimension from the Solution Explorer to use the Dimension Designer, shown in Figure 71-3
This designer presents information in four tabbed views:
■ Dimension Structure: Presents the primary design surface for defining the dimension
Along with the ever-present Solution Explorer and Properties panes, three panes present the
dimension’s structure:
■ Data Source View (right): Shows the portion of the data source view on which the
dimension is built
■ Attributes (left): Lists each attribute included in the dimension
■ Hierarchies (center): Provides a space to organize attributes into common drill-down
paths
■ Attribute Relationships: Displays a visual designer to detail how dimension attributes relate
■ Translations: Provides a place to define alternative language versions of both object captions
and the data itself
■ Browser: Displays the dimension’s data as last deployed to the target analysis server
Unlike data sources and data source views, cubes and dimensions must be deployed before their
behavior (e.g., browsing data) can be observed The process of deploying a dimension consists of
two parts First, during the build phase, the dimension definition (or changes to the definition as
appropriate) is sent to the target analysis server Examine the progress of the build process in the
Output pane Second, during the process phase, the Analysis Services server queries underlying data
and populates dimension data Progress of this phase is displayed in the Deployment Progress pane,
usually positioned as a tab of the Properties pane The Business Intelligence Development Studio
attempts to build or process only the changed portions of the project to minimize the time required for
deployment
Trang 9FIGURE 71-3
Dimension Designer with AdventureWorks Customer dimension
New in 2008
Best Practice warnings are now displayed in the Dimension Designer, appearing as either caution icons or
blue underlines These warnings are informational and are not applicable to every situation Following
the procedures outlined below, such as establishing hierarchies and attribute relationships, will eliminate
many of these warnings See the ‘‘Best Practice Warnings’’ topic later in this chapter for additional details
Attributes
Attributes are the items that are available for viewing within the dimension For example, a time
dimen-sion might expose year, quarter, month, and day attributes Dimendimen-sions built by the Cube Wizard only
have the key attribute defined Other attributes must be manually added by dragging columns from the
Data Source View pane to the Attributes pane Within the attribute list, the key icon denotes the key
Trang 10attribute (Usage property = Key), which corresponds to the primary key in the source data used to
relate to the fact table There must be exactly one key attribute for each dimension
Attribute source columns and ordering
Columns from the data source view are assigned to an attribute’sKeyColumnsandNameColumn
prop-erties to drive which data is retrieved in populating the attribute During processing, Analysis Services
will include both key and name columns in theSELECT DISTINCTit performs against the underlying
data to populate the attribute TheKeyColumnsassignment determines which items will be included as
members in the attribute The optionalNameColumnassignment can give a display value to the key(s)
when the key itself is not adequately descriptive
For example, a product dimension might assign a ProductID to theKeyColumnand the ProductName
to theNameColumn For the majority of attributes, the single key column assigned when the attribute is
initially created will suffice For example, anAddressattribute in a customer dimension is likely to be
a simple string in the source table with no associated IDs or codes; the default of assigning that single
Addresscolumn as theKeyColumnsvalue with noNameColumnwill suffice
Some scenarios beyond the simple case include the following:
■ Attributes with both an ID/code and a name: The approach for this case, which is very
common for dimension table primary keys (key attributes), depends on whether the ID or
code is commonly understood by those who will query the dimension If the code is common,
then leave itsNameColumnblank to avoid hiding the code Instead, model theID/Codeand
Namecolumns as separate attributes If the ID or code is an internal application or warehouse
value, then hide the ID by assigning both theKeyColumnsandNameColumnproperties on a
single attribute
■ ID/Code exists without a corresponding name: If the ID or code can take on only a few
values (such as Yes or No), then derive a column to assign as theNameColumnby adding a
named calculation in the data source view If the ID or code has many or unpredictable values,
then consider adding a new snowflaked dimension table to provide a name
■ Non-Unique keys: It is important that theKeyColumnsassigned uniquely identify the
members of a dimension For example, a time dimension table might identify months with
numbers 1 through 12, which are not unique keys from one year to the next In this case,
it makes sense to include both year and month columns to provide a good key value Once
multiple keys are used, aNameColumnassignment is required, so add a named calculation to
the data source view to synthesize a readable name (e.g., Nov 2008) from existing month and
year columns
In the preceding non-unique keys scenario, it might be tempting to use the named calculation results
(e.g., Jan 2009, Feb 2009) as the attribute’s key column were it not for ordering issues Numeric year
and month data is required to keep the attribute’s members in calendar, rather than alphabetic, order
The attribute’sOrderByproperty enables members to be sorted by either key or name Alternately,
theOrderByoptionsAttributeKeyandAttributeNameenable sorting of the current attribute’s
members based on the key or name of another attribute, providing that the other attribute has been
defined as a member property of the current attribute Member properties are described in detail in the
next section