Hướng dẫn học Microsoft SQL Server 2008 part 152 ppsx

Data sources Define a data source for each distinct database or other source of data needed for the Analysis Services database.. A data source can be defined on any data for which an OLE

Trang 1

Data can also be left in the relational database, or ROLAP store, which generally results in the fastest

processing times at the expense of query times Without aggregations, queries against a ROLAP store

cause the equivalent SQL to be executed as needed Aggregations can be pre-calculated for ROLAP,

but doing so requires processing all the detailed data, so MOLAP is the preferred option A relational

database in this context is not limited to SQL Server, but may be any data source for which an OLE DB

provider exists

A compromise between the speed of MOLAP storage and the need for preprocessing, called proactive

caching, serves queries out of MOLAP storage when possible, but queries the relational database to

retrieve the latest data not yet processed into the MOLAP store

Finally, the Analysis Services server uses XML for Analysis (XMLA) as its sole protocol, which is why

you see XMLA inside the arrow in Figure 71-1

Client

Clients communicate with Analysis Services, like any other Web Service, via the Simple Object Access

Protocol (SOAP) Client applications can hide XMLA and SOAP details by using the provided data access

interfaces to access Analysis Services:

■ All NET languages can use ADOMD.NET

■ Win32 applications (such as C++) can use the OLE DB for OLAP driver

■ Other COM-based applications (such as VB6, VBA, scripting) can use ADOMD

While the server will only speak XMLA via TCP/IP, clients have the option of using the HTTP protocol

for their communications, if an appropriately configured IIS server is available to translate

In addition to custom applications, Analysis Services can be accessed by several provided tools,

includ-ing the followinclud-ing:

■ Business Intelligence Development Studio, for defining database structure

■ SQL Server Management Studio, for managing and querying the server

■ Reporting Services, which can base report definitions on Analysis Services data

■ Excel features and add-ins, for querying and analyzing data

A wide variety of third-party tools are also available to exploit the features of Analysis Services

Building a Database

An Analysis Services database is built by identifying the data to include in the database, specifying the

relationships between that data, defining dimension structures on that data, and finally building one or

more cubes to combine the dimensions and measures This section describes the overall process with an

emphasis on gathering the data needed to define the database Subsequent sections describe the many

facets of dimensions and cubes

Trang 2

Business Intelligence Development Studio

The process of building an Analysis Services database begins by opening a new Analysis Services project

in the Business Intelligence Development Studio Each project corresponds to a database that will be

cre-ated on the target server when the project is deployed

Best Practice

Along with opening an Analysis Services project, it is also possible to directly open an existing database in

Business Intelligence Development Studio While this is a useful feature for examining the configuration

of a running server, changes should be made in a project, deployed first to a development server, and

deployed to production only after testing Keep the project and related files in source control

Be sure to set the target server before attempting to deploy your new database Right-click on the project

in the Solution Explorer and choose Properties Set the target server in the deployment property page for

the configuration(s) of interest (for example, development vs production) Taking care with this setup

when you create a project will prevent inadvertently creating a database on the wrong server

Data sources

Define a data source for each distinct database or other source of data needed for the Analysis Services

database Each data source encapsulates the connection string, authentication, and properties for reading

a particular set of data A data source can be defined on any data for which an OLE DB provider exists,

enabling Analysis Services to use many types of data beyond the traditional relational sources

Start the New Data Source Wizard by right-clicking the Data Sources folder in the Solutions Explorer

and selecting the New option After you view the optional welcome screen, the ‘‘Select how to define a

connection’’ screen appears and presents a list of connections Select the appropriate connection if it exists

If the appropriate connection does not exist, bring up the connection manager by clicking the New

but-ton and add it Within the connection manager, choose an appropriate provider, giving preference to

native OLE DB providers for best performance Then enter the server name, authentication information,

database name, and any other properties required by the chosen provider Review entries on the All tab

and test the connection before clicking OK to complete the connection creation

Work through the remaining wizard screens, choosing the appropriate login (impersonation) information

for the target environment and finally the name of the data source The choice of impersonation method

depends on how access is granted in your environment Any method that provides access to the

neces-sary tables is sufficient for development

■ Use a specific Windows user name and password allows the entry of the credential to be

used when connecting to the relational database This option is best when the developer and

target server would not otherwise have access to the necessary data

Trang 3

■ Use the service account will use the account that the Analysis Server service is logged in under to connect to the relational database This is the simplest option provided that the login specified for the service has been granted access to the relational database

■ Use the credentials of the current user uses the current developer’s login to read the relational database This can be a good choice for development, but it won’t work when the database is deployed to a server because there is no ‘‘current user.’’

■ Inherit uses the Analysis Services database impersonation method, which defaults to using the service account, but it can be changed in database properties

When managing multiple projects in a single solution, basing a data source in one project on

informa-tion in another project can be useful For those cases, instead of choosing a connecinforma-tion at the ‘‘Select

how to define a connection’’ window, select the option to ‘‘Create a data source based on another

object.’’ This leads the wizard through the ‘‘Data sources from existing objects’’ page This page offers

two alternatives:

■ ‘‘Create a data source based on an existing data source in your solution’’ minimizes the number

of places in which connection information must be edited when it changes

■ ‘‘Create a data source based on an Analysis Services project’’ enables two projects to share data This functionality is similar to using the Analysis Services OLE DB provider to access

an existing database, but in this case the databases can be developed simultaneously without deployment complications

Data source view

Whereas a data source describes where to look for tables of data, the data source view specifies which

available tables to use and how they relate to each other The data source view also associates metadata,

such as friendly names and calculations, with those tables and columns

Creating the data source view

The following steps create a data source view:

1 Add needed tables and named queries to a data source view.

2 Establish logical primary keys for tables without a primary key.

3 Establish relationships between related tables.

4 Annotate tables/columns with friendly names and calculations.

Begin by creating the data source view via the wizard: Right-click on the Data Source Views folder and

select the New option There are several pages in the wizard:

■ Select a Data Source: Choose one of the data sources to be included in this data source view If more than one data source is to be included in the data source view, then the first data source must be a SQL Server data source Pressing the Advanced button to limit the schemas retrieved can be helpful if there are many tables in the source database

■ Name Matching: This page appears only when no foreign keys exist in the source database, providing the option of defining relationships based on a selection of common naming con-ventions Matching can also be enabled via theNameMatchingCriteriaproperty once the data source view has been created, identifying matches as additional tables added to an existing view

Trang 4

■ Select Tables and Views: Move tables to be included from the left pane (available objects)

to the right (included objects) pane To narrow the list of available objects, enter any part of

a table name in the Filter box and press the Filter button To add objects related to included

objects, select one or more included objects and press the Add Related Tables button This

same dialog is available as the Add/Remove Tables dialog after the data source view has been

created

■ Completing the Wizard: Specify a name for the data source view

Once the data source view has been created, more tables can be added by right-clicking in the diagram

and choosing Add/Remove Tables Use this method to include tables from other data sources as well

Similar to a SQL view, named queries can be defined, which behave as if they were tables Either

right-click on the diagram and choose New Named Query or right-click on a table and choose Replace

Table/with New Named Query to bring up a Query Designer to define the contents of the named query

If the resulting named query will be similar to an existing table, then it is preferable to replace that table

because the Query Designer will default to a query that is equivalent to the replaced table Using named

queries avoids the need to define views in the underlying data sources and allows all metadata to be

centralized in a single model

As tables are added to the data source view, primary keys and unique indexes in the underlying data

source are imported as primary keys in the model Foreign keys and selected name matches (see Name

Matching presented earlier in the section ‘‘Creating the data source view’’) are automatically imported

as relationships between tables For cases in which primary keys or relationships are not imported, they

must be defined manually

For tables without primary keys, select one or more columns that define the primary key in a given

table, right-click and select Set Logical Primary Key Once primary keys are in place, any tables without

appropriate relationships can be related by dragging and dropping the related columns between tables If

the new relationship is valid, the model will show the new relationship without additional prompting

If errors occur, the Edit Relationship dialog will appear Resolving the error may be as simple as pressing

Reverse to correct the direction of the relationship, as shown in Figure 71-2, or it may take additional

effort depending on the type of error

A common issue when working with multiple data sources is different data types For example, a key

in one database may be a 16-bit integer, while another database may store the same information in

a 32-bit integer This situation can be addressed by using a named query to cast the 16-bit integer as its

32-bit equivalent

The Edit Relationship dialog can also be accessed by double-clicking an existing relationship, by

right-clicking the diagram, and from toolbar and menu selections Be sure to define all relationships, including

relationships between different columns of the fact table and the same dimension table (for example,

OrderDate and ShipDate both relate to the Time dimension table), as this enables role-playing dimension

functionality when a cube is created

Managing the data source view

As the number of tables participating in the data source view grows, it can become difficult to view

all the tables and relationships at once An excellent way to manage the complexity is to divide the

tables into a number of diagrams The Diagram Organizer pane in the upper-left corner of the Data

Source View page is initially populated with a single <All Tables> diagram Right-click in the Diagram

Trang 5

Organizer pane and choose the New Diagram option to define a new diagram, and then drag and drop

tables from the lower-left corner Tables pane to add tables to the new diagram Alternately, right-click

the diagram and use the Show Tables dialog to include tables currently in the <All Tables> diagram.

However, don’t confuse the Show Tables dialog, which determines the data source view in which tables

appear in a given diagram, with the Add/Remove Tables dialog, which determines which tables are in

the data source view as a whole

FIGURE 71-2

The Edit Relationship dialog

Other tools for managing data source views include the following:

■ Tables pane: All the tables in a data source view are listed in the Tables pane Click on any table, and it will be shown and highlighted in the current diagram (provided the table exists

in the current diagram) You can also drag tables from the Tables pane onto diagrams as an alternative to the Show Tables dialog

■ Find Table: Invoked from toolbar or menu, this dialog lists only tables in the current dia-gram and allows filtering to speed the search process Once chosen, the diadia-gram shows and highlights the selected table

■ Locator: The locator tool enables quick scrolling over the current diagram Find it at the lower-right corner at the intersection of the scroll bars Click and drag the locator to move around quickly within the diagram

■ Switch layout: Right-click the diagram to toggle between rectangular and diagonal layout

The rectangular layout is table oriented and good for understanding many relationships

at once The diagonal layout is column oriented and thus good for inspecting relationship details

Trang 6

■ Explore data: Looking at a sample of the data in a table can be very useful when

build-ing a data source view Right-click any table to open the Explore page, which presents

four tabbed views: The table view provides a direct examination of the sample data, while

the pivot table and pivot chart views enable exploration of patterns in the data The

chart view shows a series of charts, breaking down the sample data by category based on

columns in the sample data The columns selected for analysis are adjustable using the

drop-down at the top of the page, as are the basic charting options The size and type of

sample is adjustable from the Sampling Options button on the page’s toolbar After

adjust-ing sampladjust-ing characteristics, press the Resample button to refresh the currently displayed

sample

The data source view can be thought of as a cache of underlying schemas that enables a responsive

modeling environment, and like all cache it can become outdated When the underlying schema

changes, right-click on the diagram and choose Refresh to reflect the latest version of the schema

in the data source view The refresh function, also available from the toolbar and menu, opens

the Refresh Data Source View dialog, which lists all the changes affecting the data source view

Before accepting the changes, scan the list for deleted tables, canceling changes if any deleted tables

are found Inspect the underlying schema for renamed and restructured tables to determine how

equivalent data can be retrieved, and resolve any conflicts before attempting the refresh again For

example, right-click on a renamed table and choose Replace Table/with Other Table to select the

new table This approach prevents losing relationship and other context information during the

refresh

Refining the data source view

One of the strengths of the UDM is that queries against that model do not require an understanding

of the underlying table structures and relationships However, even the table name itself often conveys

important semantics to the user For example, referencing a column asaccounting.hr.staff

.employee.hourly_rateindicates that this hourly rate is on the accounting server,hrdatabase, staff

schema, andemployeetable, which suggests this hourly rate column contains an employee pay rate

and not the hourly charge for equipment rental Because the source of this data is hidden by the unified

dimensional model, these semantics will be lost

The data source view enables the definition of friendly names for every table and column It also

includes a description property for every table, column, and relationship Friendly names and

descriptions enable the preservation of existing semantics and the addition of others as appropriate

Best Practice

Make the data source view the place where metadata lives If a column needs to be renamed to give it

context at query time, give it a friendly name in the data source view, rather than rename a measure

or dimension attribute — the two names are displayed side by side in the data source view and help future

modelers understand how data is used Use description properties for non-obvious notes, capturing the

results of research required in building and modifying the model

Trang 7

Add a friendly name or description to any table or column by selecting the item and updating the

corre-sponding properties in the Properties pane Similarly, add a description to any relationship by selecting

the relationship and updating the Properties pane, or by entering the description from the Edit

Relation-ship dialog The display of friendly names can be toggled by right-clicking the diagram

Best Practice

Applications and reports based on Analysis Services data are likely a large change for the target

organization Assign friendly names that correspond to the names commonly used throughout the

organization to help speed adoption and understanding

Many simple calculations are readily included in the data source view as well As a rule of thumb, place

calculations that depend on a single row of a single table or named query in the data source view, but

implement multi-row or multi-table calculations in MDX Add calculations to named queries by coding

them as part of the query Add calculations to tables by right-clicking the table and choosing New Named

Calculation Enter a name and any expression the underlying data provider can interpret For example, if

SQL Server’s relational database is your data source, basic math, null replacement, and data conversion are

all available for creating named calculations (think of any expression that can be written in T-SQL)

Creating a cube

The data source view forms the basis for creating the cubes, which in turn present data to database

users Running the Cube Wizard generally provides a good first draft of a cube Begin by right-clicking

the Cubes folder and selecting New, and then work through these pages:

■ Select Build Method: Choose ‘‘Use existing tables.’’ The ‘‘generate tables in the data source’’

option is outlined previously in the ‘‘Analysis Services Quick Start’’ section The option ‘‘Create

an empty cube’’ does exactly that, essentially bypassing the wizard

■ Select Measure Group Tables: Choose the appropriate data source view from the drop-down, and then indicate which tables are to be used as fact tables — meaning they will contain measures Pressing the Suggest button will make an educated guess about which tables

to check, but the guesses are not accurate in all cases

■ Select Measures: The wizard presents a list of numeric columns that may be measures from the measure group tables Check/uncheck columns as appropriate; measures can also be added/removed/adjusted at the conclusion of the wizard Both Measure Groups and Measures can be renamed from this page

■ Select Existing Dimensions: If the current project already has dimensions defined, then this page will be displayed to enable those dimensions to be included in the new cube

Check/uncheck dimensions as appropriate for the created cube

■ Select New Dimensions: The wizard presents a list of dimensions and the tables that will be used to construct those dimensions Deselect any dimensions that are not desired or any tables that should not be included in that dimension Dimensions, but not tables, can be renamed from this page

Trang 8

■ Completing the Wizard: Enter a name for the new cube and optionally review the measures

and dimensions that will be created

Upon completion of the wizard, a new cube and associated dimensions will be created

Dimensions

Recall from the discussion of star schema that dimensions are useful categorizations used to summarize

the data of interest, the ‘‘group by’’ attributes that would be used in a SQL query Dimensions created by

a wizard generally prove to be good first drafts, but they need refinement before deploying a database to

production

Background on Business Intelligence and Data Warehousing concepts is presented in

Chapter 70, ‘‘BI Design.’’

Careful study of the capabilities of a dimension reveal a complex topic, but fortunately the bulk of

the work involves relatively simple setup This section deals first with that core functionality and then

expands into more complex topics in ‘‘Beyond Basic Dimensions.’’

Dimension Designer

Open any dimension from the Solution Explorer to use the Dimension Designer, shown in Figure 71-3

This designer presents information in four tabbed views:

■ Dimension Structure: Presents the primary design surface for defining the dimension

Along with the ever-present Solution Explorer and Properties panes, three panes present the

dimension’s structure:

■ Data Source View (right): Shows the portion of the data source view on which the

dimension is built

■ Attributes (left): Lists each attribute included in the dimension

■ Hierarchies (center): Provides a space to organize attributes into common drill-down

paths

■ Attribute Relationships: Displays a visual designer to detail how dimension attributes relate

■ Translations: Provides a place to define alternative language versions of both object captions

and the data itself

■ Browser: Displays the dimension’s data as last deployed to the target analysis server

Unlike data sources and data source views, cubes and dimensions must be deployed before their

behavior (e.g., browsing data) can be observed The process of deploying a dimension consists of

two parts First, during the build phase, the dimension definition (or changes to the definition as

appropriate) is sent to the target analysis server Examine the progress of the build process in the

Output pane Second, during the process phase, the Analysis Services server queries underlying data

and populates dimension data Progress of this phase is displayed in the Deployment Progress pane,

usually positioned as a tab of the Properties pane The Business Intelligence Development Studio

attempts to build or process only the changed portions of the project to minimize the time required for

deployment

Trang 9

FIGURE 71-3

Dimension Designer with AdventureWorks Customer dimension

New in 2008

Best Practice warnings are now displayed in the Dimension Designer, appearing as either caution icons or

blue underlines These warnings are informational and are not applicable to every situation Following

the procedures outlined below, such as establishing hierarchies and attribute relationships, will eliminate

many of these warnings See the ‘‘Best Practice Warnings’’ topic later in this chapter for additional details

Attributes

Attributes are the items that are available for viewing within the dimension For example, a time

dimen-sion might expose year, quarter, month, and day attributes Dimendimen-sions built by the Cube Wizard only

have the key attribute defined Other attributes must be manually added by dragging columns from the

Data Source View pane to the Attributes pane Within the attribute list, the key icon denotes the key

Trang 10

attribute (Usage property = Key), which corresponds to the primary key in the source data used to

relate to the fact table There must be exactly one key attribute for each dimension

Attribute source columns and ordering

Columns from the data source view are assigned to an attribute’sKeyColumnsandNameColumn

prop-erties to drive which data is retrieved in populating the attribute During processing, Analysis Services

will include both key and name columns in theSELECT DISTINCTit performs against the underlying

data to populate the attribute TheKeyColumnsassignment determines which items will be included as

members in the attribute The optionalNameColumnassignment can give a display value to the key(s)

when the key itself is not adequately descriptive

For example, a product dimension might assign a ProductID to theKeyColumnand the ProductName

to theNameColumn For the majority of attributes, the single key column assigned when the attribute is

initially created will suffice For example, anAddressattribute in a customer dimension is likely to be

a simple string in the source table with no associated IDs or codes; the default of assigning that single

Addresscolumn as theKeyColumnsvalue with noNameColumnwill suffice

Some scenarios beyond the simple case include the following:

■ Attributes with both an ID/code and a name: The approach for this case, which is very

common for dimension table primary keys (key attributes), depends on whether the ID or

code is commonly understood by those who will query the dimension If the code is common,

then leave itsNameColumnblank to avoid hiding the code Instead, model theID/Codeand

Namecolumns as separate attributes If the ID or code is an internal application or warehouse

value, then hide the ID by assigning both theKeyColumnsandNameColumnproperties on a

single attribute

■ ID/Code exists without a corresponding name: If the ID or code can take on only a few

values (such as Yes or No), then derive a column to assign as theNameColumnby adding a

named calculation in the data source view If the ID or code has many or unpredictable values,

then consider adding a new snowflaked dimension table to provide a name

■ Non-Unique keys: It is important that theKeyColumnsassigned uniquely identify the

members of a dimension For example, a time dimension table might identify months with

numbers 1 through 12, which are not unique keys from one year to the next In this case,

it makes sense to include both year and month columns to provide a good key value Once

multiple keys are used, aNameColumnassignment is required, so add a named calculation to

the data source view to synthesize a readable name (e.g., Nov 2008) from existing month and

year columns

In the preceding non-unique keys scenario, it might be tempting to use the named calculation results

(e.g., Jan 2009, Feb 2009) as the attribute’s key column were it not for ordering issues Numeric year

and month data is required to keep the attribute’s members in calendar, rather than alphabetic, order

The attribute’sOrderByproperty enables members to be sorted by either key or name Alternately,

theOrderByoptionsAttributeKeyandAttributeNameenable sorting of the current attribute’s

members based on the key or name of another attribute, providing that the other attribute has been

defined as a member property of the current attribute Member properties are described in detail in the

next section

Định dạng
Số trang	10
Dung lượng	672,72 KB