By using Microsoft’s Master Data Services, organizations can align operational and analytical data across the enterprise and across lines of business systems with a guaranteed level of d
Trang 1CHAPTER 51 SQL Server 2008 Analysis Services
reduced within the past year or so, and the ease of transparently applying this type of
solu-tion to OLAP is a natural fit It affects both the OLAP data populasolu-tion process and the
day-to-day what-if usage by the end users You should keep these types of surgical incisions in
mind when you face OLAP performance issues in this platform They are easy to apply, the
gains are huge, and you quickly get a return on your investment
MPP Data Warehouse Option from Microsoft
A few years ago, Microsoft acquired DATAllegro’s massively parallel data warehouse
appli-ance company This basically lifted any limitations for data warehousing that SSAS or SQL
Server 2008 R2 itself had Massively parallel means to scale horizontally on CPU and
storage to grow with your size and processing needs There is no practical limit here The
underlying architecture relies on standards-based technologies Essentially, there is a
sepa-ration of storage and compute nodes that allows you to spread out your data across vast
storage (EMC storage) so that it is very shallow (easy to get to quickly across all data
storage) The compute power is also horizontally scalable and allows any query to process
data access in parallel to surface data needed by any query (and assemble it for delivery)
Figure 51.70 shows the high-level architecture of Microsoft’s DATAllegro v3 offering
Not only is the DATAllegro v3 architecture massively parallel and fast, but the multinode
architecture also makes it highly available If any node fails, hot spares kick in to pick up
the load Any failed node can easily be replaced and brought online with zero processing
interruption Moreover, multiple appliances can be combined on a common InfiniBand
backbone to create large-scale and extremely powerful multitier or hub-and-spoke data
warehouses with rapid, parallel data movement between the various appliances Believe it
or not, there is an Ingres SQL engine at the heart of the database portion of this appliance
Dual 4GB FC Controller Dual 4GB FC Controller
Ingres
Compute
Nodes
Dual 4GB Fiber Channel
Ingres
Dual 4GB Fibre Channel
16GB RAM
16GB RAM
Cisco – Redundant Infiband Network
Cisco – Redundant Infiband Network
Storage
Nodes
Dual Fiber Channel Network
Hot Spare
FIGURE 51.70 The DATAllegro v3 MPP architecture
Trang 2An OLAP Requirements Example: CompSales International
Master Data Services
Completing the business intelligence picture is a new focus on the data quality that is
needed at all tiers of data information delivery Microsoft has been pouring an enormous
amount of effort (and money) into creating and embedding master data services
throughout its BI and transactional platforms By using Microsoft’s Master Data Services,
organizations can align operational and analytical data across the enterprise and across
lines of business systems with a guaranteed level of data quality for most core data
cate-gories (such as customer data, product data, and other core data of the business)
Microsoft has created data stewardship capabilities complete with workflows and
notifica-tions of any business user who might be impacted by core data change Managing
hierar-chies is also an important part of mastering data that has a natural hierarchical structure,
such as customer hierarchies (parent company to subsidiaries and so on) Each master data
change within the system is treated as a transaction; and the user, date, and time of each
change are logged, as well as pertinent audit details, such as type of change, member code,
and prior versus new value In addition to being a very useful audit trail, the transaction
log can be used to selectively reverse changes Customizable data quality rules create
default values, enable data validation, and trigger actions such as email notifications and
workflows Rules can be built by IT professionals or business users directly from the
stew-ardship portal
Microsoft is still getting the kinks out of Master Data Services, so you should look for
much maturing to come in the next few years Other competing products that have many
years’ headstart provide this capability to companies around the globe, but Microsoft is
catching up fast
Security and Roles
Security is straightforward in SSAS For each database or cube, roles are identified with
varying levels of granularity for users Roles are used when accessing the data in cubes The
process works like this: a role is defined, and then an individual user or group who is a
member of that role is assigned that role To create the roles you need for this data, you
right-click on theRolesentry in the Solution Explorer and select New Role Figure 51.71
shows the creation of a database role with process database and read definition permissions
The other tabs of the role designer allow you to further specify the controls, such as which
members you want to have this role (Membership tab), what data source access you want
(Data Sources tab), which cubes can be used (Cubes tab), what specific cell data the role
has access to (Cell Data tab), what dimensions can be accessed (Dimensions tab), what
dimensional data can be accessed (Dimension Data tab), and what mining structures are
allowed to be used (Mining Structures tab) These are additive As you can see in Figure
51.72, you can also specify full MDX queries as part of the process of filtering what a
member and role can have access to
Trang 3CHAPTER 51 SQL Server 2008 Analysis Services
FIGURE 51.71 Creating a database role and permissions in the role designer
FIGURE 51.72 Specifying MDX-based filtering, using the role designer
Trang 4Summary
Summary
This chapter discusses the OLAP approach, SSAS terms, and the tools Microsoft provides to
enable OLAP cubes It presents a mini-methodology to follow that should help you get an
OLAP project off the ground and running smoothly These efforts are typically not simple,
and a well-trained data warehouse analyst, BI specialist, or data architect is usually worth
his or her weight in gold because of the results (and value) that can be achieved through
good OLAP cube design
Sometimes it is difficult to engage end users and get them to use an OLAP cube
success-fully Easy-to-use third-party tools can greatly help with this problem
From an SSAS point of view, the ease of control of storage methods, dimension creation,
degrees of aggregation, cube partitioning, and usage-based optimization are features that
make this product a serious data warehousing tool It is getting easier and easier to publish
OLAP data via websites or other means SSAS is truly the land of the wizards, but having a
wizard lead you through a good OLAP cube design is critical The wizards significantly
reduce the expense and complexity of a data warehouse or data mart OLAP solution,
enabling you to build many more much-needed solutions for your end users
This chapter also introduces the new paths Microsoft is pursuing around massively parallel
data warehouse appliances and the integration of Master Data Services into their business
intelligence and transactional fabric to raise their levels of performance and data quality
across the board
The next chapter, “SQL Server 2008 Integration Services,” ventures into the very robust
offering from Microsoft in regards to data enablement, manipulation, and aggregation for
not only Analysis Services, but most other production platforms that require complex data
transformations
Trang 5This page intentionally left blank
Trang 6SQL Server Integration
Services
IN THIS CHAPTER
What’s New with SSIS666
SSIS Basics667
SSIS Architecture and Concepts671
SSIS Tools and Utilities676
A Data Transformation Requirement682
Running the SSIS Wizard682
The SSIS Designer693
The Package Execution Utility702
Connection Projects in Visual Studio716
Change Data Capture Addition with R2718
Usingbcp718
Logged and Nonlogged Operations737
As you may be aware, SQL Server 2000’s Data
Transformation Services (DTS) was completely redeployed
into and integrated with the Business Intelligence (BI)
Development Studio, Visual Studio environments, and SQL
Server Management Studio (SSMS) This chapter describes
the SQL Server Integration Services (SSIS) environment and
how SSIS addresses complex data movement and
integra-tion needs
SSIS focuses on importing, exporting, and transforming data
from one or more data sources to one or more data targets
This is Microsoft’s version of extraction, transformation,
and loading (ETL) on steroids Competing ETL products
include Informatica, but Microsoft has simply bundled this
functionality together with SQL Server, thus providing more
reasons to purchase SQL Server and not have to buy any
expensive competing products Other Microsoft solutions
exist for importing and exporting data (such as the Bulk
Copy Program, bcp), but SSIS can be used for a larger variety
of data transformation purposes, and its strength is in direct
data access and complex data transformation
If you have existing DTS implementations (that is, DTS
packages), you can convert them to SSIS packages with little
to no effort, or you can simply execute them as is (with
some restrictions)
If you still use the Bulk Copy Program (bcp), a section at the
end of this chapter describes this legacy SQL Server
capabil-ity bcpis still the workhorse of many production
environ-ments and cannot just be discarded every time a new
version of SQL Server comes along We estimate that bcp
will be around for years to come
Trang 7CHAPTER 52 SQL Server Integration Services
The alternatives to SSIS andbcpin the Microsoft SQL Server 2008 environment include
replication, distributed queries,BULK INSERT, andSELECT INTO/INSERT This chapter
helps you determine how and when to use both SSIS andbcpas opposed to these other
alternatives
What’s New with SSIS
In SQL Server 2008, Microsoft has further extended the capabilities of SSIS into a much
more comprehensive and robust data integration platform—with the emphasis on the
word platform The following are some of the highlights of SSIS 2008:
Continued support for SQL Server 2000 Data Transformation Services (DTS) This
includes DTS runtime, the object model that it exposes, and the dtsrun.exe
command-line utility This support will likely be deprecated in the next full release
of SQL Server, though There are several 64-bit restrictions with DTS
Extensive performance enhancements to leverage caching for lookup
transforma-tions, previously a major performance bottleneck during transformations This also
includes sharing caches in a single package and between separate packages
New ADO.NET components for both source and destinations
New data profiling tasks and a Data Profile Viewer
A new Integration Services Connections Project Wizard that speeds the creation of
the connection information needed by packages
A new script environment called Visual Studio Tools for Applications (VSTA)
envi-ronment VSTA supports both Microsoft Visual Basic 2008 and Visual C# 2008
Package upgrades from 2005 (or earlier) to 2008 package format
Enhanced data type handling in the SQL Server Import and Export Wizard and a few
new data types, such as new DateandTimedata types
SQL statement enhancements that allow you to perform multiple data
manipula-tions at the same time with MERGE
The ability to use SQL Server 2008’s Change Data Capture technology from within
Integration Services This one is really a big deal and has been added for R2 via
Microsoft partners
The ability to create debug dump files that provide information about your
pack-age’s execution
SSIS Basics
As the world becomes ever more data oriented, much greater emphasis is being placed on
getting data from one place to another To complicate matters, data can be stored in many
different formats, contexts, filesystems, and locations In addition, the data often requires
Trang 8SSIS Basics
SQL Server
2008
Data
Mart
SQL Server 2000
Master Data Warehouse
Distributing periodic updates to
Data Marts from a “master” Data Warehouses
Data Mart
SQL Server 2005
Data Mart
ORACLE
SSIS SSIS SSIS
FIGURE 52.1 Distributing periodic updates to data marts
significant transformation and conversion processing as it is being moved around
Whether you are trying to move data from Excel to SQL Server, create a data mart (or data
warehouse), or distribute data to heterogeneous databases, you are essentially enabling
someone with data
This section describes the SSIS environment and how it is addressing these needs As
mentioned earlier, the focus is on importing, exporting, and transforming data from one
or more data sources to one or more data targets
Common requirements of SSIS might include the following:
Exporting data out of SQL Server tables to other applications and environments (for
example, ODBC or OLE DB data sources or via flat files)
Importing data into SQL Server tables from other applications and environments (for
example, ODBC or OLE DB data sources or via flat files)
Initializing data in some data replication situations, such as initial snapshots
Aggregating data (that is, data transformation) for distribution to/from data marts or
data warehouses
Changing the data’s context or format before importing or exporting it (that is, data
conversion)
Some typical business scenarios for SSIS might include the following:
Enabling data marts to receive data from a master data warehouse through periodic
updates (see Figure 52.1)
Trang 9CHAPTER 52 SQL Server Integration Services
FIGURE 52.2 Populating a data warehouse from one or more data sources
Populating a master data warehouse from legacy systems (see Figure 52.2)
Initializing heterogeneous replication subscriber tables on Oracle from a SQL Server
2008 Publisher (see Figure 52.3)
Pulling sales data directly into SQL Server 2008 from an Access or Excel application
(see Figure 52.4)
Exporting static time-reporting data files (that is, flat files) for distribution to remote
consultants
Importing new orders directly or indirectly from a sales force automation or
distrib-uted sales systems
In general, you need SSIS if any of the following conditions exist:
You need to import data directly into SQL Server from one or more ODBC data
sources, NET and OLE DB data providers, or via flat files
You need to export data directly out of SQL Server to one or more ODBC data
sources, NET and OLE DB data providers, or via flat files
You need to perform data conversions, data cleansing/data standardization,
transfor-mations, merges, or aggregations on data from one or more data sources for
distribu-tion to one or more data targets You also need SSIS if you need to access the data
directly via any ODBC data source, NET or OLE DB data providers, or via flat files
Trang 10SSIS Basics
FIGURE 52.3 Initializing a heterogeneous replication subscriber (such as Oracle)
FIGURE 52.4 Pulling data from other disparate applications