Data mining with SSAS enables you to create predictions based on trends in historical data stores of virtually any size.. PowerPivot is a new set of technologies that give end users the
Trang 1Beginning SQL Server 2008 for Developers
Accelerated SQL Server 2008
Pro SQL Server 2008 Analysis Services
Our desire is to pave your path to success by sharing our experience We show how SQL Server professionals use SSAS to address and solve real-world challenges every day We discuss SSAS features in detail, including the enhanced Cube and Dimension Designers For administrators, we discuss several areas in server man-agement These include cube processing and processing options, the Performance Monitor, scheduling, and security
Business intelligence solutions require metrics To deliver metrics to your users, you will learn how to define, create, and use key performance indicators (KPIs), cal-culated members, perspectives, and actions Data mining with SSAS enables you
to create predictions based on trends in historical data stores of virtually any size
SQL Server Analysis Services and Microsoft Office 2010 are now tightly
integrat-ed to enable self-service business intelligence capabilities across the enterprise
PowerPivot is a new set of technologies that give end users the tools needed to form complex analysis and data mining at their workspace We introduce you to PowerPivot, helping you get started with this new and important tool
per-We believe you will gain valuable experience with SQL Server Analysis Services,
as well as insights into developing BI solutions, by applying the methods strated throughout this book
Philo Janus & Guy Fouché
THE EXPERT’S VOICE® IN SQL SERVER
Pro
SQL Server 2008 Analysis Services
Philo Janus and Guy Fouché
Create value and competitive advantage through careful mining and analysis of your company’s business data
Covers
Release 2!
Trang 3Pro SQL Server 2008 Analysis Services
Philo Janus Guy Fouché
Trang 4
Pro SQL Server 2008 Analysis Services
Copyright © 2010 by Philo Janus and Guy Fouché All rights reserved No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher
ISBN-13 (pbk): 978-1-4302-1995-8 ISBN-13 (electronic): 978-1-4302-1996-5 Printed and bound in the United States of America 9 8 7 6 5 4 3 2 1 Trademarked names, logos, and images may appear in this book Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights
Publisher and President: Paul Manning Lead Editor: Jonathan Gennick
Technical Reviewers: Dana Hoffman and Fabio Claudio Ferrachiatti Editorial Board: Clay Andres, Steve Anglin, Mark Beckner, Ewan Buckingham, Gary Cornell, Jonathan Gennick, Jonathan Hassell, Michelle Lowman, Matthew Moodie, Duncan Parkes, Jeffrey Pepper, Frank Pohlmann, Douglas Pundick, Ben Renow-Clarke, Dominic Shakeshaft, Matt Wade, Tom Welsh
Coordinating Editors: Candace English and Fran Parnell Copy Editors: Sharon Wilkey and Mary Ann Fugate Compositor: Bytheway Publishing Services Indexer: John Collin
Artist: April Milne Cover Designer: Anna Ishchenko Distributed to the book trade worldwide by Springer Science+Business Media, LLC., 233 Spring Street, 6th Floor, New York, NY 10013 Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com
For information on translations, please e-mail rights@apress.com, or visit www.apress.com
Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use eBook versions and licenses are also available for most titles For more information, reference our Special Bulk Sales–eBook Licensing web page at www.apress.com/info/bulksales
The information in this book is distributed on an “as is” basis, without warranty Although every precaution has been taken in the preparation of this work, neither the author(s) nor Apress shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly
by the information contained in this work
The source code for this book is available to readers at www.apress.com You will need to answer questions pertaining to this book in order to successfully download the code
Trang 5To Jodi Fouché: For her poetry, being my biggest fan, and unequivocal love
— Guy Fouché
Trang 6 CONTENTS
Contents at a Glance
Contents v
About the Authors xiv
About the Technical Reviewers xv
Acknowledgments xvi
Introduction xvii
Chapter 1: Introduction to OLAP 1
Chapter 2: Cubes, Dimensions, and Measures 15
Chapter 3: SQL Server Analysis Services 41
Chapter 4: SSAS Developer and Admin Interfaces 75
Chapter 5: Creating a Data Source View 97
Chapter 6: Creating Dimensions 117
Chapter 7: Building a Cube 167
Chapter 8: Deploying and Processing 195
Chapter 9: MDX 219
Chapter 10: Cube Features 251
Chapter 11: Data Mining 275
Chapter 12: PowerPivot 311
Chapter 13: Administration 333
Chapter 14: User Interfaces 373
Appendix A: Setting Up Adventure Works 431
Appendix B: Data-Mining Resources 435
Index 437
Trang 7 CONTENTS
Contents
Contents at a Glance iv
Contents v
About the Authors xiv
About the Technical Reviewers xv
Acknowledgments xvi
Introduction xvii
Chapter 1: Introduction to OLAP 1
From Pivot Tables to Dimensional Processing 2
Data Warehousing 4
Applications of OLAP 5
History of OLAP 7
SQL Server Analysis Services 8
Data Mining 13
Summary 14
Chapter 2: Cubes, Dimensions, and Measures 15
Cubes and Their Components 15
Defining Measures and Dimensions 18
Schemas 18
Dimensions in Depth 20
Measures 27
Types of Aggregation 31
Trang 8 CONTENTS
Writeback 32
Calculated Measures 33
Actions 34
XMLA 35
Multidimensional Expressions (MDX) 36
Data Warehouses 37
Storage 38
Staging Databases 38
Storage Modes 38
Summary 39
Chapter 3: SQL Server Analysis Services 41
Requirements 41
Hardware 41
Virtualization 43
Software 44
Upgrading 44
Standard or Enterprise Edition? 44
Architecture 46
The Unified Dimensional Model 46
Logical Architecture 49
Physical Architecture 52
Storage 54
Cube Structures in SSAS 59
Data Sources 61
Data Source View 61
The Cube Structure Itself 63
Dimensions 64
Mining Structures 65
Trang 9 CONTENTS
What’s New in SQL Server 2008 65
Performance 65
Tools 69
Summary 73
Chapter 4: SSAS Developer and Admin Interfaces 75
Business Intelligence Development Studio 75
BIDS Is Visual Studio? 75
Panes 76
Solution Explorer 79
Properties Pane 80
Creating or Editing a Database Solution 82
SQL Server Management Studio 86
Managing Analysis Services 87
Executing MDX Queries 91
PowerShell 92
A Convincing Example 93
PowerShell for SQL Server 93
PowerShell with SSAS 94
Summary 95
Chapter 5: Creating a Data Source View 97
Cubes Need Data 97
Data Sources 98
Data Source Views 105
Designer Tour 105
Named Calculations and Queries 110
Summary 116
Trang 10 CONTENTS
Chapter 6: Creating Dimensions 117
Dimensional Analysis 117
Review of the Dimension Concept 118
Star or Snowflake? 119
Dimensions in SSAS 127
Creating a Dimension 127
Analysis Management Objects (AMO) Warnings 135
Dimension Properties 136
Attributes 148
Attribute Relationships 150
Attribute Properties 156
Parent-Child Dimensions 157
The Time Dimension 159
Summary 166
Chapter 7: Building a Cube 167
Dimensions and Cubes 169
Creating Cubes 170
Using Measure Group Tables 170
Selecting Dimensions 173
Defining Dimension Usage 179
Measures and Measure Groups 182
Measures 182
Measure Groups 184
Calculated Measures 185
Summary 193
Chapter 8: Deploying and Processing 195
Deploying a Project 195
Project Properties 195
Trang 11 CONTENTS
Deployment Methods 198
Using the Deployment Wizard 198
Running the Wizard 199
Input Files 201
Deployment Scripts 202
Synchronizing SSAS Databases 202
Processing 204
What Processing Does for Us 205
How to Initiate Processing from BIDS 208
Processing from SQL Server Management Studio 214
Processing via XMLA 215
Processing with Analysis Management Objects (AMO) 215
Scheduling OLAP Maintenance 215
Summary 218
Chapter 9: MDX 219
Why the Need? 219
Tuples and Sets 221
Notation 222
Tuples 223
Sets 228
MDX Queries 228
SELECT 229
WHERE 232
MDX Functions 233
Categories of Functions 239
Summary 249
Chapter 10: Cube Features 251
Business Intelligence 251
Trang 12 CONTENTS
Time Intelligence 252
Account Intelligence 254
Dimension Intelligence 255
Operators, Functions, and More 255
Unary Operators 255
Custom Member Formulas 256
Attribute Ordering 257
Currency Conversion 257
Calculations Tab 258
Calculated Measures 259
Named Sets 262
Other Cube Features 262
Key Performance Indicators 263
Actions 265
Perspectives 270
Translations 272
Summary 273
Chapter 11: Data Mining 275
Why Mine Data? 275
Using Data-Mining Algorithms 276
Microsoft Nạve Bayes 276
Microsoft Clustering 276
Microsoft Decision Trees 277
Creating the Accessory Buyers Marketing Campaign 277
Preparing the Data Warehouse 278
Creating the Accessory Buyers Views in AdventureWorks 278
Creating the Accessory Campaign Data Source View 281
Finding Accessory Buyers by Using the AdventureWorks EDW 282
Trang 13 CONTENTS
Using the Data Mining Model Designer 289
The Mining Structure View 290
The Mining Models View 291
The Mining Model Viewer View 292
The Mining Accuracy Chart View 297
The Mining Model Prediction View 299
Finding Accessory Buyers by Using Data Mining Extensions (DMX) 303
Use the DMX Development Environment 303
Create the Accessory Buyers Mining Structure 304
Add a Nạve Bayes Mining Model to the Accessory Buyers Campaign 305
Process the Accessory Buyers Campaign 305
View the Accessory Buyers Mining Model 306
Predict Our Accessory Buyers 308
Summary 310
Chapter 12: PowerPivot 311
PowerPivot Support in SQL Server 2008 R2 311
Master Data Services 311
Excel Writeback 313
PowerPivot from Excel 320
PowerPivot with SharePoint Server 2010 326
Summary 331
Chapter 13: Administration 333
DBA Tasks 333
Processing a Cube 333
Processing Options 335
Processing Architecture 336
Profiler 337
Performance Monitor 337
Trang 14 CONTENTS
Automation 339
XML for Analysis 340
Analysis Management Objects 340
PowerShell 343
Scheduling 343
SQL Server Integration Services 345
Security 348
Authentication 348
Authorization 349
Performance 352
Design 352
Aggregations 359
Scaling 367
Virtualization 369
SharePoint Server 2010 369
Summary 371
Chapter 14: User Interfaces 373
Excel 2007 373
Data Source Connections 374
Pivot Tables 379
Pivot Charts 386
Visio 2007 399
SQL Server Reporting Services 401
Reports 401
Tablix 402
Charts 410
Report Builder 2.0 418
MOSS 2007 420
Trang 15 CONTENTS
KPI Lists 421
Excel Services 423
PerformancePoint 423
Appendix A: Setting Up Adventure Works 431
Appendix B: Data-Mining Resources 435
Index 437
Trang 16About the Authors
Philo Janus is a senior technology specialist with Microsoft Over the years he has presented
Microsoft Office InfoPath to thousands of users and developers, and assisted with enterprise implementations of InfoPath solutions With that background, he is particularly sensitive to the difficulties users and developers have had with InfoPath
He graduated from the US Naval Academy with a bachelor of science in electrical engineering in
1989 to face a challenging career in the US Navy After driving an aircraft carrier around the Pacific Ocean and a guided-missile frigate through both the Suez and Panama Canals, and serving in the US Embassy in Cairo, a small altercation between his bicycle and an auto indicated a change of career (some would say that landing on his head in that accident would explain many things)
Philo’s software development career started with building a training and budgeting application in Access 2.0 in 1995 Since then he’s worked with Oracle, Visual Basic, SQL Server, and NET, building applications for federal agencies, commercial firms, and conglomerates In 2003 he joined Microsoft as a technology specialist, evangelizing Office as a development platform
Guy Fouché is a business intelligence and decision support system
consultant in the Dallas, Texas area Guy spends his evenings playing one of his eight trumpets and expanding his composition skills by using the current generation of music technologies On the weekend,
he puts as many miles as he can on his bright yellow Honda F4i sport motorcycle Guy and his wife Jodi enjoy taking nine-day trips in their Jeep 4×4, taking photographs and writing travelogs along the way You can view their photography at http://photography.fouche.ws
Trang 17About the Technical Reviewers
Fabio Claudio Ferrachiatti is a senior consultant and a senior analyst/developer of Microsoft
technologies He works for Brain Force at its Italian branch (www.brainforce.it) He is a Microsoft Certified Solution Developer for NET, a Microsoft Certified Application Developer for NET, and a Microsoft Certified Professional, as well as a prolific author and technical reviewer Over the past ten years, he’s written articles for Italian and international magazines and coauthored more than ten books
on a variety of computer topics
Born in Brooklyn, New York, Dana L Hoffman often jokes that her name should have been Data She
has always had a sharp eye for detail and an avid desire to create systems that are not just workable, but intuitive and easy to use She always tries to see things from the user’s point of view, and sees technical reviewing as an excellent opportunity to put her nitpicking skills to good use With a background in programming and database development, Dana currently works as a data analyst She lives in Connecticut and is nearly finished raising two sons
Trang 18 INTRODUCTION
Acknowledgments
I’d like to offer a huge thank-you to everyone at Apress who has had input into these pages! Guy Fouché
Trang 19Introduction
Pro SQL Server 2008 Analysis Services offers an in-depth look into the latest and greatest suite of analytic
tools from Microsoft This book will help you create business intelligence (BI) solutions that improve your company’s analysis and decision making by focusing on practical, solution-oriented application of the technologies available in SQL Server 2008 Analysis Services (SSAS)
Using the examples and exercises in this book, you will further your understanding of online analytical processing (OLAP), BI, data mining, and SSAS itself New SSAS features are also explained, including the Management Data Warehouse (MDW), dynamic management views (DMVs), and Aggregation Designer Improvements to the Cube and Dimension Designers are also covered
Chapters 1 and 2 introduce you to OLAP, and to the key concepts that are termed cubes, dimensions, and measures With that foundation laid, Chapters 3 and 4 introduce you to what SQL Server provides
You’ll get your first look at Analysis Services and its administration interface Chapters 5 through 7 show you how to design and build a cube for analysis The cube is the focal point of Analysis Services Once you’ve created a cube, Chapter 8 shows how to deploy it for use
After you’ve deployed a cube, it is available for you and other analyists to query It is partly through queries that one examines and analyzes the data at one’s disposal To that end, Chapter 9 is devoted to Multidimensional Expressions (MDX), which is the query language underpinning Analysis Services solutions
Key performance indicators (KPIs) are at the heart of every BI solution In Chapter 10, you will learn how to define, create, and use these metrics Chapter 10 also introduces you to perspectives, actions, and calculated members
Data-mining algorithms enable you to sift through huge amounts of historical data, and create predictions based on trends and patterns Working through Chapter 11, you will learn how to use data mining to create, execute, and validate a prediction model Chapter 11 will also introduce you to Microsoft’s Data Mining Extensions (DMX) language
PowerPivot is an exciting set of technologies that provide powerful BI abilities to all business users
By integrating with Office 2010, your users can perform complex analysis and data mining on their workstations Using SSAS language translation and automated currency conversions greatly enhances the usability of your company’s data across the enterprise
Finally, Chapter 13 offers important information for SSAS administrators To effectively manage SSAS at the server level, you need to understand processing tasks and options, the SQL Server Profiler, the Performance Monitor, scheduling, and security
Trang 21C H A P T E R 1
Introduction to OLAP
Online analytical processing (OLAP) is a technique for aggregating data to enable business users to dig
into transactional data to solve business problems You may be familiar with pivot tables from Microsoft Excel or other reporting solutions—for example, taking a list of order details (Figure 1-1) and creating a table that shows the total for each product ordered by month (Figure 1-2)
Figure 1-1 Tabular order data
Trang 22CHAPTER 1 INTRODUCTION TO OLAP
Figure 1-2 Summarizing products by month ordered
This is an interesting report, but what if we want to see the breakdown by quarter? Or by year? Or by fiscal year? Perhaps we want to combine the products into groups—for example, if we don’t care so much how a specific product is faring, but we do want to know how our condiments are selling overall
We might be able to create some of these reports from existing data, or we might be able to write a query
to do so Some will require modifications to the database
However, we have another problem: in this database, we have only 2,155 order details and 77 products—a pretty easy group of data to deal with What do we do when we have 500 products (or more—consider Amazon.com!) and tens of thousands of records? What about millions? We can’t expect Microsoft Excel to create pivot tables from all those records for us To solve this problem of analyzing large amounts of data, we turn to a server-based solution that can take large amounts of tabular data and create these aggregations for us
From Pivot Tables to Dimensional Processing
Take another look at Figure 1-2, order totals of products by month ordered We have two dimensions to
our data: product and month This is a pretty basic pivot table based on a tabular data source Now let’s say we want to see this same table, but broken down for each geographic region (Figure 1-3)
Trang 23CHAPTER 1 INTRODUCTION TO OLAP
Figure 1-3 Breaking down orders by geographic region
Now we have a third dimension: geographic region Consider arranging the data as shown in Figure
1-4
Figure 1-4 Understanding multidimensional data
Trang 24CHAPTER 1 INTRODUCTION TO OLAP
With three dimensions to the data, we have a cube We will be using that term a lot Of course, we
don’t have to stop with three dimensions We can break the orders down by customer, by discount level,
by product type, and so on As you start to consider the exponential impact of having multiple dimensions, with each dimension having dozens of members, you will start to appreciate that presenting a tool to a nontechnical user to deal with this type of analysis is a nontrivial problem
Looking at Figure 1-4, what does it look like when we want to also break it down by warehouse
location and shipping method? And yes, we still call it a cube when there are more than three
dimensions
Let’s take a step back and look at another problem we can face when aggregating data Let’s take our order data and create a pivot table to show products ordered by customer (Figure 1-5)
Figure 1-5 Products ordered by customer
Obviously, few customers would order every product, so we see there is a lot of empty space This is
referred to as a sparse data set The problem we have is that we are taking up space for every
combination of customer and product, and these empty cells add up pretty quickly OLAP servers are designed to optimize for storage of sparse data sets and aggregations
Data Warehousing
A term that you’ll frequently encounter in analytic processing—and that is horribly overused—is data
warehouse You may also hear data mart (I will confess to also occasionally using model when I’m more
interested in presenting the business problem we’re trying to solve than getting into a debate about semantics.) Although these terms are not completely synonymous, they are often used interchangeably
Data warehouse is easily the scariest word on the list As this book will show you, working with OLAP
technologies is relatively straightforward I don’t mean to trivialize the work necessary to build a robust dimensional solution, but the difficulty can be overhyped A data warehouse is, most basically, a compilation of data fundamental to the business that has been integrated from numerous other sources For example, you can pull salary data from an HR system, vendor data from an ERP system, customer
Trang 25CHAPTER 1 INTRODUCTION TO OLAP
billing information from a financial system, and create a financial data warehouse for the company (to start calculating profits and losses, profit margins, and so forth)
A data warehouse does not need to be dimensional You could create a normalized relational database for querying and reporting However, you will find it difficult to create many of the types of reports users expect from a very large storage of data (users and analysts will generally focus on aggregation and large-scale analysis as opposed to basic tabular reports)
Because data warehouse generally implies large-scale and cross-corporate information, people may
perceive, as I mentioned, that data warehouses are heavy engineering efforts requiring a great deal of big up-front design Where possible, I prefer to grow these types of resources organically: start with small
projects, solve individual business problems, and grow the resulting data marts as necessary, finally
bridging over to create an actual data warehouse if and when it is deemed necessary I am very much about iterative design, and the work in this book supports that approach in design and maintenance of OLAP solutions
“at a glance” views of how the business is doing (Figure 1-6)
Figure 1-6 A corporate scorecard built on OLAP data