1. Trang chủ
  2. » Giáo án - Bài giảng

Business intelligence a managerial approach 2nd by david king chapter 02

47 211 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 47
Dung lượng 4,22 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Data Warehousing DefinitionsA type of database often used as an interim area for a data warehouse An operational data mart A data warehouse for the enterprise Data about data.. Integra

Trang 1

Chapter 2:

Data Warehousing

Trang 2

Learning Objectives

 Understand the basic definitions and concepts

of data warehouses

 Learn different types of data warehousing

architectures; their comparative advantages

and disadvantages

 Describe the processes used in developing and managing data warehouses

 Explain data warehousing operations

 Explain the role of data warehouses in decision support

Trang 5

Main Data Warehousing Topics

Trang 6

What is a Data Warehouse?

 A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format

 “The data warehouse is a collection of

integrated, subject-oriented databases

designed to support DSS functions, where each unit of data is non-volatile and relevant

to some moment in time”

Trang 8

Data Mart

A departmental data warehouse that stores only relevant data

A subset that is created directly from a data warehouse

A small data warehouse designed for a strategic business unit or a department

Trang 9

Data Warehousing Definitions

A type of database often used as an interim area for a data warehouse

An operational data mart

A data warehouse for the enterprise

Data about data In a data warehouse, metadata

describe the contents of a data warehouse and the

manner of its acquisition and use

Trang 10

Integrate Load

ETL Process

Enterprise Data warehouse Metadata

Replication

Data/text mining

Custom built applications

OLAP, Dashboard, Web

Routine Business Reporting

Applications (Visualization)

Data mart (Engineering)

Data mart (Marketing)

Data mart (Finance)

Data mart ( )

Access

No data marts option

Trang 11

DW Architecture

software

access and analyze data from the warehouse

Trang 13

A Web-based DW Architecture

Web Server

Client (Web browser)

Application Server

Data warehouse

Web pages

Internet/

Intranet/

Extranet

Trang 14

Data Warehousing Architectures

 Issues to consider when deciding which architecture to use:

retrieval and analysis?

Trang 15

Alternative DW Architectures

Source

Systems

Staging Area

Independent data marts (atomic/summarized data)

End user access and applications

ETL (a) Independent Data Marts Architecture

Source

Systems

Staging Area

End user access and applications

ETL

Dimensionalized data marts linked by conformed dimentions (atomic/summarized data)

(b) Data Mart Bus Architecture with Linked Dimensional Datamarts

Source

Systems

Staging Area

End user access and applications

ETL

Normalized relational warehouse (atomic data)

Dependent data marts (summarized/some atomic data)

(c) Hub and Spoke Architecture (Corporate Information Factory)

Trang 16

Alternative DW Architectures

Source

Systems

Staging Area

Normalized relational warehouse (atomic/some summarized data)

End user access and applications

ETL (d) Centralized Data Warehouse Architecture

End user access and applications

Logical/physical integration of common data elements

Existing data warehouses

Data marts and legacy systmes

Data mapping / metadata (e) Federated Architecture

Trang 17

Alternative DW Architectures

1. Independent Data Marts

2. Data Mart Bus Architecture

3. Hub-and-Spoke Architecture

4. Centralized Data Warehouse

5. Federated Data Warehouse

 Each has pros and cons!

Trang 18

Teradata Corp DW Architecture

Trang 19

Data Warehousing Architectures

7 Compatibility with existing

Ten factors that potentially affect the

architecture selection decision:

Trang 20

Data Integration and the Extraction,

Transformation, and Load (ETL) Process

Integration that comprises three major processes:

data access, data federation, and change capture

A technology that provides a vehicle for pushing data

from source systems into a data warehouse

An evolving tool space that promises real-time data integration from a variety of sources, such as

relational databases, Web services, and

multidimensional databases

Trang 21

Extraction, transformation, and load (ETL)

Data Integration and the Extraction,

Transformation, and Load (ETL) Process

Extract Transform Cleanse Load

Data warehouse

Data mart

Trang 22

ETL

 Issues affecting the purchase of ETL tool

learning curve

 Important criteria in selecting an ETL tool

number of data sources/architectures

functional user

Trang 23

Data Warehouse Development

 Data warehouse development approaches

 One alternative is the hosted warehouse

 Data warehouse structure:

 Real-time data warehousing?

Trang 24

Hosted Data Warehouses

 Benefits:

 Requires minimal investment in infrastructure

 Frees up capacity on in-house systems

 Frees up cash flow

 Makes powerful solutions affordable

 Enables powerful solutions that provide for growth

 Offers better quality equipment and software

 Provides faster connections

 Enables users to access data remotely

 Allows a company to focus on core business

 Meets storage needs for large volumes of data

Trang 25

Representation of Data in DW

supports high-volume query access

simplest style of dimensional modeling

dimension tables

 Fact table contains the descriptive attributes (numerical

values) needed to perform decision analysis and query

reporting

 Dimension tables contain classification and aggregation

information about the values in the fact table

where the diagram resembles a snowflake in shape

Trang 26

 Multidimensionality

The ability to organize, present, and analyze data

by several dimensions, such as sales by region, by product, by salesperson, and by time (four

dimensions)

 Multidimensional presentation

 Dimensions: products, salespeople, market segments,

business units, geographical locations, distribution channels, country, or industry

 Measures: money, sales volume, head count, inventory

profit, actual versus forecast

 Time: daily, weekly, monthly, quarterly, or yearly

Trang 27

Star vs Snowflake Schema

Fact Table SALES

UnitsSold

Brand

Dimension GOGRAPHY

Coutry

Fact Table SALES

UnitsSold

Dimension DATE

Date

Dimension PEOPLE

Division

Dimension PRODUCT

LineItem

Dimension STORE

LocID

Dimension BRAND

Brand

Dimension CATEGORY

Category

Dimension LOCATION

State

Dimension MONTH

M_Name

Dimension QUARTER

Q_Name

Star Schema Snowflake Schema

Trang 28

Analysis of Data in DW

 Online analytical processing (OLAP)

query the online system and to conduct analyses

 Data cubes, drill-down / rollup, slice & dice, …

 OLAP Activities

Trang 29

Analysis of Data Stored in DW

OLTP vs OLAP

 OLTP (online transaction processing)

capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, POS,

 OLAP (online analytic processing)

information extraction by providing effectively and efficiently ad hoc analysis of organizational data

Trang 30

OLAP vs OLTP

Trang 31

OLAP Operations

 Dice – a slice on more than two dimensions

data ranging from the most summarized (up)

to the most detailed (down)

relationships for one or more dimensions

orientation of a report or an ad hoc

query-page display

Trang 32

Sales volumes of

a specific Region

on variable Time and Products

Sales volumes of

a specific Time on variable Region and Products

Cells are filled with numbers representing sales volumes

A 3-dimensional OLAP cube with slicing

Trang 33

Variations of OLAP

OLAP implemented via a specialized

multidimensional database (or data store)

that summarizes transactions into

multidimensional views ahead of time

The implementation of an OLAP database on

top of an existing relational database

 Database OLAP and Web OLAP (DOLAP and

WOLAP); Desktop OLAP,…

Trang 34

DW Implementation Issues

 Establishment of service-level agreements and data-refresh requirements

 Identification of data sources and their governance policies

 Data quality planning

 Data model design

 ETL tool selection

 Relational database software and platform selection

Trang 35

DW Implementation Guidelines

managers, and users

completed project

professionals

understood by the organization

Trang 36

Successful DW Implementation

Things to Avoid

 Starting with the wrong sponsorship chain

 Setting expectations that you cannot meet

 Engaging in politically naive behavior

 Loading the data warehouse with information just because it is available

 Believing that data warehousing database

design is the same as transactional database design

 Choosing a data warehouse manager who is technology oriented rather than user oriented

Trang 37

Successful DW Implementation

Things to Avoid - Cont.

 Focusing on traditional internal

record-oriented data and ignoring the value of

external data and of text, images, etc

 Delivering data with confusing definitions

 Believing promises of performance, capacity, and scalability

 Believing that your problems are over when the data warehouse is up and running

 Focusing on ad hoc data mining and periodic reporting instead of alerts

Trang 38

Failure Factors in DW Projects

 Lack of executive sponsorship

 Unclear business objectives

 Cultural issues being ignored

 Unrealistic expectations

 Inappropriate architecture

 Low data quality / missing information

 Loading data just because it is available

Trang 39

Massive DW and Scalability

 Scalability

 The main issues pertaining to scalability:

grow

 Good scalability means that queries and

other data-access functions will grow linearly with the size of the warehouse

Trang 40

Real-time/Active DW/BI

 Enabling real-time data updates for

real-time analysis and real-time decision making is growing rapidly

 Push vs Pull (of data)

 Concerns about real-time BI

Trang 41

Real-time/Active DW at Teradata

Trang 42

Enterprise Decision Evolution and DW

Trang 43

Traditional vs Active DW Environment

Trang 44

DW Administration and Security

 have the knowledge of high-performance software, hardware and networking technologies.

 possess solid business knowledge and insight.

 be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure.

 possess excellent communications skills.

Trang 45

The Future of DW

 Open source software

 SaaS (software as a service)

 Cloud computing

 DW appliances

 Real-time DW

 Data management practices/technologies

 In-memory processing (“super-computing”)

 Advanced analytics

Trang 46

BI / OLAP Portal for Learning

Trang 47

End of the Chapter

 Questions, comments

Ngày đăng: 18/12/2017, 15:10

TỪ KHÓA LIÊN QUAN