Data Warehousing DefinitionsA type of database often used as an interim area for a data warehouse An operational data mart A data warehouse for the enterprise Data about data.. Integra
Trang 1Chapter 2:
Data Warehousing
Trang 2Learning Objectives
Understand the basic definitions and concepts
of data warehouses
Learn different types of data warehousing
architectures; their comparative advantages
and disadvantages
Describe the processes used in developing and managing data warehouses
Explain data warehousing operations
Explain the role of data warehouses in decision support
Trang 5Main Data Warehousing Topics
Trang 6What is a Data Warehouse?
A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format
“The data warehouse is a collection of
integrated, subject-oriented databases
designed to support DSS functions, where each unit of data is non-volatile and relevant
to some moment in time”
Trang 8Data Mart
A departmental data warehouse that stores only relevant data
A subset that is created directly from a data warehouse
A small data warehouse designed for a strategic business unit or a department
Trang 9Data Warehousing Definitions
A type of database often used as an interim area for a data warehouse
An operational data mart
A data warehouse for the enterprise
Data about data In a data warehouse, metadata
describe the contents of a data warehouse and the
manner of its acquisition and use
Trang 10Integrate Load
ETL Process
Enterprise Data warehouse Metadata
Replication
Data/text mining
Custom built applications
OLAP, Dashboard, Web
Routine Business Reporting
Applications (Visualization)
Data mart (Engineering)
Data mart (Marketing)
Data mart (Finance)
Data mart ( )
Access
No data marts option
Trang 11DW Architecture
software
access and analyze data from the warehouse
Trang 13A Web-based DW Architecture
Web Server
Client (Web browser)
Application Server
Data warehouse
Web pages
Internet/
Intranet/
Extranet
Trang 14Data Warehousing Architectures
Issues to consider when deciding which architecture to use:
retrieval and analysis?
Trang 15Alternative DW Architectures
Source
Systems
Staging Area
Independent data marts (atomic/summarized data)
End user access and applications
ETL (a) Independent Data Marts Architecture
Source
Systems
Staging Area
End user access and applications
ETL
Dimensionalized data marts linked by conformed dimentions (atomic/summarized data)
(b) Data Mart Bus Architecture with Linked Dimensional Datamarts
Source
Systems
Staging Area
End user access and applications
ETL
Normalized relational warehouse (atomic data)
Dependent data marts (summarized/some atomic data)
(c) Hub and Spoke Architecture (Corporate Information Factory)
Trang 16Alternative DW Architectures
Source
Systems
Staging Area
Normalized relational warehouse (atomic/some summarized data)
End user access and applications
ETL (d) Centralized Data Warehouse Architecture
End user access and applications
Logical/physical integration of common data elements
Existing data warehouses
Data marts and legacy systmes
Data mapping / metadata (e) Federated Architecture
Trang 17Alternative DW Architectures
1. Independent Data Marts
2. Data Mart Bus Architecture
3. Hub-and-Spoke Architecture
4. Centralized Data Warehouse
5. Federated Data Warehouse
Each has pros and cons!
Trang 18Teradata Corp DW Architecture
Trang 19Data Warehousing Architectures
7 Compatibility with existing
Ten factors that potentially affect the
architecture selection decision:
Trang 20Data Integration and the Extraction,
Transformation, and Load (ETL) Process
Integration that comprises three major processes:
data access, data federation, and change capture
A technology that provides a vehicle for pushing data
from source systems into a data warehouse
An evolving tool space that promises real-time data integration from a variety of sources, such as
relational databases, Web services, and
multidimensional databases
Trang 21Extraction, transformation, and load (ETL)
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
Extract Transform Cleanse Load
Data warehouse
Data mart
Trang 22ETL
Issues affecting the purchase of ETL tool
learning curve
Important criteria in selecting an ETL tool
number of data sources/architectures
functional user
Trang 23Data Warehouse Development
Data warehouse development approaches
One alternative is the hosted warehouse
Data warehouse structure:
Real-time data warehousing?
Trang 24Hosted Data Warehouses
Benefits:
Requires minimal investment in infrastructure
Frees up capacity on in-house systems
Frees up cash flow
Makes powerful solutions affordable
Enables powerful solutions that provide for growth
Offers better quality equipment and software
Provides faster connections
Enables users to access data remotely
Allows a company to focus on core business
Meets storage needs for large volumes of data
Trang 25Representation of Data in DW
supports high-volume query access
simplest style of dimensional modeling
dimension tables
Fact table contains the descriptive attributes (numerical
values) needed to perform decision analysis and query
reporting
Dimension tables contain classification and aggregation
information about the values in the fact table
where the diagram resembles a snowflake in shape
Trang 26 Multidimensionality
The ability to organize, present, and analyze data
by several dimensions, such as sales by region, by product, by salesperson, and by time (four
dimensions)
Multidimensional presentation
Dimensions: products, salespeople, market segments,
business units, geographical locations, distribution channels, country, or industry
Measures: money, sales volume, head count, inventory
profit, actual versus forecast
Time: daily, weekly, monthly, quarterly, or yearly
Trang 27Star vs Snowflake Schema
Fact Table SALES
UnitsSold
Brand
Dimension GOGRAPHY
Coutry
Fact Table SALES
UnitsSold
Dimension DATE
Date
Dimension PEOPLE
Division
Dimension PRODUCT
LineItem
Dimension STORE
LocID
Dimension BRAND
Brand
Dimension CATEGORY
Category
Dimension LOCATION
State
Dimension MONTH
M_Name
Dimension QUARTER
Q_Name
Star Schema Snowflake Schema
Trang 28Analysis of Data in DW
Online analytical processing (OLAP)
query the online system and to conduct analyses
Data cubes, drill-down / rollup, slice & dice, …
OLAP Activities
Trang 29Analysis of Data Stored in DW
OLTP vs OLAP
OLTP (online transaction processing)
capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, POS,
OLAP (online analytic processing)
information extraction by providing effectively and efficiently ad hoc analysis of organizational data
Trang 30OLAP vs OLTP
Trang 31OLAP Operations
Dice – a slice on more than two dimensions
data ranging from the most summarized (up)
to the most detailed (down)
relationships for one or more dimensions
orientation of a report or an ad hoc
query-page display
Trang 32Sales volumes of
a specific Region
on variable Time and Products
Sales volumes of
a specific Time on variable Region and Products
Cells are filled with numbers representing sales volumes
A 3-dimensional OLAP cube with slicing
Trang 33Variations of OLAP
OLAP implemented via a specialized
multidimensional database (or data store)
that summarizes transactions into
multidimensional views ahead of time
The implementation of an OLAP database on
top of an existing relational database
Database OLAP and Web OLAP (DOLAP and
WOLAP); Desktop OLAP,…
Trang 34DW Implementation Issues
Establishment of service-level agreements and data-refresh requirements
Identification of data sources and their governance policies
Data quality planning
Data model design
ETL tool selection
Relational database software and platform selection
Trang 35DW Implementation Guidelines
managers, and users
completed project
professionals
understood by the organization
Trang 36Successful DW Implementation
Things to Avoid
Starting with the wrong sponsorship chain
Setting expectations that you cannot meet
Engaging in politically naive behavior
Loading the data warehouse with information just because it is available
Believing that data warehousing database
design is the same as transactional database design
Choosing a data warehouse manager who is technology oriented rather than user oriented
Trang 37Successful DW Implementation
Things to Avoid - Cont.
Focusing on traditional internal
record-oriented data and ignoring the value of
external data and of text, images, etc
Delivering data with confusing definitions
Believing promises of performance, capacity, and scalability
Believing that your problems are over when the data warehouse is up and running
Focusing on ad hoc data mining and periodic reporting instead of alerts
Trang 38Failure Factors in DW Projects
Lack of executive sponsorship
Unclear business objectives
Cultural issues being ignored
Unrealistic expectations
Inappropriate architecture
Low data quality / missing information
Loading data just because it is available
Trang 39Massive DW and Scalability
Scalability
The main issues pertaining to scalability:
grow
Good scalability means that queries and
other data-access functions will grow linearly with the size of the warehouse
Trang 40Real-time/Active DW/BI
Enabling real-time data updates for
real-time analysis and real-time decision making is growing rapidly
Push vs Pull (of data)
Concerns about real-time BI
Trang 41Real-time/Active DW at Teradata
Trang 42Enterprise Decision Evolution and DW
Trang 43Traditional vs Active DW Environment
Trang 44DW Administration and Security
have the knowledge of high-performance software, hardware and networking technologies.
possess solid business knowledge and insight.
be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure.
possess excellent communications skills.
Trang 45The Future of DW
Open source software
SaaS (software as a service)
Cloud computing
DW appliances
Real-time DW
Data management practices/technologies
In-memory processing (“super-computing”)
Advanced analytics
Trang 46BI / OLAP Portal for Learning
Trang 47End of the Chapter
Questions, comments