Chapter 13-6 SO 1 The need for data collection and storage The Need for Data Collection and Storage The Need for Data Collection and Storage Typical storage and processing techniques: 1
Trang 1Chapter
13-1
Prepared by Coby Harmon University of California, Santa Barbara Westmont College
Prepared by Coby Harmon University of California, Santa Barbara
Westmont College
SECOND EDITION
Trang 2Chapter
13-2
Data and Databases
Trang 3Chapter
13-3
1 The need for data collection and storage
2 Methods of storing data and the interrelationship between storage and
processing
3 The differences between batch processing and real-time processing
4 The importance of databases and the historical progression from flat-file
databases to relational databases
5 The need for normalization of data in a relational database
6 Data warehouse and the use of a data warehouse to analyze data
7 The use of OLAP and data mining as analysis tools
8 Distributed databases and advantages of the use of distributed data
9 Controls for data and databases
10 Ethical issues related to data collection and storage, and their use in IT
systems
Study Objectives
Study Objectives
Trang 4Chapter
13-4
Real World
Real World Think about the volume of sales transactions
that occur on the Websites of large Internet retailers such
as L.L Bean, Lands’ End, and J.Crew These companies each process an average of approximately 120,000 transactions each day on their Websites For each of these transactions, important data must be collected about the customer, location, payment, and the items sold.
Even more overwhelming is the volume of sales transactions that are processed
by Wal-Mart on any given day In addition to its Web-based sales, consider Wal-Mart’s thousands of retail centers with several check-out lines at each location and long hours
of operation Think about the number of accountants and computers that might be
required to manage all of the related records It is no wonder that Wal-Mart has one of the largest databases of any business organization in the world.
The Wal-Mart database continually grows with new transactions Some estimate that Wal-Mart adds 1 billion rows of data per day In addition to the size of the
database, it is also growing faster The company attaches RFID chips to merchandise
so that inventory purchases, movement to stores, and sales are tracked in real time Since the data for these events get added to the database so quickly, the database grows and becomes more useful for immediate analysis This allows Wal-Mart to more quickly analyze and forecast inventory needs.
Trang 5Chapter
13-5 SO 1 The need for data collection and storage
The Need for Data Collection and Storage
The Need for Data Collection and Storage
Data are the set of facts collected from transactions,
whereas information is the interpretation of data that have
been processed
Main reasons to store transaction data:
1 To complete transactions from beginning to end
2 To follow up with customers or vendors and to expedite
future transactions.
3 To create accounting reports and financial statements.
4 To provide feedback to management.
Trang 6Chapter
13-6 SO 1 The need for data collection and storage
The Need for Data Collection and Storage
The Need for Data Collection and Storage
Typical storage and processing techniques:
1 The storage media types for data: sequential and random
access
2 Methods of processing data: batch and real time
3 Databases and relational databases
4 Data warehouses, data mining, and OLAP
5 Distributed data processing and distributed databases
Trang 7Chapter
13-7
Which of the following best describes the relationship
between data and information?
making
SO 1 The need for data collection and storage
The Need for Data Collection and Storage
The Need for Data Collection and Storage
Concept Check
Trang 8Chapter
13-8 SO 2 Methods of storing data and the interrelationship
between storage and processing
Storing and Accessing Data
Storing and Accessing Data
Data Storage Terminology
Trang 9Chapter
13-9 SO 2 Methods of storing data and the interrelationship
between storage and processing
Storing and Accessing Data
Storing and Accessing Data
Data Storage Media
Early Days of Mainframe Computers
Systems
Trang 10Chapter
13-10 SO 2 Methods of storing data and the interrelationship
between storage and processing
Storing and Accessing Data
Storing and Accessing Data
Trang 11Chapter
13-11
Magnetic tape is a form of
SO 2 Methods of storing data and the interrelationship
between storage and processing
Storing and Accessing Data
Storing and Accessing Data
Concept Check
Trang 12Chapter
13-12 SO 3 The differences between batch processing and real-time processing
Data Processing Techniques
Data Processing Techniques
Exhibit 13-2
Comparison of Batch and Real-Time Processing
Batch Processing
Real-time Processing
Trang 13Chapter
13-13
Which of the following is not an advantage of using real-time data processing?
and customer satisfaction
transactions
SO 3 The differences between batch processing and real-time processing
Data Processing Techniques
Data Processing Techniques
Concept Check
Trang 14Chapter
13-14 SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Databases
Databases
Data stored in a form that allows the data to be easily
accessed, retrieved, manipulated, and stored
Exhibit 13-3
Traditional Oriented Approach
File- Data
redundancy
Concurrency
Trang 15Chapter
13-15 SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Databases
Databases
Database Management System (DBMS) is
software that manages the database and
controls the access and use of data by
individual users and applications
Trang 16Chapter
The History of Databases
The History of Databases
Trang 17Chapter
13-17
The History of Databases
The History of Databases
Hierarchical Database Model
► Inverted tree structure
► Parent–child, represent one-to-many relationships
Linkages in a Hierarchical Database
SO 4
Trang 18Chapter
13-18 SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
The History of Databases
The History of Databases
Network Database Model
► Inverted tree structure
► More complex relationship linkages by use of shared
branches
► Not very popular, rarely used
Trang 19Chapter
13-19 SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
The History of Databases
The History of Databases
Relational Database Model
► Developed in 1969
► Stores data in two-dimensional tables
► Most widely used database structure today
► Examples include; IBM DB2, Oracle Database, and
Microsoft Access®
Trang 20Chapter
13-20 SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Databases
Databases
Concept Check
If a company stores data in separate files in its different
departmental locations and is able to update all files
simultaneously, it would not have problems with
c.industrial espionage
d.concurrency
Trang 21Chapter
13-21 SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Databases
Databases
Concept Check
When the data contained in a database are stored in large,
two-dimensional tables, the database is referred to as a
Trang 22Chapter
13-22 SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Trang 23Chapter
13-23
The Need for Normalized Data
The Need for Normalized Data
Relational databases consist of several small tables Small
tables can be joined in ways that represent relationships
among the data
SO 5 The need for normalization of data in a relational database
Bolded field is the
primary key.
Exhibit 13-6
Relational Database in Microsoft Access
Trang 24Chapter
13-24
The Need for Normalized Data
The Need for Normalized Data
Relational database has flexibility in
retrieving data Structured query language (SQL) has become the
industry standard.
SO 5
Exhibit 13-7
Relational Database in Microsoft Access
SELECT Customers.CustomerID, Customers.CompanyName,
Orders.OrderID, Orders.ShippedDate FROM Customers INNER
JOIN Orders ON Customers.CustomerID Orders.CustomerID;
Trang 25Chapter
13-25
The Need for Normalized Data
The Need for Normalized Data
The process of converting data into tables that meet the
definition of a relational database is called data
normalization
► Seven rules of data normalization, additive.
► Most relational databases are in third normal form.
► First three rules of data normalization are:
1 Eliminate repeating groups
2 Eliminate redundant data
3 Eliminate columns not dependent on primary key.
SO 5 The need for normalization of data in a relational database
Trang 26► Most organizations are willing to accept less
transaction processing efficiency for better query
opportunities.
The Need for Normalized Data
The Need for Normalized Data
SO 5 The need for normalization of data in a relational database
Trang 27Chapter
13-27
The Need for Normalized Data
The Need for Normalized Data
SO 5 The need for normalization of data in a relational database
c It stores data in a tree formation
Trang 28Chapter
13-28
Use of a Data Warehouse to Analyze Data
Use of a Data Warehouse to Analyze Data
SO 6 Data warehouse and the use of a data warehouse to analyze data
Exhibit 13-8
The Data Warehouse and
Operational Databases
Management often needs data from several fiscal periods
from across the whole organization
Trang 29Chapter
13-29
Management often needs data from several fiscal periods
from across the whole organization
► Build the data warehouse
► Identify the data
► Standardize the data
► Cleanse, or scrub, the data
► Upload the data
Use of a Data Warehouse to Analyze Data
Use of a Data Warehouse to Analyze Data
SO 6 Data warehouse and the use of a data warehouse to analyze data
Trang 30Chapter
13-30
Use of a Data Warehouse to Analyze Data
Use of a Data Warehouse to Analyze Data
SO 6 Data warehouse and the use of a data warehouse to analyze data
Concept Check
A collection of several years’ nonvolatile data used to
support strategic decision-making is a(n)
Trang 31Chapter
13-31
Data mining is the process of searching for identifiable
patterns in data that can be used to predict future behavior
Online Analytical Processing (OLAP) is a set of software
tools that allow online analysis of the data within a data
warehouse Analytical methods in OLAP usually include:
Data Analysis Tools
Data Analysis Tools
SO 7 The use of OLAP and data mining as analysis tools
Trang 32Chapter
13-32
Data Analysis Tools
Data Analysis Tools
SO 7 The use of OLAP and data mining as analysis tools
Trang 33Distributed Data Processing
Distributed Data Processing
SO 8 Distributed databases and advantages of the use of distributed data
Today’s IT Environment
Distributed data processing (DDP)
Distributed databases (DDB)
Trang 34Chapter
13-34
Real World
Real World McDonald’s has restaurants, warehouses, and
offices located throughout the world; yet its corporate headquarters is in Oakbrook, Illinois If McDonald’s management
decided that all data, including prices, must be stored in a database at
corporate headquarters, what would have to happen when you order a
cheeseburger at a McDonald’s in Los Angeles? The cash register system
would have to read pricing data from the database in Oakbrook, Illinois This would be inefficient for several reasons First, each McDonald’s restaurant
would be trying to read the same database simultaneously in order to fill
customer orders all around the world Each of the McDonald’s restaurants
would need to be networked to that data in Illinois and would need to be able
to read price data quickly in order to process the sale This would generate
so much network traffic that it would very likely overwhelm the network and
computer system In addition, if prices are stored only at corporate
headquarters, it would become more difficult for each location to set its own
prices Certainly, it would be much more efficient for McDonald’s to maintain
pricing data at the local restaurants or in regional centers.
SO 8
Trang 35Chapter
13-35
Distributed Data Processing
Distributed Data Processing
SO 8 Distributed databases and advantages of the use of distributed data
Distributing the processing and data offers the following
advantages:
1 Reduced hardware cost
2 Improved responsiveness
3 Easier incremental growth
4 Increased user control and user involvement
5 Automatic integrated backup
The most popular type of distributed system is a
client/server system
Trang 36Chapter
13-36
Distributed Data Processing
Distributed Data Processing
SO 8 Distributed databases and advantages of the use of distributed data
Concept Check
A set of small databases where data are collected,
processed, and stored on multiple computers within a
Trang 37Smart-Cloud)
A company can buy data storage from these providers
Arrangement is Database as a Service (DaaS)
Cloud provider generally provides
► data storage space and
► software tools to manage and control the database.
Trang 38Chapter
13-38
Real World
Real World
The best-selling jet airplane of the Boeing Corporation is the 737 In
2011, Boeing rolled out a new function called “737 Explained,” a
cloud-based database and application using Microsoft Azure cloud
storage This cloud database stores 20,000 high-resolution photos of the Boeing 737, which are accessible by the Boeing salespeople who may be traveling to any location in the world to seek customers 737
Explained can show 360-degree tours of the airplane, as well as
individual parts and features The director of marketing at Boeing
said, “737 Explained is one of the best marketing tools I’ve seen
because it allows us to show prospective customers the new features and improvements without bringing them to an airport.”
SO 10 Controls for data and databases
Trang 39Chapter
13-39
IT Controls for Data and Databases
IT Controls for Data and Databases
SO 10 Controls for data and databases
To ensure integrity (completeness and accuracy) of data in
the database, IT application controls should be used These controls are
► input,
► processing, and
► output controls such as
1 data validation,
2 control totals and reconciliation, and
3 reports that are analyzed by managers.
Trang 40Chapter
13-40
Ethical Issues Related to Data Collection
Ethical Issues Related to Data Collection
SO 11 Ethical issues related to data collection and
storage, and their use in IT systems
Ethical Responsibilities of the Company
Data collected and stored in databases in many instances
consist of information that is private between the company and its customer
Ten privacy practices for online companies:
7 Disclosure to third parties
8 Security for privacy
9 Quality
10 Monitoring and enforcement
Trang 41Chapter
13-41
Real World
Real World
No matter how extensive the controls in place, it is never possible to
completely eliminate unauthorized access In April of 2011, Netflix
disclosed that it had fired an unnamed call center employee for
stealing credit card information from customers he had spoken with
on the phone The company declined to disclose the number of
customers affected The “monitoring and enforcement” mention
above is intended to help discover problems such as this and to fix
them quickly In this case, a Netflix spokesperson said, “We do
everything we can to safeguard our members’ personal data and
privacy, and when there’s an issue like this, we deal with it swiftly and decisively.”
SO 11 Ethical issues related to data collection and
storage, and their use in IT systems