1. Trang chủ
  2. » Luận Văn - Báo Cáo

Lecture Business management information system - Lecture 26: Data mining

48 41 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 48
Dung lượng 638,89 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Lecture Business management information system - Lecture 26: Data mining. In this chapter, the following content will be discussed: What is data mining? Why data mining? What applications? What techniques? What process? What software?

Trang 1

Data Mining

Lecture 26

Trang 2

Today’s Lecture

 What is data mining?

 Why data mining?

 What applications?

 What techniques?

 What process?

 What software?

Trang 3

Data mining may be defined as follows:

data mining is a collection of techniques for efficient

automated discovery of previously unknown, valid, novel, useful and understandable patterns in large databases The patterns must be actionable so they may be used in

an enterprise’s decision making.

Trang 4

What is Data Mining?

 Efficient automated discovery of previously unknown

patterns in large volumes of data

 Patterns must be valid, novel, useful and understandable

 Businesses are mostly interested in discovering past

patterns to predict future behaviour

 A data warehouse, as discussed earlier, is an enterprise’s memory Data mining can provide intelligence using that memory

Trang 5

 amazon.com uses associations Recommendations to

customers are based on past purchases and what other

customers are purchasing

 A store in USA “Just for Feet” has about 200 stores, each carrying up to 6000 shoe styles, each style in several

sizes Data mining is used to find the right shoes to stock in the right store

 More examples in case studies to be discussed later

Trang 6

Data Mining

 We assume we are dealing with large data, perhaps

Gigabytes, perhaps in Terabytes

 Although data mining is possible with smaller amount of data, bigger the data, higher the confidence in any

unknown pattern that is discovered

 There is considerable hype about data mining at the

present time and Gartner Group has listed data mining as one of the top ten technologies to watch

Trang 7

Why Data Mining Now?

 Growth in generation and storage of corporate data –

information explosion

 Need for sophisticated decision making – current

database systems are Online Transaction Processing

(OLTP) systems The OLTP data is difficult to use for

such applications Why?

 Evolution of technology – much cheaper storage, easier

data collection, better database management, to data

analysis and understanding

Trang 8

Information explosion

 Database systems are being used since the 1960s in

the Western countries (perhaps since 1980s in India)

These systems have generated mountains of data

 Point of sale terminals and bar codes on many

products, railway bookings, educational institutions,

huge number of mobile phones, electronic commerce,

all generate data

 Government is now collecting a lot of information

Trang 9

Information explosion

 Internet banking via networked computers and ATMs

 Credit and debit cards

 Medical data, doctors, hospitals

 Transportation, Indian railways, automatic toll collection

on toll roads, growing air travel

 Passports, NRI visas, Other visas, NRI money

transfers

Question: Can you think of other examples of data collection?

Trang 10

Information explosion

Many adults in India generate:

 Mobile phone transactions More than 300 million phones

in India, reportedly growing at the rate of 10,000 new

ones every hour! Mobile companies must save

information about calls

 Growing middle class with growing number of credit and debit card transactions About 25m credit cards and 70m debit cards in 2007 Annual growth rate about 30% and 40% respectively Could be 55m credit cards and 200m debit cards in 2010 resulting in perhaps 500m

Trang 11

Information explosion

 India has some huge enterprises, for example Indian

railways, perhaps the busiest network in the world with

2.5m employees, 10,000 locomotives, 10,000 passenger trains daily, 10,000 freight trains daily and 20m

passengers daily

 Growing airline traffic with more than ten airlines Perhaps 30m passengers annually

 Growing number of motor vehicles – registration,

insurance, driver license

 Internet surfing records

Trang 12

As noted earlier, most enterprise database systems were designed in the 1970’s or 1980’s and were mainly 

designed to automate some of the office procedures e.g. order entry, student enrolment, patient registration, 

airline reservations. These are well structured repetitive operations easily automated

Trang 13

Decision Making

 Need for business memory and intelligence

 Need to serve customers better by learning from past interactions

 OLTP data is not a good basis for maintaining an

enterprise memory

 The intelligence hidden in data could be the secret

weapon in a competitive business world but given the information explosion not even a small fraction could be looked at by human eye

Question: Why OLTP is not good for maintaining an enterprise memory?

Trang 14

OLTP vs Decision Making

Clerical view of data focuses on details required for 

day­to­day running of an enterprise

Management view of data focuses on summary data to identify trends, challenges and opportunities

The detailed data view is the operational view while 

the management view is decision­support view. 

Comparison of the two views:

Trang 15

Operational vs Management View

Trang 16

Question: How much is the cost of 100GB disk? What is the cost of a PC and what is its CPU performance?

Trang 17

Decline in Hard Drive cost

Trang 18

Growth in Worldwide Disk Capacity

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

Trang 19

Evolution of Technology

Question: What do the graphs in the last two slides tell us? What scales are used in

them? What was the pink line is the first graph?

Trang 20

Evolution of Technology

 Database technology has improved over the years

 Data collection is often much better and cheaper now

 The need for analyzing and synthesizing information is growing in a fiercely competitive business environment

of today

Trang 21

Question: Why OLTP cannot be used for sales forecasting and analysis? 

Trang 22

Why Data Mining Now?

As noted earlier, the reasons may be summarized as:

•Accumulation of large amounts of data

• Increased affordable computing power enabling data mining processing

• Statistical and learning algorithms

• Availability of software

• Strong business competition

Trang 23

Large amount of data

Already discussed that many enterprises have large 

amounts of data accumulated over 30+ years

Noted earlier that some enterprises collect information for analysis, for example, supermarkets in USA offer 

loyalty cards in exchange for shopper information. 

Loyalty cards in Australia also collect information 

using a reward system

Trang 24

Growth of cards

A recent survey in USA found that the percentages of

US adults using the following types of cards were:

Trang 25

Affordable computing power

Trang 26

A variety of statistical and learning algorithms have been available in fields like statistics and artificial 

intelligence that have been adapted for data mining.With new focus on data mining, new algorithms are being developed

Trang 27

Availability of Software

Large variety of DM software is now available Some

more widely used software is:

 IBM - Intelligent Miner and more

 SAS - Enterprise Miner

 Silicon Graphics - MineSet

 Oracle - Thinking Machines - Darwin

 Angoss - knowledgeSEEKER

Trang 28

Strong Business Competition

Trang 29

In finance, telecom, insurance and retail:

 Loan/credit card approval

Trang 30

Loan/Credit card approvals

In a modern society, a bank does not know its

customers Only knowledge a bank has is their

information stored in the computer

Credit agencies and banks collect a lot of customers’

behavioural data from many sources This information is used to predict the chances of a customer paying back a loan

Trang 31

Market Segmentation

 Large amounts of data about customers contains

valuable information

 The market may be segmented into many subgroups

according to variables that are good discriminators

 Not always easy to find variables that will help in market segmentation

Trang 32

Fraud Detection

 Very challenging since it is difficult to define

characteristics of fraud Often based on detecting

changes from the norm

 In statistics, it is common to throw out the outliers but in data mining it may be useful to identify them since they could either be due to errors or perhaps fraud

Trang 33

Better Marketing

When customers buy new products, other products may

be suggested to them when they are ready

As noted earlier, in mail order marketing for example, one wants to know:

- will the customer respond?

- will the customer buy and how much?

- will the customer return purchase?

- will the customer pay for the purchase?

Trang 34

Better Marketing

It has been reported that more than 1000 variable 

values on each customer are held by some mail order marketing companies

The aim is to “lift” the response rate

Trang 35

Trend analysis

In a large company, not all trends are always visible to the management. It is then useful to use data mining software that will identify trends

Trends may be long term trends, cyclic trends or 

seasonal trends

Trang 36

Market Basket Analysis

 Aims to find what the customers buy and what they buy together

 This may be useful in designing store layouts or in

deciding which items to put on sale

 Basket analysis can also be used for applications other than just analysing what items customers buy together

Trang 37

makes customers loyal.

 Cheaper to develop a retention plan and retain an old customer than to bring in a new customer

Trang 38

holders that don’t use the card, bank customers with

very small amount of money in their accounts

Trang 39

Web site design

 A Web site is effective only if the visitors easily find

what they are looking for

 Data mining can help discover affinity of visitors to

pages and the site layout may be modified based on this information

Trang 40

Data Mining Process

Successful data mining involves careful determining the aims and selecting appropriate data

The following steps should normally be followed:

1. Requirements analysis

2. Data selection and collection

3. Cleaning and preparing data

4. Data mining exploration and validation

5. Implementing, evaluating and monitoring

6. Results visualisation

Trang 41

Requirements Analysis

The enterprise decision makers need to formulate goals that the data mining process is expected to achieve The business problem must be clearly defined One cannot use data mining without a good idea of what kind of

outcomes the enterprise is looking for

If objectives have been clearly defined, it is easier to

evaluate the results of the project

Trang 42

Data Selection and Collection

Find the best source databases for the data that is

required If the enterprise has implemented a data

warehouse, then most of the data could be available

there Otherwise source OLTP systems need to be

identified and required information extracted and stored

in some temporary system

In some cases, only a sample of the data available may

be required

Trang 43

Cleaning and Preparing Data

This may not be an onerous task if a data warehouse

containing the required data exists, since most of this

must have already been done when data was loaded in the warehouse

Otherwise this task can be very resource intensive,

perhaps more than 50% of effort in a data mining project

is spent on this step Essentially a data store that

integrates data from a number of databases may need to

be created When integrating data, one often encounters problems like identifying data, dealing with missing data, data conflicts and ambiguity An ETL (extraction,

transformation and loading) tool may be used to

overcome these problems

Trang 44

Exploration and Validation

Assuming that the user has access to one or more data mining tools, a data mining model may be constructed based on the enterprise’s needs It may be possible to take a sample of data and apply a number of relevant

techniques For each technique the results should be

evaluated and their significance interpreted

This is likely to be an iterative process which should

lead to selection of one or more techniques that are

suitable for further exploration, testing and validation

Trang 45

Implementing, Evaluating and

Monitoring

Once a model has been selected and validated, the

model can be implemented for use by the decision

makers This may involve software development for

generating reports or for results visualisation and

explanation for managers

If more than one technique is available for the given data mining task, it is necessary to evaluate the results and

choose the best This may involve checking the accuracy and effectiveness of each technique

Trang 46

Implementing, Evaluating and

Monitoring

Regular monitoring of the performance of the

techniques that have been implemented is required

Every enterprise evolves with time and so must the data mining system Monitoring may from time to time lead to the refinement of tools and techniques that have been implemented

Trang 47

Results Visualisation

Explaining the results of data mining to the decision

makers is an important step Most DM software includes data visualisation modules which should be used in

communicating data mining results to the managers

Clever data visualisation tools are being developed to

display results that deal with more than two dimensions The visualisation tools available should be tried and used

if found effective for the given problem

Trang 48

 What is data mining?

 Why data mining?

 What applications?

 What techniques?

 What process?

 What software?

Ngày đăng: 18/01/2020, 17:28

TỪ KHÓA LIÊN QUAN