MACHINE LEARNING APPLICATIONS

Machine learning is a scientific discipline that explores the construction and study of algorithms that learn from data. ■ ML is used to train computers to do things that are impossible to program in advance (e.g. handwriting recognition, fraud detection). ■ ML is an important part of Data Mining, KDD, Data Science ■ ML has a strong ties to statistics and mathematical optimization; statistics and optimization techniques are usually at the core of ML algorithms

Trang 3

Machine Learning

■ Machine learning is a scientific discipline that explores the

construction and study of algorithms that learn from data

■ ML is used to train computers to do things that are impossible

to program in advance (e.g handwriting recognition, fraud

detection)

■ ML is an important part of Data Mining, KDD, Data Science

■ ML has a strong ties to statistics and mathematical

optimization; statistics and optimization techniques are usually

at the core of ML algorithms

3

Trang 4

■ Predicting the stock prices based on the

current and historical data

■ Predict how much inventory to stock in the

case of hurricanes (Walmart)

■ How to group customers based on their

characteristics and buying behaviors?

■ Email classification (spam vs non-spam)

■ Predict who (customers) will quit using your

service (MegaTelCo)

4

Trang 5

Machine Learning Tasks

■ Supervised learning

■ Unsupervised learning

■ Reinforcement learning

5

Trang 6

Supervised Learning

training data), the goal of supervised learning is to learn/find a general rules that maps inputs to outputs

– Classification: target variable is discrete (e.g., spam email)

– Regression: target variable is real-valued (e.g., stock price)

Mapping function f

Learning Algorithm Training set

6

Trang 7

Example:

USER’S PREFERENCES

Example Author Thread Length Where Read User Action

These are some training and test examples obtained from observing a user deciding whether to read articles posted to a threaded discussion board depending on whether the author is known or not (source:

http://artint.info/html/ArtInt_171.html )

7

Trang 8

Example: Write-off

8

Figure 3-1 Data mining terminology for a supervised classification problem The prob‐ lem is supervised because it has a target attribute and some “training” data where we know the value for the target attribute It is a classification (rather than regression) problem because the target is a category (yes or no) rather than a number.

Black-Scholes model of option pricing, and so on Each of these abstracts away detailsthat are not relevant to their main purpose and keeps those that are

In data science, a predictive model is a formula for estimating the unknown value ofinterest: the target The formula could be mathematical, or it could be a logical statementsuch as a rule Often it is a hybrid of the two Given our division of supervised datamining into classification and regression, we will consider classification models (andclass-probability estimation models) and regression models

Terminology: Prediction

In common usage, prediction means to forecast a future event In data

science, prediction more generally means to estimate an unknown value This value could be something in the future (in common us‐

age, true prediction), but it could also be something in the present or

in the past Indeed, since data mining usually deals with historical data, models very often are built and tested using events from the past.

Predictive models for credit scoring estimate the likelihood that a potential customer will default (become a write-off) Predictive mod‐

els for spam filtering estimate whether a given piece of email is spam.

Predictive models for fraud detection judge whether an account has

Models, Induction, and Prediction | 45

(Provost and Fawcett, 2013)

Trang 9

➡ Artificial Neural Network (ANN)

➡ Support Vector Machine (SVM)

Trang 10

Example: Write-off

■ Solving write-off problem (binary classification) with decision

tree algorithm

10

Figure 3-3 Entropy of a two-class set as a function of p(+).

entropy(S) = - 0.7 × log2(0.7) + 0.3 × log2(0.3)

≈ - 0.7 × - 0.51 + 0.3 × - 1.74

≈ 0.88

Entropy is only part of the story We would like to measure how informative an attribute

is with respect to our target: how much gain in information it gives us about the value

of the target variable An attribute segments a set of instances into several subsets En‐ tropy only tells us how impure one individual subset is Fortunately, with entropy to

measure how disordered any set is, we can define information gain (IG) to measure how

much an attribute improves (decreases) entropy over the whole segmentation it creates.

Strictly speaking, information gain measures the change in entropy due to any amount

of new information being added; here, in the context of supervised segmentation, we consider the information gained by splitting the set on all values of a single attribute.

Let’s say the attribute we split on has k different values Let’s call the original set of examples the parent set, and the result of splitting on the attribute values the k chil‐

dren sets Thus, information gain is a function of both a parent set and of the children

52 | Chapter 3: Introduction to Predictive Modeling: From Correlation to Supervised Segmentation

entropy = -p 1* log(p 1 ) - p 2* log(p 2 ) - …

p i is the probability of i in the set

IG(parent, children) =

Trang 11

Example: Write-off

11

entropy = 0.99 (high impurity)

entropy = 0.79 entropy = 0.39

Trang 13

Example: Write-off

13

Figure 3-15 A classification tree and the partitions it imposes in instance space The black dots correspond to instances of the class Write-off, the plus signs correspond to instances of class non-Write-off The shading shows how the tree leaves correspond to segments of the population in instance space.

Trang 14

Example: Write-off

14

Figure 3-15 A classification tree and the partitions it imposes in instance space The

black dots correspond to instances of the class Write-off, the plus signs correspond to

instances of class non-Write-off The shading shows how the tree leaves correspond to

segments of the population in instance space.

70 | Chapter 3: Introduction to Predictive Modeling: From Correlation to Supervised Segmentation

Trang 15

Example: Write-off

■ Linear discrimination function: perceptron, logistics regression, support vector machine

15

2 And sometimes it can be surprisingly hard for them to admit it.

Figure 4-5 Many different possible linear boundaries can separate the two groups of points of Figure 4-4

Unfortunately, it’s not trivial to choose the “best” line to separate the classes Let’s con‐ sider a simple case, illustrated in Figure 4-4 Here the training data can indeed be sep‐ arated by class using a linear discriminant However, as shown in Figure 4-5 , there actually are many different linear discriminants that can separate the classes perfectly They have very different slopes and intercepts, and each represents a different model

of the data In fact, there are infinitely many lines (models) that classify this training set perfectly Which should we pick?

Optimizing an Objective Function

This brings us to one of the most important fundamental ideas in data mining—one that surprisingly is often overlooked even by data scientists themselves: we need to ask,

what should be our goal or objective in choosing the parameters? In our case, this would

allow us to answer the question: what weights should we choose? Our general procedure

will be to define an objective function that represents our goal, and can be calculated for

a particular set of weights and a particular set of data We will then find the optimal value for the weights by maximizing or minimizing the objective function What can easily be overlooked is that these weights are “best” only if we believe that the objective function truly represents what we want to achieve, or practically speaking, is the best proxy we can come up with We will return to this later in the book.

Unfortunately, creating an objective function that matches the true goal of the data mining is usually impossible, so data scientists often choose based on faith 2 and expe‐

f(x) = w 0 + w 1 x 1 + w 2 x 2 + …

Trang 16

Unsupervised Learning

■ Unsupervised learning studies how systems can learn to

represent particular input patterns (unlabeled) in a way that reflects the statistical structure of the overall collection of

input patterns

- Clustering

- Principal components analysis (PCA)

- Self-organizing map (SOM)

- Evolutionary Computation

16

Trang 17

Cluster analysis

■ Cluster analysis aims to search for patterns in a data set by

grouping the (multivariate) observations into clusters

■ The goal is to find an optimal grouping for which the

observations or objects within each cluster are similar, but the

clusters are dissimilar to each other

17

1 Randomly assign a number, from 1 to K, to each of the observations These

server as initial cluster assignments for the observations

2 Iterate until the cluster assignments stop changing :

- For each of the K clusters, compute the cluster centroid The nth cluster

centroid is the vector of the p feature means for the observations in the nth cluster

- Assign each observation to the cluster whose centroid is closest (based on

Euclidean distance)

K-mean clustering

Trang 18

Example: K = 3

18

10.3 Clustering Methods 389

Data Step 1 Iteration 1, Step 2a

Iteration 1, Step 2b Iteration 2, Step 2a Final Results

FIGURE 10.6 The progress of the K-means algorithm on the example of ure 10.5 with K=3 Top left: the observations are shown Top center: in Step 1

Fig-of the algorithm, each observation is randomly assigned to a cluster Top right:

in Step 2(a), the cluster centroids are computed These are shown as large ored disks Initially the centroids are almost completely overlapping because the initial cluster assignments were chosen at random Bottom left: in Step 2(b), each observation is assigned to the nearest centroid Bottom center: Step 2(a) is once again performed, leading to new cluster centroids Bottom right: the results obtained after ten iterations.

col-initial configurations Then one selects the best solution, i.e that for which the objective (10.11) is smallest Figure 10.7 shows the local optima obtained by running K-means clustering six times using six diﬀerent initial cluster assignments, using the toy data from Figure 10.5 In this case, the best clustering is the one with an objective value of 235.8.

As we have seen, to perform K-means clustering, we must decide how many clusters we expect in the data The problem of selecting K is far from simple This issue, along with other practical considerations that arise in performing K-means clustering, is addressed in Section 10.3.3.

Trang 19

Example: group similar news

19

diagram-using-spark-and-mllib/

Trang 20

http://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-Example: Facebook friends

20

Trang 21

Reinforcement Learning

■ Reinforcement learning is learning what to do how to map

■ Learning about, from, and while interacting with an external environment

■ The learner is not told which actions to take, as in most forms

of machine learning, but instead must discover which actions yield the most reward by trying them

21

Trang 22

■ Feature scaling/Normalization: standardize the range of

independent variables or features of data

■ Feature manipulation: includes feature selection and feature construction

■ Interpretability: is about how easy we can explain the results/models obtained by ML algorithms

22

Trang 23

A quick demonstration

■ Titanic

23

Trang 24

24

and Karl Pearson

quantifying the relation- ship between offspring and parental

characteristics

linear discriminant function

analysis to solve

a taxonomic problem

1950s-2000s evolution of AI, machine learning and

Trang 25

KDD/Data Mining

■ Knowledge Discovery in Databases (KDD) is the non-trivial

process of identifying valid, novel, potentially useful and

ultimately understandable patterns in data [Fayyad]

■ Data Mining is a problem solving methodology that finds a logical or mathematical description, of a complex nature, of patterns and regularities in a set of data [Decker and Focardi]

■ Data Mining is often related to learning/adaptive algorithms and methods

■ KDD/DM is not new techniques but rather a multi-disciplinary field of research: all make a contribution (later)

25

Trang 26

Business Analytics

■ Business analytics (BA) refers to the skills, technologies,

practices for continuous iterative exploration and investigation

of past business performance to gain insight and drive

business planning [Bartlett,2013]

■ Using data to make better decisions; basically operations

research with emphasis on data

26

Trang 27

27

Econometrics Operations Research

Business Analytics

Trang 28

28

Econometrics Operations Research

Business Analytics Machine Learning

Trang 29

29

Trang 30

Common applications of BA

30

Competing on Analytics

that they recognize those methods’ tions—which factors are being weighed and which ones aren’t When the CEOs need help grasping quantitative techniques, they turn to experts who understand the business and how analytics can be applied to it We interviewed several leaders who had retained such advisers, and these executives stressed the need to find someone who can explain things in plain lan- guage and be trusted not to spin the numbers.

limita-A few CEOs we spoke with had surrounded themselves with very analytical people—pro- fessors, consultants, MIT graduates, and the like But that was a personal preference rather than a necessary practice

Of course, not all decisions should be grounded in analytics—at least not wholly so.

Personnel matters, in particular, are often well and appropriately informed by instinct and an- ecdote More organizations are subjecting re- cruiting and hiring decisions to statistical analysis (see the sidebar “Going to Bat for Stats”).

But research shows that human beings can make quick, surprisingly accurate assessments

of personality and character based on simple observations For analytics-minded leaders, then, the challenge boils down to knowing

when to run with the numbers and when to run with their guts

Their Sources of Strength

Analytics competitors are more than simple number-crunching factories Certainly, they apply technology—with a mixture of brute force and finesse—to multiple business problems But they also direct their energies to- ward finding the right focus, building the right culture, and hiring the right people to make optimal use of the data they constantly churn.

In the end, people and strategy, as much as formation technology, give such organizations strength

in-The right focus Although analytics itors encourage universal fact-based decisions, they must choose where to direct resource- intensive efforts Generally, they pick several functions or initiatives that together serve an overarching strategy Harrah’s, for example, has aimed much of its analytical activity at in- creasing customer loyalty, customer service, and related areas like pricing and promotions UPS has broadened its focus from logistics to customers, in the interest of providing supe- rior service While such multipronged strate-

Supply chain Simulate and optimize supply chain flows; reduce Dell, Wal-Mart, Amazon

inventory and stock-outs.

Customer selection, Identify customers with the greatest profit potential; Harrah’s, Capital One,

loyalty, and service increase likelihood that they will want the product or Barclays

service offering; retain their loyalty.

Pricing Identify the price that will maximize yield, or profit Progressive, Marriott

Human capital Select the best employees for particular tasks or jobs, New England Patriots,

at particular compensation levels Oakland A’s, Boston Red Sox

Product and service Detect quality problems early and minimize them Honda, Intel

quality

Financial Better understand the drivers of financial performance MCI, Verizon

performance and the effects of nonfinancial factors.

Research and Improve quality, efficacy, and, where applicable, safety Novartis, Amazon, Yahoo

development of products and services.

Analytics competitors make expert use of statistics and modeling to improve a wide variety of functions.

Here are some common applications:

THINGS YOU CAN COUNT ON

Thomas H Davenport (2005), Competing on Analytics, HRM

Định dạng
Số trang	33
Dung lượng	4,58 MB
File đính kèm	machinelearningandapplications.rar (2 MB)