4 pattern recognition

Feature and Pattern [6] = Feature e Feature is any distinctive aspect, quality or characteristic = Features may be symbolic i.e., color or numeric i-e., height e Definitions = The co

Trang 1

4 Pattern Recognition

Trang 2

> Introduction to Pattern Recognition System

> Feature Extraction a9: Haar-like feature} Integral image

> Dimension Reduction: PCA

> Bayesian Decision Theory

> Bayesian Discriminant Function for Normal Density

> Linear Discriminant Analysis

> Linear Discriminant Functions

> Support Vector Machine

> Kk Nearest Neighbor

> Statistical Clustering

Trang 3

introduction

Trang 4

— OCR (Optical Character Recognition)

— DNA sequence identification

Trang 5

Components of Pattern Classification System [6]

devices Preprocessing reduction Prediction selection

Sensors Feature selection Cross-validation

Trang 6

Types of Prediction Problems [6]

= Classification

e The PR problem of assigning an object to a class

e The output of the PR system is an integer label

a e.g Classifying a product as “good” or “bad” in a quality control test

=» Regression

e A generalization of a classification task

e The output of the PR system is a real-valued number

= €.g predicting the share value of a firm based on past performance and stock market indicators

=» Clustering

e The problem of organizing objects into meaningful groups

e The system returns a (Sometimes hierarchical) grouping of objects

m €.g organizing life forms into a taxonomy of species

=» Description

e The problem of representing an object in terms of a series of primitives

e The PR system produces a structural or linguistic description

a e.g labeling an ECG signal in terms of P, QRS and T complexes

Trang 7

Feature and Pattern [6]

= Feature

e Feature is any distinctive aspect, quality or characteristic

= Features may be symbolic (i.e., color) or numeric (i-e., height)

e Definitions

= The combination of d features is represented as a d-dimensional column vector called a feature vector

= The d-dimensional space defined by the feature vector is called the feature space

= Objects are represented as points in feature space This representation is called a scatter plot

e Pattern is a composite of traits or features characteristic of an individual

e In classification tasks, a pattern is a pair of variables {x,@} where

= xis acollection of observations or features (feature vector)

= gis the concept behind the observation (label)

HANYANG UNIVERSITY

http://web.yonsei.ac.kr/hgjung

Trang 8

Feature and Pattern [6]

a What makes a “good” feature vector?

e The quality of a feature vector is related to its ability to discriminate examples

from different classes

= Examples from the same class should have similar feature values

= Examples from different classes have different feature values

“Good” features “Bad” features

=» More feature properties

Linear separability Non-linear separability Highly correlated features Multi-modal

HANYANG UNIVERSITY

Trang 9

Classifier [6]

= The task of a classifier is to partition feature

space into class-labeled decision regions a

e Borders between decision regions are called R2 ` decision boundaries

e Theclassification of feature vector x consists of Rs

determining which decision region it belongs to,

and assign x to this class

=» Acliassifier can be represented as a set of discriminant functions

e The classifier assigns a feature vector x to class w, if g(x)>9(x) Vj=i

Trang 10

Pattern Recognition Approaches [6]

a Statistical (StatPR)

e Patterns classified based on an underlying statistical model of the features

= The statistical model is defined by a family of class-conditional probability density functions Pr(x|c,) (Probability of feature vector x given class c,)

=» Neural (NeurPR)

e Classification is based on the response of a network of processing units

(neurons) to an input stimuli (pattern)

m “Knowledge” is stored in the connectivity and strength of the synaptic weights

e NeurPR is a trainable, non-algorithmic, black-box strategy

e NeurPR is very attractive since

a it requires minimum a priori knowledge

= With enough layers and neurons, an ANN can create any complex decision region

a Syntactic (SyntPR)

e Patterns classified based on measures of structural similarity

m “Knowledge” is represented by means of formal grammars or relational descriptions

(graphs)

e SyntPR is used not only for classification, but also for description

= Typically, SyntPR approaches formulate hierarchical descriptions of complex patterns built up from simpler sub patterns

Trang 11

Pattern Recognition Approaches [6]

Neural* Statistical Structural

Feature extraction:

# intersections / \

# right oblique lines +

0 *Neural approaches may also \

~ employ feature extraction

Trang 13

Machine Perception [2]

"salmon"

FIGURE 1.1 The objects to be classified are first sensed by a transducer (camera),

whose signals are preprocessed Next the features are extracted and finally the clas-

sification is emitted, here either “salmon” or “sea bass.” Although the information flow

is often chosen to be from the source to the classifier, some systems employ information

flow in which earlier levels of processing can be altered based on the tentative or pre-

liminary response in later levels (gray arrows) Yet others combine two or more stages

into a unified step, such as simultaneous segmentation and feature extraction From:

Richard O Duda, Peter E Hart, and David G Stork, Pattern Classification Copyright

Preprocessing: use a segmentation operation to isolate fishes from one another and from the background

Feature extraction: information

from a single fish is sent to a feature extractor whose purpose 1s to reduce the data by measuring certain

Trang 14

Feature Selection [2]

The length of the fish as a possible feature for discrimination

salmon sea bass

count

22†

20†

IS lót J2}

egories; using length alone, we will have some errors The value marked /* will lead to

= Ihe length is a poor feature alone!

Trang 15

FIGURE 1.3 Histograms for the lightness feature for the two categories No single

threshold value x* (decision boundary) will serve to unambiguously discriminate be-

tween the two categories; using lightness alone, we will have some errors The value x*

Wiley & Sons, Inc

Trang 16

FIGURE 1.4 The two features of lightness and width for sea bass and salmon The dark

line could serve as a decision boundary of our classifier Overall classification error on

the data shown is lower than if we use only one feature as in Fig 1.3, but there will

still be some errors From: Richard O Duda, Peter E Hart, and David G Stork, Pattern

Trang 17

shown leads it to be classified as a sea bass From: Richard O Duda, Peter E Hart, and

Trang 19

Generalization: Model Selection [3]

Polynomial Curve Fitting

_ Oth Order Polynomial _ | 1st Order Polynomial |

Trang 20

Generalization: Model Selection [3]

Polynomial Curve Fitting, Over-fitting

Trang 21

Generalization: Sample Size [3]

Trang 22

Generalization: Sample Size [3]

Trang 23

Generalization: Regularization [3]

Polynomial Curve Fitting

Regularization: Penalize large coefficient values

E(w) = 2 > {y(fn, Ww) — tn} + 5 Il

n=1

Trang 24

Learning and Adaptation [2]

Trang 25

Linear Discriminant Functions [6]

g(x)>O xew, XÌ=Ww'X+Wa ©4Ÿ)

a(x) 0 To X EW,

e where wis the weight vector and W, is the threshold weight or bias (not to be

confused with that of the bias-variance dilemma)

Trang 26

Feature Extraction 29}

integral image

Trang 27

Haar-like Feature [7]

The simple features used are reminiscent of Haar basis functions which

have been used by Papageorgiou et al (1998)

Three kinds of features: two-rectangle feature, three-rectangle feature, and four-rectangle feature

Given that the base resolution of the detector is 24x24, the exhaustive set

of rectangle feature is quite large, 160,000

Trang 28

Haar-like Feature: Integral Image [7]

Rectangle features can be computed very rapidly using an intermediate representation

for the image which we call the integral image

The integral image at location x,y contains the sum of the pixels above and to the left of

x, y, Inclusive:

ii(x,y)= > i@’,y’),

X'<x,y'<y

where ii (x, y) 1s the integral image and i (x, y) is the original image (see Fig 2) Using

the following pair of recurrences:

S(X, y) = S(X, vy — 1) +i(4, y) (1) ii(x, y) =ii(x — 1, y) + S(x,y) (2)

(where s(x, y) 1s the cumulative row sum, s(x, —1) =O, and ii (—1, y) = 0) the integral

image can be computed in one pass over the original image

y ES OF al ofallthepixlsaboveandiothekf E-mail: hogijung@hanyang.ac.kr

Trang 29

Haar-like Feature: Integral Image [7]

Using the integral image any rectangular Sum can be computed in four

array references (see Fig 3)

1 is the sum of the pixels in rectangle A The value at location 2 is

A + B, at location 3 is A + C, and at location 4 is A+ B+C + D

The sum within D can be computed as 4 + | — (2 + 3)

Our hypothesis, which is borne out by experiment, is that a very small

number of these features can be combined to form an effective classifier The main challenge is to find these features

Trang 30

Uimension Reduction: PCA

Trang 31

Abstract [1]

Principal component analysis (PCA) is a technique that is useful for the compression and classification of data The purpose 1s to reduce the dimensionality of a data set (sample)

by finding a new set of variables, smaller than the original set of variables, that

nonetheless retains most of the sample's information

By information we mean the variation present in the sample, given by the correlations between the original variables The new variables, called principal components (PCs), are uncorrelated, and are ordered by the fraction of the total information each retains

Trang 32

Geometric Picture of Principal Components [1]

LD)

A sample of 7 observations In the 2-D space K = (x4, 42)

Goal: to account for the variation in a sample in as few variables as

possible, to some accuracy

Trang 33

Geometric Picture of Principal Components [1]

¢ the 1st PC is a minimum distance fit to a line in X Space

¢ the 2nd PC is a minimum distance fit to a line in the plane perpendicular

to the 1st PC

PCs are a series of linear least squares fits to a sample, each orthogonal

to all the previous

Trang 34

Usage of PCA: Data Compression [1]

Because the kth PC retains the kth greatest fraction of the variation

we can approximate each observation by truncating the sum at the first m < p PCs

Trang 35

Usage of PCA: Data Compression [1]

Reduce the dimensionality of the data

from oto m< pby approximating KX Y~ X™ = ZFrAamt

where Z’” is the 7x m portion of Z

and A’ is the ox mportion of A

Trang 36

Derivation of PCA using the Covariance Method [8]

Let X be a @-dimensional random vector expressed as column vector

Without loss of generality, assume X has zero mean We want to find

a @xd orthonormal transformation matrix P such that

Y=P'X

with the constraint that

cov(Y) IS a diagonal matrix and PP =P"

> PX is a random vector with all its distinct components pairwise uncorrelated

By substitution, and matrix algebra, we obtain:

(ca) 0†QFTI|ðFn] E-mail: hogijung@hanyang.ac.kr

Trang 37

Derivation of PCA using the Covariance Method [8]

substituting into equation above, we obtain:

[Ai Pi, Ao Po, ,AaPa| = [cov(X) P;, cov(X)Po, , cov(X) Pil

Notice that inA;P; = cov(X)P;

P.is an eigenvector of the covariance matrix of X Therefore, by finding the

eigenvectors of the covariance matrix of X, we find a projection matrix P

that satisfies the original constraints

Trang 38

Bayesian Decision

Theory

Trang 39

State of Nature [2]

We let o denote the sfate of nature, with a= a, tor sea bass and w= a, for salmon

Because the state of nature is so unpredictable, we consider a to be a

variable that must be described probabilistically

P(o,) = Plo.) (uniform priors) P(o,) + Po.) = 7 (exclusivity and exhaustivity)

More generally, we assume that there is some @ prior’ probability (or simply

prior) P(w,) that the next fish is sea bass, and some prior probability Pfa./

that it is salmon

P(a,) + P( @,) = 7 (exclusivity and exhaustivity)

Decision rule with only the prior information

Decide a, if P(a,) > P(w,) otherwise decide a,

Trang 40

Class-—Conditional Probability Density [2]

In most circumstances we are not asked to make decisions with so little

information In our example, we might for instance use a lightness

measurement x to improve our classifier

We consider x to be a continuous random variable whose distribution

depends on the state of nature and is expressed as D(/@/ This is the

class—conditional probability density function, the probability density

function for x given that the state of nature Is a

* value x given the pattern is in category w

Trang 41

Posterior, likelihood, evidence [2]

suppose that we know both the prior probabilities P(o) and the conditional

densities o(x/a/ for 1, 2

Suppose further that we measure the lightness of a fish and discover that its value is x

How does this measurement influence our attitude concerning the true state of nature — that is, the category of the fish?

Tiêu đề	4 Pattern Recognition
Trường học	Yonsei University
Chuyên ngành	Machine Perception
Thể loại	Giáo trình

Định dạng
Số trang	147
Dung lượng	5,39 MB