Xử lý ảnh trong cơ điện tử machine vision chapter 7 object recognition

Feature-based Methods Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011.. Feature-based Methods Richard Szeliski, Computer Vision Algorit

Trang 1

XỬ LÝ ẢNH TRONG CƠ ĐIỆN TỬ

Machine Vision

1

TRƯỜNG ĐẠI HỌC BÁCH KHOA HÀ NỘI

Giảng viên: TS Nguyễn Thành Hùng Đơn vị: Bộ môn Cơ điện tử, Viện Cơ khí

Hà Nội, 2021

Trang 2

❖4 Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen.

Trang 3

1 Introduction

▪ Object recognition: localize and to classify objects.

▪ General concept:

➢ training datasets containing images with known and labelled objects;

➢ extracts different types of information (colours, edges, geometric forms) based on thechosen algorithm

➢ for any new image the same information is gathered and compared to the trainingdataset to find the most suitable classification

Trang 4

1 Introduction

▪ Applications:

➢ robots in industrial environments,

➢ face or handwriting recognition

➢ autonomous systems such as modern cars which use object recognition for pedestriandetection, emergency brake assistant and so on

➢ …

Trang 5

➢ Artificial Neural Networks

Trang 6

▪ General Object Recognition Strategies: Appearance-based method

➢ Face or handwriting recognition

➢ Reference training images

➢ This dataset is compressed to obtain a lower dimension subspace, also called eigenspace

➢ Parts of the new input images are projected on the eigenspace and then correspondence isexamined

1 Introduction

Trang 7

▪ General Object Recognition Strategies: Feature-based Method

➢ Characteristic for each object

➢ Colours, contour lines, geometric forms or edges

➢ The basic concept of feature-based object recognition strategies is following:

• Every input image is searched for a specific type of feature,

• This feature is then compared to a database containing models of the objects in order to

verify if there are recognised objects

1 Introduction

Trang 8

▪ General Object Recognition Strategies: Feature-based method

➢ Features and their descriptors can be either found considering the whole image (globalfeature) or after observing just small parts of the image (local feature)

➢ An histogram of the pixel intensity or colour are simple examples for global features

➢ It is not always reasonable to compare the whole image, as already slight changes in

illumination, position (occlusion) or rotation lead to significant differences and a correctrecognition is not possible anymore

1 Introduction

Trang 9

▪ General Object Recognition Strategies: Feature-based method

➢ Descriptors of local features are more robust against these problems and thereforealgorithms with local features often outperform global feature-based methods

Two patches of different

images are cut and

compared if the error

between the patches is

below a certain threshold.

1 Introduction

Trang 10

▪ General Object Recognition Strategies: Interpretation Tree

➢ Interpretation tree is a depth first search algorithm for model matching

➢ Algorithms based on this approach often try to recognise n-dimensional geometricobjects, therefore a database containing models with known features is necessary

➢ The feature set might consist of distance, angle and direction constraints between points

on the surface of the objects

1 Introduction

Trang 11

Procedure of an interpretation tree algorithm

1 Introduction

▪ General Object Recognition Strategies: Interpretation Tree

Trang 12

▪ General Object Recognition Strategies: Pattern Matching

➢ Methods of pattern matching, or sometimes called template matching, are often usedbecause of their simplicity

➢ Template matching is a technique for finding small parts of an image which match atemplate image

1 Introduction

Trang 13

▪ General Object Recognition Strategies: Pattern Matching

➢ One famous application of template matching is traffic sign recognition, small parts of theinput image are tried to be matched with a database full of different images of traffic signs

➢ This approach has lots of disadvantages such as problems with occlusion, rotation, scaling,

different illuminations

1 Introduction

Trang 14

▪ General Object Recognition Strategies: Artificial neural networks

➢ A model consists of several layer, in which each layer is composed of a certain number ofneurons

A neural network containing one input layer, two hidden layer and one output layer

1 Introduction

Trang 15

➢ An input and an output layer is the minimum amount of layers a network can have, butnormally hidden layer are included to be able to learn more complex things such as objectrecognition

➢ All neurons from one layer are connected to all neurons from the next layer and thereforecreate a huge network with millions of parameters

➢ All of these connections have a weight which is updated during learning phase Neurons

are activated if the sum of the input signals is above a certain threshold and a activationfunction triggers the output

1 Introduction

Trang 17

➢ There are different types of networks such as feed-forward, recurrent with differentnumber and types of hidden layers, while the input (e.g number of pixels) and output

(number of classes) layer are fixed

➢ Later, convolutional neural networks and their hidden layers are explained in a moredetailed way in Section 4 New inputs go through the same way, some neurons might beactivated based on the trained network and finally, this leads to the most suitableclassification

1 Introduction

Trang 18

➢ Reliability and Accuracy

Trang 19

❖ Performance Analysis: Invariances and Robustness

▪ First, the algorithms are analysed and checked whether invariances occur and what level

of robustness they have

1 Introduction

Trang 20

❖ Performance Analysis: Complexity

▪ Secondly, the algorithms are compared regarding complexity, especially in terms ofcomputational load and memory usage

1 Introduction

Trang 21

The development of accuracy rates of traditional computer vision and deep learning regarding ImageNet

1 Introduction

❖ Performance Analysis: Reliability and Accuracy

Trang 22

Trang 23

❖ Template matching is a technique for finding areas of an image that match (are similar) to

a template image (patch)

❖ How does it work?

▪ We need two primary components:

▪ Source image (I): The image in which we expect to find a match to the template image

▪ Template image (T): The patch image which will be compared to the template image our

goal is to detect the highest matching area:

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html

2 Pattern Matching

Trang 24

30 https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html

❖ Template matching

Trang 25

▪ To identify the matching area, we have to compare the template image against the source

image by sliding it:

Trang 26

▪ By sliding, we mean moving the patch one pixel at a time (left to right, up to down) Ateach location, a metric is calculated so it represents how "good" or "bad" the match at thatlocation is (or how similar the patch is to that particular area of the source image)

▪ For each location of T over I, you store the metric in the result matrix R Each location(x,y) in R contains the match metric:

Trang 27

33 https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html

Trang 28

▪ The image above is the result R of sliding the patch with a metricTM_CCORR_NORMED The brightest locations indicate the highest matches As you cansee, the location marked by the red circle is probably the one with the highest value, sothat location (the rectangle formed by that point as a corner and width and height equal tothe patch image) is considered the match

Trang 31

Trang 34

3 Feature-based Methods

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011.

❖ Feature detectors

▪ The simplest possible matching criterion for comparing two image patches:

where I0 and I1 are the two images being compared, u = (u, v) is the displacement vector, w(x) is a

spatially varying weighting (or window) function, and the summation i is over all the pixels in the patch

Trang 35

Aperture problems for different image patches: (a) stable (“corner-like”) flow; (b) classic aperture problem

(barber-pole illusion); (c) textureless region The two images I0 (yellow) and I1 (red) are overlaid The red

vector u indicates the displacement between the patch centers and the w(x i) weighting function (patch

window) is shown as a dark circle.

Trang 36

▪ auto-correlation function or surface

Three auto-correlation surfaces EAC(Δu) shown as both grayscale

images and surface plots: (a) The original image is marked with

three red crosses to denote where the auto-correlation surfaces

were computed; (b) this patch is from the flower bed (good

unique minimum); (c) this patch is from the roof edge

(one-dimensional aperture problem); and (d) this patch is from the

cloud (no good peak) Each grid point in figures b–d is one value

of Δu.

Trang 37

▪ auto-correlation function or surface

Uncertainty ellipse corresponding to an eigenvalue analysis of

the auto-correlation matrix A.

Trang 38

Interest operator responses: (a) Sample image, (b) Harris response, and (c) DoG response The circle sizes

and colors indicate the scale at which each interest point was detected Notice how the two detectors

tend to respond at complementary locations.

Trang 39

▪ Adaptive non-maximal suppression (ANMS)

Adaptive non-maximal suppression (ANMS) (Brown, Szeliski, and Winder 2005): The upper two images show the strongest 250 and 500 interest points, while the lower two images show the interest points selected with adaptive non-maximal suppression, along with the

corresponding suppression radius r Note how the latter

features have a much more uniform spatial distribution across the image.

Trang 42

▪ Rotational invariance and orientation estimation

A dominant orientation estimate can be computed by creating a histogram of all the gradient orientations (weighted by their magnitudes or after thresholding out small gradients) and then finding the significant peaks in this distribution ( Lowe 2004 )

Trang 43

▪ Rotational invariance and orientation estimation

Affine region detectors used to match two images taken from dramatically different viewpoints ( Mikolajczyk and Schmid 2004 )

Trang 44

Affine normalization using the second moment matrices, as described by Mikolajczyk, Tuytelaars, Schmid et

al ( 2005): After image coordinates are transformed using the matrices A 0 -1/2 and A 1 -1/2 , they are related by a

pure rotation R, which can be estimated using a dominant orientation technique.

Trang 48

❖ Feature descriptors

▪ Bias and gain normalization (MOPS)

MOPS descriptors are formed using an 8×8 sampling of bias and gain normalized intensity values, with a

sample spacing of five pixels relative to the detection scale ( Brown, Szeliski, and Winder 2005 ) This low

frequency sampling gives the features some robustness to interest point location error and is achieved by sampling at a higher pyramid level than the detection scale.

Trang 49

▪ Scale invariant feature transform (SIFT)

A schematic representation of Lowe’s ( 2004 ) scale invariant feature transform (SIFT): (a) Gradient orientations and magnitudes are computed at each pixel and weighted by a Gaussian fall-off function (blue circle) (b) A weighted gradient orientation histogram is then computed in each subregion, using trilinear interpolation While this figure shows an 8

× 8 pixel patch and a 2 × 2 descriptor array,

Lowe’s actual implementation uses 16 × 16 patches and a 4 × 4 array of eight-bin

histograms.

Trang 50

▪ Gradient location-orientation histogram (GLOH)

The gradient orientation histogram (GLOH) descriptor uses log- polar bins instead of square bins to compute orientation histograms ( Mikolajczyk and Schmid 2005 ).

Tiêu đề	Object Recognition
Tác giả	Simon Achatz
Người hướng dẫn	TS. Nguyễn Thành Hùng
Trường học	Trường Đại Học Bách Khoa Hà Nội
Chuyên ngành	Cơ Điện Tử
Thể loại	Thesis
Năm xuất bản	2021
Thành phố	Hà Nội

Định dạng
Số trang	106
Dung lượng	5,31 MB