1. Trang chủ
  2. » Công Nghệ Thông Tin

Notes on neural networks and deep learning

268 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 268
Dung lượng 17,63 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Introduction to Neural Networks and Deep Learning Introduction to the Convolutional Network Andres Mendez Vazquez March 28, 2021 1 148 Outline 1 Introduction The Long Path The Problem of Image Proce.

Trang 1

Introduction to Neural Networks and Deep Learning

Introduction to the Convolutional Network

Andres Mendez-Vazquez

March 28, 2021

Trang 2

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Finally, The Fully Connected Layer

Trang 3

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Trang 4

The Long Path [1]

2014

Spatial Explotation Parallelism Inception Block

Bottleneck Factorization

AlexNet

VGG Highway Net ResNet

2016 2017

Depth Revolution

2015

Multi-Path Connectivity

2017

SE Net CMPE-SE

Residual Attention Module

CBAM

Channel Boosted CNN

Width Explotation

2018 2018 Channel 2018

Attention

Feature Map Explotation

The Revolution

First Results

Residual and Multipath Architectures

The Beginnig of Atention?

Complex Architectures and The Attention Revolution

Trang 5

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Trang 6

Digital Images as pixels in a digitized matrix [2]

Ilumination Source

Ilumination Source

Output

Trang 7

Further [2]

Pixel values typically represent

Gray levels, colors, heights, opacities etc

Something Notable

Remember digitization implies that a digital image is an

approximation of a real scene

Trang 8

Further [2]

Pixel values typically represent

Gray levels, colors, heights, opacities etc

Something Notable

Remember digitization implies that a digital image is an

approximation of a real scene

Trang 9

Common image formats include

On sample/pixel per point (B&W or Grayscale)

Three samples/pixel per point (Red, Green, and Blue)

Four samples/pixel per point (Red, Green, Blue, and “Alpha”)

Trang 10

Therefore, we have the following process

Low Level Process

Imagen

Noise

Trang 11

Edge Detection

Trang 12

Mid Level Process

Object

Segmentation

Trang 13

Object Recognition

Trang 14

It would be nice to automatize all these processes

We would solve a lot of headaches when setting up such processWhy not to use the data sets

By using a Neural Networks that replicates the process

Trang 15

It would be nice to automatize all these processes

We would solve a lot of headaches when setting up such processWhy not to use the data sets

By using a Neural Networks that replicates the process

Trang 16

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Finally, The Fully Connected Layer

Trang 17

Multilayer Neural Network Classification

We have the following classification [3]

Trang 18

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Finally, The Fully Connected Layer

Trang 19

Drawbacks of previous neural networks

The number of trainable parameters becomes extremely large

Large N

A Z

Trang 20

Drawbacks of previous neural networks

In addition, little or no invariance to shifting, scaling, and other forms

of distortion

Large N

A Z

Trang 21

Drawbacks of previous neural networks

In addition, little or no invariance to shifting, scaling, and other forms

of distortion

Large N

A Z Shift to the Left

Trang 22

Drawbacks of previous neural networks

The topology of the input data is completely ignored

Trang 23

For Example

We have

Trang 24

For Example

If we have an element that the network has never seen

Trang 25

Possible Solution

We can minimize this drawbacks by getting

Fully connected network of sufficient size can produce outputs thatare invariant with respect to such variations

Problem!!!

Training time

Network size

Free parameters

Trang 26

Possible Solution

We can minimize this drawbacks by getting

Fully connected network of sufficient size can produce outputs thatare invariant with respect to such variations

Problem!!!

Training time

Network size

Free parameters

Trang 27

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Trang 31

Convolutional Neural Networks (CNN) were invented by [5]

In 1989, Yann LeCun and Yoshua Bengio introduced the concept of

Convolutional Neural networks

Patterns of Local Contrast Face Features

Faces

OUTPUT

Trang 35

About CNN’s

In addition

CNN is a feed-forward network that can extract topological propertiesfrom an image

Like almost every other neural networks they are trained with a

version of the back-propagation algorithm

Convolutional Neural Networks are designed to recognize visual

patterns directly from pixel images with minimal preprocessing

They can recognize patterns with extreme variability

Trang 36

About CNN’s

In addition

CNN is a feed-forward network that can extract topological propertiesfrom an image

Like almost every other neural networks they are trained with a

version of the back-propagation algorithm

Convolutional Neural Networks are designed to recognize visual

patterns directly from pixel images with minimal preprocessing

They can recognize patterns with extreme variability

Trang 37

About CNN’s

In addition

CNN is a feed-forward network that can extract topological propertiesfrom an image

Like almost every other neural networks they are trained with a

version of the back-propagation algorithm

Convolutional Neural Networks are designed to recognize visual

patterns directly from pixel images with minimal preprocessing

They can recognize patterns with extreme variability

Trang 38

About CNN’s

In addition

CNN is a feed-forward network that can extract topological propertiesfrom an image

Like almost every other neural networks they are trained with a

version of the back-propagation algorithm

Convolutional Neural Networks are designed to recognize visual

patterns directly from pixel images with minimal preprocessing

They can recognize patterns with extreme variability

Trang 39

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Trang 40

Local Connectivity

We have the following idea [6]

Instead of using a full connectivity

Trang 41

Local Connectivity

We have the following idea [6]

Instead of using a full connectivity

Input Image

We would have something like this

Trang 42

Local Connectivity

We decide only to connect the neurons in a local way

Each hidden unit is connected only to a subregion (patch) of theinput image

It is connected to all channels:

Trang 43

Local Connectivity

We decide only to connect the neurons in a local way

Each hidden unit is connected only to a subregion (patch) of theinput image

It is connected to all channels:

Trang 44

Local Connectivity

We decide only to connect the neurons in a local way

Each hidden unit is connected only to a subregion (patch) of theinput image

It is connected to all channels:

Trang 45

For gray scale, we get something like this

Input ImageThen, our formula changes

y i = f

 X

w i x i

Trang 46

For gray scale, we get something like this

Input ImageThen, our formula changes

y i = f

 X

i∈L p

w i x i

Trang 47

In the case of the 3 channels

Input ImageThus

y i = f

 X

w i x c i

Trang 48

In the case of the 3 channels

Input ImageThus

y i = f

 X

i∈L p ,c

w i x c i

Trang 49

Solving the following problems

Trang 50

Solving the following problems

Trang 51

How this looks in the image

We have

Trang 52

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Finally, The Fully Connected Layer

Trang 53

Parameter Sharing

Second Idea

Share matrix of parameters across certain units

These units are organized into

The same feature “map”

I Where the units share same parameters (For example, the same mask)

Trang 54

Parameter Sharing

Second Idea

Share matrix of parameters across certain units

These units are organized into

The same feature “map”

I Where the units share same parameters (For example, the same mask)

Trang 55

We have something like this

Feature Map 1 Feature Map 2 Feature Map 3

Trang 56

We have something like this

Feature Map 1 Feature Map 2 Feature Map 3

Trang 57

Now, in our notation

We have a collection of matrices representing this connectivity

feature map

In each cell of these matrices is the weight to be multiplied with thelocal input to the local neuron

An now why the name of convolution

Yes!!! The definition is coming now

Trang 58

Now, in our notation

We have a collection of matrices representing this connectivity

feature map

In each cell of these matrices is the weight to be multiplied with thelocal input to the local neuron

An now why the name of convolution

Yes!!! The definition is coming now

Trang 59

1 Introduction

The Long Path

The Problem of Image Processing

Multilayer Neural Network Classification

Fixing the Problem, ReLu function

Back to the Non-Linearity Layer

Rectification Layer

Local Contrast Normalization Layer

Sub-sampling and Pooling

Strides

Normalization Layer AKA Batch Normalization

Trang 60

Digital Images

In computer vision [2, 7]

We usually operate on digital (discrete) images:

Sample the 2D space on a regular grid

Quantize each sample (round to nearest integer)

The image can now be represented as a matrix of integer values,

Trang 61

Digital Images

In computer vision [2, 7]

We usually operate on digital (discrete) images:

Sample the 2D space on a regular grid

Quantize each sample (round to nearest integer)

The image can now be represented as a matrix of integer values,

Trang 62

Digital Images

In computer vision [2, 7]

We usually operate on digital (discrete) images:

Sample the 2D space on a regular grid

Quantize each sample (round to nearest integer)

The image can now be represented as a matrix of integer values,

Trang 63

Digital Images

In computer vision [2, 7]

We usually operate on digital (discrete) images:

Sample the 2D space on a regular grid

Quantize each sample (round to nearest integer)

The image can now be represented as a matrix of integer values,

Trang 64

Many times we want to eliminate noise in a image

For example a moving average

Trang 66

This can be generalized into the 2D images

Left I and Right I ∗ K

Trang 67

This can be generalized into the 2D images

Left I and Right I ∗ K

Trang 68

This can be generalized into the 2D images

Left I and Right I ∗ K

Trang 69

This can be generalized into the 2D images

Left I and Right I ∗ K

Trang 72

Thus, we can define the concept of convolution

Yes, using the previous ideas

Trang 73

Thus, we can define the concept of convolution

Yes, using the previous ideas

Trang 74

Definition

Let I : [a, b] × [c, d] → [0 255] be the image and

K : [e, f ] × [h, i] → R be the kernel The output of Convolving I

Trang 75

Now, why not to expand this idea

Imagine that a three channel image is splitted into a three featuremap

Feature Maps

Trang 76

Mathematically, we have the following

Map i

(I ∗ k) [x, y, o] =

3 X

Trang 77

Mathematically, we have the following

Map i

(I ∗ k) [x, y, o] =

3 X

Trang 78

For Example, Encoder

We have the following situation

Trang 79

We have the following

Y j (l) is a matrix representing the l layer and j th feature map

K ij (l) is the kernel filter with i th kernel for layer j th

Therefore

We can see the Convolutional as a fusion of information from

different feature maps

m (l−1)1

X

Y j (l−1) ∗ K ij (l)

Trang 80

We have the following

Y j (l) is a matrix representing the l layer and j th feature map

K ij (l) is the kernel filter with i th kernel for layer j th

Therefore

We can see the Convolutional as a fusion of information from

different feature maps

m (l−1)1

X

j=1

Y j (l−1) ∗ K ij (l)

Trang 81

Y i (l) is the i th feature map in layer l.

B i (l) is the bias matrix for output j.

K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.

Trang 82

Y i (l) is the i th feature map in layer l.

B i (l) is the bias matrix for output j.

K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.

Trang 83

Y i (l) is the i th feature map in layer l.

B i (l) is the bias matrix for output j.

K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.

Trang 84

Y i (l) is the i th feature map in layer l.

B i (l) is the bias matrix for output j.

K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.

Trang 85

Y i (l) is the i th feature map in layer l.

B i (l) is the bias matrix for output j.

K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.

Trang 86

Thew output of layer l

It consists m (l)1 feature maps of size m (l)2 × m (l)3

Something Notable

m (l)2 and m (l)3 are influenced by border effects

Therefore, the output feature maps when the Convolutional sum isdefined properly have size

m (l)2 = m (l−1)2 − 2h (l)1

m (l)3 = m (l−1)3 − 2h (l)2

Trang 87

Thew output of layer l

It consists m (l)1 feature maps of size m (l)2 × m (l)3

Something Notable

m (l)2 and m (l)3 are influenced by border effects

Therefore, the output feature maps when the Convolutional sum isdefined properly have size

m (l)2 = m (l−1)2 − 2h (l)1

Trang 88

Why? The Border

Example

Convolutional Maps

Trang 89

Special Case

When l = 1

The input is a single image I consisting of one or more channels.

Trang 90

Y j (l−1)

x−k,x−t

Trang 92

Here, an interesting case

Only a Historical Note

The foundations for deconvolution came from Norbert Wiener of theMassachusetts Institute of Technology in his book “Extrapolation,Interpolation, and Smoothing of Stationary Time Series” (1949)

layer that we want to recover

Y i (l) ∗ K ij (l) = Y j (l−1)

Trang 93

Here, an interesting case

Only a Historical Note

The foundations for deconvolution came from Norbert Wiener of theMassachusetts Institute of Technology in his book “Extrapolation,Interpolation, and Smoothing of Stationary Time Series” (1949)

layer that we want to recover

Y i (l) ∗ K ij (l) = Y j (l−1)

Trang 94

Typically, p = 1, although other values are possible.

They look for the arguments to minimize a cost of function over a set

arg min

Y j (l) ∗K ij (l)

C (y)

Trang 95

Typically, p = 1, although other values are possible.

They look for the arguments to minimize a cost of function over a set

(l) (l) C (y)

Trang 96

Y i (l−1,k) are the feature maps from the previous layer

g (l) ij is a fixed binary matrix that determines the connectivity betweenfeature maps at different layers

Trang 97

Y i (l−1,k) are the feature maps from the previous layer

g (l) ij is a fixed binary matrix that determines the connectivity between

Trang 98

This can be sen as

We have the following layer

+

Trang 99

They noticed some drawbacks

Using the following optimizations

Direct Gradient Descent

Iterative Reweighted Least Squares

Stochastic Gradient Descent

All of they presented problems!!!

They solved it using a new cost function

Ngày đăng: 09/09/2022, 20:04