Introduction to Neural Networks and Deep Learning Introduction to the Convolutional Network Andres Mendez Vazquez March 28, 2021 1 148 Outline 1 Introduction The Long Path The Problem of Image Proce.
Trang 1Introduction to Neural Networks and Deep Learning
Introduction to the Convolutional Network
Andres Mendez-Vazquez
March 28, 2021
Trang 21 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Finally, The Fully Connected Layer
Trang 31 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Trang 4The Long Path [1]
2014
Spatial Explotation Parallelism Inception Block
Bottleneck Factorization
AlexNet
VGG Highway Net ResNet
2016 2017
Depth Revolution
2015
Multi-Path Connectivity
2017
SE Net CMPE-SE
Residual Attention Module
CBAM
Channel Boosted CNN
Width Explotation
2018 2018 Channel 2018
Attention
Feature Map Explotation
The Revolution
First Results
Residual and Multipath Architectures
The Beginnig of Atention?
Complex Architectures and The Attention Revolution
Trang 51 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Trang 6Digital Images as pixels in a digitized matrix [2]
Ilumination Source
Ilumination Source
Output
Trang 7Further [2]
Pixel values typically represent
Gray levels, colors, heights, opacities etc
Something Notable
Remember digitization implies that a digital image is an
approximation of a real scene
Trang 8Further [2]
Pixel values typically represent
Gray levels, colors, heights, opacities etc
Something Notable
Remember digitization implies that a digital image is an
approximation of a real scene
Trang 9Common image formats include
On sample/pixel per point (B&W or Grayscale)
Three samples/pixel per point (Red, Green, and Blue)
Four samples/pixel per point (Red, Green, Blue, and “Alpha”)
Trang 10Therefore, we have the following process
Low Level Process
Imagen
Noise
Trang 11Edge Detection
Trang 12Mid Level Process
Object
Segmentation
Trang 13Object Recognition
Trang 14It would be nice to automatize all these processes
We would solve a lot of headaches when setting up such processWhy not to use the data sets
By using a Neural Networks that replicates the process
Trang 15It would be nice to automatize all these processes
We would solve a lot of headaches when setting up such processWhy not to use the data sets
By using a Neural Networks that replicates the process
Trang 161 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Finally, The Fully Connected Layer
Trang 17Multilayer Neural Network Classification
We have the following classification [3]
Trang 181 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Finally, The Fully Connected Layer
Trang 19Drawbacks of previous neural networks
The number of trainable parameters becomes extremely large
Large N
A Z
Trang 20Drawbacks of previous neural networks
In addition, little or no invariance to shifting, scaling, and other forms
of distortion
Large N
A Z
Trang 21Drawbacks of previous neural networks
In addition, little or no invariance to shifting, scaling, and other forms
of distortion
Large N
A Z Shift to the Left
Trang 22Drawbacks of previous neural networks
The topology of the input data is completely ignored
Trang 23For Example
We have
Trang 24For Example
If we have an element that the network has never seen
Trang 25Possible Solution
We can minimize this drawbacks by getting
Fully connected network of sufficient size can produce outputs thatare invariant with respect to such variations
Problem!!!
Training time
Network size
Free parameters
Trang 26Possible Solution
We can minimize this drawbacks by getting
Fully connected network of sufficient size can produce outputs thatare invariant with respect to such variations
Problem!!!
Training time
Network size
Free parameters
Trang 271 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Trang 31Convolutional Neural Networks (CNN) were invented by [5]
In 1989, Yann LeCun and Yoshua Bengio introduced the concept of
Convolutional Neural networks
Patterns of Local Contrast Face Features
Faces
OUTPUT
Trang 35About CNN’s
In addition
CNN is a feed-forward network that can extract topological propertiesfrom an image
Like almost every other neural networks they are trained with a
version of the back-propagation algorithm
Convolutional Neural Networks are designed to recognize visual
patterns directly from pixel images with minimal preprocessing
They can recognize patterns with extreme variability
Trang 36About CNN’s
In addition
CNN is a feed-forward network that can extract topological propertiesfrom an image
Like almost every other neural networks they are trained with a
version of the back-propagation algorithm
Convolutional Neural Networks are designed to recognize visual
patterns directly from pixel images with minimal preprocessing
They can recognize patterns with extreme variability
Trang 37About CNN’s
In addition
CNN is a feed-forward network that can extract topological propertiesfrom an image
Like almost every other neural networks they are trained with a
version of the back-propagation algorithm
Convolutional Neural Networks are designed to recognize visual
patterns directly from pixel images with minimal preprocessing
They can recognize patterns with extreme variability
Trang 38About CNN’s
In addition
CNN is a feed-forward network that can extract topological propertiesfrom an image
Like almost every other neural networks they are trained with a
version of the back-propagation algorithm
Convolutional Neural Networks are designed to recognize visual
patterns directly from pixel images with minimal preprocessing
They can recognize patterns with extreme variability
Trang 391 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Trang 40Local Connectivity
We have the following idea [6]
Instead of using a full connectivity
Trang 41Local Connectivity
We have the following idea [6]
Instead of using a full connectivity
Input Image
We would have something like this
Trang 42Local Connectivity
We decide only to connect the neurons in a local way
Each hidden unit is connected only to a subregion (patch) of theinput image
It is connected to all channels:
Trang 43Local Connectivity
We decide only to connect the neurons in a local way
Each hidden unit is connected only to a subregion (patch) of theinput image
It is connected to all channels:
Trang 44Local Connectivity
We decide only to connect the neurons in a local way
Each hidden unit is connected only to a subregion (patch) of theinput image
It is connected to all channels:
Trang 45For gray scale, we get something like this
Input ImageThen, our formula changes
y i = f
X
w i x i
Trang 46For gray scale, we get something like this
Input ImageThen, our formula changes
y i = f
X
i∈L p
w i x i
Trang 47In the case of the 3 channels
Input ImageThus
y i = f
X
w i x c i
Trang 48In the case of the 3 channels
Input ImageThus
y i = f
X
i∈L p ,c
w i x c i
Trang 49Solving the following problems
Trang 50Solving the following problems
Trang 51How this looks in the image
We have
Trang 521 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Finally, The Fully Connected Layer
Trang 53Parameter Sharing
Second Idea
Share matrix of parameters across certain units
These units are organized into
The same feature “map”
I Where the units share same parameters (For example, the same mask)
Trang 54Parameter Sharing
Second Idea
Share matrix of parameters across certain units
These units are organized into
The same feature “map”
I Where the units share same parameters (For example, the same mask)
Trang 55We have something like this
Feature Map 1 Feature Map 2 Feature Map 3
Trang 56We have something like this
Feature Map 1 Feature Map 2 Feature Map 3
Trang 57Now, in our notation
We have a collection of matrices representing this connectivity
feature map
In each cell of these matrices is the weight to be multiplied with thelocal input to the local neuron
An now why the name of convolution
Yes!!! The definition is coming now
Trang 58Now, in our notation
We have a collection of matrices representing this connectivity
feature map
In each cell of these matrices is the weight to be multiplied with thelocal input to the local neuron
An now why the name of convolution
Yes!!! The definition is coming now
Trang 591 Introduction
The Long Path
The Problem of Image Processing
Multilayer Neural Network Classification
Fixing the Problem, ReLu function
Back to the Non-Linearity Layer
Rectification Layer
Local Contrast Normalization Layer
Sub-sampling and Pooling
Strides
Normalization Layer AKA Batch Normalization
Trang 60Digital Images
In computer vision [2, 7]
We usually operate on digital (discrete) images:
Sample the 2D space on a regular grid
Quantize each sample (round to nearest integer)
The image can now be represented as a matrix of integer values,
Trang 61Digital Images
In computer vision [2, 7]
We usually operate on digital (discrete) images:
Sample the 2D space on a regular grid
Quantize each sample (round to nearest integer)
The image can now be represented as a matrix of integer values,
Trang 62Digital Images
In computer vision [2, 7]
We usually operate on digital (discrete) images:
Sample the 2D space on a regular grid
Quantize each sample (round to nearest integer)
The image can now be represented as a matrix of integer values,
Trang 63Digital Images
In computer vision [2, 7]
We usually operate on digital (discrete) images:
Sample the 2D space on a regular grid
Quantize each sample (round to nearest integer)
The image can now be represented as a matrix of integer values,
Trang 64Many times we want to eliminate noise in a image
For example a moving average
Trang 66This can be generalized into the 2D images
Left I and Right I ∗ K
Trang 67This can be generalized into the 2D images
Left I and Right I ∗ K
Trang 68This can be generalized into the 2D images
Left I and Right I ∗ K
Trang 69This can be generalized into the 2D images
Left I and Right I ∗ K
Trang 72Thus, we can define the concept of convolution
Yes, using the previous ideas
Trang 73Thus, we can define the concept of convolution
Yes, using the previous ideas
Trang 74Definition
Let I : [a, b] × [c, d] → [0 255] be the image and
K : [e, f ] × [h, i] → R be the kernel The output of Convolving I
Trang 75Now, why not to expand this idea
Imagine that a three channel image is splitted into a three featuremap
Feature Maps
Trang 76Mathematically, we have the following
Map i
(I ∗ k) [x, y, o] =
3 X
Trang 77Mathematically, we have the following
Map i
(I ∗ k) [x, y, o] =
3 X
Trang 78For Example, Encoder
We have the following situation
Trang 79We have the following
Y j (l) is a matrix representing the l layer and j th feature map
K ij (l) is the kernel filter with i th kernel for layer j th
Therefore
We can see the Convolutional as a fusion of information from
different feature maps
m (l−1)1
X
Y j (l−1) ∗ K ij (l)
Trang 80We have the following
Y j (l) is a matrix representing the l layer and j th feature map
K ij (l) is the kernel filter with i th kernel for layer j th
Therefore
We can see the Convolutional as a fusion of information from
different feature maps
m (l−1)1
X
j=1
Y j (l−1) ∗ K ij (l)
Trang 81Y i (l) is the i th feature map in layer l.
B i (l) is the bias matrix for output j.
K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.
Trang 82Y i (l) is the i th feature map in layer l.
B i (l) is the bias matrix for output j.
K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.
Trang 83Y i (l) is the i th feature map in layer l.
B i (l) is the bias matrix for output j.
K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.
Trang 84Y i (l) is the i th feature map in layer l.
B i (l) is the bias matrix for output j.
K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.
Trang 85Y i (l) is the i th feature map in layer l.
B i (l) is the bias matrix for output j.
K ij (l) is the filter of sizeh2h (l)1 + 1i×h2h (l)2 + 1i.
Trang 86Thew output of layer l
It consists m (l)1 feature maps of size m (l)2 × m (l)3
Something Notable
m (l)2 and m (l)3 are influenced by border effects
Therefore, the output feature maps when the Convolutional sum isdefined properly have size
m (l)2 = m (l−1)2 − 2h (l)1
m (l)3 = m (l−1)3 − 2h (l)2
Trang 87Thew output of layer l
It consists m (l)1 feature maps of size m (l)2 × m (l)3
Something Notable
m (l)2 and m (l)3 are influenced by border effects
Therefore, the output feature maps when the Convolutional sum isdefined properly have size
m (l)2 = m (l−1)2 − 2h (l)1
Trang 88Why? The Border
Example
Convolutional Maps
Trang 89Special Case
When l = 1
The input is a single image I consisting of one or more channels.
Trang 90Y j (l−1)
x−k,x−t
Trang 92Here, an interesting case
Only a Historical Note
The foundations for deconvolution came from Norbert Wiener of theMassachusetts Institute of Technology in his book “Extrapolation,Interpolation, and Smoothing of Stationary Time Series” (1949)
layer that we want to recover
Y i (l) ∗ K ij (l) = Y j (l−1)
Trang 93Here, an interesting case
Only a Historical Note
The foundations for deconvolution came from Norbert Wiener of theMassachusetts Institute of Technology in his book “Extrapolation,Interpolation, and Smoothing of Stationary Time Series” (1949)
layer that we want to recover
Y i (l) ∗ K ij (l) = Y j (l−1)
Trang 94Typically, p = 1, although other values are possible.
They look for the arguments to minimize a cost of function over a set
arg min
Y j (l) ∗K ij (l)
C (y)
Trang 95Typically, p = 1, although other values are possible.
They look for the arguments to minimize a cost of function over a set
(l) (l) C (y)
Trang 96Y i (l−1,k) are the feature maps from the previous layer
g (l) ij is a fixed binary matrix that determines the connectivity betweenfeature maps at different layers
Trang 97Y i (l−1,k) are the feature maps from the previous layer
g (l) ij is a fixed binary matrix that determines the connectivity between
Trang 98This can be sen as
We have the following layer
+
Trang 99They noticed some drawbacks
Using the following optimizations
Direct Gradient Descent
Iterative Reweighted Least Squares
Stochastic Gradient Descent
All of they presented problems!!!
They solved it using a new cost function