An application improving the accuracy of image classification

An Application Improving the Accuracy of Image Classification An application improving the accuracy of image classification Pham Tuan Dat Faculty of Information Technology Vietnam Maritime University[.]

Trang 1

An application improving the accuracy

of image classification

Pham Tuan Dat

Faculty of Information Technology Vietnam Maritime University

Hai Phong, Vietnam datpt@vimaru.edu.vn

Nguyen Kim Anh

Faculty of Information Technology Vietnam Maritime University

Hai Phong, Vietnam anhnk@vimaru.edu.vn

Abstract—There have been various research approaches to

the problem of image classification so far For image data

containing kinds of objects in the wild, many machine learning

algorithms give unreliable results Meanwhile, deep learning

networks are appropriate for big data, and they can deal with

the problem effectively Therefore, this paper aims to build an

application combining a ResNet model and image manipulation

to improve the accuracy of classification The classifier performs

the training phases on CIFAR-10 in a feasible time In addition,

it achieves around 93% accuracy of the test data This result is

better than that of some recently published studies

augmentation, cutmix, normalization

I INTRODUCTION Social networks have stored and managed a massive

information volume on the Internet To meet the needs of

users, social networks have to build useful applications From

a given keyword, the search services need to find relevant

information on the same subject exactly and fast Obviously,

relevant information does not just contain text but also

includes images A challenge of applications is that they must

develop an effective mechanism that can classify patterns into

the same subject if they represent a kind of object.

In fact, the problem of image classification is not a new

issue Machine learning algorithms are actually applied to

cope with this problem For instance, K-Nearest Neighbor and

Support Vector Machines solve the problem of handwritten

digit classification on MNIST very well [12] But many

conventional algorithms only achieve poor performances on

data sets such as CIFAR-10 and CIFAR-100, which contain

kinds of objects in the wild [12,13]

In recent years, deep learning networks have overcome

the weaknesses of machine learning algorithms Deep

learning networks can train big data and get optimal training

results A problem of deep learning networks is that when

increasing the number of layers, they generate more training

errors This will make the accuracy get saturated ResNet is a

typically deep learning network, the key point of ResNet is

the residual block that may cope with the degradation

problem [1,2] Residual blocks reduce the above drawback

and allow ResNet to achieve impressive accuracy in the case

of adding layers

On the other hand, the performance effectiveness does not

just depend on network architecture but also comes from data

The lack of data diversity makes deep learning networks

work inefficiently By modifying patterns of training data,

augmented images will represent a more comprehensive data

set [6] Consequently, image augmentation minimizes the

difference between patterns in training data and validation

data, as well as test data

Therefore, the objective of this paper is to propose an application combining a ResNet model and image manipulation to improve the accuracy of classification on CIFAR-10 The estimated accuracy of the classifier is around 93% on the test set, and this result is better than that of the CNN and Attentive CutMix ResNet-34

II THEORETICAL BACKGROUND

A Image Augmentation

In some cases, deep learning networks may give too high accuracy on the training data but achieve unreliable results on the test data Image augmentation is a solution to this situation It generates new data from original data, but new patterns still keep the original nature of patterns On the basis

of data diversity, deep learning networks decrease validation errors and increase test accuracy

There are two practical approaches to image augmentation: image manipulation and deep learning

However, the experiments in this paper and published studies [7,8,9] apply image manipulation to the problem of image classification Thus, the paper only presents an overview of image manipulation

Image manipulation needs a small amount of memory to transform and store data It takes a lower computational cost

if compared with the deep learning approach Generally, image manipulation [6] includes geometric transformations, color jitter, mixing images, and several other techniques

Typical geometric transformations are shifting, flipping, cropping, and rotation When images are taken in the wild, they do not just contain informative regions of objects, so a classifier sometimes predicts labels of patterns incorrectly

Cropping can reduce the confusion possibility of classification for such images The use of geometric transformations does not guarantee effectiveness for every data set For a data set including patterns of letters and digits, rotation or flipping changes shapes of patterns, so labels of patterns are incorrectly classified Nevertheless, for images

of objects in the wild, rotation or flipping does not lose labels

of patterns In Fig.1, observers can see a kind of object in some images after a series of transformations

Color jitter is another technique of image manipulation

For the problem of letter classification, images of letters are relatively simple, and they are usually converted into binary images, so color jitter is not really necessary By contrast, images of objects in the wild are much more sophisticated and the poor quality of images will reduce the effectiveness

of classification In this case, color jitter may bring noticeable effects for data augmentation Color jitter consists of brightness change, hue, and saturation adjustment Brightness change makes dark images get brighter Over-saturated

Trang 2

Fig 1 A series of translations and rotations for a pattern

images look so artificial, whereas many actual images often

give impure colors Hence, the brightness, saturation, and hue

of such images need to be adjusted

Mixing images has been seen as a potential technique for

data augmentation It combines patterns into new training

instances CutMix [7] is a typical example of this technique

For each pair of images, it replaces a removed region on the

first image with a patch from the second image The ground

truth labels are mixed proportionally to the area of the

patches New training instances of CutMix do not lose nature

if compared with a few regional dropout strategies [10,11]

But CutMix is unable to capture the most informative regions

on images Attentive CutMix [8] adjusts the strategy of

CutMix, it takes out a 7×7 grid map from the first image and

picks top N (the optimal value is in the range of 1 to 15)

attentive patches These patches are pasted onto the second

image at their respective original locations (images have the

same size)

B Batch Normalization

Training neural networks might become ineffective if

they encounter high learning rates or too small weights [14,

16] when carrying out back-propagation This loses the

learning ability and does not enhance the performance of

networks An ordinary solution to the problem of vanishing

gradient is using ReLu and choosing small learning rates But

this way is not good enough for vanishing gradients Batch

Normalization [5] (BN) is a better alternative to ReLu, it

normalizes input data and speeds up the convergence of

learning networks In fact, BN stabilizes the growth of

parameters during training phases, so networks are able to

work with a broader range of learning rates without the risk

of divergence

There are opposing viewpoints about the link between BN

and ICS [17], or the link between BN and the exploding

gradient problem [14] One viewpoint indicates that the use

of BN improves the accuracy of networks, but it does not

decrease ICS in several test cases Another opinion shows

that adding BN layers may exacerbate the problem of

exploding gradient [14] Nonetheless, the experiments in [17]

do not deny a clear improvement in terms of gradient change

and loss variation for VGG networks Furthermore, BN

allows a VGG network (different learning rates) to achieve

acceptable results on the test data

In practice, BN does not carry out normalizing the entire

training set at a time Instead, it splits the training set into

mini-batches Next, BN calculates the mean and the variance

over each mini-batch, as described in (1) and (2) Afterward,

BN normalizes each activation, then each normalized activation will become the input for each transformation in



1

 

1

2 1 m( B)

i i





 















2

ˆ

B B i i

x

x 

y i xˆi (4) For deep learning networks such as CNN [3], BN operates as a layer, which usually goes with ReLu functions and convolutional layers In learning networks, one convolutional layer can receive BN(x) as the input data instead of x

C The Overview of ResNet

As mentioned above, the problem of vanishing gradient

in learning networks can be addressed by a solution such as

BN However, there are still difficulties in optimizing deep learning networks The degradation problem is exposed when the deep of networks increases: networks generate more training errors and the accuracy gets saturated In this situation, over-fitting [15] is not a reason

ResNet is a deep learning network overcoming the degradation problem It shares the idea of LSTM and components of CNN Nevertheless, it does not have gates controlling data flow in units ResNet builds residual blocks

in which the activation of any deeper block is the sum of the activation of a shallower block and a residual function

Kaiming He and partners investigate the benefits of identity shortcuts [1,2], which make ResNet get higher accuracy ResNet includes residual blocks, and each one has

an overview structure as illustrated in Fig.2 In one residual block, a ReLu and weight layers are placed alternatively To accelerate the convergence of ResNet, batch normalization may be inserted into each block Moreover, ResNet also includes pooling layers

Let xl and f(F(xl) + h(xl)) denote the input for the lth

residual block and the output of this block, respectively F is defined as a residual function, which includes two or three convolutional layers If F only contains a layer, it will bring fewer advantages The identity mapping is h(xl) = xl and f is one ReLu function

From these hypotheses, the authors indicate that the

output of the lth (l from 0 to L-1) unit is the summation of the

outputs of all previous residual functions In an extremely

deep learning network, when the identity mapping in the lth

layer is replaced with h(xl) = λlxl, the authors obtain an equation as follows:

( 1 ) 1( 1 ) ( , )

1 i i

L l

L l L

l i











   

The factor 

 1

l i

 in (5) can be exponentially large if all λi

> 1, and the factor can be exponentially small if all λi < 1

(i from l to L-1) This result will cause exploding or vanishing

Trang 3

Fig 2 A residual block

λi = 1, the gradient will not vanish in each layer when the

weights are arbitrarily small [2]

Different techniques do not perform better than identity

shortcuts For example, the use of exclusive gating generates

more test errors than identity shortcuts in the ResNet-110

The authors also investigate 1×1 convolutional shortcuts

giving the poor performance of the ResNet-110

Identity shortcuts do not take extra parameters and do not

increase too much computational complexity ResNets are

able to be trained by optimization algorithms, and they are

easy to be implemented with basic libraries without needing

much modification

III EXPERIMENT AND COMPARISON

A The Application and Network Model

To build the ResNet and the experimental application, this

paper uses Python language and the necessary libraries such

as PyTorch, Keras, etc As shown in Fig.3a, the application

consists of the data augmentation, training, and classification

functions Before performing the training phase, the patterns

in the training set are augmented to minimize the difference

between patterns of the training and validation data After

finishing the training phase, the classifier can predict the

output for the test set

The function of image manipulation combines

transformations including horizontal flipping, random

cropping, random rotation, and color jitter Geometric

transformations change the directions and shapes of patterns

while color jitter is used to adjust the brightness, saturation,

and hue of patterns Patterns are randomly rotated with small

angles in the range of -5o to 5o The application uses the vision

library of PyTorch to implement image manipulations on the

training data, and this function takes a short time to finish the

task

Fig.3b represents the ResNet containing six convolutional

blocks and three residual blocks, it seems like an abridged

version of ResNet-18 Although the ResNet has a smaller

number of residual blocks, two models have an insignificant

difference in the number of filters on each convolutional

layer Besides, when training CIFAR-10, the ResNet takes

less time than ResNet-18

In the ResNet, each convolutional block includes one

convolutional layer while each residual block includes two

convolutional layers and an identity shortcut Every block has

at least a BN layer and a ReLu activation, but only several

blocks have pooling layers In each block, convolutional

layers and BN layers are placed alternatively The 3×3

convolutional layers have from 64 to 512 filters The last

layer of the model acts as one fully connected layer, which

converts data in the previous layers into one-dimensional

data From that, the classifier will estimate the output labels

Like other learning networks, the ResNet needs to

Fig 3 (a) The functions of application; (b) the ResNet model integrate with an optimizer, which allows the training process

to decrease the number of training errors and validation errors This leads to an increase in terms of accuracy on the test set In this model, the application chooses Adam [4]

B The Experiments and Comparison

The application experiments with the ResNet classifier on CIFAR-10 It contains 60000 samples, which are divided into three sets (the training, validation, and test data) in the ratio

of 4:1:1 The validation set is used for tuning hyper-parameters in training phases and making the performance of the test set better The effectiveness of the classifier is evaluated by the loss and the accuracy on both the training set and the test set

The position of BNs in the blocks makes slightly different outcomes on the validation set: If the BNs are first executed

in the blocks, the accuracy of the ResNet stably increase during the training phase; if the BNs are executed after the convolutional layers, the ResNet generates the fluctuating accuracy in the middle epochs, as illustrated in Fig.5 Nonetheless, the first choice does not give better overall results of the validation data, and the accuracy of the test data also decreases a slight amount

As shown in Fig.4, the training error reduces quickly, so after finishing the phase, the training loss is approximately 6% In other words, the ResNet gives very high classification accuracy on the training set (over 98%) This result does not reflect the benefits of image manipulation because the non -augmentation ResNet also gives an extremely low loss (below 0.5%) In Table I, the results show a little increase in the accuracy of the ResNet on the validation data Furthermore, the loss of the ResNet is much lower than that

of the non-augmentation ResNet on the validation set (0.26

Trang 4

Fig 4 The loss of the ResNet during a training phase

Fig 5 The accuracy of the ResNet on the validation set

Fig 6 The confusion matrix of the ResNet on the test data

with 0.45), and the accuracy of the ResNet increases by 3.4%

on the test data (from 89.6% to 93.0%)

According to the confusion matrix in Fig.6, incorrectly

classified rates of ten classes are really low Generally, the

maximal correct classification rate belongs to the class of

automobile (over 0.96) In contrast, the minimal correct

classification rate usually belongs to the class of cat (over

0.84) because the ResNet confuses many cat objects with dog

objects

To make the comparison fair, the application compares

the ResNet with the CNN (unchanged proportions for three

TABLE I I MAGE M ANIPULATION I MPROVES T HE A CCURACY

Classifier Training data Loss on Validation data Accuracy on

Non-augmentation

TABLE II C OMPARING T HE R ES N ET W ITH T HE C NN

Classifier Training data Loss on Accuracy on Test data

TABLE III A PPLYING M IXING I MAGES T O R ES N ET -34

Method Accuracy on Test data

Attentive CutMix 0.9040 CutMix 0.8875

sets of CIFAR-10) Both classifiers take the augmented images as the training data The CNN has 8 convolutional layers, 4 max-pooling layers, 1 fully connected layer, and some BN layers The 3×3 convolutional layers of this network also have from 64 to 512 filters

In the experiment, the ResNet outperforms the CNN on both the training and test data, as shown in Tables II

Although the CNN classifier has a quick convergence in the first half of the training phase, its accuracy on the validation data gets saturated in the last epochs Finally, it achieves 87%

accuracy on the test data Meanwhile, after 30 epochs, the ResNet gets around 93% accuracy

Applying mixing images to ResNet improves the accuracy of classification on CIFAR-10 From the reports in

a recent study [8], the method of Mixup has the most ineffective performance, but it also gains 1.58% accuracy improvement over the baseline method Attentive CutMix is able to capture the most informative regions on images, so its accuracy improvement exceeds that of CutMix (3.28% with 1.63%) Consequently, CutMix ResNet-34 only gains 88.75% accuracy while Attentive CutMix ResNet-34 gains 90.40% accuracy (Table III) However, mixing images does not bring more advantages than geometric transformation and color jitter

IV CONCLUSION This paper aims at building an application combining a ResNet model and image manipulation to improve the accuracy of classification on CIFAR-10 The experiments in the paper and the reports from recently published studies show that the use of geometric transformation and color jitter

is a suitable alternative to mixing images The ResNet achieves the high accuracy of image classification, with around 93% on the test data The classifier obtains an accuracy increase of 3.4% over the non-augmentation ResNet Additionally, this growth outweighs that of Attentive CutMix ResNet-34

Trang 5

REFERENCES [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun “Deep

residual Learning for Image Recognition”, Conference on Computer

Vision and Pattern Recognition, June 2016

[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Identity

Mappings in deep residual Networks”, European Conference on

Computer Vision, September 2016

[3] Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir

Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang

Wang, Jianfei Cai, and Tsuhan Chen, “Recent Advances in

convolutional neural Networks”, Elsevier, October 2017

[4] Diederik P.Kingma and Jimmy LeiBa, “Adam: a Method for Stochastic

Optimazation”, ICLR, 2015

[5] Sergey Ioffe and Christian Szegedy, “Batch Normalization:

accelerating Deep Network Training by Reducing internal covariate

Shift”, vol.37, Proceedings of the 32 nd International Conference on

Machine Learning, July 2015

[6] Connor Shorten and Taghi M Khoshgoftaar, “A survey on Image Data

Augmentation for Deep Learning”, Journal of Big Data, 2019

[7] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun,

Junsuk Choe, and Youngjoon Yoo, “CutMix: Regularization Strategy

to train strong Classifiers with localizable Features”, International

Conference on Computer Vision, August 2019

[8] Devesh Walawalkar, Zhiqiang Shen, Zechun Liu, and Marios

Savvides, “Attentive Cutmix: An enhanced Data Augmentation

Approach for deep Learning based Image Classification”, International

Conference on Acoustics, Speech and Signal Processing, May 2020

[9] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz, “Mixup: Beyond Empirical Risk Minimization”, ICLR, April

2018

[10] Terrance DeVries and Graham W.Taylor, “Improved Regularization of convolutional neural Networks with Cutout”, arxiv.org/abs/1708.04552, November 2017

[11] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang,

“Random Erasing Data Augmentation”, arxiv.org/abs/1708.04896, November 2017

[12] Sonika Dahiya , Rohit Tyagi , and Nishchal Gaba, “Comparison of ML

https://easychair.org/publications/preprint_open/KnC4, July 2020 [13] Karttikeya Mangalam and Vinay Prabhu, “Do deep neural Networks learn shallow learnable Examples First?”, Proceedings of the Workshop on Identifying and Understanding Deep Learning Phenomena at 36 th International Conference on Machine Learning,

2019

[14] George Philipp, Dawn Song, and Jaime G Carbonell, “Gradients explode - Deep Networks are Shallow - ResNet explained”, International Conference on Learning Representations, 2018 [15] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: A Simple Way to prevent neural Networks from Overfitting”, Journal of Machine Learning Research,

2014

[16] Yoshua Bengio, Patrice Simard, and Paolo Frasconi, “Learning long- term Dependencies with Gradient Descent is difficult”, IEEE Transactions on Neural Networks, February 1994

[17] Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander M adry, “How does Batch Normalization help Optimization?”, 32 nd

Conference on Neural Information Processing Systems, 2018

Tiêu đề	An application improving the accuracy of image classification
Tác giả	Pham Tuan Dat, Nguyen Kim Anh
Trường học	Vietnam Maritime University
Chuyên ngành	Information Technology
Thể loại	Conference on Information and Computer Science
Năm xuất bản	2021
Thành phố	Hai Phong

Định dạng
Số trang	5
Dung lượng	851,19 KB