Handwriting recognition based on convolutional neural network

In this paper, authors present the overview model of convolutional neural network including major function in each layer and transformations in this network. Based on that theory, authors have designed an application of English handwriting recognition using convolutional neural network as well as having comparative tests between some recognition algorithms.

Trang 1

HANDWRITING RECOGNITION BASED ON CONVOLUTIONAL NEURAL NETWORK

PHAM TUAN DAT, LE THE ANH

Faculty of Information Technology, Vietnam Maritime University

Abstract

Recent years, there is a new research approach based on convolutional neural network which is known as one of the most advanced deep learning models In fact, convolutional neural network has been widely applied in many artificial intelligence problems such as object recognition, feature detection and text classification To meet the need of those problems applications must have an efficient algorithm In this paper, authors present the overview model of convolutional neural network including major function in each layer and transformations in this network Based on that theory, authors have designed an application

of English handwriting recognition using convolutional neural network as well as having comparative tests between some recognition algorithms

Keywords: Convolutional, perceptron, nearest neighbor, feature map, pooling, receptive field,

backpropagation, cross - entropy

Tóm tắt

Những năm gần đây, có một hướng nghiên cứu mới dựa trên mạng nơron nhân chập được biết như một trong những mô hình học sâu tiên tiến nhất Thực tế mạng nơron nhân chập được ứng dụng rộng rãi trong nhiều bài toán trí tuệ nhân tạo như nhận dạng đối tượng, phát hiện đặc trưng hay phân loại văn bản Để đáp ứng yêu cầu của những vấn đề trên, các ứng dụng phải có một giải thuật hiệu quả Trong bài báo này, nhóm tác giả giới thiệu

mô hình tổng quan mạng nơron nhân chập gồm chức năng trong mỗi tầng và những phép biến đổi trong mạng này Trên cơ sở đó, nhóm tác giả thiết kế một ứng dụng nhận dạng chữ viết tay tiếng Anh sử dụng mạng nơron nhân chập cũng như có sự so sánh kết quả giữa một số giải thuật nhận dạng

Từ khóa: Nhân chập, giải thuật học có giám sát, láng giềng gần nhất, bản đồ đặc tính, tổng hợp,

trường tiếp nhận, truyền ngược, độ đo số bít tối thiểu cho mã hóa

1 Introduction

Convolutional neural network (CNN) is an advanced deep learning model on which different artificial intelligence problems may be supported For instance, major problems such as feature detection and object recognition from digital image can be solved by applying CNN model Currently many applications of face recognition or natural language processing were installed on different platforms of computers or mobile devices

Compared with other recognition algorithms, neural network has abilities of learning, fault tolerance, and classifying samples into different classes CNN is improved from traditional neural network, so that it does have above advantages and also offers high recognition accuracy Furthermore, training algorithm in neural network can generate parameters for the input of various models such as support vector machine All network models could be applied for artificial intelligence fields but only some of them are efficient enough for recognition problems Therefore in the paper

authors present the overview model and comparative experiments of English handwriting recognition

based on CNN, multilayer perceptron network (MLP), and the other deep learning model called nearest neighbor algorithm

2 Theoretical Background

CNN is different from MLP in some points: Firstly, CNN consists of hidden layers that are linked together through convolutional operations and non-linear functions Secondly, in CNN, each neuron of hidden layer is only connected to some neurons in local region of input layer Lastly, CNN works on basic concepts such as local receptive field, shared weight and pooling

Local receptive field: let input of network be a digital image with size 28*28 and divided into

regions with size 5*5 as depicted in the Figure 1 If the window is moved by one pixel in order (from left to right and from top to bottom of image) then 24*24 regions of image will be generated according

to 24*24 neurons at convolutional layer, and this transformation is known as a feature map

Trang 2

Figure 1 Neuron is created from the local region

Shared weight: each feature map has 26 weights or parameters, including 25 shared weights

and 1 bias parameter, so that 6 feature maps will produce 156 weights In fact, recognition applications always have dozens of feature maps

Figure 2 Max - pooling procedure

Pooling: Its task is to reduce number of neurons, each region including 2*2 neurons will create

1 neuron at next layer Two helpful procedures for pooling are called max - pooling and l2 - pooling Figure 3 describes the overview model of one CNN The structure of network consists of input and output layer, 2 convolutional layers, 2 pooling layers, 1 fully connected layer The input size is 28*28, feature map number is 6, output layer allows to recognize labels from 0 to 9 Of course if the size of input or class number is large, then network must have more layers

Figure 3 The overview model of one CNN

Transformations from local regions to 1st convolutional layer:

























 

x

p p

p

e x

j i p

b v u k v j u i I j

i C

1

1 ) (

24

1 ,

; 6

1

) ) , ( ) , ( (

) ,

2

2 2

2 1



(1)

Activation equation in this case is sigmoid but CNN providing another one is tanh function Transformations from 1st convolutional layer to 1st pooling layer:













12 1

; 6 1

2 2 4

1 1

0 1

0

1 1

i, j p

v) j

u, i ( C (i,j)

S

p p

(2)

Trang 3

Transformations from 1st pooling layer to 2nd convolutional layer:

















8 1

; 12

) ) , ( ) , ( (

) , (

6

1

2 2

, 2

2 2

2

1 2

i, j q

b v u k v j u i S j

i C

p

p q

p

u v

p

q 

(3)

Transformations from 2nd convolutional layer to 2nd pooling layer:

i, j q

v) j

u, i ( C (i,j)

S

q q













4 1

; 6 1

2 2 4

1 1

0 1

0

2 2

(4)

In Equations (2) and (4), the procedure used is l2 - pooling

Transformations from 2nd pooling layer to fully connected layer:

)

* (

) } ({ 2 1 12











b f W y

S F



Estimation error between true label and prediction: there are two choices including cross

- entropy and mean squared error Recent research showed that applying cross - entropy cost is better than applying mean squared error cost in neural network [4] Furthermore, to increase the accuracy of recognition problem then CNN must use backpropagation algorithm with the detail content described in [1], [3]

3 Experimental Work

Authors have carried out experiments of English handwriting recognition with three recognition

algorithms based on CNN [2], MLP, and nearest neighbor Toolkits for the application are Python

and relevant libraries

Patterns were collected from “www.ee.surrey.ac.uk/CVSSP/demos/chars74k/” including 62 classes (0 - 9, a - z, A - Z) Of all the patterns, digits number is 484 while upper letters and lower letters account for 1297 and 1139 patterns respectively The number of patterns for training and testing was splitted by rate 7:3

Model of the CNN structure in experiments is the same as the above model but having some differences: the size of input data is 28*40, kernel number is 16/32, learning rate parameter is 0.01; the procedure used in transformations from convolutional layers to pooling layers is max – pooling,

the estimation function is cross - entropy

MLP network has 3 hidden layers with 1024 features on each one and like CNN, it also uses the cross - entropy function while the estimation function of nearest neighbor algorithm is manhattan distance (l1 norm) [5]

The process for training and testing upper letters on CNN:

Figure 4 Training and testing upper letters on CNN

The experiment detects handwriting letters from digital image according to labels of the above model:

Figure 5 Recognizing lower letters

Trang 4

Figure 6 describes the recognition result of algorithms CNN model recognizes well for upper letters patterns, MLP network also gives rather good results while nearest neighbor achieves the highest accuracy Nevertheless, with lower letters both algorithms don’t give recognition results as expectation

Digits Lower letters Upper letters

Figure 6 The accuracy of recognition results

4 Conclusion

Nowadays, the problem of English text recognition is not a new subject Actually, many applications have been implemented on the multilayer perceptron network approach For the difficult problem like English handwriting recognition, authors present a new approach based on CNN model and compare to some other methods about the accuracy of recognition result Through the experimental work, authors obtained the results of training and testing on CNN more exactly than those on other networks In addition, one remarkable advantage of neural networks is the ability to change the number of layers in the network structure to improve accuracy However, the recognition result based on CNN is not really higher when compared with nearest neighbor algorithm On the other hand, the limitation of CNN has not been resolved completely, if complex shapes of patterns

or the quality of input data is not good then the recognition efficiency will decrease

REFERENCES

[1] Gavin Hackeling, Mastering Machine Learning with scikit - learn, Packt Publishing, October 2014 [2] Rodolfo Bonnin, Building Machine Learning Projects with TensorFlow, Packt Publishing, November 2016

[3] Zhifei Zhang, “Derivation of Backpropagation in Convolutional Neural Network (CNN)”, October 2016 [4] Michael Nielsen, Neural Networks and Deep Learning, Determination Press, 2015

[5] Deepak Sinwar and Rahul Kaushik, “Study of Euclidean and Manhattan Distance Metrics using Simple K-Means Ckustering”, International Journal For Research In Applied Science And

Engineering Technology, Vol.2 IssueV, May 2014

Received: 11 January 2018

Revised: 22 January 2018

Accepted: 26 January 2018

Định dạng
Số trang	4
Dung lượng	536,94 KB