Da Nang University of Technology, Department of Electronics and Telecommunications ABSTRACT This paper presents the results of handwritten digit recognition on well-known image databas
Trang 1HANDWRITTEN NUMBER RECOGNITION AND ITS APPLICATION AT
DANANG UNIVERSITY OF TECHNOLOGY
Authors: Duong Thi Kim Cuc, Dinh Quang Huy, Tran Hoang An, Nguyen Van Trong
Da Nang University of Technology, Center of Excellence, ECE08
Advisors: Hoang Le Uyen Thuc, M.S., Pham Van Tuan, Ph.D
Da Nang University of Technology, Department of Electronics and Telecommunications
ABSTRACT
This paper presents the results of handwritten digit recognition on well-known image databases using state-of-the-art feature extraction and classification techniques The tested databases are obtained from MNIST [1] and collected samples of digits handwritten by teachers at
Da Nang University of Technology For feature extraction, two features are chosen: Hu’s seven moments and image averaging (resizing the images to ones of less number of pixels for easier comparison) The preceding features are accompanied with corresponding classifiers, which are Neural Network classifier and Euclidean Distance So far with the dictionary for matching collected
at Da Nang University of Technology, the combination of image averaging feature and the Euclidean Distance gives the best accuracies (more than 93%) and can further be improved with a more comprehensive database
1 Introduction
One of the most troublesome and tedious tasks teachers at Da Nang University of Technology generally face is to manually put the exam grades into computers This project aims at providing them with the convenience of not having to copy the grades by hands, by presenting a method of automatically importing grades into computers This technique employs a well-known procedure in pattern recognition called OCR (optical character recognition)
The performance of character recognition largely depends on the feature extraction approach and the classification/learning scheme For feature extraction of character recognition, various approaches have been proposed Hu’s seven moments have been extensively employed as invariant global features of images in pattern recognition Averaging is a rather simple process of representing a square of pixels by a single pixel, leading to an image being expressed by a smaller image
An artificial neural network (ANN) consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation When the network structure is appropriately designed and the training sample size is large, neural networks are able to give high classification accuracy to unseen test data OCR using template matching is a system prototype that useful to recognize the character or alphabet by comparing two images of digits We implement template-matching technique, which involves optimizing the Euclidean Distance between the patterns to be recognized with the sample patterns provided by the dictionary
2 Main process
Trang 22.1 Proposed System
The scanned image is first preprocessed to give the normalized image To ease the classification process, the normalized image is represented by a set of features for comparison Finally, the conversion from the JPEG file to xls takes place
Figure 1 Proposed System Overview
2.2 Preprocessing
There are four main steps in this stage Firstly, the scanned RGB image is
converted to gray scale image This process is completed using the formula Intensity = 0.2989*red + 0.5870*green + 0.1140*blue [2] Secondly, the image is thresholded to
obtain the binary one The thresholding level, which is chosen to be 70 in this case, depends on the quality of the scanned image and the background noise The output binary image has value of 1 (white) for all pixels in the input image with luminance greater than
70 and 0 (black) for all other pixels Thirdly, “Opening” morphology method [3] is applied
to smoothen the number and eliminate small noise regions Finally, normalization is used
to regulate the size, position and shape of the image so that the differences between samples in one class are reduced The key idea behind normalization involves bilinear
interpolation theory
depicted in the Figure
2
(a) (b) (c) (d) (e)
Figure 2 Preprocessing Steps (a) RGB Image (b) Gray-scale Image (c) Binary Image
(d) Image after Morphology (e) Normalized Image
2.3 Feature extraction
The features used in our experiment are Hu’s seven moments and image averaging
2.3.1 Hu’s seven moments (SM)
An essential issue in the field of pattern analysis is the recognition of objects and characters regardless of their position, size and orientation as illustrated in figure 1 The idea of using moments in shape recognition gained prominence when Hu (1962) [5], derived a set of invariants using algebraic invariants The two-dimensional (p + q)th order moments of an image with density function f(x, y) are defined in terms of Riemann integrals
Trang 3The central moment are defined as:
(2) (3)
In particular, Hu (1962), defines seven values, computed by normalizing central moments through order three, that are invariant to object scale, position, and orientation
2.3.2 Image Averaging
Since the matrix expressions for each of the ten numbers from 0 to 9 are very different, it is reasonable to recognize them by checking each ‘pseudo pixel’, which is represented by a 4x4 block in a particular image number A 4x4 block has its own averaging value and can be considered a ‘pixel’ By choosing 4x4 blocks, we can reduce the complexity of the recognition process but still maintain the shape of the image Figure
3 shows the number zero images before and after the averaging
Figure 3 Example of Image Averaging (a) Initial Image (b) Average Image
2.4 Classification algorithm
2.4.1 Artificial neural network
Artificial neural network [6], which are inspired from studies of biological nervous systems, are composed of many simple nonlinear computational elements (neurons or nodes) which are connected by links with variable weights The inherent parallelism
of these networks allows rapid pursuit of many hypotheses in parallel, resulting in high computation rates Moreover, they provide a greater degree of robustness or fault tolerance than conventional computers because of the many processing nodes, each of which is responsible for a small portion of the task Damage to a few nodes or links thus does not impair overall performance significantly Neural networks can perform different tasks, one of which is in the context of a supervised classifier This is a decision-making process which requires the net to identify the class or
Trang 4category which best represents an input pattern It is assumed that the net has already adapted to the classes it is expected to recognize through a learning process using labeled training prototypes from each category
Figure 4 General structural of a neural network [6]
2.4.2 Template matching using Euclidean Distance
The Euclidean Distance [7] is based on the smallest ‘distance’ or error between the testing samples and a dictionary that is built up in advance
2
1
1
k
j
3 Experimental results
Two different features and classifiers result in four experiments: Hu’s SM and Neural Network, SM and Euclidean Distance, Image Averaging and Neural Network, and Image Averaging and Euclidean Distance
Table 1: Errors rates for data from MNIST Table 2: Errors rates for data from DUT (Over 1000 samples) (Over 90 samples)
Figure 6 shows the actual result from the Graphic User Interface (GUI)
Hu’s seven moments
Image averaging Neural
Network
Euclidean
Distance
Hu’s seven moments
Image averaging Neural
Network
Euclidean Distance
Trang 5
Figure 6 The experimental results of DUT and MNIST databases (a) DUT database
Figure 7 introduces the GUI The user types the name of the JPEG file and the
corresponding number of students, and then clicks Convert button to get an xls file containing
extracted scores To view the xls file, the user clicks Open to view xls file button
Figure 7 Graphic User Interface
4 Conclusion
The experimental results indicate that Image Averaging and Euclidean distance
give more stable and smaller errors than the combination of Neural Network and SM,
while the best performance is obtained using Neural Network classifier From these
statistics, we decided to implement Image Averaging and Euclidean Distance in the final
Number Recognition Product In the future, this program will be upgraded to recognize the
score written in decimal number (such as 9.5 or 10) Also, a score written in word
recognition system will be added for checking the result
REFERENCES
[1] Yann LeCun, Corinna Cortes, The MNIST Database of Handwritten Digits,
http://yann.lecun.com/exdb/mnist/
[2] http://www.mathworks.com/help/toolbox/images/ref/rgb2gray.html
[3] Rafael C Gonzalez and Richard E Woods, Digital Image Processing, second
edition (2001) P.528-532
[4] http://en.wikipedia.org/wiki/Bilinear_interpolation
[5] Ming – Kuel Hu, “Visual Pattern Recognition by Moment Invariants” (1962)
[6] V Venugopal, W Baets, Neural Networks and Statistical Techniques in Marketing
Research: A Conceptual Comparison (1994), Vol 12 Iss: 7, pp.30 – 38
[7] Cheng Liu Liu, “Handwritten Digit Recognition: Benchmarking of state-of-the-art
techniques”, Elsevier Ltd, (2002)