Experimental Analysis of Neural Network Based Feature Extractors forCursive Handwriting Recognition Ling Gang, Brijesh Verma and Siddhi Kulkarni School of Information Technology, Griffit
Trang 1Experimental Analysis of Neural Network Based Feature Extractors for
Cursive Handwriting Recognition
Ling Gang, Brijesh Verma and Siddhi Kulkarni School of Information Technology, Griffith University-Gold Coast Campus
PMB 50, GCMC, Qld 9726, Australia E-mail: B.Verma@mailbox.gu.edu.au, S.Kulkarni@mailbox.gu.edu.au
Web: http://intsun.int.gu.edu.au
ABSTRACT
Artificial neural networks have been widely used in many
real world applications including classification of cursive
handwritten segmented characters However, the feature
extraction ability of MLP based neural networks has not
been investigated properly In this paper, a new MLP based
approach such as an auto-associator for feature extraction
from segmented handwritten characters is proposed The
performance of Auto-Associator (AA), Multilayer
Perceptron (MLP) and Multi-MLP as a feature extractor
have been investigated and compared The results and
detailed analysis of our investigation are presented in the
paper.
1 INTRODUCTION
1.1 Motivations and aims of the research
There are a number of classification techniques widely
used by researchers in many real world applications
However, there are a very few researchers who have
tried MLP based neural networks as a feature extractor
The need for research to further improve and embellish
current character recognition techniques has been widely
recognized It is also recognised that the types of feature
extractors used contribute to some of the errors caused
Therefore a need can be seen to find a new feature
extractor and investigate the NN-based feature extraction
techniques, to show which are indeed the best and most
efficient techniques to use
1.2 Background
There are only a few empirical comparative studies of
NN-based feature extraction paradigms have been made
The paradigms in Mao and Jain [1] are compared only
for exploratory data projection and two-dimensional
classification and in Lerner et al [2] only for one
database In the research carried out by Boaz Lerner,
Hugo Guterman, Mayer Aladjem [3]; the complex
architectures of more than two layers were not
considered as candidates for the classifier and the
number of output units was three, which is quite small
The comparative studies of different MLP-based feature
extraction have not been considered yet So it is
necessary to do more over this issue, the primary aim of
this research is to investigate the feature extraction
ability of Auto-Associator, MLP and Multi-MLP to
determine which one is more suitable and reliable to be used in real-world handwriting character recognition systems
The origins of character recognition [4-6] can be found
as early as 1870 It first appeared as an aid to the visually handicapped, and the first successful attempt was made
by the Russian scientist Tyurin in 1900 [7] From then
on, many papers about neural networks [8-15] and applications have been presented and widely used in pattern recognition areas, the modern version of character recognition appeared in the middle of the 1940s with the development of digital computers Thenceforth it was realized as a data processing approach with application to the business world The principal of motivation for the development of character recognition is the need to cope with enormous quantities
of paper such as bank checks, commercial forms, government records, credit card imprints and mail sorting generated by the expanding technological society Presently, the methodologies in character recognition have advanced from the earlier use of primitive techniques for the recognition of machine printed numerals and a limited number of English letters to the application of sophisticated techniques for the recognition of a wide variety of complex handwriting characters, symbols and word/script
1.3 Organization of the paper
This paper consists of five sections Section 1 presents the motivations and background Section 2 details research methodology, proposing and describing the methods that were employed in this research Section 3 details experimental results, listing the results obtained during the experiments Section 4 provides a discussion,
an analysis of the experimental results and compares the three different techniques that have been investigated Section 5 provides conclusions that have been drawn from this research
2 PROPOSED RESEARCH METHODOLOGY
The Figure 1 below outlines the proposed research methodology and it is described in the sections below
Trang 2Figure 1 Block diagram of research methodology
2.1 Character acquisition and preprocessing
Before experiments could be carried out, there was a
need to process the original images The techniques
employed to prepare input files for various techniques
are discussed in the following sections
2.1.1 Character database acquisition
The training and test characters/words used in this
research came from the following directories on CEDAR
CD-ROM (Benchmark Database):
TRAIN/BINANUMS/BD/*
TEST/BINANUMS/BD/*
TRAIN/CITIES/BD/*
TEST/CITIES/BD/*
All the images were black & white lower case characters
and stored in PBM format All useless white spaces
around the images have been removed
2.1.2 Character resizing
Resizing was the first technique used to process the
images The resizing process partially employed an
existing program written by R Crane [16] using the C
programming language, which was modified by us, and
all the images were resized to 30 rows by 40 columns
2.1.3 Chain code feature extraction
For all the training and test characters, the character
images were first boundaried, all the pixels of each
image were changed to the background colour, except
the outmost ones of the image The images were then
processed using chain code technique with 8 directions
After chain coding, each image was divided into some
small sub-images whose size was 10 rows by 10 columns The numbers of each direction in a single sub-window were added, and these numbers were recorded to
be used later After all the images were chain coded, all the numbers were divided by the biggest one among them to create the inputs whose maximum value was 1 and minimum value was 0, each character had 12 * 8
=96 inputs
3 EXPERIMENTAL METHOD
There were a total of three character feature extraction/recognition techniques investigated in this research: AA, MLP and Multi-MLP The BP algorithm was employed as a common training algorithm The networks used were feed-forward neural networks There was only one hidden layer for all kinds of networks in this research The number of neurons in the input layer of all these extractor was governed by the size of the sub-windows of the training characters Training character matrices had 30 rows by 40 columns, and the sub-window had the size 10 rows by 10 columns So each image had 12 sub-windows, since each sub-window contributed 8 elements, the number of units in the input layer was 96
3.1 Auto-associator (AA) feature extractor
An AA, as its name implies, is a network that learns an input-output mapping such that the output is the same as the input, in other words, the target data set is identical to
the input data set Hence, an AA has configuration of d: m: d with d units in both the input and output layers and m<d in the hidden layer.
The dimensionality of the input and output is obviously the same and the network is trained, using the error back propagation algorithm, to generate output vectors o as close as possible to the input vectors x by again minimizing the mean squared error over all patterns in the training set:
∑∑
−
p k k
p k p
k x o
E
1 1
2 ) (
2 / 1
where ok p represents the kth output of the nth input
vector xn=(x1n….xn k, …xn d) and N is the number of training patterns
The key aspect of the auto associative MLP is that the number of hidden units at the center of the network is usually chosen to be much smaller than the input/output dimensionality As a result of this bottleneck, the hidden units extract a low-dimensional representation of the input data and such a network can therefore be used for feature extraction
Trang 3To be consistent with all the neural networks being
compared in this research, the training file was set to the
same format, they all consisted of two parts The first
part was obtained by chain coding as described
previously For the AA, the second part was exactly the
same as the first part
3.2 MLP
The values of the MLP’s learning rate and momentum
were both set to 0.1 as those of the AA The number of
inputs and outputs were set to 96 and 24, respectively
The number of hidden units of MLP was set to 26
The input vector of the MLP was obtained by employing
the chain code feature extractor as described previously,
the second part was the output vector This part indicates
to the network which class the current character belongs
to
3.3 Multi-MLP
For the Multi-MLP feature extractor, the situation was
different The possible outputs governed the number of
neural networks instead of the number of output units
Each neural network had only 2 units in the output layer,
in other words, each neural network was dedicated to
recognize one letter The hidden units of these networks
were all set to 26
The input vector to the Multi-MLP was the same as for
the AA and MLP, but the desired output of the
Multi-MLP only had two classes
3.4 Criteria for training termination
When using the back propagation algorithm, the usual
criterion for termination is the reduction of the Root
Mean Squared (RMS) error to an acceptable level There
is no general level for the RMS error, however, the
smaller, the better It was found in experiments, the
convergence of the RMS error was very slow, therefore,
another criterion for termination that was considered,
was to set the network to stop training after a certain
number of iterations
4 EXPERIMENTAL RESULTS
The proposed approaches were all implemented and run
on SP2 supercomputer and on the NIAS (UNIX)
machine at Griffith University SP2 is an IBM-based
product that consists of eight RS/6000 390 machines and
14 RS/6000 590 machines connected by a high speed
switch The operation system is UNIX The
programming language used for implementation was C
This section shows the results obtained by using the three
neural network based character feature extraction
techniques In this research, the training data set consists
of 16 lowercase letters: a, b, c, d, e, h, i, l, m, n, o, r, s, t,
u, x After training, the three techniques were assessed by
their classification rates for both training data and test data There were many experiments conducted, all of them were very time consuming Sometime it took more than several days Only the most relevant results shall be shown in this and following sections
4.1 Preliminary results
Before training these networks with 16 letters, some preliminary experiments were conducted In these experiments, only the AA and the MLP were trained and,
instead of 16 letters, only four letters were chosen: a, c,
d, e All training characters were hand printed characters.
There were 96 training pairs Both the MLP and the AA were trained with 2000 iterations The number of test characters was 181 The results are displayed in Table 1 below There were two AAs were trained, one AA had
26 hidden units and the other AA had 96 hidden units
As can be seen from Table 1, the classification rates of the training data were 100% The classification rates for the test data were quite high as well All of them were between 82% and 87% Of course, the classification would be lower than these figures if the number of classes increased, but these figures were very promising for doing more research Subsequent experiments used more classes and larger datasets, more and more images were segmented from the handwritten words, which were very hard to recognize, so that these ANNs could be trained with a more diverse and challenging training set
TABLE 1 PRELIMINARY RESULTS
(#inputs: 96)
Classification Rate (%) MLP
/AA
No of hidden units
No
of outp uts
RMS
Training set
Test set
4.2 Classification rates for the MLP
After doing some preliminary experiments, the number
of letters in the training and test database were increased from 4 to 16 The numbers of characters in training and test database were increased as well The MLP based network is a very popular character recognition technique and has been widely used in many fields So the major comparison of work was conducted between the MLP and the AA The results in Table 1 (rows 1, 2) were obtained by training the MLP with 352 hand printed characters The test data set presented to the MLP was 280 hand printed characters The MLP was trained
Trang 4with 2000 and 5000 iterations, respectively As can be
seen from Table 2, the classification rate for the training
set was 99.7% for two MLPs, near 100% Test sets of
two MLPs obtained high figures as well: 78.2%
TABLE 2 CLASSIFICATION RATES USING MLP
(#inputs: 96, #outputs: 24, #hidden units: 26)
Classification Rate (%)
Training
pairs
No of
iterations
RMS
Training set
Test Set
The results in Table 2 (rows 3,4,5) were obtained by
adding more and more cursive characters to the previous
training data set At the beginning, there were no cursive
characters, there were only 352 hand-printed characters
in the training data set The number of cursive characters
was increased from 0 to 144, 304 and 599 Accordingly,
the number of training characters increased from 352 to
506, 656, 951
TABLE 3 CLASSIFICATION RATES FOR TWO MLPS
Classification Rate (%) Trai
ning
pairs
No of
Iteratio
ns
RMS
Training set
Test set
Test (top5)
The above table (Table 3) contains the classification
rates for two MLPs The result in first row was trained
by all 16 letters including l, but tested without l The
second one was trained and tested without l:
4.3 Classification rates for the AA
The comparison of performance between the AA and the
MLP was the major aim of this research The results in
Table 4 (rows 1,2,3,4) were obtained by training the AA
with 352 characters The test data set presented to the
AA was 280 characters To achieve better results, two
AAs with a different number of hidden units were
trained One AA had 26 hidden units, so the structure of
its classifier was 26 inputs, 26 hidden units and 24 output
units The second AA had 96 hidden units, accordingly,
the structure of its classifier was 96 input units, 26
hidden units and 24 output units The two kinds of AA were trained for 2000 and 5000 iterations respectively
TABLE 4 CLASSIFICATION RATES FOR AA
(#inputs: 96, #outputs: 96)
Classification Rate(%)
No of training chars
No of hidden units
No
of Itera tions
RMS
Training set
Test set
As can be seen from the above table, the AA with 96 hidden units outperforms the AA with 26 hidden units Therefore the subsequent experiments were only conducted using an AA with 96 hidden units
The results listed in Table 4 (last 3 rows) were obtained
by adding the previous training and test data sets, which contained only hand printed characters, to some cursive characters from data set B as used in the case with the MLP The number of iterations for all experiments was
5000 The number of hidden units of AA was 96 The classifier had 26 hidden units The training pairs were increased from 352 to 506, 656 and 951 The test set had 1,056 training pairs As can be seen from the above table, the classification rates increased when the number of training pairs increased Of course, since the number of training iterations for each AA was 5000, the classification rate for the training set decreased when the number of training pairs increased As the RMS error increased, the classification rates of the training set decreased, in this case, from 95.1% to 94.3%
Compared with its MLP counterpart, some letters
obtained higher classification rates For example, letter l
obtained a 17.9% increase in classification rate, whereas
letter a decreased by 11.9% No changes to the classification rate of letter x In total 8 letters of the AA
obtained higher classification rates than the MLP, 7 letters obtained lower classification rates, they were
letters a, b, d, e, h, i, m, respectively.
4.4 Classification rates for the Multi-MLP
Multi-MLP was the one that had 16 neural networks each
of which had only two classes The training files were different to each of these networks Each network was
Trang 5designed to respond to a particular letter Each network
had only two classes The first one was trained to
respond to a given letter The other one was trained to
respond to the rest of the letters Since it was very time
consuming, only one experiment was conducted so far:
the number of training pairs for each network was 951,
the same as for the MLP and the AA Some of its testing
results are listed in Table 5
TABLE 5 CLASSIFICATION RATES OF MULTI-MLP
(#inputs: 96, #outputs: 2, #hidden units: 26)
Classification Rate (%) Multi
-MLP
RMS
Training set
Test set
Test (top5)
These results were obtained by training the multi-MLP
with the same training characters as those used to train
MLP and AA and test characters were the same as those
of MLP and AA as well But the number of classes of
each network was only 2 rather than 24 The number of
iterations for all networks was 5000 As can be seen, the
RMS for different MLP were varied dramatically, and
the classification rates of these MLP were also different,
the highest one was 81.3%, the lowest one was 31.9%
5 DISCUSSION
5.1 Classification rate
As can be seen from the tables in previous section,
among three feature extraction techniques, the AA
provided the top results for character recognition: 61.9%
for the test set and 94.3% for the training set, The second
best one was the MLP, followed by Multi-MLP, whose
classification rate for the training set was 94.1%, and the
test set was 57.1% From the experiments we can
observe that increasing the number of training pairs for
the AA or the MLP is a good way to increase the
classification rates, but the characters added were all
cursive characters Does that increase the classification
rates of the hand printed character section which has 280
out of a total of 1056 characters in test dataset? The classification rates for only the 280 hand printed characters rather than the whole test set were calculated From above, we can observe that when the number of cursive characters in the training dataset of the MLP increased from 0 to 599, the classification rates for printed characters only increased by 0.7% Meanwhile, the classification rates for cursive characters dramatically increased from 30.4% to 53.2% As can be seen, when the number of cursive characters in the training dataset of the AA increased from 0 to 599, the classification rates for printed characters only increased by 1.9% Even so, it
is better than the MLP ‘s 0.7% increase Meanwhile, the classification rates for cursive characters dramatically increased from 26.2% to 55.9%, the percentage increase was 29.7%, while the MLP counterpart was 22.8%
5.2 General problems with classification rates
It was found that when the number of training pairs was increased to 951 and the number of iterations was set to 5,000, the best classification rate for handwritten characters using the three feature extraction techniques was less than 62% Increasing the number of iterations did little to improve the classification rates It was deduced that three main factors influenced the classification rates obtained by these techniques: Small training dataset, Difficulty of training and testing data, Resizing problems, Similarity of characters
5.2.1 Small training data set
As can be seen from the previous experiments, the classification rate increased when the number of training pairs increased For example, when the number of training pairs increased by 150 from 506 to 656 The classification rate increased by 3.4 percent When 295 more training pairs were added to obtain a total of 951, the classification rate increased by 5.2 percent accordingly A 60 percent classification rate is quite high considering that a maximum of 951 training pairs was used Due to time constraints, more training pairs were not employed
5.2.2 Difficulty of training and testing data The other factor that influenced the recognition rate was the nature of the handwritten data As the characters sampled were real world characters, it was found that the actual writing styles of two different people could be extremely diverse Misclassification could easily occur
by both human and automated systems The diversity of characters was not only shown between people, but also between samples of the same person This increased the difficulty of training and test data dramatically The following characters (Figure 2) are some samples for
lowercase b:
Trang 6Figure 2 Examples of lowercase b
The other important reason was that some training and
test samples were segmented from handwritten words,
sometimes they became very hard to recognize (Figure
3) even for a human
(a) (b) (c)
Figure 3 Examples of segmented characters (a) Lowercase r,
(b) Lowercase s, (c) Lowercase x.
5.2.3 Resizing problems
Because the sizes of training and test images were
different from each other, all the characters were resized
to the same size before being chain coded in order to
attain more comparable features However, one of the
major disadvantages of resizing is that it can cause some
of the character’s characteristics to be lost, and that may
be critical for extracting features
5.2.4 Similarity of characters
After analyzing the tables describing target and actual
outputs of each class, we found that some classes had
very high classification rates, whereas some others got as
low as 40% After further analysis, it was found that
some letters were easily recognized as some other
particular letters For example, the letter l was easily
recognized as letter e or letter i It has been found that
about 20 percent of some letters were recognized as the
other letters, such as 19.6 percent of l’s were recognized
as e’s.
6 CONCLUSIONS
We have investigated neural network based 3 feature
extraction techniques The Auto-associator feature
extractor proposed by us, achieved the highest
recognition rates The highest rate for difficult
handwritten characters from CEDAR benchmark
database were approximately 61.9% The classification
rate was quite high considering it only used 951 training
pairs and 5,000 iterations The classification rates of the
MLP were lower than those of the AA The best
classification result for handwritten characters was
60.0% (1.9% less than the AA) The recognition rates
and overall performance of the Multi-MLP were the
lowest of all three techniques tested The highest
classification rate it provided was 57.1% (4.8% lower
than the AA and 2.9% lower than the MLP), and this
method took the longest training time
References
[1] J Mao, A K Jain, Artificial neural networks for feature extraction
and multivariate data projection IEEE Trans Neural Networks 6,
296 –317, 1995.
[2] B Lerner, Toward a completely automatic neural network based
human chromosome analysis IEEE Trans, Syst Man Cyber 28,
Part B, Special issue on artificial neural networks, 544-552, 1998 [3] B Lerner, H Guterman, Mayer Aladjem, A comparative study of
neural network based feature extraction paradigms Pattern
Recognition Vol 20, 1999.
[4] M E Stevens, Introduction to the special issue on optical character
recognition (OCR), Pattern Recognition 2, 147 – 150, 1970 [5] J Rabinow, Whither OCR and whence Datamation 38 – 42, July
1969.
[6] P L Andersson, Optical character recognition – a survey.
Datamation, 43 – 48, July 1969.
[7] J Mantas, An overview of character recognition methodologies,
Pattern Recognition 19, 425 –430 1986.
[8] R Davis and J L Yall, Recognition of handwritten characters – a
review, Image vision Comput 4, 208 –218.
[9] Y Cheng and C H Leung, Chain – code transform for chinese
character recognition, IEEE 1985, Proc Int Conf Cyb Soc.
Tucson, AZ, U.S.A pp 42 – 45, 1985.
[10] H I Avi-Itzhak, T A Diep, and H Garland, “High accuracy optical character recognition using neural networks with centroid
dithering”, IEEE Trans Pattern Analysis and Machine
Intelligence, Vol 17, pp 218-224, 1995.
[11] S-W Lee, “Off-Line Recognition of totally unconstrained handwritten numerals using multilayer cluster neural network”,
IEEE Trans Pattern Analysis and Machine Intelligence, Vol 18,
pp 648-652, 1996.
[12] S-B Cho, “Neural-network classifiers for recognizing totally
unconstrained handwritten numerals”, IEEE Trans on Neural
Networks, Vol 8, pp 43-53, 1997.
[13] N.W Strathy, C.Y Suen, and A Krzyzak, “Segmentation of
handwritten digits using contour features”, ICDAR ‘93, pp
577-580, 1993.
[14] B A Yanikoglu, and P A Sandon, “Off-line cursive handwriting
recognition using style parameters”, Tech Report PCS-TR93-192,
Dartmouth College, NH., 1993.
[15] J-H Chiang, “A hybrid neural model in handwritten word recognition”, Neural Networks, Vol 11, pp 337-346, 1998 [16] R Crane, A simplified approach to image processing: classical
and modern techniques in C Prentice Hall 1996.