To model the spatial relationships between the strokes in a Chinese character, a refined interval relationship that considers more granular levels is proposed.. Error-tolerant graph matc
Trang 1F Li et al (Eds.): ICWL 2008, LNCS 5145, pp 344–355, 2008
© Springer-Verlag Berlin Heidelberg 2008
Attributed Relational Graph Matching
Zhihui Hu1,2,3, Howard Leung2,3, and Yun Xu1,2
1 Department of Computer Science and Technology, University of Science & Technology of China, Hefei, China
2 Joint Research Lab of Excellence, CityU-USTC Advanced Research Institute,
Suzhou, China
3
Department of Computer Science, City University of Hong Kong, Hong Kong S.A.R kittyhu@mail.ustc.edu.cn, howard@cityu.edu.hk, xuyun@ustc.edu.cn
Abstract Due to the complex shapes and various writing styles of Chinese
characters, it is a challenge to automatically detect the errors in people’s
hand-writing In this paper, we use attributed relational graph to represent a Chinese
character To model the spatial relationships between the strokes in a Chinese
character, a refined interval relationship that considers more granular levels is
proposed A novel interval neighborhood graph is also proposed to compute the
distances among the refined interval relationships Error-tolerant graph
match-ing is used to locate the stroke production errors, sequence error as well as the
spatial relationship errors We also propose a pruning strategy in order to speed
up the graph matching Experiment results show that our proposed method
out-performs existing approaches in terms of accuracy as well as its ability to
han-dle more kinds of handwriting errors in less computational time
Keywords: Chinese handwriting error detection, attributed relational graph,
stroke spatial relationship error, stroke spatial relationship error, error-tolerant
graph matching
1 Introduction
A Chinese character is an ideogram composed of many strokes The correct
handwrit-ing should follow the correct position, proportion and order of each stroke Law et al
[1] shows the following handwriting errors children may often make: 1) stroke pro-duction errors that include missing, extra, broken, and concatenated strokes; 2) stroke sequence errors Besides, there exist other handwriting errors such as spatial
relation-ship errors resulting from problems in the relative length or position between strokes
When a student makes a handwriting mistake, he/she often does not even realize it It
is thus essential for the student to receive feedback about his/her handwriting in order
to correct any mistakes
Traditionally, the teacher can help the student find out their handwriting errors in class however the teacher’s available time for each student is limited As a result, we are motivated to build a Chinese handwriting education system for assisting the teacher when the teacher is absent In this system, a student can first write a Chinese
Trang 2character by following a template character from teacher then the system can auto-matically check the handwriting and give feedback to indicate whether and where there are any errors
The existing handwriting education systems can be divided into two categories The first one is the view-only system The student can see how a Chinese character should be written but they cannot practice handwriting through the system [2, 3] The other category allows the student to practice handwriting and gives some feedback to indicate if there are errors in their handwriting These systems can be further divided into four main streams The first one is focused on locating the production errors [4,
5, 6] The second stream can only evaluate the stroke sequence errors [7] The third stream can detect the spatial relationship error among strokes [8] The last one is the combination of the previous types In [9] the system can find out both the stroke pro-duction and sequence errors but without considering the spatial relationship errors As
a result, we are motivated to explore a method that can identify the stroke sequence, production and spatial relationship errors at the same time
In this paper, we propose a method that can not only identify the stroke production errors and sequence error but also the spatial relationship errors between strokes given
an input online Chinese handwriting This is achieved by using the attributed rela-tional graph (ARG) matching Attributed relarela-tional graph is a powerful tool to repre-sent the relational structure of a pattern It has been used in 2D recognition [10, 11] as well as Chinese handwriting education [6] In our application, the Chinese character is represented by a complete ARG The nodes in the ARG are used to describe the strokes of the character and the edges denote the relations between any two strokes
As the relations between the Chinese characters are rather complex, we propose to extend the existing interval relationship to refine its granularity The optimal detailed matching between the two ARGs is the mapping between corresponding strokes In order to find this detailed matching, the error-tolerant graph matching [13, 14] is used with the graph edit operations: deletion, insertion, substitution, merging and splitting
of the nodes and the edges A* algorithm is applied to perform the state-space search-ing of such a graph matchsearch-ing The resultsearch-ing operations can reflect the graph distor-tions On the other hand the operation of the edges can show the spatial relationship between strokes However, we should not ignore the computational complexity of graph matching thus we propose a pruning strategy to reduce the matching time The main contributions of this paper is as follows: 1) we propose an algorithm that can analyze an input online Chinese handwriting and determine stroke production error, stroke sequence error and stroke spatial relationship error at the same time; 2)
we define a refined interval relationship to model the spatial relationship between strokes and extend the interval neighborhood graph to obtain the distance measures for the refined interval relationships; 3) we propose a pruning strategy in order to reduce the state-space searching time while we apply the error-tolerant graph match-ing The remainder of this paper is organized as follows: In Section 2, the proposed ARG matching method incorporating the spatial relationships is described Experi-ments and results are discussed in Section 3 Conclusions and future work are pro-vided in Section 4
Trang 32 Our Proposed Method
2.1 Overview
The flowchart of our method is illustrated in Figure 1 First, the sample handwriting inputted by the student and the template character with which the student should fol-low are both represented as ARGs Then the error-tolerant graph matching is applied
on the two ARGs in order to find out the stroke production and sequence error in the sample handwriting Afterwards, the post processing can detect the stroke relationship error Finally the feedback that locates all the errors is provided to the student
Representation
Representation
Character matching
Post processing Sample
handwriting
Template
handwriting
Feedback
Fig 1 Flowchart of our method
2.2 Spatial Relationship in Chinese Character
A Chinese character consists of many strokes that form a particular structure unique
to that Chinese character The spatial relationship between strokes is one important factor in determining whether a student’s Chinese handwriting is written correctly In object recognition, people have studied the spatial relationship between objects Allen firstly shows 13 interval relationships in [15] and the spatial relationships between objects have been described in [16, 17].Nevertheless, it is not sufficient to use these interval relationships to fully describe the spatial relationship between strokes This can be illustrated by the example in Figure 2 The strokes in Figure 2(a), (b) and (c)
all have the same ‘during d’ relation as defined in Allen’s interval relationship mean-ing that the duration of stroke a is within the duration of stroke b However, only
Figure 2(b) shows the standard handwriting of this character The handwritings in
Figure 2(a) and (c) are non-standard because stroke a in Figure 2(a) is too long
whereas the one in Figure 2(c) is too short
(a) Non-standard handwriting (b) Standard handwriting (c) Non-standard handwriting
Fig 2 Example of spatial relationships in Chinese character
As illustrated in Figure 2, it can be observed that the relationship between the strokes
is not only the topological relationship but also the relative distance between the strokes
A more granular definition of the interval relationship is able to distinguish among the
Trang 4three cases in Figure 2 In particular, we propose to further refine the interval
relation-ship into three levels (f, m, l) by considering the distance information The refined inter-val relationships of the strokes in Figure 2(a), (b) and (c) become ‘dl’, ‘dm’ and ‘df’
respectively The refined relationship with three additional levels based on the distance can also be applied to other existing interval relationships The resulting refined rela-tionships are summarized in Figure 3
Relation Symbol Symbol for inverse Example
Fig 3 Refined interval relationships with more granular levels
2.3 Complete ARG Representation of Chinese Character
ARG was first described in [10] to represent the structure information of a pattern as
g=(V,E, α,β) In our application, the set of nodes V describe the strokes of the Chinese character, and the set of edges E describes the relationships between any two strokes
as defined in Figure 3 The ARG representation is given as follows
Nodes in the ARG Each node stores the x and y coordinates of a stroke The node
labeling function α:V→L V returns n data points for each stroke [6]
Edges in the ARG Each edge stores the relation of the two nodes (strokes) which are
connected by this edge The edge labeling function β:E→L E returns (μ, λ) where μ, λ are the refined interval relationship along the x-axis and y-axis respectively
As an example, a Chinese character and its stroke spatial relationships are shown in Figure 4(a) The ARG representation of this character is shown in Figure 4(b) The strokes
a, b and c in the character are represented by the nodes a, b and c in the ARG The term
rs1s2 is the relationship between strokes s1 and s2, and s1,s2∈(a,b,c),s1≠s2
Trang 5r ac: (df, mi)
r ca: (dif, m)
r ab: (df, dif)
r ba: (dif, df)
r bc: (dm, >m)
r cb: (dim, <m)
(a) A Chinese character (b) Corresponding ARG
Fig 4 ARG representation of a Chinese character
In this example, r ac is denoted by (df ,mi), r ab is denoted by (df, dif), and r bc is denoted
by (dm,>m) Note that the r ca is formed simply by taking the inverse of each
compo-nent of the relationship used to represent r ac and is denoted by (dif, m)
2.4 Error-Tolerant Graph Matching
As illustrated in Figure 1, the input (sample) handwriting is represented as an ARG g1=(V1,E1,α1,β1) and the template handwriting is represented as another ARG g2=(V2,E2,α2,β2) In order to decide whether the two ARGs have some differences, we find an error–tolerant graph matching from g1 to g2 which is a transformation denoted
by the function f [13, 14].This function f consists of many edit operations performed
on both nodes and edges The node operations have been defined by the authors in [6]
with node substitution, merging, splitting, deletion and insertion On the other hand,
we extend the work in [6] by adding the edge operations defined as follows: 1) edge substitution implying that both nodes sharing this edge are correct; 2) edge deletion
implying that one of the nodes/both nodes sharing this edge is an extra or broken
stroke; 3) edge insertion implying that one of the node/both nodes sharing this edge is
a missing or concatenated stroke
Edge substitution The cost for the edge substitution is the matching cost between
an edge in the sample character and an edge in the template We use Rt to denote the set of edges in the template and Rs to denote the set of edges in the sample
Note that an edge represents the spatial relationship between two strokes in a
hand-writing The i-th template edge Rt i can be denoted by (μti, λti) and the j-th sample edge Rs j can be denoted by (μsj, λsj) The dissimilarity between (μti, λti) and (μsj, λsj)
is defined as D(Rt i, Rsj) which is derived from the idea of the interval neighborhood
graph [16] Two interval relationships are neighbors, if they can be transformed into one another by continuous deformation (shortening, lengthening, and moving) [17]
We construct a new interval neighborhood graph in Figure 5 which considers our
proposed refined relationship with three levels (f, m, l) in each relationship defined
in Figure 3 Note that the three levels with the same interval relationship are close
to each other in the refined interval neighborhood graph since they can be trans-formed from one to another by shortening or lengthening the distance between the two strokes