Lecture Notes in Computer Science- P73 potx

Examples of Chinese character handwriting By analyzing those relationships we can discover the following rule: if the two nodes have the relationship, both along x-axis and y-axis, whic

Trang 1

Fig 5 The refined interval neighborhood graph

The distance between two interval relationships μt i,and μsj or the distance between λti and λs j is defined as the topological distance between the two relationships, i.e.,

the length of the shortest path from μt i to μsj or from λt i to λs j in the interval

neighbor-hood graph The final spatial relationship distance D(Rt i, Rsj)is:

e

Edge deletion It is used together with the node deletion operation If some extra

nodes need to be removed by using the node deletion, then the corresponding edge should also be removed by the edge deletion

Edge insertion It is similar to the edge deletion, but it is used together with the node

insertion operation If some missing nodes need to be added by using the node inser-tion, then the corresponding edge should also be added by the edge insertion

The graph edit distance defining the overall cost for transforming from the ARG

1

g to ARGg2using the function f is given as follows:

) , , ( ) , , ( ) , , (f g1 g2 C f g1 g2 C f g1 g2

where C node is the node edit distance and Cedge is the edge edit distance

where n

sub

mer

spl

del

ins

C are the costs of node substitution, merging, splitting, deletion and insertion respectively

where e

sub

del

ins

C are edge substitution, deletion and insertion costs respectively There are many possible sequences of edit operations that can transform from

1

g tog2 It is impractical to perform exhaustive search to find the optimal matching that minimizes the overall graph edit distance As a result, the A* algorithm which

is a state-space search strategy is employed to identify the optimal ARG matching [6]

Trang 2

2.5 Pruning Strategy

In the handwriting education system, students always want to get an immediate feed-back The computational time of the graph matching thus should not be too high By using the edit operations like merging and splitting, the searching space of our prob-lem is increased We need to consider all the possibilities for the nodes to be merged

or split However, actually not all of the nodes need to be merged For example, in Figure 6(a) strokes 2 and 3 are horizontal strokes In addition, in Figure 6(b) stroke 2

is enclosed in stroke 1, and in Figure 6(c) strokes 1 and 4 are far from each other Strokes with such spatial relationships can never be broken strokes so there is no need

to consider merging them As a result, we are motivated to examine the properties of the spatial relationships in order to determine under which conditions there is no need

to perform the merging operation This will lead to a reduced set of candidate nodes which in turn can reduce the searching space for the graph matching

(a) (b) (c)

Fig 6 Examples of Chinese character handwriting

By analyzing those relationships we can discover the following rule: if the two

nodes have the relationship, both along x-axis and y-axis, which can be found in the set {<f, <m, om, ol, fif, fim, dim, dil, sif, sim, sf, sm, dm, dl, ff, fm, oif, oim, >m, >l},

then there is no need to merge those two nodes For example, in Figure 6(a) the

rela-tionship between strokes 2 and 3 is (dil, >m), while both dil and >m are in the set

which we had mentioned above, so there is no need to merge strokes 2 and 3; in

Fig-ure 6(b) the relationship between strokes 1 and 2 is (dm, ol); in FigFig-ure 6(c) the rela-tionship between strokes 1 and 4 is (ol, >l) Each time when we consider about the

merge/split operation of two strokes we should check whether their relationship is in the relationship set above If it is, there is no need to do the merge operation If not, the merging operation is performed on the two nodes

2.6 Post-processing for Detecting Spatial Relationship Errors

After the ARG matching we can get a mapping from the sample ARG to the template ARG showing the stroke correspondence between the sample character and the template character For the example in Figure 7, after applying the resulting edit operations, the

stroke correspondence between the sample and the template is: 1-a; 2,3-b;4-c (5 is extra

stroke in sample) We can then obtain a new graph to represent the sample graph as shown in Figure 7 The edges in the graph can describe the spatial relations between the

nodes (strokes) In Figure 7, there is a difference between the spatial relationships r bc and r14 In particular, the spatial relationship along the x-axis for r bc is dm and the spatial relationship along the x-axis for r14 is dim It can be found from Figure 5 that it takes at least 4 steps to get from the dm node to the dim node thus the spatial relationship

Trang 3

distance between r bc and r14 is 4 Since this is quite a large distance, we can conclude that the spatial relationship between strokes 1 and 4 is incorrect compared with the

corresponding template strokes b and c

a

23

Template Template graph New sample graph Sample

Fig 7 Matching of spatial relationships 2.7 Feedback for Revealing Handwriting Errors

After the character matching and post processing, we can already locate the student’s handwriting errors including stroke sequence, production and spatial relationship errors First, the character matching can reveal the stroke correspondence which is used to detect the stroke sequence and production errors If the matched strokes in the sample are not in the same writing sequence as in the template, then there is stroke sequence error and our system can notify the student which strokes are written in the wrong sequence The graph edit operations in Section 2.4 reveal any stroke produc-tion errors and our system can notify the student which strokes have such problems

On the other hand, the result from the post processing described in Section 2.6 can identify and locate the spatial relationship error Our system is thus able to handle multiple handwriting errors, i.e., stroke sequence, production and spatial relationship errors at the same time

3 Experimental Results

3.1 Dataset

In our experiment, 44 Chinese characters shown in Figure 8(a) are written by different people and we get 1247 various Chinese handwriting samples The number of strokes for those characters ranges from 2 to 13 The purpose of our system is to help children and foreigners to learn writing Chinese characters In the first stage, we will ask them

to start with some simple characters consisted of few strokes We are in the process of collecting more complex characters to expand our dataset

Trang 4

(b) Correct variation of the characters

(a) Character used in the dataset (c) Wrong variation of the characters

Fig 8 The dataset

Different people may write the same character in different ways resulting in many variations of the characters In Figure 8(b), the handwritings have some variations due

to the user writing style which is acceptable However, in Figure 8(c), it can be seen that the variations of the handwritings are so large that either they are considered as handwriting errors Some handwritings are even transformed into a completely differ-ent character despite the similarity of their appearance Our algorithm aims to iddiffer-entify these kinds of variations and notify the student about the problems

Some of the people’s handwritings contain stroke production errors and spatial re-lationship errors We manually check those error types to obtain the ground truth information Our algorithm is then applied to identify handwriting errors and check its accuracy using the ground truth information

3.2 Results and Discussions

We compare our proposed method to some of the existing methods on detecting dif-ferent kinds of handwriting errors

Stroke production error We have compared our proposed method with four

exist-ing methods: Tsay and Tsai [4], Tonouchi and Kawamura [5], Tang and Leung [9] and Hu and Leung [6] The methods in [4] and [5] are based on string matching which

is sequence dependent If the student input character has a different stroke sequence from the template, these methods may fail to find the matching Tang and Leung [9] proposed a system that allows students to practice handwriting freely, and it can check both the stroke sequence error and stroke production errors simultaneously However it relies heavily on some threshold values to determine the potential produc-tion errors Hu and Leung [6] applied graph matching to find the stroke producproduc-tion errors, but without considering the sequence error and spatial relationship errors On the other hand, our current proposed method is able to identify all three kinds of handwriting errors with less computational time The performance of finding stroke production errors is shown in Figure 9(a) Figure 10(a) shows some examples of the production errors that can be well identified by our method

Stroke relationship error In Figure 8(b), the first character in each row is the

tem-plate character, and the others are variations of users’ handwritings The last column

in Figure 8(b) shows some handwritings that resemble the template character but in fact they represent completely different characters Those Chinese characters almost

Trang 5

(a) Performance of finding stroke production

errors

(b) Comparison computational time

Fig 9 Performance comparison

Template Sample Template Sample

(b) Spatial relationship errors

Missing stroke error Extra stroke error

Broken stroke error Concatenated stroke error

Extra+ Broken stroke

error

Missing + Concatenated stroke error

(a) Stroke production errors (c) Complex errors

Fig 10 Handwriting errors that can be identified by our proposed approach

have the same shape as the corresponding template character, with only one or two strokes that are different in relative length or position showing spatial relationship errors We compared our method to the existing method in [8] This method first finds the features of a given Chinese character by rewriting the character many times in the training step After the training, they can get an invariant feature of the given charac-ter The difference can be obtained by comparing the input character with the given character The disadvantage of this approach is that when the teacher wants to add a new character to the system, the new character must be trained many times at the back-end before it can be used at the front-end In our proposed method, when the teacher wants to add a new character, he/she only needs to write the character once and then the character is stored in the database Figure 10(b) shows the spatial rela-tionship errors that can only handled by our method Figure 10(c) shows some of the complex cases with both production errors and spatial relationship errors in the handwriting

Stroke sequence error After the ARG matching we can determine a stroke

corre-spondence of the template and student input characters The standard stroke sequence

Định dạng
Số trang	5
Dung lượng	477,3 KB