genetic evolution processing of data structures for image classification

Genetic Evolution Processing of Data Structures for Image Classification Siu-Yeung Cho, Member, IEEE, and Zheru Chi, Member, IEEE Abstract—This paper describes a method of structural pat

Trang 1

Genetic Evolution Processing of Data Structures for Image Classification

Siu-Yeung Cho, Member, IEEE, and Zheru Chi, Member, IEEE

Abstract—This paper describes a method of structural pattern recognition based on a genetic evolution processing of data structures with neural networks representation Conventionally, one of the most popular learning formulations of data structure processing is Backpropagation Through Structures (BPTS) [7] The BPTS algorithm has been successfully applied to a number of learning tasks that involved structural patterns such as image, shape, and texture classifications However, this BPTS typed algorithm suffers from the long-term dependency problem in learning very deep tree structures In this paper, we propose the genetic evolution for this data structures processing The idea of this algorithm is to tune the learning parameters by the genetic evolution with specified chromosome structures Also, the fitness evaluation as well as the adaptive crossover and mutation for this structural genetic processing are

investigated in this paper An application to flowers image classification by a structural representation is provided for the validation of our method The obtained results significantly support the capabilities of our proposed approach to classify and recognize flowers in terms of generalization and noise robustness.

Index Terms—Adaptive processing of data structures, genetic algorithm, image classification, and neural networks.

æ

1 INTRODUCTION

recognition and classification, it is more appropriate to

model objects by data structures The topological behavior

in the structural representation provides significant

infor-mation to describe the nature of objects Unfortunately,

most connectionist models assume that data are organized

by relatively poor structures, such as arrays or sequences,

rather than by a hierarchical manner In recent years,

machine learning models conceived for dealing with

sequences have been straightforwardly adapted to process

data structures For instance, in image processing, a basic

issue is how to understand a particular given scene Fig 1

shows a tree representation of a flower image that can be

used for content-based flower image retrieval and flower

classification Obviously, the image can be segmented into

two major regions (i.e., the background and foreground

regions) and flower regions can then be extracted from the

foreground region A tree-structure representation (to some

extent of a semantic representation) can then be established

and the image content can be better described The leaf

nodes of the tree actually represent individual flower

regions and the root node represents the whole image

The intermediate tree nodes denote combined flower

regions For flower classification, such a representation will

take into account both flower regions and the background

All the flower regions and the background in the tree

representation will contribute to the flower classification to

different extents partially decided by the tree structure The

tree-structure processing by these specified models can

carry out on the sequential representation based upon the construction of trees However, this approach has two major drawbacks First, the sequential mapping of data structures, which are necessary to break some regularities inherently associated with the data structures, hence they will yield poor generalization Second, since the number of nodes grows exponentially with the depth of the trees, a large number of parameters need to be learned, which makes learning difficult and inefficient

Neural networks (NNs) for adaptive processing of data structures are of paramount importance for structural pattern recognition and classification [1] The main motiva-tion of this adaptive processing is that neural networks are able to classify static information or temporal sequences and to perform automatic inferring or learning [2], [3] Sperduti and Starita proposed supervised neural networks for the classification of data structures [4] This approach is based on using generalized recursive neurons [1], [5] Most recently, some advances in this area have been presented and some preliminary results have been obtained [6], [7], [8] The basic idea of a learning algorithm for this processing is to extend a Backpropagation Through Time (BPTT) algorithm [9] to encode data structures by recursive neurons The so-called recursive neurons means that a copy

of the same neural network is used to encode every node of the tree structure In the BPTT algorithm, the gradients of the weights to be updated can be computed by back-propagating the error through the time sequence Similarly,

if learning is performed on a data structure such as a directed acyclic graph (DAG), the gradients can be computed by backpropagating the error through the data structures, which is known as the Backpropagation Through Structure (BPTS) algorithm [5] However, this gradient-based learning algorithm has several shortcom-ings First, the rate of convergence is slow so that the learning process cannot guarantee completing within a reasonable time for most complex problems Although the algorithm can be accelerated simply by using a larger learning rate, this would probably introduce oscillation and might result in a failure in finding an optimal solution

S.-Y Cho is with the Division of Computing Systems, School of Computer

Engineering, Nanyang Technological University, 50 Nanyang Ave.,

Singapore 639798 E-mail: assycho@ntu.edu.sg.

Z Chi is with the Department of Electronic and Information Engineering,

The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong

Kong E-mail: enzheru@polyu.edu.hk.

Manuscript received 1 July 2003; revised 1 Jan 2004; accepted 19 Apr 2004;

published online 17 Dec 2004.

For information on obtaining reprints of this article, please send e-mail to:

tkde@computer.org, and reference IEEECS Log Number TKDE-0109-0703.

Trang 2

Second, gradient-based algorithms are usually prone to

local minima [10] From a theoretical point of view, we

believe that gradient-based learning is not very reliable for

rather complex error surfaces formulated in the data

structure processing Third, it is extremely difficult for the

gradient-based BPTS algorithm to learn a very deep tree

structure because of the problem of long-term dependency

[11], [12] Indeed, the gradient contribution disappears at a

certain tree level when the error backpropagates through a

deep tree structure (i.e., the learning information is latched)

This is because the decreasing gradient terms tend to be

zero since the backpropagating error is recursively

multi-plied by the derivative (between 0 and 1) of the Sigmoid

function in each neural node This results in convergence

stalling and yields a poor generalization

In view of the rather complex error surfaces formulated by

the adaptive processing of data structures, we need more

sophisticated learning schemes to replace the gradient-based

algorithm so as to avoid the learning being converged to a

suboptimal solution In our study, a Genetic-based Neural

Network Processing of Data Structures (GNNPoDS) is

developed to solve the problems of long-term dependency

and local minima Genetic Algorithm (GA) or Evolutionary

Computing (EC) [13], [14], [15] is a computational model

inspired by population genetics It has been used mainly as

function optimizers and it has been demonstrated to be

effective in the global optimization Also, GA has been

successfully applied to many multi objective optimizations

Genetic evolution learning for NNs [16], [17] has been

introduced to perform a global exploration of the search

space, thus avoiding the problem of stagnation that is

characteristic of local search procedures There are a number

of different ways for GA implementation as the choice of

genetic operations can be taken in various combinations

During evolving the parameters of our proposed NN

processing, the usual approach is to code the NN as a string

obtained by concatenating the parameter values in one after

another The structure of the strings corresponds to

para-meters to be learned and may vary depending on how we

impose a certain fitness criteria In our study, two string

structures are proposed The first one is called

“whole-in-one” structure Each parameter represents in 12-bits code and

all parameters are arranged into a long string Simple fitness

criteria based on the error between the target and the output

values can be applied to this kind of string structure, but the

problem lies in the slow convergence because the dimension

of the strings is large As the string is not a simple chain like

DNA structure, rather it is in a multidimensional form,

performing crossover would become a rather complicated

issue A simple single point crossover is not applicable for this structure; rather, a window crossover is suitable to be performed where a fixed window size of crossover segments

is optimized The second string structure is called

“4-parallel” structure Each parameter in four groups is represented in 12-bit code and all parameters are arranged into four parametric matrices, each of which is dealt with independently in the neural network processing of data structures It is a much faster approach compared with the

“whole-in-one” structure, but a correlation among different groups of parameters to be learned may not be imposed directly for fitness evaluation based only on the error between the target and output values Therefore, introducing appropriate fitness function is an important issue Among many different kinds of encoding schemes available, the binary encoding is applied because of its simplicity Mutation and crossover size (i.e., window size in the “whole-in-one” structure) are determined and adjusted according to the best fitness among the population, which results in improving the

GA convergence Our proposed GA-based NN processing of data structures are evaluated by flower images classifications [18] In this application, semantic image contents are represented by a tree-structure representation in which the algorithm can characterize the image features at multilevels

to be beneficial to image classification by using a small number of simple features Experimental results illustrate that our proposed algorithm enhances the learning perfor-mance significantly in terms of quality of solution and the avoidance of the long-term dependency problem in the adaptive processing of data structures

This paper is organized as follows: The basic idea of the neural network processing of data structures is presented in Section 2 A discussion on the problem of long-term dependency for this processing is also given in this section Section 3 presents the genetic evolution of the proposed neural network processing Section 4 describes the method

of generating the flower image representation by means of the tree structure and illustrates the working principle of this proposed application Section 5 gives the simulation results and discussion of our study Finally, a conclusion is drawn in Section 6

2 NEURAL NETWORK PROCESSING OF DATA

STRUCTURES (NNPODS)

In this paper, the problem of devising neural network architectures and learning algorithms for the adaptive processing of data structure is addressed in the content of classification of structured patterns The encoding method

Fig 1 A tree representation of a flower image.

Trang 3

by recursive neural networks is based on and modified by

the research works of [1], [4] We consider that a structured

domain D and all graphs (the tree is a special case of the

graph) In the following discussion, we will use either graph

or tree when it is appropriate G is a learning set

representing the task of the adaptive processing of data

structures This representation by the recursive neural

network is shown in Fig 2

As shown in Fig 2, a copy of the same neural network

(shown on the right-side of Fig 2b) is used to encode every

node in the graph G Such an encoding scheme is flexible

enough to allow the model to deal with DAGs of different

internal structures and with a different number of nodes

Moreover, the model can also naturally integrate structural

information into its processing In the Directed Acyclic

Graph (DAG) shown in Fig 2a, the operation is run forward

for each graph, i.e., from terminals nodes (N3 and N4) to the

root node (N1) The maximum number of children for a

node (i.e., the maximum branch factor c) is predefined for a

task domain For instance, a binary tree (each node has two

children only) has a maximum branch factor c equal to two

At the terminal nodes, there will be no inputs from children

Therefore, the terminal nodes are known as frontier nodes

The forward recall is in the direction from the frontier nodes

to the root in a bottom-up fashion The bottom-up

processing from a child node to its parent node can be

i ,

the current node This operator is similar to the shift

operator used in the time series representation Thus, the

recursive network for the structural processing is formed as

where x, u, and y are the n-dimensional output vector of the

neurons, and the p-dimensional outputs of the neurons

node is taken from its child so that,

q11 y

q12 y

q1c y

0 B B

1 C

The parametric matrix A is defined as follows:

where c denotes the maximum number of children in the

, k ¼

ai

as follows:

Fnð Þ ¼

f ð 1Þ

f ð 2Þ

f ð nÞ

0 B B

1 C

f ð Þ ¼ 1= 1 þ eð Þ:

Algorithm

In accordance with the research work by Hammer and Sperschnedier [19], based on the theory of the universal approximation of the recursive neural network, a single hidden layer is sufficient to approximate any complicated mapping problems The input-output learning task can be defined by estimating the parameters A, B, C, and D in the parameterization from a set of training (input-output) examples Each input-output example can be formed in a tree data structure consisting of a number of nodes with their inputs and target outputs Each node’s inputs are described by a set of attributes u The target output is denoted by t, where t is a p-dimensional vector So, the cost function is defined as a total sum-squared-error function:

Fig 2 An illustration of a data structure with its nodes encoded by a single-hidden-layer neural network (a) A Directed Acyclic Graph (DAG) and (b) the encoded DAG.

Trang 4

2

XN T

i¼1

ti yR i

that in the case of structural learning processing, it is often

assumed that the attributes, u, are available at each node of

the tree The main step in the learning algorithm involves

the following gradient learning step:

kð þ 1Þ ¼ kð Þ @J

@

¼ðkÞ

A; B; C; D

@J

@

¼ðkÞ is the partial derivative of the cost function with

learning algorithm involves the evaluation of the partial

derivative of the cost function with respect to the

parameters in each node Thus, the general form of the

derivatives of the cost function with respect to the

parameters is given by:

@J

i¼1

i

y Ri

derivative of the nonlinear activation function is defined

as n-dimensional vector which is the function of the

derivative of x with respect to the parameters It can be

evaluated as:

same computation such that the evaluation depends on the

structure of the tree This is called either the folding

architecture algorithm [5] or backpropagation through

structure algorithm [4]

In the formulation of the learning structural processing

task, it is not required to assume a priori knowledge of any

data structures or any a priori information concerning the

internal structures However, we need to assume the

maximum number of children for each node in the tree is

predefined The parameterization of the structural

proces-sing problem is said to be an overparameterization if the

predefined maximum number of children is so much

greater than that of real trees, i.e., there are many

redundancy parameters in the recursive network than

required to describe the behavior of the tree The

over-parameterization may give rise to the problem of local

minima in the BPTS learning algorithm Moreover, the

long-term dependency problem may also affect the learning

performance of the BPTS approach due to the vanishing

gradient information in learning deep trees The learning

information may disappear at a certain level of the tree

before it reaches at the frontier nodes so that the

conver-gence of the BPTS stalls and a poor generalization results A

detailed analysis of this problem will be given in the next

section

For backpropagation learning of multilayer perceptron (MLP) networks, it is well-known that if there are too many hidden layers, the parameters at very deep layers are not updated This is because backpropagating errors are multi-plied by the derivative of the sigmoidal function, which is between 0 and 1 and, hence, the gradient for very deep layers could become very small Bengio et al [11] and Hochreiter and Schmidhuber [20] have analytically ex-plained why backprop learning problems with the long-term dependency are difficult They stated that the recurrent MLP network is able to robustly store information for an application of long temporal sequences when the states of the network stay within the vicinity of a hyperbolic attractor, i.e., the eigenvalues of the Jacobian are within the unit circle However, Bengio et al have shown that if its eigenvalues are inside the unit circle, then the Jacobian at each time step is an exponentially decreasing function This implies that the portion of gradients becomes insignificant This behavior is called the effect of vanishing gradient or forgetting behavior [11] In this section, we briefly describe some of the key aspects of the long-term dependency problem learning in the processing of data structures The gradient-based learning algorithm updates a set of

node representation defined in (1) and (2) such that the updated parameter can be denoted as

@ 1

@

@ n

By using the chain rule, the gradient can be expressed as:

i¼1

ti yR i

If we assume that computing the partial gradient with respect to the parameters of the node representation at different levels of a tree is independent, the total gradient is then equal to the sum of these partial gradients as:

i¼1

ti yRi

rx RyRi XR

l¼1

JxR;Rlr lxl

!

x ¼

level R (root node) to l backwardly Based on the idea of

decreasing function of n since the backpropagating error is multiplied by the derivative of the Sigmoidal function

is insignificant compared to the portion at the upper levels

of trees The effect of vanishing gradients is the main reason why the BPTS algorithm is not sufficiently reliable for discovering the relationships between desired outputs and inputs, which we term the problem of long-term depen-dency Therefore, we are now proposing a genetic evolution method to avoid this effect of vanishing gradients by the BPTS algorithm so that the evaluation for updating the parameters becomes more robust in the problem of deep tree structures

Trang 5

3 GENETICEVOLUTION FORPROCESSING OFDATA

STRUCTURES

The genetic evolution neural network introduces an

adaptive and global approach to learning, especially in

the reinforcement learning and recurrent neural network

learning paradigm where gradient-based learning often

experiences great difficulties on finding the optimal

solu-tion [16], [17] This secsolu-tion presents using the genetic

algorithm for evolving neural network processing of data

structures In our study, the major objective is to determine

network in (1) and (2) over the whole data structures Our

proposed genetic approach consists of two major

considera-tions The first one is to consider the string representation of

the parameters, i.e., either in form of “whole-in-one” or

“4-parallel” structure These two string representations will

be discussed in the next section Based on these two

different string structures, the objection function for fitness

criterion is the other main consideration Different string

representations and object functions can lead to quite

different learning performance A typical cycle of the

evolution of learning parameters is shown in Fig 3 The

evolution terminates when the fitness is greater than a

predefined value (i.e., the objective function reaches the

stopping criterion) or the population has converged

The genetic algorithm always uses binary strings to encode alternative solutions, often termed chromosomes In such a representation scheme, each parameter is represented by a number of bits with certain length The recursive neural network is encoded by concatenation of all the parameters

in the chromosome Basically, the merits of the binary representation lie in its simplicity and generality It is straightforward to apply the classical crossover (such as the single-point or multipoint crossover) and mutation to binary strings There are several encoding methods (such

as uniform, gray, or exponential) that can be used in the binary representation The gray code is suggested to alleviate the Hamming distance problem in our study It ensures that the codes for adjacent integers always have a Hamming distance of one so that the Hamming distance does not monotonously increase with the difference in integer values In the string structure representation, a proper string structure for GA operations is selected depending on fitness evaluation One of a simple way is a

“whole-in-one” structure in which all parameters are encoded into one long string

The encoding for the “whole-in-one” structure is simple and the objective function is simply evaluated by the error between the target and the root output values of data

Fig 3 The genetic evolution cycle for the neural network processing of data structure.

Trang 6

structures But, the dimension may be very high so that the

GA operations may be inefficient Moreover, this

“whole-in-one” structure representation has the permutation problem

It is caused by the many-to-one mapping from the

chromosome representation to the recursive neural network

since two different networks have an equivalent function

but they have different chromosomes This permutation

problem makes the crossover operator very inefficient and

ineffective in producing good offspring Thus, another

string structure representation called “4-parallel” structure

is used to overcome the above problem The GA process

becomes efficient when we apply it over each group of

parameters individually It is likely to perform a separate

GA process on each group of parameters in parallel, but the

limitation lies on its inability of performing the correlation

constrains among the learning parameters of each node The

objective function is essentially designed for this “4-parallel”

string structure so as to evaluate the fitness criteria for GA

operations of structural processing In (1) and (2), the

recursive network for the structure processing is rewritten

in matrices form as

matrix

can be encoded into one binary string for the “whole-in-one”

structure A very long chromosome is formed as:

chromosomeðA; B; C; DÞ :¼

00100 0000110

f gjd¼nðcpÞþnmþpnþpm: ð15Þ

On the other hand, for the “4-parallel” structure

B, C, and D are formed as

Note that d represents the number of parameters to be

learned so that the total size of this chromosome is d

number of encoding bits

The genetic algorithm with the arithmetic crossover and

nonuniform mutation is employed to optimize the

para-meters in the neural processing of data structures The

objective function is defined as a mean-squared-error

between the desired output and the network output at the

root node:

P

N T

i¼1

ti yR i

desired output and the real output at the root node For

GA operations, the objective is to maximize the fitness value

by setting the chromosome to find the optimal solution In order to perform operations in the “whole-in-one” structure representation, the fitness evaluation can be simply defined

1þ ffiffiffiffiffiffiE

a

Basically, the above fitness is applied to the “whole-in-one” structure but cannot be applied directly to the “4-parallel” string structure The objective function for the “4-parallel” string representation is evaluated as follows: Let an error

Taylor series as,

proposed processing and, so,

@A

@

@B

@

@C

@

@D

Therefore, (19) becomes:

eið Þ

eið Þ þ 0 @yi

: ð21Þ

In (21), the first term is the initial error term while the second term can be denoted as a smoothness constraint that

is given by the output derivatives of learning parameters Thus, the objective function of this constraint becomes,

P

N T

i¼1

@yRi

@

So, the fitness evaluation for the “4-parallel” string structure representation is thus determined:

p

where is a constant and ð1 Þ weights the smoothness constraint It is noted that the range of the above fitness evaluation is within [0,1] This smoothness constraint is a trade off between the ability of the GA convergence and the correlation among four groups of parameters In our study,

we empirically set ¼ 0:9

Chromosomes in the population are selected for the generation of new chromosomes by a selection scheme It

is expected that a better chromosome will generate a larger number of offsprings, and has a higher chance of surviving

in the subsequent generation The well-known Roulette

Trang 7

Wheel Selection [21] is used as the selection mechanism.

Each chromosome in the population is associated with a

sector in a virtual wheel According to the fitness value of

the chromosome, which is proportional to the area of the

sector, the chromosome that has a higher fitness value will

occupy a larger sector while a lower value takes the slot of a

smaller sector The selection rate of chromosome (s), is

determined by:

where F is the sum of the fitness values of all chromosomes

study, the selection rate is predefined such that the

chromosome is selected if the rate is equal to or smaller

than the predefined rate In our study, the predefined rate is

set as 0.6

Another selection criterion of the chromosome may be

considered on the constant in the fitness function (23)

which takes the form as follows: Assume that at least one

chromosome has been successfully generated in the

fitness evaluation becomes:

> 0) ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

EaðsjÞ

p

>> Eb sj

, so:

1þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

EaðsjÞ

Hence, is selected as follows to ensure

< fitness sð Þ;i

then

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

EaðsjÞ

q

so

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

EaðsjÞ

p

As our empirical study defines the constant value of

satisfy-ing the criterion in (28) To sum up, suppose that a

following conditions:

EbðstestÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

EaðstestÞ

p

þ EbðstestÞ< 0:9:

There are several ways to implement the crossover

operation depending on the chromosome structure The

single point crossover is appropriate for the “4-parallel”

structure, but it is not applicable for the “whole-in-one”

structure because of its high dimension It is more

appropriate to implement the window crossover for the

“whole-in-one” encoding, where the crossover point and

the size of the window are taken within a valid range

Basically, the point crossover operation with the probability

the probability test has passed (i.e., a random number is

Besides, the crossover window size is determined by the

The idea is that the window size is forced to decrease as the square of the best fitness value increases So, the window size is:

Wsize¼ Nð bit NcrossoverÞ 1 fitness2

best

the chromosome The crossover operation of this “whole-in-one” structure is illustrated in Fig 4 The parents are separated into two portions by a randomly defined cross-over point and the size of the portions is determined by (29) The new chromosome is then formed by combining the shading portions of two parents as indicated in Fig 4 For another chromosome structure as “4-parallel” struc-ture since the size of this strucstruc-ture is smaller than that of the

“whole-in-one” structure, single-point crossover operation can thus be applied directly There are four crossover rates

to be assigned with the four groups of parameters, so that if

a random number is smaller than the probability, the new chromosome is mated from the first portion of the parent 1 and the last portion in the parent 2 The crossover operation for this “4-parallel” structure is shown in Fig 5

Mutation introduces variations of the model parameters into chromosomes It provides a global searching capability for the GA by randomly altering the values of string in the chromosomes Bit mutation is applied for the above two chromosome structures in the form of bit-string This is a

typically between 0.01 and 0.05) occurs which alters the value of a string bit so as to introduce variations into the chromosome A bit is flipped if a probability test is satisfied

4 STRUCTURE-BASED FLOWER IMAGE

CLASSIFICATION

Flower classification is a very challenging problem and will find a wide range of applications including live plant resource and data management, and education on flower taxonomy [18] There are 250,000 named species of flower-ing plants and many plant species have not been classified and named In fact, flower classification or plant identifica-tion is a very demanding and time-consuming task, which has mainly been carried out by taxonomists/botanists A significant improvement can be expected if the flower classification can be carried out by a machine-learning model with the aid of image processing and computer vision techniques Machine learning-based flower classifi-cation from color images is and will continue to be one of the most difficult tasks in computer vision due to the lack of proper models or representations, the large number of biological variations that a species of flowers can take, and imprecise or ambiguous image preprocessing results Also, there are still many problems in accurately locating flower regions when the background is complex It is due to its complex structure and the nature of 3D objects which adds another dimension of difficulty in modeling Flowers can, basically, be characterized by color, shape, and texture Color is a main feature that can be used to differentiate flowers from the background including leaves, stems, shadows, soils, etc Color-based domain knowledge can be

Trang 8

adopted to delete pixels that do not belong to flower

regions Das et al [27] proposed an iterative segmentation

algorithm with a knowledge-driven mechanism to extract

flower regions from the background Van der Heijden and

Vossepoel proposed a general contour-oriented shape

dissimilarity measure for a comparison of flowers of potato

species [28] In another study, a feature extraction and

learning approach was developed by Saitoh and Kaneko for

recognizing 16 wild flowers [29] Four flower features

together with two leaf features were used as the input for

training the neural network flower classifier A quite good

performance was achieved by their holistic approach

However, the approach can only handle single flower

orientation to classify the corresponding category It cannot

be directly extended to several different flower orientations

with the same species (i.e., they are the same species but in

different orientations and colors)

Image content representation has been a popular research topic in various images processing applications for the past few years Most of the approaches represent the image content using only low-level visual features either globally or locally It is noted that high-level features (such

as Fourier descriptors or wavelet domain descriptors) cannot characterize the image contents accurately by their spatial relationships whereas local features (such as color, shape, or spatial texture) depend on error-prone segmenta-tion results In this study, we consider a region-based representation called binary tree [22], [23], [24] The construction of image representation is based on the extraction of the relevant regions in the image This is typically obtained by a region-based segmentation in which the algorithm can extract the interesting regions of flower images based on a color clustering technique in order to simulate human visual perception [30] Once the regions of

Fig 4 Window crossover operation for the “whole-in-one” structure.

Fig 5 Parallel crossover operation for the “4-parallel” string structure.

Trang 9

interest have been extracted, a node is added to the graph

for each of these regions Relevant regions to describe the

objects can be merged together based on a merging strategy

Binary trees can be formed as a semantic representation

whose nodes correspond to the regions of the flower image

and arcs represent the relationships among regions Beside

the extraction of the structure, a vector of real value

attributes is computed to describe the image regions

associated by the node The features include color

informa-tion, shading/contrast properties, and invariant shape

characteristics The following sections describe how to

construct the binary trees representation for flower images

Fig 6 illustrates the system architecture of the

structure-based flower images classification At the learning phase, a

set of binary tree patterns representing flower images under

different families were generated by the combining

pro-cesses of segmentation, merging strategy, and feature

extraction All these tree patterns were used for training

the model by our proposed genetic evolution processing in

data structures At the classification phase, a query image is

supposed to be classified automatically by the trained

neural network in which the binary tree was generated by

the same processes for generating learning examples

A color image is usually given by R (red), G (green), and B

(blue) values at every pixel But, the difficulty with the RGB

color model is that it produces color components that do not

closely follow those of the human visual system A better

color model produces color components that follow the

understanding of color by H (hue), S (saturation), and I

(intensity or luminance) [25] Of these three components,

the hue is considered as a key component in the human

perception However, the HSI color model has several

limitations First, the model gives equal weighting to the

RGB components when computing the intensity or

lumi-nance of an image This does not correspond with the

brightness of a color as perceived by the eye The second one is that the length of the maximum saturation vector varies depending on the hue of the color Therefore, from the color clustering point of view, it is desired that the image is represented by color features which constitute a space possessing uniform characteristics such as the

this system gives good results in segmenting the color

transforming the (R, G, B) values into the (X, Y, Z) space which is further converted to a cube-root system The transformation is shown below:

X Y Z

2 4

3

5 ¼ 2:76901:0000 1:75184:5907 1:13000:0601

2 4

3

B

2 4

3

Y0

Z0

white color (i.e., 255 for the 8-bit gray-scale image) Thus, the cube-root system yields a simpler decision surface in accordance with human color perception They are given by

Fig 6 System architecture of the flower classification.

Trang 10

hue : H¼ tan1 b

a

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðaÞ2þ ðbÞ2

r

The proposed segmentation uses the Euclidean distance

to measure the similarity between the selected cluster and

the image pixels within the above cube-root system The

first step of our method is to convert the RGB components

into the lightness-hue-chroma channel based on (30) and

(31) The Euclidean distance between each cluster centroid

and the image pixels within the lightness-hue-chroma

channel is given as:

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Lðx; yÞ L

i

q

;

ð32Þ

i; Hi; Ci

is

the image pixel at the coordinates x and y within the

cube-root system For clustering the regions of interest, the

k-mean clustering method [25] is used such that a pixel ðx; yÞ

is identified as belonging to background cluster j if

determination of the cluster centroids is very crucial They

can be evaluated by:

Ni

X

L ð x;y Þ2 i

Ni

X

H ð x;y Þ2 i

Ni

X

C ð x;y Þ2 i

number of assigned clusters is based on the number of the

most dominant peaks determining by the k-mean clustering

within the chroma channel For example, Fig 7 illustrates a

flower image with a histogram of the chroma channel in

which there are two most dominant peaks within the

channel (i.e., clusters “a” and “b”) Thus, two clusters can be assigned One of them should be the background cluster whereas another should be the foreground cluster The segmentation results of this example image are shown in Fig 8 in which two images (Figs 8a and 8b) are segmented with two cluster centroids and the corresponding flower region is extracted as shown in Fig 8c

The idea of creating and processing a tree-structure image representation is an attempt to take benefit from the attractive features of the segmentation results based on the method described in the previous section In our study,

we start from the terminated nodes and merge two similar neighboring regions associated with the child nodes based

on their contents This merging is iteratively operated by a recursive algorithm until the child nodes of the root node (i.e., the background and foreground regions) The follow-ing explains the proposed mergfollow-ing strategy to create a binary tree Assume that the merged regions pair is denoted

criterion is based on examining the entropy of all pairs of regions to identify which one is the maximum and the merging is terminated until the last pair of regions merged

to become the entire image At each step, the algorithm searches for the pair of most similar regions’ contents, which should be the pair of child nodes linked with their parent node The most similar regions pair is determined by maximizing the entropy:

O Ri; Rj

O Rð i ;R jÞ2 i;j

MR i [R j

i6¼j

computed based on the color homogeneity of two sub-regions, which is defined as:

MR i [R j

i6¼j¼ NRi

k¼1

pRi

k log2pRi

k þNRj

k¼1

pRj

k log2pRj

k

!

; ð35Þ

represents the percentages of the pixels at the kth color in

Fig 7 (a) A single flower image example and (b) its histogram of Chroma channel.

Định dạng
Số trang	16
Dung lượng	1,62 MB