Genetic Evolution Processing of Data Structures for Image Classification Siu-Yeung Cho, Member, IEEE, and Zheru Chi, Member, IEEE Abstract—This paper describes a method of structural pat
Trang 1Genetic Evolution Processing of Data Structures for Image Classification
Siu-Yeung Cho, Member, IEEE, and Zheru Chi, Member, IEEE
Abstract—This paper describes a method of structural pattern recognition based on a genetic evolution processing of data structures with neural networks representation Conventionally, one of the most popular learning formulations of data structure processing is Backpropagation Through Structures (BPTS) [7] The BPTS algorithm has been successfully applied to a number of learning tasks that involved structural patterns such as image, shape, and texture classifications However, this BPTS typed algorithm suffers from the long-term dependency problem in learning very deep tree structures In this paper, we propose the genetic evolution for this data structures processing The idea of this algorithm is to tune the learning parameters by the genetic evolution with specified chromosome structures Also, the fitness evaluation as well as the adaptive crossover and mutation for this structural genetic processing are
investigated in this paper An application to flowers image classification by a structural representation is provided for the validation of our method The obtained results significantly support the capabilities of our proposed approach to classify and recognize flowers in terms of generalization and noise robustness.
Index Terms—Adaptive processing of data structures, genetic algorithm, image classification, and neural networks.
æ
1 INTRODUCTION
recognition and classification, it is more appropriate to
model objects by data structures The topological behavior
in the structural representation provides significant
infor-mation to describe the nature of objects Unfortunately,
most connectionist models assume that data are organized
by relatively poor structures, such as arrays or sequences,
rather than by a hierarchical manner In recent years,
machine learning models conceived for dealing with
sequences have been straightforwardly adapted to process
data structures For instance, in image processing, a basic
issue is how to understand a particular given scene Fig 1
shows a tree representation of a flower image that can be
used for content-based flower image retrieval and flower
classification Obviously, the image can be segmented into
two major regions (i.e., the background and foreground
regions) and flower regions can then be extracted from the
foreground region A tree-structure representation (to some
extent of a semantic representation) can then be established
and the image content can be better described The leaf
nodes of the tree actually represent individual flower
regions and the root node represents the whole image
The intermediate tree nodes denote combined flower
regions For flower classification, such a representation will
take into account both flower regions and the background
All the flower regions and the background in the tree
representation will contribute to the flower classification to
different extents partially decided by the tree structure The
tree-structure processing by these specified models can
carry out on the sequential representation based upon the construction of trees However, this approach has two major drawbacks First, the sequential mapping of data structures, which are necessary to break some regularities inherently associated with the data structures, hence they will yield poor generalization Second, since the number of nodes grows exponentially with the depth of the trees, a large number of parameters need to be learned, which makes learning difficult and inefficient
Neural networks (NNs) for adaptive processing of data structures are of paramount importance for structural pattern recognition and classification [1] The main motiva-tion of this adaptive processing is that neural networks are able to classify static information or temporal sequences and to perform automatic inferring or learning [2], [3] Sperduti and Starita proposed supervised neural networks for the classification of data structures [4] This approach is based on using generalized recursive neurons [1], [5] Most recently, some advances in this area have been presented and some preliminary results have been obtained [6], [7], [8] The basic idea of a learning algorithm for this processing is to extend a Backpropagation Through Time (BPTT) algorithm [9] to encode data structures by recursive neurons The so-called recursive neurons means that a copy
of the same neural network is used to encode every node of the tree structure In the BPTT algorithm, the gradients of the weights to be updated can be computed by back-propagating the error through the time sequence Similarly,
if learning is performed on a data structure such as a directed acyclic graph (DAG), the gradients can be computed by backpropagating the error through the data structures, which is known as the Backpropagation Through Structure (BPTS) algorithm [5] However, this gradient-based learning algorithm has several shortcom-ings First, the rate of convergence is slow so that the learning process cannot guarantee completing within a reasonable time for most complex problems Although the algorithm can be accelerated simply by using a larger learning rate, this would probably introduce oscillation and might result in a failure in finding an optimal solution
S.-Y Cho is with the Division of Computing Systems, School of Computer
Engineering, Nanyang Technological University, 50 Nanyang Ave.,
Singapore 639798 E-mail: assycho@ntu.edu.sg.
Z Chi is with the Department of Electronic and Information Engineering,
The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong
Kong E-mail: enzheru@polyu.edu.hk.
Manuscript received 1 July 2003; revised 1 Jan 2004; accepted 19 Apr 2004;
published online 17 Dec 2004.
For information on obtaining reprints of this article, please send e-mail to:
tkde@computer.org, and reference IEEECS Log Number TKDE-0109-0703.
Trang 2Second, gradient-based algorithms are usually prone to
local minima [10] From a theoretical point of view, we
believe that gradient-based learning is not very reliable for
rather complex error surfaces formulated in the data
structure processing Third, it is extremely difficult for the
gradient-based BPTS algorithm to learn a very deep tree
structure because of the problem of long-term dependency
[11], [12] Indeed, the gradient contribution disappears at a
certain tree level when the error backpropagates through a
deep tree structure (i.e., the learning information is latched)
This is because the decreasing gradient terms tend to be
zero since the backpropagating error is recursively
multi-plied by the derivative (between 0 and 1) of the Sigmoid
function in each neural node This results in convergence
stalling and yields a poor generalization
In view of the rather complex error surfaces formulated by
the adaptive processing of data structures, we need more
sophisticated learning schemes to replace the gradient-based
algorithm so as to avoid the learning being converged to a
suboptimal solution In our study, a Genetic-based Neural
Network Processing of Data Structures (GNNPoDS) is
developed to solve the problems of long-term dependency
and local minima Genetic Algorithm (GA) or Evolutionary
Computing (EC) [13], [14], [15] is a computational model
inspired by population genetics It has been used mainly as
function optimizers and it has been demonstrated to be
effective in the global optimization Also, GA has been
successfully applied to many multi objective optimizations
Genetic evolution learning for NNs [16], [17] has been
introduced to perform a global exploration of the search
space, thus avoiding the problem of stagnation that is
characteristic of local search procedures There are a number
of different ways for GA implementation as the choice of
genetic operations can be taken in various combinations
During evolving the parameters of our proposed NN
processing, the usual approach is to code the NN as a string
obtained by concatenating the parameter values in one after
another The structure of the strings corresponds to
para-meters to be learned and may vary depending on how we
impose a certain fitness criteria In our study, two string
structures are proposed The first one is called
“whole-in-one” structure Each parameter represents in 12-bits code and
all parameters are arranged into a long string Simple fitness
criteria based on the error between the target and the output
values can be applied to this kind of string structure, but the
problem lies in the slow convergence because the dimension
of the strings is large As the string is not a simple chain like
DNA structure, rather it is in a multidimensional form,
performing crossover would become a rather complicated
issue A simple single point crossover is not applicable for this structure; rather, a window crossover is suitable to be performed where a fixed window size of crossover segments
is optimized The second string structure is called
“4-parallel” structure Each parameter in four groups is represented in 12-bit code and all parameters are arranged into four parametric matrices, each of which is dealt with independently in the neural network processing of data structures It is a much faster approach compared with the
“whole-in-one” structure, but a correlation among different groups of parameters to be learned may not be imposed directly for fitness evaluation based only on the error between the target and output values Therefore, introducing appropriate fitness function is an important issue Among many different kinds of encoding schemes available, the binary encoding is applied because of its simplicity Mutation and crossover size (i.e., window size in the “whole-in-one” structure) are determined and adjusted according to the best fitness among the population, which results in improving the
GA convergence Our proposed GA-based NN processing of data structures are evaluated by flower images classifications [18] In this application, semantic image contents are represented by a tree-structure representation in which the algorithm can characterize the image features at multilevels
to be beneficial to image classification by using a small number of simple features Experimental results illustrate that our proposed algorithm enhances the learning perfor-mance significantly in terms of quality of solution and the avoidance of the long-term dependency problem in the adaptive processing of data structures
This paper is organized as follows: The basic idea of the neural network processing of data structures is presented in Section 2 A discussion on the problem of long-term dependency for this processing is also given in this section Section 3 presents the genetic evolution of the proposed neural network processing Section 4 describes the method
of generating the flower image representation by means of the tree structure and illustrates the working principle of this proposed application Section 5 gives the simulation results and discussion of our study Finally, a conclusion is drawn in Section 6
2 NEURAL NETWORK PROCESSING OF DATA
STRUCTURES (NNPODS)
In this paper, the problem of devising neural network architectures and learning algorithms for the adaptive processing of data structure is addressed in the content of classification of structured patterns The encoding method
Fig 1 A tree representation of a flower image.
Trang 3by recursive neural networks is based on and modified by
the research works of [1], [4] We consider that a structured
domain D and all graphs (the tree is a special case of the
graph) In the following discussion, we will use either graph
or tree when it is appropriate G is a learning set
representing the task of the adaptive processing of data
structures This representation by the recursive neural
network is shown in Fig 2
As shown in Fig 2, a copy of the same neural network
(shown on the right-side of Fig 2b) is used to encode every
node in the graph G Such an encoding scheme is flexible
enough to allow the model to deal with DAGs of different
internal structures and with a different number of nodes
Moreover, the model can also naturally integrate structural
information into its processing In the Directed Acyclic
Graph (DAG) shown in Fig 2a, the operation is run forward
for each graph, i.e., from terminals nodes (N3 and N4) to the
root node (N1) The maximum number of children for a
node (i.e., the maximum branch factor c) is predefined for a
task domain For instance, a binary tree (each node has two
children only) has a maximum branch factor c equal to two
At the terminal nodes, there will be no inputs from children
Therefore, the terminal nodes are known as frontier nodes
The forward recall is in the direction from the frontier nodes
to the root in a bottom-up fashion The bottom-up
processing from a child node to its parent node can be
i ,
the current node This operator is similar to the shift
operator used in the time series representation Thus, the
recursive network for the structural processing is formed as
where x, u, and y are the n-dimensional output vector of the
neurons, and the p-dimensional outputs of the neurons
node is taken from its child so that,
q11 y
q12 y
q1c y
0 B B
1 C
The parametric matrix A is defined as follows:
where c denotes the maximum number of children in the
, k ¼
ai
as follows:
Fnð Þ ¼
f ð 1Þ
f ð 2Þ
f ð nÞ
0 B B
1 C
f ð Þ ¼ 1= 1 þ eð Þ:
Algorithm
In accordance with the research work by Hammer and Sperschnedier [19], based on the theory of the universal approximation of the recursive neural network, a single hidden layer is sufficient to approximate any complicated mapping problems The input-output learning task can be defined by estimating the parameters A, B, C, and D in the parameterization from a set of training (input-output) examples Each input-output example can be formed in a tree data structure consisting of a number of nodes with their inputs and target outputs Each node’s inputs are described by a set of attributes u The target output is denoted by t, where t is a p-dimensional vector So, the cost function is defined as a total sum-squared-error function:
Fig 2 An illustration of a data structure with its nodes encoded by a single-hidden-layer neural network (a) A Directed Acyclic Graph (DAG) and (b) the encoded DAG.
Trang 42
XN T
i¼1
ti yR i
ti yR i
that in the case of structural learning processing, it is often
assumed that the attributes, u, are available at each node of
the tree The main step in the learning algorithm involves
the following gradient learning step:
kð þ 1Þ ¼ kð Þ @J
@
¼ðkÞ
A; B; C; D
@J
@
¼ðkÞ is the partial derivative of the cost function with
learning algorithm involves the evaluation of the partial
derivative of the cost function with respect to the
parameters in each node Thus, the general form of the
derivatives of the cost function with respect to the
parameters is given by:
@J
i¼1
i
y Ri
derivative of the nonlinear activation function is defined
as n-dimensional vector which is the function of the
derivative of x with respect to the parameters It can be
evaluated as:
same computation such that the evaluation depends on the
structure of the tree This is called either the folding
architecture algorithm [5] or backpropagation through
structure algorithm [4]
In the formulation of the learning structural processing
task, it is not required to assume a priori knowledge of any
data structures or any a priori information concerning the
internal structures However, we need to assume the
maximum number of children for each node in the tree is
predefined The parameterization of the structural
proces-sing problem is said to be an overparameterization if the
predefined maximum number of children is so much
greater than that of real trees, i.e., there are many
redundancy parameters in the recursive network than
required to describe the behavior of the tree The
over-parameterization may give rise to the problem of local
minima in the BPTS learning algorithm Moreover, the
long-term dependency problem may also affect the learning
performance of the BPTS approach due to the vanishing
gradient information in learning deep trees The learning
information may disappear at a certain level of the tree
before it reaches at the frontier nodes so that the
conver-gence of the BPTS stalls and a poor generalization results A
detailed analysis of this problem will be given in the next
section
For backpropagation learning of multilayer perceptron (MLP) networks, it is well-known that if there are too many hidden layers, the parameters at very deep layers are not updated This is because backpropagating errors are multi-plied by the derivative of the sigmoidal function, which is between 0 and 1 and, hence, the gradient for very deep layers could become very small Bengio et al [11] and Hochreiter and Schmidhuber [20] have analytically ex-plained why backprop learning problems with the long-term dependency are difficult They stated that the recurrent MLP network is able to robustly store information for an application of long temporal sequences when the states of the network stay within the vicinity of a hyperbolic attractor, i.e., the eigenvalues of the Jacobian are within the unit circle However, Bengio et al have shown that if its eigenvalues are inside the unit circle, then the Jacobian at each time step is an exponentially decreasing function This implies that the portion of gradients becomes insignificant This behavior is called the effect of vanishing gradient or forgetting behavior [11] In this section, we briefly describe some of the key aspects of the long-term dependency problem learning in the processing of data structures The gradient-based learning algorithm updates a set of
node representation defined in (1) and (2) such that the updated parameter can be denoted as
@ 1
@
@ n
By using the chain rule, the gradient can be expressed as:
i¼1
ti yR i
If we assume that computing the partial gradient with respect to the parameters of the node representation at different levels of a tree is independent, the total gradient is then equal to the sum of these partial gradients as:
i¼1
ti yRi
rx RyRi XR
l¼1
JxR;Rlr lxl
!
x ¼
level R (root node) to l backwardly Based on the idea of
decreasing function of n since the backpropagating error is multiplied by the derivative of the Sigmoidal function
is insignificant compared to the portion at the upper levels
of trees The effect of vanishing gradients is the main reason why the BPTS algorithm is not sufficiently reliable for discovering the relationships between desired outputs and inputs, which we term the problem of long-term depen-dency Therefore, we are now proposing a genetic evolution method to avoid this effect of vanishing gradients by the BPTS algorithm so that the evaluation for updating the parameters becomes more robust in the problem of deep tree structures
Trang 53 GENETICEVOLUTION FORPROCESSING OFDATA
STRUCTURES
The genetic evolution neural network introduces an
adaptive and global approach to learning, especially in
the reinforcement learning and recurrent neural network
learning paradigm where gradient-based learning often
experiences great difficulties on finding the optimal
solu-tion [16], [17] This secsolu-tion presents using the genetic
algorithm for evolving neural network processing of data
structures In our study, the major objective is to determine
network in (1) and (2) over the whole data structures Our
proposed genetic approach consists of two major
considera-tions The first one is to consider the string representation of
the parameters, i.e., either in form of “whole-in-one” or
“4-parallel” structure These two string representations will
be discussed in the next section Based on these two
different string structures, the objection function for fitness
criterion is the other main consideration Different string
representations and object functions can lead to quite
different learning performance A typical cycle of the
evolution of learning parameters is shown in Fig 3 The
evolution terminates when the fitness is greater than a
predefined value (i.e., the objective function reaches the
stopping criterion) or the population has converged
The genetic algorithm always uses binary strings to encode alternative solutions, often termed chromosomes In such a representation scheme, each parameter is represented by a number of bits with certain length The recursive neural network is encoded by concatenation of all the parameters
in the chromosome Basically, the merits of the binary representation lie in its simplicity and generality It is straightforward to apply the classical crossover (such as the single-point or multipoint crossover) and mutation to binary strings There are several encoding methods (such
as uniform, gray, or exponential) that can be used in the binary representation The gray code is suggested to alleviate the Hamming distance problem in our study It ensures that the codes for adjacent integers always have a Hamming distance of one so that the Hamming distance does not monotonously increase with the difference in integer values In the string structure representation, a proper string structure for GA operations is selected depending on fitness evaluation One of a simple way is a
“whole-in-one” structure in which all parameters are encoded into one long string
The encoding for the “whole-in-one” structure is simple and the objective function is simply evaluated by the error between the target and the root output values of data
Fig 3 The genetic evolution cycle for the neural network processing of data structure.
Trang 6structures But, the dimension may be very high so that the
GA operations may be inefficient Moreover, this
“whole-in-one” structure representation has the permutation problem
It is caused by the many-to-one mapping from the
chromosome representation to the recursive neural network
since two different networks have an equivalent function
but they have different chromosomes This permutation
problem makes the crossover operator very inefficient and
ineffective in producing good offspring Thus, another
string structure representation called “4-parallel” structure
is used to overcome the above problem The GA process
becomes efficient when we apply it over each group of
parameters individually It is likely to perform a separate
GA process on each group of parameters in parallel, but the
limitation lies on its inability of performing the correlation
constrains among the learning parameters of each node The
objective function is essentially designed for this “4-parallel”
string structure so as to evaluate the fitness criteria for GA
operations of structural processing In (1) and (2), the
recursive network for the structure processing is rewritten
in matrices form as
matrix
can be encoded into one binary string for the “whole-in-one”
structure A very long chromosome is formed as:
chromosomeðA; B; C; DÞ :¼
00100 0000110
f gjd¼nðcpÞþnmþpnþpm: ð15Þ
On the other hand, for the “4-parallel” structure
B, C, and D are formed as
Note that d represents the number of parameters to be
learned so that the total size of this chromosome is d
number of encoding bits
The genetic algorithm with the arithmetic crossover and
nonuniform mutation is employed to optimize the
para-meters in the neural processing of data structures The
objective function is defined as a mean-squared-error
between the desired output and the network output at the
root node:
P
N T
i¼1
ti yR i
ti yR i
desired output and the real output at the root node For
GA operations, the objective is to maximize the fitness value
by setting the chromosome to find the optimal solution In order to perform operations in the “whole-in-one” structure representation, the fitness evaluation can be simply defined
1þ ffiffiffiffiffiffiE
a
Basically, the above fitness is applied to the “whole-in-one” structure but cannot be applied directly to the “4-parallel” string structure The objective function for the “4-parallel” string representation is evaluated as follows: Let an error
Taylor series as,
proposed processing and, so,
@A
@
@B
@
@C
@
@D
Therefore, (19) becomes:
eið Þ
eið Þ þ 0 @yi
: ð21Þ
In (21), the first term is the initial error term while the second term can be denoted as a smoothness constraint that
is given by the output derivatives of learning parameters Thus, the objective function of this constraint becomes,
P
N T
i¼1
@yRi
@
So, the fitness evaluation for the “4-parallel” string structure representation is thus determined:
p
where is a constant and ð1 Þ weights the smoothness constraint It is noted that the range of the above fitness evaluation is within [0,1] This smoothness constraint is a trade off between the ability of the GA convergence and the correlation among four groups of parameters In our study,
we empirically set ¼ 0:9
Chromosomes in the population are selected for the generation of new chromosomes by a selection scheme It
is expected that a better chromosome will generate a larger number of offsprings, and has a higher chance of surviving
in the subsequent generation The well-known Roulette
Trang 7Wheel Selection [21] is used as the selection mechanism.
Each chromosome in the population is associated with a
sector in a virtual wheel According to the fitness value of
the chromosome, which is proportional to the area of the
sector, the chromosome that has a higher fitness value will
occupy a larger sector while a lower value takes the slot of a
smaller sector The selection rate of chromosome (s), is
determined by:
where F is the sum of the fitness values of all chromosomes
study, the selection rate is predefined such that the
chromosome is selected if the rate is equal to or smaller
than the predefined rate In our study, the predefined rate is
set as 0.6
Another selection criterion of the chromosome may be
considered on the constant in the fitness function (23)
which takes the form as follows: Assume that at least one
chromosome has been successfully generated in the
fitness evaluation becomes:
> 0) ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
EaðsjÞ
p
>> Eb sj
, so:
1þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
EaðsjÞ
Hence, is selected as follows to ensure
< fitness sð Þ;i
then
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
EaðsjÞ
q
so
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
EaðsjÞ
p
As our empirical study defines the constant value of
satisfy-ing the criterion in (28) To sum up, suppose that a
following conditions:
EbðstestÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
EaðstestÞ
p
þ EbðstestÞ< 0:9:
There are several ways to implement the crossover
operation depending on the chromosome structure The
single point crossover is appropriate for the “4-parallel”
structure, but it is not applicable for the “whole-in-one”
structure because of its high dimension It is more
appropriate to implement the window crossover for the
“whole-in-one” encoding, where the crossover point and
the size of the window are taken within a valid range
Basically, the point crossover operation with the probability
the probability test has passed (i.e., a random number is
Besides, the crossover window size is determined by the
The idea is that the window size is forced to decrease as the square of the best fitness value increases So, the window size is:
Wsize¼ Nð bit NcrossoverÞ 1 fitness2
best
the chromosome The crossover operation of this “whole-in-one” structure is illustrated in Fig 4 The parents are separated into two portions by a randomly defined cross-over point and the size of the portions is determined by (29) The new chromosome is then formed by combining the shading portions of two parents as indicated in Fig 4 For another chromosome structure as “4-parallel” struc-ture since the size of this strucstruc-ture is smaller than that of the
“whole-in-one” structure, single-point crossover operation can thus be applied directly There are four crossover rates
to be assigned with the four groups of parameters, so that if
a random number is smaller than the probability, the new chromosome is mated from the first portion of the parent 1 and the last portion in the parent 2 The crossover operation for this “4-parallel” structure is shown in Fig 5
Mutation introduces variations of the model parameters into chromosomes It provides a global searching capability for the GA by randomly altering the values of string in the chromosomes Bit mutation is applied for the above two chromosome structures in the form of bit-string This is a
typically between 0.01 and 0.05) occurs which alters the value of a string bit so as to introduce variations into the chromosome A bit is flipped if a probability test is satisfied
4 STRUCTURE-BASED FLOWER IMAGE
CLASSIFICATION
Flower classification is a very challenging problem and will find a wide range of applications including live plant resource and data management, and education on flower taxonomy [18] There are 250,000 named species of flower-ing plants and many plant species have not been classified and named In fact, flower classification or plant identifica-tion is a very demanding and time-consuming task, which has mainly been carried out by taxonomists/botanists A significant improvement can be expected if the flower classification can be carried out by a machine-learning model with the aid of image processing and computer vision techniques Machine learning-based flower classifi-cation from color images is and will continue to be one of the most difficult tasks in computer vision due to the lack of proper models or representations, the large number of biological variations that a species of flowers can take, and imprecise or ambiguous image preprocessing results Also, there are still many problems in accurately locating flower regions when the background is complex It is due to its complex structure and the nature of 3D objects which adds another dimension of difficulty in modeling Flowers can, basically, be characterized by color, shape, and texture Color is a main feature that can be used to differentiate flowers from the background including leaves, stems, shadows, soils, etc Color-based domain knowledge can be
Trang 8adopted to delete pixels that do not belong to flower
regions Das et al [27] proposed an iterative segmentation
algorithm with a knowledge-driven mechanism to extract
flower regions from the background Van der Heijden and
Vossepoel proposed a general contour-oriented shape
dissimilarity measure for a comparison of flowers of potato
species [28] In another study, a feature extraction and
learning approach was developed by Saitoh and Kaneko for
recognizing 16 wild flowers [29] Four flower features
together with two leaf features were used as the input for
training the neural network flower classifier A quite good
performance was achieved by their holistic approach
However, the approach can only handle single flower
orientation to classify the corresponding category It cannot
be directly extended to several different flower orientations
with the same species (i.e., they are the same species but in
different orientations and colors)
Image content representation has been a popular research topic in various images processing applications for the past few years Most of the approaches represent the image content using only low-level visual features either globally or locally It is noted that high-level features (such
as Fourier descriptors or wavelet domain descriptors) cannot characterize the image contents accurately by their spatial relationships whereas local features (such as color, shape, or spatial texture) depend on error-prone segmenta-tion results In this study, we consider a region-based representation called binary tree [22], [23], [24] The construction of image representation is based on the extraction of the relevant regions in the image This is typically obtained by a region-based segmentation in which the algorithm can extract the interesting regions of flower images based on a color clustering technique in order to simulate human visual perception [30] Once the regions of
Fig 4 Window crossover operation for the “whole-in-one” structure.
Fig 5 Parallel crossover operation for the “4-parallel” string structure.
Trang 9interest have been extracted, a node is added to the graph
for each of these regions Relevant regions to describe the
objects can be merged together based on a merging strategy
Binary trees can be formed as a semantic representation
whose nodes correspond to the regions of the flower image
and arcs represent the relationships among regions Beside
the extraction of the structure, a vector of real value
attributes is computed to describe the image regions
associated by the node The features include color
informa-tion, shading/contrast properties, and invariant shape
characteristics The following sections describe how to
construct the binary trees representation for flower images
Fig 6 illustrates the system architecture of the
structure-based flower images classification At the learning phase, a
set of binary tree patterns representing flower images under
different families were generated by the combining
pro-cesses of segmentation, merging strategy, and feature
extraction All these tree patterns were used for training
the model by our proposed genetic evolution processing in
data structures At the classification phase, a query image is
supposed to be classified automatically by the trained
neural network in which the binary tree was generated by
the same processes for generating learning examples
A color image is usually given by R (red), G (green), and B
(blue) values at every pixel But, the difficulty with the RGB
color model is that it produces color components that do not
closely follow those of the human visual system A better
color model produces color components that follow the
understanding of color by H (hue), S (saturation), and I
(intensity or luminance) [25] Of these three components,
the hue is considered as a key component in the human
perception However, the HSI color model has several
limitations First, the model gives equal weighting to the
RGB components when computing the intensity or
lumi-nance of an image This does not correspond with the
brightness of a color as perceived by the eye The second one is that the length of the maximum saturation vector varies depending on the hue of the color Therefore, from the color clustering point of view, it is desired that the image is represented by color features which constitute a space possessing uniform characteristics such as the
this system gives good results in segmenting the color
transforming the (R, G, B) values into the (X, Y, Z) space which is further converted to a cube-root system The transformation is shown below:
X Y Z
2 4
3
5 ¼ 2:76901:0000 1:75184:5907 1:13000:0601
2 4
3
B
2 4
3
Y0
Y0
Y0
Y0
Z0
Z0
white color (i.e., 255 for the 8-bit gray-scale image) Thus, the cube-root system yields a simpler decision surface in accordance with human color perception They are given by
Fig 6 System architecture of the flower classification.
Trang 10hue : H¼ tan1 b
a
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðaÞ2þ ðbÞ2
r
The proposed segmentation uses the Euclidean distance
to measure the similarity between the selected cluster and
the image pixels within the above cube-root system The
first step of our method is to convert the RGB components
into the lightness-hue-chroma channel based on (30) and
(31) The Euclidean distance between each cluster centroid
and the image pixels within the lightness-hue-chroma
channel is given as:
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Lðx; yÞ L
i
i
i
q
;
ð32Þ
i; Hi; Ci
is
the image pixel at the coordinates x and y within the
cube-root system For clustering the regions of interest, the
k-mean clustering method [25] is used such that a pixel ðx; yÞ
is identified as belonging to background cluster j if
determination of the cluster centroids is very crucial They
can be evaluated by:
Ni
X
L ð x;y Þ2 i
Ni
X
H ð x;y Þ2 i
Ni
X
C ð x;y Þ2 i
number of assigned clusters is based on the number of the
most dominant peaks determining by the k-mean clustering
within the chroma channel For example, Fig 7 illustrates a
flower image with a histogram of the chroma channel in
which there are two most dominant peaks within the
channel (i.e., clusters “a” and “b”) Thus, two clusters can be assigned One of them should be the background cluster whereas another should be the foreground cluster The segmentation results of this example image are shown in Fig 8 in which two images (Figs 8a and 8b) are segmented with two cluster centroids and the corresponding flower region is extracted as shown in Fig 8c
The idea of creating and processing a tree-structure image representation is an attempt to take benefit from the attractive features of the segmentation results based on the method described in the previous section In our study,
we start from the terminated nodes and merge two similar neighboring regions associated with the child nodes based
on their contents This merging is iteratively operated by a recursive algorithm until the child nodes of the root node (i.e., the background and foreground regions) The follow-ing explains the proposed mergfollow-ing strategy to create a binary tree Assume that the merged regions pair is denoted
criterion is based on examining the entropy of all pairs of regions to identify which one is the maximum and the merging is terminated until the last pair of regions merged
to become the entire image At each step, the algorithm searches for the pair of most similar regions’ contents, which should be the pair of child nodes linked with their parent node The most similar regions pair is determined by maximizing the entropy:
O Ri; Rj
O Rð i ;R jÞ2 i;j
MR i [R j
i6¼j
computed based on the color homogeneity of two sub-regions, which is defined as:
MR i [R j
i6¼j¼ NRi
k¼1
pRi
k log2pRi
k þNRj
k¼1
pRj
k log2pRj
k
!
; ð35Þ
represents the percentages of the pixels at the kth color in
Fig 7 (a) A single flower image example and (b) its histogram of Chroma channel.