Emphasis is given toward understanding the relationship between problem difficulty and the loss of diversity.. Frequency of ancestor occurrence nA specifically refers to the number of in
Trang 1Visualizing the Loss of Diversity in Genetic Programming
Jason M Daida, David J Ward, Adam M Hilss, Stephen L Long, Mark R Hodges, and
Jason T Kriesel
Center for the Study of Complex Systems and Space Physics Research Laboratory
2455 Hayward Avenue Ann Arbor, Michigan 48109-2143 daida@eecs.umich.edu
Abstract- This paper introduces visualization
techniques that allow for a multivariate approach in
understanding the dynamics that underlie genetic
programming (GP) Emphasis is given toward
understanding the relationship between problem
difficulty and the loss of diversity The visualizations
raise questions about diversity and problem solving
efficacy, as well as the role of the initial population in
determining solution outcomes.
I INTRODUCTION
There has been a significant amount of evolutionary
computation on the matter of diversity Part of this interest
extends well into some of the earliest investigations (e.g.,
[1-4]) Work such as these has helped to shape our current
understanding of the need for diversity in genetic and
evolutionary computation: namely, that diversity helps to
escape premature convergence It suggests that increasing
diversity could lead to more robust problem solving
However, there are a few instances that argue for a
nuanced understanding of the role of diversity in problem
solving with GP These include the following:
• Burke et al [5] systematically examined various measures
of diversity, some of which have been devised in support
of a particular diversity-promoting technique They
concluded that the relationship between diversity and run
performance is not straightforward More robust problem
solving is not necessarily a result of more diversity,
depending which definition is used
• Luke et al [6] demonstrated that dynamically decreasing
the size of a population over the course of a run might be
a computationally effective method Their idea and
subsequent method run counter to the assumption that a
large, presumably diverse population needs to be
maintained at all times It could be argued that Luke et
al.’s work shows that less diversity could lead to more
robust diversity
Unfortunately, a nuanced understanding of diversity has
been difficult to achieve This difficulty exists partly
because such studies require large amounts of data for
examination Typical diversity studies require capturing data
on each individual that is created over the course of a GP
run Even modest studies that involve relatively small
populations of 500 individuals can generate on the order of
one terabyte of data, if only because multiple trials need to
be run This difficulty exists also because the dynamics associated with GP also tends to be multivariate and not well understood There are few standard methods for conducting multivariate analyses over large data sets in which some of the key variables and interactions are unknown
In situations like these, it is subsequently common to employ data exploration methods—quantitative multivariate visualization techniques—to glean possible linkages between possible key variables and interactions Unfortunately, such visualization techniques are non-standard for more than a handful of variables and have not been available for GP
This paper builds upon the methods and results of two key papers [5, 7] that have been published on the subject of diversity We do so by incorporating newly developed techniques for data visualizing in GP Some of these techniques have been borrowed from other work (i.e., [8]) for visualizing populations of trees Other techniques have been developed specifically by us and are reported here In using these visualization techniques, we corroborate what these two papers have found and reveal for the first time some of the temporal dynamics that underlie population histories in GP
The organization of this paper is as follows Section 2 details our motivation and our subsequent definitions Section 3 summarizes the procedure that was used to generate our data Section 4 addresses methods and results for visualizing data from single runs of GP, while Section 5 addresses methods and results for multiple runs Section 6 discusses implications of the data visualized Section 7 concludes this paper
II BACKGROUND
A Motivation
This paper builds from the findings and suggestions of two studies [5, 7] that focused on understanding diversity in the context of population dynamics in GP Our motivations are described here
Both works indirectly suggest the possibility for a multivariate approach to understanding diversity McPhee and Hopper’s work suggested that population dynamics would be of interest, but their analysis featured one-variable statistics and did not figure in time as a variable Burke et al.’s work was almost exclusively based on two-variable statistics, but they concluded that such an approach did not
Trang 2allow for them to understand the dynamics of a population.
Consequently, this paper describes quantitative multivariate
visualization techniques to identify possible linkages
between multiple variables It would serve as a precursor to
a multivariate statistical analysis
McPhee and Hopper described a rigorous method, but
replication and extension of their method has been difficult
In comparison to the other measures of diversity reported in
Burke et al.’s work, McPhee and Hopper’s method is
arguably the most rigorous One could derive other
measures of diversity from a data set generated using
McPhee and Hopper’s method, which is not possible to do
so the other way around Unfortunately, the analysis
conducted by McPhee and Hopper was based on 20-trial
statistics—it was perhaps unwieldy at the time to do more
It did not help that their experiments used non-standard GP
settings (i.e., steady-state and the use of a size
normalization threshold) Consequently, this paper used a
form of McPhee and Hopper’s method on a substantially
larger study that allows for 200-trial statistics
Burke et al partially replicated McPhee and Hopper’s
work, but not the part that tracked the history of initial
population individuals The omission is noteable, because
they were looking for early indicators for run performance
and because McPhee and Hopper indicated that only a few
initial population individuals contribute to a final
population The omission is also understandable, because
the amount of work required to do the bookkeeping of
nodes to initial population individuals is nontrivial
Nevertheless, we focused on the initial population
individuals because of their potential importance to
understanding GP dynamics
B Definitions
There are several terms that are used throughout this
paper They are all nonstandard, given that much of the
previous literature has been influenced by a Markovian type
of analysis (i.e., focus on the transition from time t i to t i+1)
In contrast, our analysis has been motivated with the intent
of using the initial population as a predictor of solution
outcomes Our terms and their associated definitions are
subsequently reflective of this particular intent These
definitions presume that standard GP is being discussed,
although extensions to other methods in evolutionary
computation are also possible
• V0 Given an initial population P0, let every node / vertex
in this population be uniquely identified and labeled to
form a set V0 For example, an initial population P0 of
500 individuals could consists of well over 14,000 nodes,
depending on the type of population initalization scheme
that was used The membership of V0 presumes that each
of these 14,000+ nodes is considered as uniquely labeled
• Ancestor An ancestor A i specifically refers to an
individual from the initial population P0 Since all nodes
in the initial population are unique, it follows that an
ancestor Ai consists of a unique set of nodes A i ⊇ V0 that
is mutually exclusive from the set of nodes A j, which
characterizes an ancestor Aj , where i ≠ j For example, a
three-node tree in P0 is presumed to be uniquely identified
by a set A = {a1, a2, a3}, given that A ⊇ V0 and the
elements a1, a2, and a3 are not a part of any other tree in
P0,
• Lineage Lineage LB for an individual B at some
generation t i , i ≠ 0, specifically refers to the set of
ancestors A that contribute to that individual Consequently,
LB≡ ∀A ∈ P{ 0:∃a ∈ A ← A and a ∈ B ← B} (1) For example, an individual B can have a lineage that consists of {A1, A2, A3} if there is some node from each
of these ancestors that exists in B
• Frequency of Ancestor Occurrence Frequency of ancestor occurrence nA specifically refers to the number of individuals in a given population that have ancestor A as part of their lineage For example, in the initial
population P0, each individual Ai has a frequency of
ancestor occurance nA = 1
• Surviving Number of Ancestors Surviving number of ancestors n s for a given population refers to the number of ancestors that contribute at least one node to that
population For example, in the initial population P0 that
is of population size M, n s = M.
III EXPERIMENTAL PROCEDURE
We analyzed GP on a particular, well documented,
tunably difficult test problem (i.e., binomial-3) The
problem been designed as a probe for understanding GP dynamics and represents an instance of data modeling
The binomial-3 is discussed in detail in [9, 10] In brief,
the problem is an instance taken from symbolic regression
and involves solving for the function f(x) = 1 + 3x + 3x2 +
x3 Fitness cases are 50 equidistant points generated from
f(x) over the interval [-1, 0) The function set is {+, –, ×,
÷}, which corresponds to arithmetic operators of addition, subtraction, multiplication, and protected division A terminal set was {X, R}, where X is the symbolic variable
and R is the set of ephemeral random constants that are
distributed uniformly over the interval [-α, α] The tuning parameter is α, which is a real number The binomial-3 can
be tuned from a relatively easy problem (e.g., α = 1) to a difficult one (e.g., α = 1000)
We used a modified version of lilgp [11] similar to that used in [9] Most of the modifications were for bug fixes and for the replacement of the random number generator with the Mersenne Twister [12] Other significant modifications included augmenting the data structure associated with each node to include an integer I D that serves as that node’s serial number Each ID is unique to a node and is generated once during population initialization
We configured lilgp to run as a single thread
The ID labeling scheme was a result of [9], but is similar
to that desccribed in [7] McPhee and Hopper’s scheme called for tagging each node in the initial population with
Trang 3integer label pairs (ID:memID) The ID part of their label is
assigned just once at population initialization and consists
of an integer that is unique to a node relative to the set of
nodes that make up the initial population (i.e., V0) ID is
used as a serial number that can be used to track individual
nodes memID is used for providing an audit trail for
subtree memberships For our work, we implemented what
amounts to just the ID portion of their integer pair
Table 1 lists the parameter settings considered in this
paper Most of the GP parameters were similar to those
mentioned in [13], Chapter 7
TABLE 1 PARAMETER SETTINGS
Initialization Method Ramped Half-and-Half
Initialization Depths 2–6 Levels
Internal Node Bias 90% internal, 10% terminals
Termination Criteria Run reaches G
Binomial-3 α 1 or 1000
We used four different experimental configurations, given
that we considered two different selection methods and two
different difficulty settings There were 200 trials taken per
configuration for a total of 800 trials
Although the number of trials is modest, what was
unusual were the data requirements specified for each trial
Each trial resulted in recording each individual in every
population that was generated during a trial for a total of
201 generations (i.e., 200 specified generations and the
initial population) The total amount of data that was
generated by these four configurations was about 0.5 TB
Post-processing of this population data was done in two
stages: the first stage reduced individuals to their
appropriate lineages and structures, while the second stage
visualized the reduced data Both stages were custom-coded:
the first stage was coded in PERL; the second stage, in
Mathematica There were 80,400,000 trees that were parsed
and analyzed in this manner (i.e., 4 configurations, 200
trials per configuration, 201 generations per trial, 500
individual trees per generation) Post-processing time was
about one CPU-month per configuration
GP trials and data reduction were run on a Linux
workstation Visualization was done on a Power Mac
IV SINGLE-TRIAL METHODS AND RESULTS
The visualization principles employed in this work were
those described by Tufte in [14-16] Tufte’s work has been
influential in the design of data graphics He is known for a
particular, minimalist style of visualization that we use in
the construction of nonstandard graphics This particular
style is not commonly used in the evolutionary
computation community For that reason, it is worthwhile
quoting several of his design principles from [14] (pp 105, 121):
• Above all else show the data
• Maximize the data-ink ratio
• Erase non-data-ink
• Erase redundant data-ink
We have used these principles to generate visualizations
of single- and multiple-trial dynamics
In comparison to the one- and two-variable techniques in [5, 7], our technique is a ten-variable visualization: population structure (three variables), lineage mapping (i.e., ancestors to individuals in a current population), frequency
of ancestor occurrence nA, rank ordering of individuals in the current generation, rank ordering of ancestors, surviving
number of ancestors n s, time, and problem difficulty
Of these ten variables, visualizing the three variables associated with population structure is based on a relatively new technique that was introduced in [8] The technique calls for each tree to be layed out on a circular grid, as shown in Figures 1 and 2
1
3
2
1 7
3 2
6 5
4
11 10 9
14
13
12
Figure 1 Mapping a full binary tree to a circular gird (a) Full binary tree
of depth 3 (b) Corresponding circular grid.
1
3
2
2
5 4
11 10 9
8
Figure 2 Mapping a partial binary tree to a circular grid (a) Binary tree
of depth 3 (b) Corresponding mapping of this tree on a circular grid.
Cumulative distributions for a population were then computed for each grid point This step is analogous to overlaying tree structures one on top of the other As a result, darker lines correspond to links that are more frequently used in a population The variables that this visualization would encode for are structure (two dimensions) and cumulative distribution One typically uses this method to see which structures are more frequently used
in a population The method is useful inasmuch as structure has been an integral part of GP theory (e.g., [17-19]) Figure 3a shows an example of visualization of a population of 500
Trang 4A visualization that shows every occupied grid point for
a population can detract away from understanding which
structures are most typically used This happens because
visible lines can be made only so thin and unintentional
occlusion takes place Consequently, we displayed only
those structures that are common to a majority of a
population Figure 3b shows the same example as given in
Figure 3a, except that only the structures that are in the
majority is shown
Figure 3 Visualization of Population Structure The data is from
generation 200 for the configuration α = 1000 and tournament selection.
(a) Visualization of a population of 500 trees Darker lines indicate
greater frequencies of those structures that appear in that population The
gray circle is for reference only and corresponds to depth level 26 (b)
Majority-only view This view shows the same data as in (a), but only for
those structures in which 50% or more of population uses are shown.
Visualizing the next five variables was accomplished by
using a modified form of a histogram The five variables
that were considered in this part of the visualization
included the following: lineage mapping (i.e., ancestors to
individuals in a current population), nA, rank ordering of
individuals in the current generation, rank ordering of
ancestors, and n s These variables address the issue of
content use at the level of individual and population We
note that structure is implicit at these levels of examination,
since individuals are identified by their nodes and nodes are
organized by their associated tree structure
Figure 4 gives an example of the modified histogram
This modified histogram consists of two parts: the upper
part, which shows the nA; and the lower part, which shows
the associated lineage mapping from ancestors to currrent
generation individuals Concerning the upper part, the x-axis
of the frequency-of-ancestor-occurrence histogram is rank ordered This means that ancestors are positionally identified from the least fit (left) to the most fit (right) The ordering of ancestors is determined just once at generation 0
and does not change during subsequent generations The
y-axis of the histogram is normalized according to population
size M, since that is the maximum value that is attainable
by any one ancestor The frequency data that is displayed is specific to the given current generation An absence of a bar
in the histogram part of this visualization means that an ancestor has not survived This absense is indicative of the complement of the variable that describes the surviving
number of ancestors n s The lower part of this modified histogram shows the associated lineage mapping from ancestors to the current generation As in the upper part, the ancestors are positionally identified from the least fit (left) to the most fit (right) The current-generation individuals are ordered likewise Figure 4 shows an instance where an ancestor at
rank position a has a lineage that maps to multiple individuals in the current generation Arrows at b and c
identify a few of those mappings The lines that link an ancestor to a current-generation individual are darker as the rank of the current-generation individual increases
Both frequency-of-ancestor-occurrence histogram and lineage mapping are correlated For example, the ancestor at
position a in Figure 4 shows a frequency of occurance that
is low Likewise, the number of mappings from that ancestor to the current-generation individuals is sparse Likewise, if the frequency of ancestor occurrence were at
maximum (i.e., M ), it would mean that parts of that
ancestor are distributed throughout the entire population Figure 5 shows the complete ten-variable visualization for an “easy” problem (which featured tournament selection and α = 1) To show the first eight variables, the visualizations for population structure and the modified histogram for content were shown in tandem for a given generation
To show the variable of time, we used the technique of small multiples (see [14, 15]) The number that is embedded in the frequency-of-ancestor-occurence part of the modified histogram corresponds to the current generation Consequently, Figure 5 shows the dynamics of population structure and content for the first 22 generations, which were
Lineage Mapping
Frequency of Ancestor Occurrence (Histogram)
Ancestors A
Current Generation Individuals B
Rank
a
Figure 4 Visualization of Content at the Level of Individual and Population There are two main parts to this visualization: frequency of ancestor occurrence histogram and lineage mapping The visualization that is shown is from generation 3 of α = 1, fitness proportionate selection.
Trang 5sampled for visualization every two generations.
To show the variable of problem difficulty, we used a
simple thermometer icon at the bottom of the figure The
particular metric that is displayed is the percentage of
successful trials that produced a “perfect” solution
Consequently, the easier a problem is to solve, the greater
the number of trials out of the total number of trials that
end up producing a “perfect” solution The icon that is
shown in Figure 5 corresponds to a relatively easy problem:
about 7 out of every 10 trials produced a “perfect” solution
We note that the data that was used to produce the thermometer icon was based off of results from [10] This was done in part because the metrics associated with difficult problems correspond to low probabilities (i.e., < 1%) The results given in were based on 600 trials, instead
of the 200 trials given here However, the statistics concerning this metric between data sets were comparable Figure 6 shows contrasting results for a “difficult” problem
0
2
4
6
8
10
12
14
16
18
20
22
70%
Figure 5 Ten-Variable Visualization of Structure and Content for an “Easy” Problem (i.e., tournament selection and α= 1) The data that is shown is for the first 22 generations of a successful trial.
Trang 6V MULTI-TRIAL METHODS AND RESULTS
There is a trade-off in visualizing single versus multiple
trials Although single-trial results provide insight on
nuances in GP dynamics that occur during the course of a
run, it is difficult to place those nuances in an appropriate
statistical context Unfortunately, the ten-variable
visualization of Section IV does not scale easily to account
for multiple trials
For that reason, we reduced the number of variables that can be examined at once In doing so, we chose to emphasize just a handful of variables so that more of the data can be shown We used visualizations that were generated using methods from Section IV to identify possible variables for further study
We settled on a five-variable visualization: surviving
number of ancestors n s, time, problem difficulty, selection method, and cumulative distribution The visualization is constructed from common methods in data visualization,
0
2
4
6
8
10
12
14
16
18
20
22
1%
Figure 6 Ten-Variable Visualization of Structure and Content for a “Very Difficult” Problem (i.e., fitness proportionate selection and α = 1000) The data that is shown is for the first 22 generations of a successful trial.
Trang 7whereby the each of the four major elements of this
visualization correspond to a density plot The x-axis is
time (in generations) and the y-axis is the surviving number
of ancestors The maximum surviving number of ancestors
is M, which occurs at generation 0.
Oftentimes in evolutionary computation, the plot most
commonly employed to display quantities that vary as a
function of time is a line graph In this case, the cumulative
distributions are inferred by the number of lines that occur
per square unit in a plot Instead of resorting to this
technique, however, cumulative distributions were
computed for every nonzero tuple of surviving number of
ancestors and time A density plot was then used to display
this data instead of line graphs The convention used in this
paper has darker tones indicating higher cumulative
distributions Consequently, each density plot describes
three variables (i.e., surviving number of ancestors n s, time,
cumulative distribution)
The remaining two variables—selection method and
problem difficulty—were displayed using the method of
small multiples Each density plot corresponds to a
variation of one of these variables Density plots are
subsequently arranged as a two-dimensional matrix
However, because problem difficulty resulted in values that
were not binary quantities, we used the thermometer icon
(as was described in Section IV)
Figure 7 shows the results of the data described in Section III This visualization is a complete (non-sampled) summary of approximately 80 million trees that were parsed into lineages of 400,000 ancestors It represents a view of approximately 0.5 TB of data (Some data is not visualized since only the range [0, 150] for the surviving number of
ancestors is shown The complete range is [0, M], where
M = 500.)
VI DISCUSSION
Our results do corroborate many of the findings and speculations in [5, 7] For example, the results bear out the speculations in [5], which suggested a nuanced approach in understanding diversity Not only do the results show signficant temporal behaviors as was shown in [5], but also show that other previously neglected factors—like problem difficulty and selection method—can signficantly influence
GP dynamics (i.e., Figure 7) Furthermore, the results also corroborate some of the earliest finding concerning diversity
in GP [7], even though that work was based on a limited statistical sample and used nonstandard GP settings Even though the argument in [7] for an “Eve” is perhaps overstated, the results in Figure 7 do show that for certain configurations, the amount of individuals that contribute to
a population is but a fraction of the original population Figures 5 and 6 also corroborate what is suggested in [7]
0
20
40
60
80
100
120
140
0 20 40 60 80 100 120 140
0
20
40
60
80
100
120
140
0 20 40 60 80 100 120 140
70%
34%
37%
1%
(a)
(b)
(c)
(d)
Figure 7 Five-Variable Visualization of the Loss in Diversity by Problem Difficulty and Selection Method Each plot corresponds to a particular selection method and tuning parameter setting (a) Fitness proportionate, α = 1 (b) Tournament, α = 1 (c) Fitness proportionate, α = 1000 (d) Tournament, α = 1000.
Trang 8concerning structure—i.e., that there is a convergence of
both structure and content
The visualizations shown in Figures 5, 6, and 7 raise
signficant issues about the dynamics that occur during the
course of a GP run Although it is beyond the scope of this
paper to describe these issues at length, we summarize two
of them here
• What is the relationship between diversity and problem
solving ability? Fitness proportionate selection was one
of the earliest methods that were used to enhance diversity
for the purpose of robust problem solving [2, 3] On one
hand, the results in Figure 7 do support the contention
that fitness proportionate selection does enhance diversity,
since there is much more of V0 that was retained
However, the results indicated that simply enhancing
diversity during the course of a GP run was not
intrinsically helpful In both cases, “hard” selection
resulted in significantly better performances (i.e.,
problems became significantly easier to solve under
tournament selection) This occurred in spite of large
losses in diversity This finding supports previous, albeit
controversial work in [20, 21]
• What are the dynamics between population size and
diversity? On one hand, the theory that describes the
dynamics under fitness proportionate selection is
complex, which makes it difficult to make comparisons
of this work with existing theory On the other hand,
there have been some investigations in tournament
selection (e.g., [22, 23]) that point to dynamics that are
independent of fitness distributions for a given
population Given an analysis of ancestors and lineage,
their work would only apply to predicting the number of
ancestors in generation 1 However, Figures 5 and 6
indicate that mixing between ancestors eventually
becomes so thorough that many of the individuals in a
current population can trace its lineage to most of the
surviving ancestors For that reason, it could be that the
loss of diversity measures in [22, 23] might also apply to
the steady state that occurs in Figures 7b and 7d The
existence of this steady state under tournament selection
would also indicate why a population implosion method,
as described in [6] would work In particular, if only a
fraction of ancestors are used to form a solution, then it
might make sense to dynamically decrease the population
size over the course of a GP run
VII CONCLUSIONS
This paper introduced several visualization techniques
that enabled a multivariate exploration of GP dynamics and
diversity The resulting visualizations produced detailed
patterns of temporal behaviors that occurred in what
amounted to one of of the largest studies of its kind to date
This study required the tracking the lineage of 80,000,000
to some 400,000 individuals in initial populations The
visualizations summarize approximately 0.5 TB of data
The results corroborate previous theoretical and empirical
studies that have been published on the subject of diversity
in GP The visualizations also raise further questions on the role of diversity in GP
REFERENCES
[1] D B Fogel, Evolutionary Computation: The Fossil Record,
Piscataway: IEEE Press, 1998, pp 641.
[2] M Pincus, “An Evolutionary Strategy,” Journal of Theoretical
Biology, vol 28, pp 483–488, 1970.
[3] R Galar, “Handicapped Individua in Evolutionary Processes,”
Biological Cybernetics, vol 53, pp 1–9, 1985.
[4] J H Holland, Adaptation in Natural and Artificial Systems Ann
Arbor: University of Michigan Press, 1975.
[5] E Burke, et al., “A Survey and Analysis of Diversity Measures in
Genetic Programming,” in GECCO-2002, W B Langdon, et al.,
Eds San Francisco: Morgan Kaufmann Publishers, 2002, pp 716–723.
[6] S Luke, et al., “Population Implosion in Genetic Programming,” in
GECCO 2003, E Cantú-Paz, et al., Eds Berlin: Springer-Verlag,
2003, pp 1729–1739.
[7] N F McPhee and N J Hopper, “Analysis of Genetic Diversity
through Population History,” in GECCO ’99, W Banzhaf, et al., Eds.
San Francisco: Morgan Kaufmann Publishers, 1999, pp 1112 – 1120.
[8] J M Daida, et al., “Visualizing Tree Structures in Genetic
Programming,” in GECCO ’03, E Cantú-Paz, et al., Eds Berlin:
Springer-Verlag, 2003, pp 1652–1664.
[9] J M Daida, et al., “Analysis of Single-Node (Building) Blocks in
Genetic Programming,” in Advances in Genetic Programming 3, L.
Spector, et al., Eds Cambridge: The MIT Press, 1999, pp 217–241 [10] J M Daida, et al., “What Makes a Problem GP-Hard? Analysis of a
Tunably Difficult Problem in Genetic Programming,” Genetic
Programming and Evolvable Machines, vol 2, pp 165–191, 2001.
[11] D Zongker and W Punch, “lilgp,” v 1.02 ed Lansing: Michigan State University Genetic Algorithms Research and Applications Group, 1995.
[12] M Matsumoto and T Nishimura, “Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudorandom Number
Generator,” ACM Transactions on Modeling and Computer
Simulation, vol 8, pp 3–30, 1998.
[13] J R Koza, Genetic Programming: On the Programming of
Computers by Means of Natural Selection Cambridge: The MIT
Press, 1992.
[14] E R Tufte, The Visual Display of Quantitative Information.
Cheshire, CT: Graphics Press, 1983.
[15] E R Tufte, Envisioning Information Cheshire, CT: Graphics Press,
1990.
[16] E R Tufte, Visual Explanations: Images and Quantities, Evidence
and Narrative Cheshire, CT: Graphics Press, 1997.
[17] J P Rosca and D H Ballard, “Rooted-Tree Schemata in Genetic
Programming,” in Advances in Genetic Programming 3, L Spector,
et al., Eds Cambridge: The MIT Press, 1999, pp 243–271.
[18] W B Langdon and R Poli, Foundations of Genetic Programming.
Berlin: Springer-Verlag, 2002.
[19] J M Daida and A Hilss, “Identifying Structural Mechanisms in
Standard Genetic Programming,” in GECCO ’03, E Cantú-Paz, et
al., Eds Berlin: Springer-Verlag, 2003, pp 1639–1651.
[20] J.-J Kim and B.-Y Zhang, “Effects of Selection Schemes in
Genetic Programming for Time Series Prediction,” in CEC ’99, vol.
1 Piscataway: IEEE Press, 1999, pp 252–258.
[21] B.-Y Zhang and J.-J Kim, “Comparison of Selection Methods for
Evolutionary Computation,” Evolutionary Optimization, vol 2, pp.
55–70, 2000.
[22] T Bickle and L Thiele, “A Mathematical Analysis of Tournament
Selection,” in ICGA ’95, L J Eshelman, Ed San Francisco: Morgan
Kaufmann Publishers, 1995, pp 9–16.
[23] T Motoki, “Calculating the Expected Loss of Diversity of Selection
Schemes,” Evolutionary Computation, vol 10, pp 397–422, 2002.