0 ratings 0% found this document useful (0 votes) 165 views 432 pages Goldberg Genetic Algorithms in Search
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Goldberg Genetic Algorithms in Search For Later A
y
~< GENETIC
ALGORITHMS
FX in Search,
= Optimization &
Machine Learning
oe
N
Za
; DAVID E. GOLDBERG
=A
Ny
“=GENETIC
ALGORITHMS
in Search,
Optimization &
Machine Learning
DAVID E. GOLDBERG
As agraduate student at the University of
Michigan, David E. Goldberg spear.
headed a successful project applying
genetic algorithms and classifier systems
to the control of natural gas pipelines.
After receiving his Ph.D. at the University
of Michigan, Dr. Goldberg joined the fac-
ulty of the University of Alabama at Tusca-
Joosa where he is now Associate Professor
of Engineering Mechanics.
Dr. Goldberg has continued his research
in genetic algorithms and classifier sys-
tems and received a 1985 NSE Presiden-
tial Young Investigator Award for his
work. Dr. Goldberg has 12 years of con.
sulting experience in industry and gov-
ernment and has published numerous
articles and papers.Genetic Algorithms in
Search, Optimization, and
Machine LearningGenetic Algorithms in
Search, Optimization, and
Machine Learning
David E. Goldberg
The University of Alabama
a
vw
ADDISON-WESLEY PUBLISHING COMPANY, INC.
Reading, Massachusetts + Menlo Park, Californio + Sydney
Don Mills, Ontario + Madrid + SanJuan + NewYork + Singapore
Amsterdam + Wokingham, England + Tokyo = BonnThe procedures and applications presented in this book have been included for their in-
structional value. They have been tested with care but are not guaranteed for any partic
ular purpose. The publisher does not offer any warranties or representations, nor does it
accept any liabilities with respect to the programs or applications.
Library of Congress Cataloging-in-Publication Data
Goldberg, David E. (David Edward), 1953—
Genetic algorithms in search, optimization, and machine learning,
Bibliography: p.
Includes index.
1. Combinatorial optimization. 2. Algorithms.
3.Machine learning. L Title.
QAs02.5.G635 1989 0063’ 88-6276
ISBN 0-201-15767-5
Reprinted with corrections January, 1989
Copyright © 1989 by Addison-Wesley Publishing Company, Inc.
Al rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, o transmitted, in any form or by any means, electronic, mechanical, photocopy:
ing, recording, or otherwise, without the prior written permission of the publisher. Printed
in the United States of America. Published simultaneously int Canada.
DEFGHIJ-HA-943210Foreword
I first encountered David Goldberg as a young, PhD-bound Civil Engineer inquir
ing about my course Introduction to Adaptive Systems. He was something of an
anomaly becuase his severely practical field experience in the gas-pipeline indus-
try, and his evident interest in that industry, did not seem to dampen his interest
in what was, after all, an abstract course involving a lot of “biological stuff,” After
he enrolled in the course, I soon realized that his expressed interests in control,
‘gas pipelines, and AI were the surface tell-tales of a wide-ranging curiosity and a
talent for careful analysis. He was, and is, an engineer interested in building, but
the was, and is, equally interested in ideas.
Not long thereafter, Dave asked if | would be willing to co-chair (with Ben
Wylie, the chairman of our Civil Engineering Department ) a dissertation investi
ating the use of genetic algorithms and classifier systems in the contro! of gas-
Pipeline transmission. My first reaction was that this was too difficult a problem
for a dissertation—there are no closed analytic solutions to even simple versions
of the problem, and actual operation involves long, craftsmanlike apprenticeships,
Dave persisted, and in a surprisingly short time produced a dissertation that, in
turn, produced for him a 1985 NSF Presidential Young Investigator Award. So
much for my intuition as to what constitutes a reasonable dissertation.
In the past few years GAs have gone from an arcane subject known to a few
of my students, and their students, to a subject engaging the curiosity of many
different research communities including researchers in economics, political sci-
ence, psychology, linguistics, immunology, biology, and computer science. A ma-
jor reason for this interest is that GAs really work. GAs offer robust proce lures
that can exploit massively parallel architectures and, applied to classifier systems,
they provide a new route toward an understanding of intelligence and adaptation,
David Goldberg's book provides a turnpike into this territory.
‘One cannot be around David Goldberg for long without being infected by
enthusiasm and energy. That enthusiasm comes across in this book. It is alsoForeword
an embodiment of his passion for clear explanations and carefully worked ex:
amples. His book docs an exceptional job of making the methods of GAs and
classifier systems available to a wide audience. Dave is deeply interested in the
intellectual problems of GAs and classifier systems, but he is interested even more
in secing these systems used. This book, I think, will be instrumental in realizing
that ambition.
John Holland
‘Ann Arbor, MichiganPreface
This book is about genetic algorithms (GAs)—search procedures based on the
mechanics of natural selection and natural genetics. In writing it, I have tried to
bring together the computer techniques, mathematical tools, and research results
that will enable you to apply genetic algorithms to problems in your field. If you
choose to do so, you will join a growing group of researchers and practitioners
who have come to appreciate the natural analogues, mathematical analyses, and
computer techniques comprised by the genetic algorithm methodology.
The book is designed to be a textbook and a self-study guide, I have tested
the draft text in a one semester, senior-level undergraduateffirst-year graduate
course devoted to genetic algorithms. Although the students came from different
backgrounds (biochemistry, chemical engineering, computer science, electrical
engineering, engineering mechanics, English, mathematics, mechanical engineer
ing, and physics) and had wide differences in mathematical and computational
‘maturity, they all acquired an understanding of the basic algorithm and its theory
of operation. To reach such a diverse audience, the tone of the book is intention-
ally casual, and rigor has almost always been sacrificed in the interest of building
intuition and understanding, Worked out examples illustrate major topics, and
computer assignments are available at the end of each chapter.
[have minimized the mathematics, genetics, and computer background re-
quired to read this book. An understanding of introductory college-level mathe-
matics (algebra and a little calculus) is assumed. Elementary notions of counting
and finite probability are used, and Appendix A summarizes the important con-
cepts briefly. l assume no particular knowledge of genetics and define all required
genetic terminology and concepts within the text. Last, some computer program-
ming ability is necessary. If you have programmed a computer in any language,
you should be able to follow the computer examples | present. All computer code
in this book is written in Pascal, and Appendix B presents a brief introduction 10
the essentials of that language.vi Preface
Although I have not explicitly subdivided the book into separate parts, the
chapters may be grouped in two major categorics: those dealing with search and
optimization and those dealing with machine learning,
The first five chapters are devoted to genetic algorithms in search and opti
mization. Chapter 1 introduces the topic of genetic search; it also describes a
simple genetic algorithm and illustrates the Ga's application through a hand cal-
culation, Chapter 2 introduces the essential theoretical basis of GAs, covering
topics including schemata, the fundamental theorem, and extended analysis. If
you dislike theory, you can safely skip Chapter 2 without excessive loss of con-
tinuity; however, before doing So, I suggest you try reading it anyway. The math-
ematical underpinnings of GAS are not difficult to follow, but their ramifications
are subtle; some attention to analysis early in the study of GAs promotes fuller
understanding of algorithm power. Chapter 3 introduces computer implementa-
tion of genetic algorithms through example. Specifically, a Pascal code called the
simple genetic algorithm (SGA) is presented along with a number of extensions.
Chapter 4 presents a historical account of early genetic algorithms together with
a potpourri of current applications. Chapter 5 examines more advanced genetic
operators and presents a number of applications illustrating their use, These in-
clude applications of micro- and macro-level operators as well as hybrid
techniques.
Chapters 6 and 7 present the application of genctic algorithms in machine
learning systems. Chapter 6 gives a generic description of one type of gen
based machine learning (GBML) system, a classifier system. The theory of oper-
ation of such a system is briefly reviewed, and one Pascal implementation called
the simple classificr system (SCS) is presented and applied to the learning of a
boolean function. Chapter 7 rounds out the picture of GBML by presenting a
historical review of carly GBML systems together with a selective survey of other
current systems and topics.
ACKNOWLEDGMENTS
In writing acknowledgments for a book on genetic algorithms, there is no ques-
tion who should get top billing. I thank John H. Holland from the University of
Michigan for his encouragement of this project and for giving birth to the infant
Wwe now recognize as the genetic algorithms movement. It hasn't been easy nur-
turing such a child. At times she showed signs of stunted intellectual growth, and
the other kids on the block haven't always treated her very nicely. Nonetheless,
John stood by his baby with the quiet confidence only a father can possess, know:
ing that his daughter would one day take her rightful place in the community of
ideas.
also thank two men who have influenced me in more ways than they know:
E. Benjamin Wylie and William D. Jordan. Ben Wylie was my dissertation adviserPreface vil
in Civil Engineering at the University of Michigan. When I approached him with
the idea for a dissertation about gas pipelines and genetic algorithms, he was
appropriately skeptical, but he gave me the rope and taught me the research and
organizational skills necessary not to hang myself Bill Jordan was my Department
Head in Engineering Mechanics at The University of Alabama (he retired in
1986). He was and continues to be a model of teaching quality and administrative
fairness that I still strive to emulate.
I thank my colleagues in the Department of Engineering Mechanics at Ala-
bama, A. E. Carden, C. H. Chang, C. R Evces, S.C. Gambrell, J. L. Hill, S. E. Jones,
D.C. Rancy, and H. B. Wilson, for their encouragement and support, I also thank
my many colleagues in the genetic algorithms community. P: ks are
due Stewart Wilson at the Rowland Institute for Science for providing special
encouragement and a sympathetic ear on numerous occasions.
I thank my students (the notorious Bama Gene Hackers), including ©. L.
Bridges, K. Deb, C. L. Karr, C. H. Kuo, R. Lingle, Jr, M. P. Samtani, P. Segrest, T.
Sivapalan, R. E, Smith, and M. Valenzucla-Rendon, for lots of long hours and hard
work. | also recognize the workmanlike assistance rendered by a string of right
hand persons: A. L. Thomas, S. Damsky, B. Korb, and K. Y. Lee.
1 acknowledge the editorial assistance provided by Sarah Bane Wood at Ala-
ama, I am also grateful to the team at Addison-Wesley, including Peter Gordon,
Helen Goldstein, Helen Wythe, and Cynthia Benn, for providing expert advice
and assistance during this project.
I thank the reviewers, Ken De Jong, John Holland, and Stewart Wilson, for
their comments and suggestions.
‘A number of individuals and organizations have granted permission to reprint
or adapt materials originally printed elsewhere. I gratefully acknowledge the per-
mission granted by the following individuals: L. B. Booker, G. E. P. Box, K. A. De
Jong, S. Forrest, J.J. Grefenstette, J. H. Holland, J. D. Schatfer, S. E Smith, and S. W.
Wilson, I also acknowledge the permission granted by the following organiza-
tions: Academic Press, Academic Press London Ltd. (Journal of Theoretical Bi-
ology), the American Society of Civil Engineers, the Association for Computing
Machinery, the Conference Committee of the International Conference on Ge-
netic Algorithms, Kluwer Academic Publishers (Machine Learning), North-Hol-
land Physics Publishing, the Royal Statistical Society Journal of the Royat
Statistical Society, C), and John Wiley and Sons, Inc.
I thank my spouse and still best friend, Mary Ann, for her patience and assis-
tance. There were more than a few evenings and weekends I didn't come home
when I said I would, and she proofread the manuscript, judiciously separating my
tolerable quips from my unacceptable quirks. Untold numbers of readers would
thank you, Nary (sic), if they knew the fate they have been spared by your sound
judgment.
This material is bascd upon work supported by the National Science Foun-
dation under Grant MSM-8451610. I am also grateful for rescarch support pro-
vided by the Alabama Rescarch Institute, Digital Equipment Corporation, Intelvill
Preface
Corporation, Mr. Peter Prater, the Rowland Institute for Science, Texas Instru-
ments Incorporated, and The University of Alabama
Last, it has become a cliche in textbooks and monographs; after thanking one
and all for their assistance, the author gallantly accepts blame for all remaining
errors in the text. This is usually done with no small amount of pomp and cit-
cumstance—a ritualistic incantation to ward off the evil spirits of error. I will
forgo this exercise and close these acknowledgments by paraphrasing a piece of
graffiti that I first spotted on the third floor of the West Engineering Building at
the University of Michigan:
‘To err is human. To really foul up, use a computer,
Unfortunately, in writing this book, I find myself subject to both of these sources
of error, and no doubt many mistakes remain. I can only take comfort in knowing
that error is the one inevitable side effect of our human past and the probable
destiny of our artificially intelligent future.Contents
FOREWORD iii
PREFACE v
A GENTLE INTRODUCTION
TO GENETIC ALGORITHMS 1
What Are Genetic Algorithms? 1
Robustness of Traditional Optimization and Search Methods 2
‘The Goals of Optimization 6
How Are Genetic Algorithms Different from Traditional Methods? 7
A Simple Genetic Algorithm 10
Genetic Algorithms at Work—a Simulation by hand 15
Grist for the Search Mill—Important Similarities 18
‘Similarity Templates (Schemata) 19
Learning the Lingo 21
Summary 22
Problems 23
‘Computer Assignments 25
GENETIC ALGORITHMS REVISITED:
MATHEMATICAL FOUNDATIONS 27
‘Who Shall Live and Who Shall Dic? The Fundamental Theorem 28
Schema Processing at Work: An Example by Hand Revisited 33
The Two-armed and k-armed Bandit Problem 36
How Many Schemata Are Processed Usefully? 40Contents
The Building Block Hypothesis 41
Another Perspective: The Minimal Deceptive Problem 46
Schemata Revisited: Similarity Templates as Hyperplanes 53
Summary $4
Problems 55
Computer Assignments 56
COMPUTER IMPLEMENTATION OF
AGENETIC ALGORITHM 59
Data Structures 60
Reproduction, Crossover, and Mutation 62
A Time to Reproduce, a Time to Cross 66
Get with the Main Program 68
How Well Does it Work? 70
Mapping Objective Functions to Fitness Form 75
Fitness Scaling 76
Codings 80
‘A Multiparameter, Mapped, Fixed-Point Coding 82
Discretization 84
Constraints 85
Summary 86
Problems 87
Computer Assignments 88
SOME APPLICATIONS OF GENETIC ALGORITHMS 89
The Rise of Genetic Algorithms 89
Genetic Algorithm Applications of Historical Interest. 92
De Jong and Function Optimization 106
Improvements in Basic Technique 120
Current Applications of Genetic Algorithms — 125
Summary 142
Problems 143
Computer Assignments 145
ADVANCED OPERATORS AND TECHNIQUES IN
GENETIC SEARCH 147
Dominance, Diploidy, and Abeyance 148
Inversion and Other Reordering Operators 166Contents xi
Other Micro-operators 179
Niche and Speciation 185
Multiobjective Optimization 197
Knowledge-Based Techniques 201
Genetic Algorithms and Parallel Processors 208
Summary 212
Problems 213
‘Computer Assignments 214
INTRODUCTION TO GENETICS-BASED
MACHINE LEARNING 217
Genetics-Based Machine Learning: Whence It Came 218
What is a Classifier System? 221
Rule and Message System 223
Apportionment of Credit: The Bucket Brigade 225
Genetic Algorithm 229
‘A Simple Classifier System in Pascal 230
Results Using the Simple Classifier System — 245
Summary 256
Problems 258
Computer Assignments 259
APPLICATIONS OF GENETICS-BASED
MACHINE LEARNING 261
‘The Rise of GBML 261
Development of CS-1, the First Classifier System 265
Smith’s Poker Player 270
Other Early GBML Efforts 276
4 Potpourri of Current Applications 293
Summary 304
Problems 306
Computer Assignments 307
ALOOK BACK, AGLANCE AHEAD 309
APPENDIXES 313xl
Contents
A REVIEW OF COMBINATORICS
AND ELEMENTARY PROBABILITY 313
Counting 313
Permutations 314
Combinations 316
Binomial Theorem 316
Events and Spaces 317
Axioms of Probability 318
Equally Likely Outcomes 319
Conditional Probability 321
Partitions of an Event 321
Bayes’ Rule 322
Independent Events 322
‘Two Probability Distributions: Bernoulli and Binomial 323
Expected Value of a Random Variable 323
Limit Theorems 324
Summary 324
Problems 325
PASCAL WITH RANDOM NUMBER GENERATION FOR.
FORTRAN, BASIC, AND COBOL PROGRAMMERS 327
Simple1: An Extremely Simple Code 327
imple2: Functions, Procedures, and More VO 330
Let's Do Something 332
Last Stop Before Freeway 338
Summary 341
A SIMPLE GENETIC ALGORITHM
(SGA) IN PASCAL 343
‘A SIMPLE CLASSIFIER SYSTEM
(SCS) IN PASCAL 351
PARTITION COEFFICIENT TRANSFORMS FOR
PROBLEM-CODING ANALYSIS 373
Partition Coefficient Transform 374
An Example: f(x) = x¢on Three Bits a Day 375
What do the Partition Coefficients Mean? 376Contents xl
Using Partition Coefficients to Analyze Deceptive Problems 377
Designing GA-Deceptive Problems with Partition Coefficients 377
Summary 378
Problems 378
Computer Assignments 379
BIBLIOGRAPHY 381
INDEX 403A Gentle Introduction to
Genetic Algorithms
In this chapter, we introduce genetic algorithms: what they are, where they came
from, and how they compare to and differ from other search procedures, We
illustrate how they work with a hand calculation, and we start to understand their
power through the concept of a schema or similarity template
WHAT ARE GENETIC ALGORITHMS?
Genetic algorithms are search algorithms based on the mechanics of natural se-
lection and natural genetics. They combine survival of the fittest among string
structures with a structured yet randomized information exchange to form a
search algorithm with some of the innovative flair of human search. In every
generation, a new set of artificial creatures (strings) is created using bits and
pieces of the fittest of the old; an occasional new part is tried for good measure.
While randomized, genetic algorithms are no simple random walk, They effi
ciently exploit historical information to speculate on new search points with ex-
pected improved performance.
Genetic algorithms have been developed by John Holland, his colleagues, and
his students at the University of Michigan. The goals of their research have been
twofold: (1) to abstract and rigorously explain the adaptive processes of natural
systems, and (2) to design artificial systems software that retains the important
mechanisms of natural systems. This approach has led to important discoveries
in both natural and artificial systems science.
‘The central theme of research on genetic algorithms has been robustness,
the balance between efficiency and efficacy necessary for survival in many differ-2 Chapter 1 / A Gentle Introduction to Genetic Algorithms
ent environments. The implications of robustness for artificial systems are mani-
fold, If artificial systems can be made more robust, costly redesigns can be
reduced or climinated. If higher levels of adaptation can be achieved, existing
systems can perform their functions longer and better. Designers of artificial sys-
tems—both software and hardware, whether engineering systems, computer sys
tems, or business systems—can only marvel at the robustness, the efficiency, and
the flexibility of biological systems. Features for self-repair, self-guidance, and re
production are the rule in biological systems, whereas they barely exist in the
most sophisticated artificial systems.
‘Thus, we are drawn to an interesting conclusion: where robust performance
is desired (and where is it not?), nature does it better; the secrets of adaptation
and survival are best learned from the careful study of biological example, Yet
‘we do not accept the genetic algorithm method by appeal to this beauty-of nature
argument alone. Genetic algorithms are theoretically and empirically proven to
provide robust search in complex spaces. The primary monograph on the topic
is Holland's (1975) Adaptation in Natural and Artificial Systems. Many papers
and dissertations establish the validity of the technique in function optimization
and control applications. Having been established as a valid approach to problems
requiring efficient and effective search, genetic algorithms are now finding more
widespread application in business, scientific, and engineering circles. The rea-
sons behind the growing numbers of applications are clear. These algorithms are
computationally simple yet powerful in their search for improvement. Further-
more, they are not fundamentally limited by restrictive assumptions about the
search space (assumptions concerning continuity, existence of derivatives, uni-
‘modality, and other matters). We will investigate the reasons behind these attrac
tive qualities; but before this, we need to explore the robustness of more widely
accepted search procedures.
ROBUSTNESS OF TRADITIONAL OPTIMIZATION
AND SEARCH METHODS
This book is not a comparative study of search and optimization techniques
Nonetheless, it is important to question whether conventional search methods
meet our robustness requirements. The current literature identifies three main
types of search methods: calculus-based, enumerative, and random. Let us ex-
amine each type to see what conclusions may be drawn without formal testing.
Calculus-based methods have been studied heavily. These subdivide into two
main classes: indirect and direct. Indirect methods seek local extrema by solving
the usually nonlinear set of equations resulting from setting the gradient of the
objective function equal to zero. This is the multidimensional generalization of
the elementary calculus notion of extremal points, as illustrated in Fig 1.1. Given
4 smooth, unconstrained function, finding a possible peak starts by restricting
search to those points with slopes of zero in all directions. On the other hand,Robusiness of Traditional Optimization and Search Methods 3
f(x,y)
FIGURE 1.1 The single-peak function is easy for calculus-based methods.
direct (search) methods seek local optima by hopping on the function and mov-
ing in a direction related to the local gradient. This is simply the notion of hill-
climbing: to find the local best, climb the function in the steepest permissible
direction. While both of these calculus-based methods have been improved,
extended, hashed, and rehashed, some simple reasoning shows their lack of
robustness.
, both methods are local in scope; the optima they seek are the best in a
neighborhood of the current point. For example, suppose that Fig. 1.1 shows a
Portion of the complete domain of interest; a more complete picture is shown in
Fig. 1.2. Cleatly, starting the search or zero-finding procedures in the neighbor:
hood of the lower peak will cause us to miss the main event (the higher peak),
Furthermore, once the lower peak is reached, further improvernent must be
sought through random restart or other trickery. Second, calculus-based methods
depend upon the existence of derivatives (well-defined slope values). Even if we
allow numerical approximation of derivatives, this is a severe shortcoming. Many
practical parameter spaces have little respect for the notion of a derivative and
the smoothness this implies. Theorists interested in optimization have been too
willing to accept the legacy of the great eighteenth and nineteenth-century math-
‘ematicians who painted a clean world of quadratic objective functions, ideal con-
straints, and ever present derivatives. The real world of search is fraught with
discontinuities and vast multimodal, noisy search spaces as depicted in a less
calculus-friendly function in Fig, 1.3. It comes as no surprise that methods de
pending upon the restrictive requirements of continuity and derivative existence
are unsuitable for all but a very limited problem domain, For this reason andChapter 1 / A Gentle Introduction to Genetic Algorithms
1(x, ¥)
FIGURE 1.2. The multiple-peak function causes a dilemma. Which hill should
we climb?
because of their inherently local scope of search, we must reject calculus-based
methods. They are insufficiently robust in unintended domains.
Enumerative schemes have been considered in many shapes and sizes. The
idea is fairly straightforward; within a finite scarch space, or a discretized infinite
search space, the search algorithm starts looking at objective function values at
every point in the space, one at a time. Although the simplicity of this type of
f(x) 199
10
0.09) 020) ue 00 ono 100
x
FIGURE 1.3 Many functions are noisy and discontinuous and thus unsuitable
for search by traditional methods.Robustness of Traditional Optimization and Search Methods 5
algorithm is attractive, and enumeration is a very human kind of search (when
the number of possibilities is small), such schemes must ultimately be discounted
in the robustness race for one simple reason: lack of efficiency, Many practical
spaces are simply too large to search one ata time and still have a chance of using
the information to some practical end. Even the highly touted enumerative
scheme dynamic programming breaks down on problems of moderate size and
complexity, suffering from a malady melodramatically labeled the “curse of di
‘mensionality” by its creator (Bellman, 1961), We must conclude that less clever
enumerative schemes are similarly, and more abundantly, cursed for real
problems,
Random search algorithms have achieved increasing popularity as research:
ers have recognized the shortcomings of calculus-based and enumerative
schemes. Yet, random walks and random schemes that search and save the best
must also be discounted because of the efficiency requirement, Random searches,
im the long run, can be expected to do no better than enumerative schemes. In
our haste to discount strictly random search methods, we must be careful to
separate them from randomized techniques, The genetic algorithm is an example
of a search procedure that uses random choice as a tool to guide a highly explo
tative search through a coding of a parameter space. Using random choice as
tool in a directed search process seems strange at first, but nature contains many
examples. Another currently popular search technique, simulated anneating,
uses random processes to help guide its form of search for minimal energy states.
A recent book (Davis, 1987) explores the connections between simulated an-
nealing and genetic algorithms. The important thing to recognize at this juncture
is that randomized search does not necessarily imply directionless search,
While our discussion has been no exhaustive examination of the myriad
methods of traditional optimization, we are left with a somewhat unsettling con-
clusion: conventional search methods are not robust. This does not imply that
they are not useful. The schemes mentioned and countless hybrid combinations
and permutations have been used successfully in many applications; however, as
more complex problems are attacked, other methods will be necessary. To put
this point in better perspective, inspect the problem spectrum of Fig. 1.4. In the
figure a mythical effectiveness index is plotted across a problem continuum for a
specialized scheme, an cnumerative scheme, and an idealized robust scheme. The
gradient technique performs well in its narrow problem class, as we expect, but
it becomes highly inefficient (if useful at all) elsewhere. On the other hand, the
enumerative scheme performs with egalitarian inefficiency across the spectrum
of problems, as shown by the lower performance curve. Far more desirable would
be a performance curve like the one labeled Robust Scheme. It would be worth-
while sacrificing peak performance on a particular problem to achieve a relatively
high level of performance across the spectrum of problems. (Of course, with
broad, efficient methods we can always create hybrid schemes that combine the
best of the local search method with the more general robust scheme. We will
have more to say about this possibility in Chapter 5.) We shall soon see how
genetic algorithms help fill this robustness gap.6 Chapter 1 / A Gentle Introduction to Genetic Algorithms
Robust Schone
Specialized Scheme
Efficiency
endo Walk
combinatorie! enmodel ‘multimodal
Problem Type
FIGURE 1.4 Many traditional schemes work well in a narrow problem domain,
Enumerative schemes and random walks work equally inefficiently across a broad
spectrum. A robust method works well across a broad spectrum of problems.
THE GOALS OF OPTIMIZATION
Before examining the mechanics and power of a simple genetic algorithm, we
must be clearer about our goals when we say we want to optimize a function or
a process. What are we trying to accomplish when we optimize? The conven:
tional view is presented well by Beightler, Phillips, and Wilde (1979, p. 1),
Man's longing for perfection finds expression in the theory of optimiza:
tion. It studies how to describe and attain what is Best, once one knows
how to measure and alter what is Good or Bad... Optimization theory
‘encompasses the quantitative study of optima and methods for finding,
them.
‘Thus optimization seeks to improve performance toward some optimal point or
points. Note that this definition has two parts: (11) we seek improvement to ap-
proach some (2) optimal point. There is a clear distinction between the process
of improvement and the destination or optimum itself. Yet, in judging optimiz:-
tion procedures we commonly focus solely upon convergence (does the method
reach the optimum?) and forget entirely about interim performance. This empha-
sis stems from the origins of optimization in the calculus. It is not, however, 2
natural emphasis.How are Genetic Algorithms Different from Traditional Methods? 7
Consider a human decision maker, for example, a businessman, How do we
judge his decisions? What criteria do we use to decide whether he has done
a good or bad job? Usually we say he has done well when he makes adequate
selections within the time and resources allotted. Goodness is judged relative
to his competition. Does he produce a better widget? Does he get it to market
more efficiently? With better promotion? We never judge a businessman by an
attainment-of-the-best criterion; perfection is all too stern a taskmaster. AS a re-
sult, we conclude that convergence to the best is not an issue in business or in
most walks of life; we are only concerned with doing better relative to others.
‘Thus, if we want more humanlike optimization tools, we are led to a reordering
of the prioritics of optimization. The most important goal of optimization is im-
provement. Can we get to some good, “satisficing” (Simon, 1969) level of per:
formance quickly? Attainment of the optimum is much less important for
complex systems. It would be nice to be perfect: meanwhile, we can only strive
to improve. In the next chapter we watch the genetic algorithm for these quali
ties; here we outline some important differences between genetic algorithms and
more traditional methods.
HOW ARE GENETIC ALGORITHMS DIFFERENT FROM
TRADITIONAL METHODS?
In order for genetic algorithms to surpass their more traditional cousins in the
quest for robustness, GAs must differ in some very fundamental ways. Genetic
algorithms are different from more normal optimization and search procedures
in four ways:
1. GAs work with a coding of the parameter set, not the parameters themselves,
2. GAs search from a population of points, not a single point.
3. GAs use payoff (objective function) information, not derivatives or other
auxiliary knowledge.
4, GAs use probabilistic transition rules, not deterministic rules.
Genetic algorithms require the natural parameter set of the optimization
problem to be coded as a finite-length string over some finite alphabet. AS an
‘example, consider the optimization problem posed in Fig. 1.5. We wish to maxi-
mize the function f(x) = x" on the integer interval (0, 31]. With more traditional
methods we would be tempted to twiddle with the parameter x, turning it like
the vertical hold knob on a television set, until we reached the highest objective
function value. With GAs, the first step of our optimization process is to code the
parameter x as a finite-length string. There are many ways to code the x param
eter, and Chapter 3 examines some of these in detail. At the moment, let's con
sider an optimization problem where the coding comes a bit more naturally.
Consider the black box switching problem illustrated in Fig, 1.6. This prob:
lem concerns a black box device with a bank of five input switches. For every
setting of the five switches, there is an output signal , mathematically f= f(s),Chapter 1 / A Gentle Introduction to Genetic Algorithms
fo)
‘ _—
¢ a
FIGURE 1.5 A simple function optimization example, the function tx) = x" on
the integer interval (0, 31].
where 5 is a particular setting of the five switches, The objective of the problem
is to set the switches to obtain the maximum possible f value. With other meth
‘ods of optimization we might work directly with the parameter set (the switch
settings) and toggle switches from one setting to another using the transition
rules of our particular method. With genetic algorithms, we first code the
switches as a finite-length string. A simple code can be generated by considering
a string of five 1's and O's where each of the five switches is represented by a 1 if
the switch is on and a 0 if the switch is off. With this coding, the string 11110
codes the setting where the first four switches are on and the fifth switch is off
Some of the codings introduced later will not be so obvious, but at this juncture
we acknowledge that genetic algorithms use codings. Later it will be apparent
FIGURE 1.6 A black box optimization problem with five on-off switches illus-
trates the idea of a coding and a payoff measure. Genetic algorithms only require
these two things: they don't need to know the workings of the black box.How are Genetic Algorithms Different from Traditional Methods? 9
that genetic algorithms exploit coding similarities in a very general way
result, they arc largely unconstrained by the limitations of other methods (con-
tinuity, derivative existence, unimodality, and so on),
In many optimization methods, we move gingerly from a single point in the
decision space to the next using some transition rule to determine the next point.
This point-to-point method is dangerous because it is a perfect prescription for
locating false peaks in multimodal (many-peaked) search spaces. By contrast, GAS
work from a rich database of points simultancously (a population of strings),
climbing many peaks in parallel; thus, the probability of finding a false peak is
reduced over methods that go point to point. As an cxample, let's consider our
black box optimization problem (Fig. 1.6) again. Other techniques for solving
this problem might start with one set of switch settings, apply some transition
rules, and generate a new trial switch setting. A genetic algorithm starts with a
population of strings and thereafter generates successive populations of strings
For example, in the five-switch problem, a random start using successive coin
flips (head = 1, tail = 0) might generate the initial population of size n = 4
(small by genetic algorithm standards)
01101
11000
01000
10011
After this start, successive populations are generated using the genetic algorithm,
By working from a population of well-adapted diversity instead of a single point,
the genetic algorithm adheres to the old adage that there is safety in numbers;
we will soon see how this parallel flavor contributes to a genetic algorithm's
robustness.
Many search techniques require much auxiliary information in order to work
properly. For example, gradient techniques need derivatives (calculated analyti-
cally or numerically) in order to be able to climb the current peak, and other
local search procedures like the greedy techniques of combinatorial optimization
(Lawler, 1976; Syslo, Deo, and Kowalik, 1983) require access to most if not all
tabular parameters, By contrast, genetic algorithms have no need for all this aux
iliary information: GAS are blind. To perform an effective search for better and
better structures, they only require payoff values (objective function values) as-
sociated with individual strings. This characteristic makes a GA a more canonical
method than many search schemes. After all, every search problem has a metric
(or metrics) relevant to the search; however, different search problems have
vastly different forms of auxiliary information. Only if we refuse to use this aux-
iliary information can we hope to develop the broadly based schemes we desire
(On the other hand, the refusal to use specific knowledge when it does exist can
place an upper bound on the performance of an algorithm when it goes head to
head with methods designed for that problem. Chapter 5 examines ways to use
nonpayoff information in so-called knowledge-directed genetic algorithms; how:
ever, at this juncture we stress the importance of the blindness assumption to
pure genetic algorithm robustness.10 Chapter 1 //A Gentle Introduction to Genetic Algorithms
Unlike many methods, GAs use probabilistic transition rules to guide their
search. To persons familiar with deterministic methods this secms odd, but the
use of probability does not suggest that the method is some simple random
search; this is not decision making at the toss of a coin. Genetic algorithms use
random choice as a tool to guide a search toward regions of the search space with
likely improvement,
Taken together, these four differences—direct use of a coding, search from a
population, blindness to auxiliary information, and randomized operators—con-
tribute to a genetic algorithm's robustness and resulting advantage over other
more commonly used techniques. The next section introduces a simple three
‘operator genetic algorithm,
A SIMPLE GENETIC ALGORITHM
‘The mechanics of a simple genetic algorithm are surprisingly simple, involving
nothing more complex than copying strings and swapping partial strings. The
explanation of why this simple process works is much more subtle and powerful.
Simplicity of operation and power of effect are two of the main attractions of the
genetic algorithm approach.
‘The previous section pointed out how genetic algorithms process popula-
tons of strings. Recalling the black box switching problem, remember that the
initial population had four strings:
o1101
11000
01000
10011
Also recall that this population was chosen at random through 20 successive flips
of an unbiased coin. We now must define a set of simple operations that take this
initial population and generate successive populations that (we hope) improve
over time.
A simple genetic algorithm that yields good results in many practical prob-
lems is composed of three operators:
1, Reproduction
2. Crossover
3. Mutation
Reproduction is a process in which individual strings are copied according
to their objective function values, f (biologists call this function the fitness func
tion), Intuitively, we can think of the function fas some measure of profit, utility,
or goodness that we want to maximize, Copying strings according to their fitness
values means that strings with a higher value have a higher probability of con-
tributing one or more offspring in the next generation. This operator, of course,
is an artificial version of natural selection, a Darwinian survival of the fittestASimple Genetic Algorithm n
TABLE 1.1 Sample Problem Strings and Fitness Values
No. String Fitness % of Tor
1 1101 169 44
2 11000 576 49.2
3 01000 64 55
4 10011 361 309,
Toral 1170 1000
among string creatures. In natural populations fitness is determined by a crea-
ture’ ability to survive predators, pestilence, and the other obstacles to adult-
hood and subsequent reproduction, In our unabashedly artificial setting, the
objective function is the final arbiter of the string-creature’s life or death,
The reproduction operator may be implemented in algorithmic form in a
number of ways. Perhaps the easiest is to create a biased roulette wheel where
each current string in the population has a roulette wheel slot sized in proportion
to its fitness. Suppose the sample population of four strings in the black box
problem has objective or fitness function values fas shown in Table 1.1 (for now
‘we accept these values as the output of some unknown and arbitrary black box—
later we will examine a function and coding that generate these same values).
Summing the fitness over all four strings, we obtain a total of 1170. The
percentage of population total fitness is also shown in the table. The correspond
ing weighted roulette wheel for this generation's reproduction is shown in Fig
1.7. To reproduce, we simply spin the weighted roulette wheel thus defined four
times. For the example problem, string number 1 has a fitness value of 169, which
represents 14.4 percent of the total fitness. As a result, string 1 is given 14.4
percent of the biased roulette wheel, and cach spin turns up string 1 with prob-
10x
saz
FIGURE 1.7. Simple reproduction allocates offspring strings using a roulette
wheel with slots sized according to fitness. The sample wheel is sized for the
problem of Tables 1.1 and 1.2.2
Chapter 1 /A Gentle Introduction to Genetic Algorithms
ability 0.144, Each time we require another offspring, a simple spin of the
weighted roulette wheel yields the reproduction candidate. In this way, more
highly fit strings have a higher number of offspring in the succeeding generation.
Once a string has been selected for reproduction, an exact replica of the string
is made. This string is then entered into a mating pool, a tentative new population,
for further genetic operator action.
After reproduction, simple crossover (Fig. 1.8) may proceed in two steps.
First, members of the newly reproduced strings in the mating pool are mated at
random. Second, each pair of strings undergoes crossing over as follows: an in
teger position & along the string is selected uniformly at random between 1 and
the string length less one [1, / — 1], Two new strings are created by swapping all
characters between positions & + 1 and f inclusively. For example, consider
strings A, and A, from our example initial population:
A= 0110/1
A= 1100/0
Suppose in choosing a random number between 1 and 4, we obtain ak = 4 (as
indicated by the separator symbol | ). The resulting crossover yields two new
strings where the prime (’) means the strings are part of the new generation:
Ww, 201100
4a,211001
BEFORE CROSSOVER AFTER CROSSOVER
crass sire
STRING 1 W/V ors
=x)
a ARiitiemo4
FIGURE 1.8. A schematic of simple crossover shows the alignment of two
strings and the partial exchange of information, using a cross site chosen at
random.A Simple Genetic Algorithm 13
‘The mechanics of reproduction and crossover are surprisingly simple, involv-
ing random number generation, string copics, and some partial string exchanges,
Nonetheless, the combined emphasis of reproduction and the structured, though
randomized, information exchange of crossover give genetic algorithms much of
their power. At first this seems surprising. How can two such simple (and com-
putationally trivial) operators result in anything useful, let alone a rapid and ro-
bust search mechanism? Furthermore, doesn’t it seem a little strange that chance
should play such a fundamental role in a directed search process? We will ¢x-
amine a partial answer to the first of these two questions ina moment; the answer
to the second question was well recognized by the mathematician J. Hadamard
(1949, p. 29):
We shall sce a little later that the possibility of imputing discovery to
pure chance is already excluded... On the contrary, that there is an
intervention of chance but also a necessary work of unconsciousness,
the latter implying and not contradicting the former... Indeed, it is ob-
vious that invention or discovery, be it in mathematics or anywhere else,
takes place by combining ideas.
Hadamard suggests that even though discovery is not a result—cannot be a re-
sult—of pure chance, it is almost certainly guided by directed serendipity. Fur-
thermore, Hadamard hints that a proper role for chance in a more humanlike
discovery mechanism is to cause the juxtaposition of different notions. It is in-
teresting that genetic algorithms adopt Hadamard’s mix of direction and chance
in a manner that efficiently builds new solutions from the best partial solutions,
‘of previous trials
To see this, consider a population of m strings (perhaps the four-string pop-
ulation for the black box problem) over some appropriate alphabet, coded so
that cach is a complete idea or prescription for performing a particular task Cin
this case, cach string is one complete switch-setting idea). Substrings within cach
string (idea) contain various notions of what is important or relevant to the task,
Viewed in this way, the population contains not just a sample of 1 ideas; rather,
it contains a multitude of notions and rankings of those notions for task perfor:
‘mance. Genetic algorithms ruthlessly exploit this wealth of information by (1)
reproducing high-quality notions according to their performance and (2) cross:
ing thesc notions with many other high-performance notions from other strings.
‘Thus, the action of crossover with previous reproduction speculates on new ideas
constructed from the high-performance building blocks (notions) of past trials,
In passing, we note that despite the somewhat fuzzy definition of a notion, we
have not limited a notion to simple linear combinations of single features or pairs
of features. Biologists have long recognized that evolution must efficiently pro
cess the epistasis (positionwise nonlinearity) that arises in nature. In a similar
‘manner, the notion processing of genetic algorithms must effectively process no:
tions even when they depend upon their component features in highly nonlinear
and complex ways.4
Chapter 1 / A Gentle Introduction to Genetic Algorithms
Exchanging of notions to form new ideas is appealing intuitively, if we think
in terms of the process of énnovation. What is an innovative idea? As Hadamard
suggests, most often it is a juxtaposition of things that have worked well in the
past. In much the same way, reproduction and crossover combine to search po-
tentially pregnant new ideas. This experience of emphasis and crossing is analo-
gous to the human interaction many of us have observed at a trade show or
scientific conference, At a widget conference, for example, various widget ¢x-
perts from around the world gather to discuss the latest in widget technology.
After the lecture sessions, they all pair off around the bar to exchange widget
stories, Well-known widget experts, of course, are in greater demand and ex
change more ideas, thoughts, and notions with their lesser known widget col
leagues. When the show ends, the widget people return to their widget
laboratories to try out a surfeit of widget innovations. The process of reproduc-
tion and crossover in a genetic algorithm is this kind of exchange. High-perfor-
mance notions are repeatedly tested and exchanged in the search for better and
better performance.
If reproduction according to fitness combined with crossover gives genetic
algorithms the bulk of their processing power, what then is the purpose of the
‘mutation operator? Not surprisingly, there is much confusion about the role of
‘mutation in genetics (both natural and artificial). Perhaps it is the result of too
many B movies detailing the exploits of mutant eggplants that consume mass
quantities of Tokyo or Chicago, but whatever the cause for the confusion, we find
that mutation plays a decidedly secondary role in the operation of genetic algo-
rithms. Mutation is needed because, even though reproduction and crossover
effectively search and recombine extant notions, occasionally they may become
overzealous and lose some potentially useful genetic material (1's or 0's at partic.
ular locations). In artificial genetic systems, the mutation operator protects
against such an irrecoverable loss. In the simple GA, mutation is the occasional
Gwith small probability) random alteration of the value ofa string position. In the
binary coding of the black box problem, this simply means changing a 1 to a 0
and vice versa. By itself, mutation is a random walk through the string space.
‘When used sparingly with reproduction and crossover, it is an insurance p
against premature loss of important notions.
That the mutation operator plays a secondary role in the simple GA, we sim-
ply note that the frequency of mutation to obtain good results in empirical
genetic algorithm studies is on the order of onc mutation per thousand bit (po-
sition) transfers. Mutation rates arc similarly small (or smaller) in natural popu-
lations, leading us to conclude that mutation is appropriately considered as a
secondary mechanism of genetic algorithm adaptation.
Other genetic operators and reproductive plans have been abstracted from
the study of biological example. However, the three examined in this section,
reproduction, simple crossover, and mutation, have proved to be both computa
tionally simple and effective in attacking a number of important optimization
problems. In the next section, we perform a hand simulation of the simple genetic
algorithm to demonstrate both its mechanics and its power.Genetic Algorithms at Work—A Simulation by Hand 5
GENETIC ALGORITHMS AT WORK—A SIMULATION BY HAND
Let's apply our simple genetic algorithm to a particular optimizatic~ >roblem step
by step. Consider the problem of maximizing the function f(x) = x, where x is
Permitted to vary between 0 and 31, a function displayed earlier as Fig, 1.5. To
tise a genctic algorithm we must first code the decision variables of our problem
as some finite-length string, For this problem, we will code the variable x simply
as a binary unsigned integer of length 5. Before we proceed with the simulation,
let's briefly review the notion of a binary integer. As decadigited creatures, we
have little problem handling base 10 integers and arithmetic, For example, the
five digit number 53,095 may be thought of as,
5:10! + 3-10' + 0:10? + 9:10! + 5:
53,095.
In base 2 arithmetic, we of course only have two digits to work with, 0 and 1,
and as an example the number 10,011 decodes to the base 10 number
1-24 + O24 + 0-2? + 1-2! + 1-2 = 16+ 241 = 19
With a five-bit (binary digit) unsigned integer we can obtain numbers between
© (00000) and 31 (11111). With a well-defined objective function and coding,
we now simulate a single generation of a genetic algorithm with reproduction,
‘crossover, and mutation.
‘To start off, we select an initial population at random. We select a population
of size 4 by tossing 2 fair coin 20 times. We can skip this step by using the initial
population created in this way earlier for the black box switching problem, Look-
ing at this population, shown on the left-hand side of Table 1.2, we observe that
the decoded x values are presented along with the fitness or objective function
values f(x). To make sure we know how the fitness values f(x) are calculated
from the string representation, let’s take a look at the third string of the initial
population, string 01000. Decoding this string as an unsigned binary integer, we
note that there is a single one in the 2' = 8's position, Hence for string 01000
we obtain x = 8. To calculate the fitness or objective function we simply square
the x value and obtain the resulting fitness value (x) = 64, Other x and f(x)
values may be obtained similarly.
You may notice that the fitness or objective function values are the same as
the black box values (compare Tables 1.1 and 1.2). This is no coincidence, and
the black box optimization problem was well represented by the particular func-
tion, f(x), and coding we are now using Of course, the genetic algorithm necd
not know any of this; it is just as happy to optimize some arbitrary switching
function (or any other finite coding and function for that matter) as some poly
nomial function with straightforward binary coding. This discussion simply rein
forces one of the strengths of the genetic algorithm: by exploiting similarities in
codings, genetic algorithms can deal effectively with a broader class of functions
than can many other procedures.
‘A generation of the genetic algorithm begins with reproduction. We select
the mating pool of the next generation by spinning the weighted roulette wheel16
Chapter 1 / A Gentle Introduction to Genetic Algorithms
TABLE 1.2 A Genetic Algorithm by Hand
inital
Population Value pectect,
String Randomly Unsigned) xy Roulette
No. Generated) (Vimeger) x Wheel
1 01102 3 169 O14 1
2 11000 24 5760.49 2
3 01000 8 4006 °
4 10012 19 361 1
Sam 1170 1.00 40
Average 293 025 100 to
Max 5% 049 197 20
(shown in Fig. 1.7) four times. Actual simulation of this process using coin tosses
has resulted in string 1 and string 4 receiving one copy in the mating pool, string
2 receiving two copies, and string 3 receiving no copies, as shown in the center
of Table 1.2. Comparing this with the expected number of copies (1 pselect,) we
haye obtained what we should expect: the best get more copies, the average stay
even, and the worst dic off
‘With an active pool of strings looking for mates, simple crossover proceeds
in two steps: (1) strings are mated randomly, using coin tosses to pair off the
happy couples, and (2) mated string couples cross over, using coin tosses to
select the crossing sites. Referring again to Table 1.2, random choice of mates has
selected the second string in the mating pool to be mated with the first. With a
crossing site of 4, the two strings 01101 and 11000 cross and yield two new
strings 01100 and 11001, The remaining two strings in the mating pool are
crossed at site 2; the resulting strings may be checked in the table.
‘The last operator, mutation, is performed on a bit-by-bit basis. We assume
that the probability of mutation in this test is 0,001. With 20 transferred bit po:
sitions we should expect 20-0.001 = 0.02 bits to undergo mutation during a
given generation, Simulation of this process indicates that no bits undergo mu-
tation for this probability value. As a result, no bit positions are changed from 0
to 1 or vice versa during this generation.
Following reproduction, crossover, and mutation, the new population is
ready to be tested. To do this, we simply decode the new strings created by the
simple genetic algorithm and calculate the fitness function values from the x
values thus decoded. The results of a single generation of the simulation are
shown at the right of Table 1.2. While drawing concrete conclusions from a single
trial of a stochastic process is, at best, 2 risky business, we start to see how genetic
algorithms combine high-performance notions to achieve better performance. In
the table, note how both the maximal and average performance have improved
in the new population. The population average fitness has improved from 293 t0Genetic Algorithms at Work—A Simulation by Hand WV
TABLE 1.2 (Continued)
au nasrane: Mate Crossover Site
Reproduction Randomly Randomly New x fx»
(Cross Site Shown) Selected Selected Population Value ee
o11o0|1 2 4 01100 1 4
1100/0 1 4 11001 25 625,
1ifooo 4 2 11011 27 79
Lolo1d 3 2 10000 16 256
1754
439
2
Nomis.
1) Initial population chosen by four repetitions of five coin tosses where heads = 1, tals = 0
2) Reproduction performed through 1 part in & simulation of roulette wheel selection (three
coin tosses).
3) Crossover performed through binary decoding of 2 coin tosses (TT = 00,
LHI = 11, = 3 = cross site 4).
4) Crossover probability assumed to be unity p. = 1.0.
5) Mutation probability assumed to be 0.001, p. = 0.001, Expected mutations = §:4-0,001 =
0.02. No mutations expected during a single gencration. None simulated.
439 in one generation. The maximum fitness has increased from 576 to 729 dur-
ing that same period, Although random processes help cause these happy citcum-
stances, we start to sce that this improvement is no fluke. The best string of the
first generation (11000) reccives two copies because of its high, above-average
performance. When this combines at random with the next highest string
(10011) and is crossed at location 2 (again at random), one of the resulting
strings (11011) proves to be a very good choice indeed.
This event is an excellent illustration of the ideas and notions analogy devel-
oped in the previous section. In this case, the resulting good idea is the combi-
nation of two above-average notions, namely the substrings 11-—— and ——-11.
Although the argument is still somewhat heuristic, we start to see how genetic
algorithms effect a robust search. In the next section, we expand our understand-
ing of these concepts by analyzing genetic algorithms in terms of schemata or
similarity templates.
The intuitive viewpoint developed thus far has much appeal. We have com-
pared the genetic algorithm with certain human search processes commonly
called innovative or creative, Furthermore, hand simulation of the simple genetic
algorithm has given us some confidence that indeed something interesting is
‘going on here. Yet, something is missing. What is being processed by genetic
algorithms and how do we know whether processing it (whatever it is) will lead
to optimal or near optimal results in a particular problem? Cleatly, as scientists,Chapter 1 / A Gentle Iniroduction to Genetic Algorithms
engineers, and business managers we need to understand the what and the how
of genctic algorithm performance.
To obtain this understanding, we examine the raw data available for any
search procedure and discover that we can search more effectively if we exploit
important similarities in the coding we use. This leads us to develop the impor:
tant notion of a similarity template, or schema This in turn leads us to a key
stone of the genetic algorithm approach, the building block hypothesis.
GRIST FOR THE SEARCH MILL—IMPORTANT SIMILARITIES
For much too long we have ignored a fundamental question. In a search process
given only payoff data (fitness values), what information is contained in a popu:
lation of strings and their objective function values to help guide a directed
search for improvement? To ask this question more clearly, consider the strings
and fitness values originally displayed in Table 1.1 from the simulation of the
previous section (the black box problem) and gathered below for convenience:
string Fitness
01101 169
11000 576
01000 4
10011 361
What information is contained in this population to guide a directed search for
improvement? On the face of it, there is not very much: four indepersi. ~t samples
of different strings with their fitness values. As we stare at the page. however,
quite naturally we start scanning up and down the string column, and we notice
certain similarities among the strings. Exploring these similarities in more depth,
we notice that certain string patterns seem highly associated with good perfor-
mance. The longer we stare at the strings and their fitness values, the greater is
the temptation to experiment with these high fitness associations. It seems per-
fectly reasonable to play mix and match with some of the substrings that are
highly correlated with past success. For example, in the sample population, the
strings starting with a I seem to be among the best. Might this be an important
ingredient in optimizing this function? Certainly with our function (f(x) = <")
and our coding (a five-bit unsigned integer) we know it is (why is this true?),
But, what are we doing here? Really, two separate things, First, we are seeking
similarities among strings in the population. Second, we are looking for causal
relationships between these similarities and high fitness. In so doing, we admit a
‘wealth of new information to help guide a search. To see how much and preciselySimilarity Templates (Schemato) 9
what information we admit, let us consider the important concept of a schema
(plural, schemata), or similarity template.
‘SIMILARITY TEMPLATES (SCHEMATA)
In some sense we are no longer interested in strings as strings alone, Since im-
portant similarities among highly fit strings can help guide a search, we question
how one string can be similar to its fellow strings. Specifically we ask, in what
ways is a string a representative of other string classes with similarities at certain
string positions? The framework of schemata provides the tool to answer these
‘questions.
A schema (Holland, 1968, 1975) is a similarity template describing a subset
of strings with similarities at certain string positions. For this discussion, let us
once again limit ourselves without loss of generality to the binary alphabet {0,1}.
‘We motivate a schema most easily by appending a special symbol to this alphabet;
we add the * or don’t care symbol. With this extended alphabet we can now
create strings (schemata) over the ternary alphabet {0, 1, *}, and the meaning of
the schema is clear if we think of it as a pattern matching device: a schema
‘matches a particular string if at every location in the schema a 1 matches a 1 in
the string, a 0 matches a 0, or a* matches either. AS an example, consider the
strings and schemata of length 5. The schema *0000 matches two strings, namely
{10000, 00000}. As another example, the schema *111* describes a subset with
four members {01110, 01111, 11110, 11111}. As one last example, the schema
0°1** matches any of the eight strings of length 5 that begin with a 0 and have a
| in the third position. As you can start to see, the idea of a schema gives us a
powerful and compact way to talk about all the well-defined similarities among
finite-length strings over a finite alphabet. We should emphasize that the * is only
4 metasymbol (a symbol about other symbols); it is never explicitly processed
by the genctic algorithm. It is simply a notational device that allows description
of all possible similarities among strings of a particular length and alphabet.
Counting the total number of possible schemata is an enlightening exercise.
In the previous example, with / = 5, we note there are 33°3°3:3 = 3° = 243
different similarity templates because cach of the five positions may be a 0, 1,
‘or *, In general, for alphabets of cardinality (number of alphabet characters) k
there arc (k + 1) schemata. At first blush, it appears that schemata are making
the search more difficult. For an alphabet with & elements there are only (only?)
different strings of length £ Why consider the (& + 1} schemata and enlarge
the space of concern? Put another way, the length 5 example now has only 2°
32 different alternative strings. Why make matters more difficult by considering
3° = 243 schemata? In fact, the reasoning discussed in the previous section makes
things easier, Do you recall glancing up and down the list of four strings and
fitness values and trying to figure out what to do next? We recognized that if we
considered the strings separately, then we only had four pieces of information;Chapter 1 / A Gentle Introduction to Genetic Algorithms
however, when we considered the strings, their fitness values, and the similarities
‘among the strings in the population, we admitted a wealth of new information to
help direct our search. How much information do we admit by considering the
similarities? The answer to this question is related to the number of unique sche-
mata contained in the population. To count this quantity exactly requires knowl-
‘edge of the strings in a particular population. To get a bound on the number of
schemata in a particular population, we first count the number of schemata con-
tained in an individual string, and then we get an upper bound on the total num-
ber of schemata in the population.
To see this, consider a single string of length 5: 11111, for example, This
string is a member of 2° schemata because each position may take on its actual
value or a don’t care symbol. In general, a particular string contains 2’ schemata.
AS a result, a population of size 7 contains somewhere between 2! and 7:2! sche.
mata, depending upon the population diversity. This fact verifies our earlier in-
tuition. The original motivation for considering important similarities was to get
more information to help guide our search. The counting argument shows that a
wealth of information about important similarities is indeed contained in even
moderately sized populations. We will examine how genetic algorithms effec-
tively exploit this information. At this juncture, some parallel processing appears
to be needed if we are to make use of all this information in a timely fashion,
‘These counting arguments are well and good, but where does this all lead?
More pointedly, of the 2! to 7-2! schemata contained in a population, how many
are actually processed in a useful manner by the genetic algorithm? To obtain the
answer to this question, we consider the effect of reproduction, crossover, and
mutation on the growth or decay of important schemata from gencration to gen:
eration. The effect of reproduction on a particular schema is easy to determine;
since more highly fit strings have higher probabilitics of selection, on average we
give an ever increasing number of samples to the observed best similarity pat-
terns (this is a good thing to do, as is shown in the next chapter); however,
reproduction alone samples no new points in the space. What then happens to a
particular schema when crossover is introduced? Crossover leaves a schema un
scathed if it does not cut the schema, but it may disrupt a schema when it does.
For example, consider the two schemata 1***0 and **11*. The first is likely to be
disrupted by crossover, whereas the second is relatively unlikely to be destroyed.
AS a result, schemata of short defining length are left alone by crossover and
reproduced at a good sampling rate by reproduction operator. Mutation at nor
mal, low rates does not disrupt a particular schema very frequently and we are
Jeft with a startling conclusion. Highly fit, short-defining length schemata (we call
them building blocks) are propagated generation to generation by giving expo-
nentially increasing samples to the observed best; all this goes in parallel with no
special bookkeeping or special memory other than our population of strings.
In the next chapter we will count how many schemata are processed usefully in
each generation. It turns out that the number is something like n°. This compares
favorably with the number of function evaluations (1). Because this processing
leverage is so important (and apparently unique to genetic algorithms), we give
ita special name, implicit parattetism.Learning the Lingo 21
LEARNING THE LINGO
‘The power behind the simple operations of our genetic algorithm is at least in:
tuitively clearer if we think of building blocks. Some questions remain; How do
we know that building blocks lead to improvement? Why is it a near optimal
strategy to give exponentially increasing samples to the best? How can we cal-
culate the number of schemata usefully processed by the genetic algorithm?
‘These questions are answered fully in the next chapter, but first we need to mas-
ter the terminology used by researchers who work with genetic algorithms. Be-
cause genetic algorithms are rooted in both natural genetics and computer
science, the terminology used in the GA literature is an unholy mix of the natural
and the artificial. Until now we have focused on the artificial side of the genetic
algorithm's ancestry and talked about strings, alphabets, string positions, and the
like. We review the correspondence between these terms and their natural coun-
terparts to connect with the growing GA literature and also to permit our own
‘occasional slip of a natural utterance or two.
Roughly speaking, the strings of artificial genetic systems are analogous to
chromosomes in biological systems. In natural systems, one or more chromo-
somes combine to form the total genetic prescription for the construction and
operation of some organism. In natural systems the total genetic package is called
the genotype. In artificial genetic systems the total package of strings is called a
Structure (in the early chapters of this book, the structure will consist of a single
string, so the text refers to strings and structures interchangeably until it is nec-
essary to differentiate between them). In natural systems, the organism formed
by the interaction of the total genetic package with its environment is called the
phenotype In artificial genetic systems, the structures decode to form a partic:
ular parameter set, solution alternative, ot point (in the solution space). The
designer of an artificial genctic system has a varicty of alternatives for coding
both numeric and nonnumeric parameters. We will confront codings and coding
principles in later chapters; for now, we stick to our consideration of GA and
natural terminology.
In natural terminology, we say that chromosomes are composed of genes,
‘which may take on some number of values called alleles. In genctics, the position
of a gene (its focus) is identified separately from the gene's function, Thus, we
can talk of a particular gene, for example an animal's eye color gene, its locus,
position 10, and its allele value, blue eyes. In artificial genetic search we say that
strings are composed of features or detectors, which take on different values,
Features may be located at different positions on the string. The correspondence
between natural and artificial terminology is summarized in Table 1.3.
Thus far, we have not distinguished between a gene (a particular character)
and its locus (its position); the position of a bit in a string has determined its
meaning (how it decodes) uniformly throughout a population and throughout
time. For example, the string 10000 is decoded as a binary unsigned integer 16
(base 10) because implicitly the | is in the 16's place. It is not necessary to limit
codings like this, however. A later chapter presents more advanced structures
that treat locus and gene separately.22
Chapter 1 / A Gentle Introduction to Genetic Algorithms
TABLE 1.3 Comparison of Natural and GA Terminology
Natural Genetic Algorithm
chromosome string
gene feature, character, or detector
allele feature value
locus string position
genotype structure
phenotype Parameter set, alternative solution
a decoded structure
epistasis nonlinearity
SUMMARY
‘This chapter has laid the foundation for understanding genetic algorithms,
their mechanics and their power. We are led to these methods by our search for
robustness; natural systems are robust—efficient and efficacious—as they adapt
to a wide varicty of environments. By abstracting nature’s adaptation algorithm
of choice in artificial form we hope to achieve similar breadth of performance.
In fact, genetic algorithms have demonstrated their capability in a number of
analytical and empirical studics
‘The chapter has presented the detailed mechanics of a simple, three-operator
genetic algorithm. Genetic algorithms operate on populations of strings, with the
string coded to represent some underlying parameter set. Reproduction, cross-
over, and mutation are applied to successive string populations to create new
string populations. These operators are simplicity itself, involving nothing more
complex than random number generation, string copying, and partial string ex-
changing; yet, despite their simplicity, the resulting search performance is wide-
ranging and impressive. Genetic algorithms realize an innovative notion exchange
among strings and thus connect to our own ideas of human search or discovery.
A simulation of one generation of the simple genetic algorithm has helped illus
trate both the detail and the power of the method.
Four differences separate genetic algorithms from more conventional opti
mization techniques:
1. Direct manipulation of a coding
2. Search from a population, not a single point
3. Search via sampling, a blind search
4, Search using stochastic operators, not deterministic rules
Genetic algorithms manipulate decision or control variable representations
at the string level to exploit similarities among high-performance strings. Other
‘methods usually deal with functions and their control variables directly. BecauseProblems 23
genetic algorithms operate at the coding level, they are difficult to fool even when
the function may be difficult for traditional schemes.
Genetic algorithms work from a population; many other methods work from
‘single point. In this way, GAs find safety in numbers. By maintaining a population
of well-adapted sample points, the probability of reaching a false peak is reduced.
Genetic algorithms achieve much of their breadth by ignoring information
except that concerning payoff, Other methods rely heavily on such information,
and in problems where the necessary information is not available or difficult to
‘obtain, these other techniques break down. GAS remain general by exploiting
information available in any search problem, Genetic algorithms process simila
ities in the underlying coding together with information ranking the structures
according to their survival capability in the current environment. By exploiting
such widely available information, GAs may be applied in virtually any problem.
‘The transition rules of genetic algorithms are stochastic; many other methods
have deterministic transition rules. A distinction exists, however, between the
randomized operators of genetic algorithms and other methods that are simple
random walks. Genetic algorithms use random choice to guide a highly exploi-
tative search. This may seem unusual, using chance to achieve directed results
(the best points), but nature is full of precedent,
We have started a more rigorous appraisal of genetic algorithm performance
through the concept of schemata or similarity templates. A schema is a string
over an extended alphabet, {0,1,"} where the 0 and the I retain their normal
‘meaning and the * is a wild card or don’t care symbol. This notational device
greatly simplifies the analysis of the genetic algorithm method because it explic-
itly recognizes all the possible similarities in a population of strings. We have
discussed how building blocks—short, high-performance schemata—are com-
bined to form strings with expected higher performance. This occurs because
building blocks are sampled at near optimal rates and recombined via crossover.
Mutation has little effect on these building blocks; like an insurance policy, it
helps prevent the irrecoverable loss of potentially important genetic material.
The simple genetic algorithm studied in this chapter has much to recom-
mend it. In the next chapter, we will analyze its operation more carefully. Follow:
ing this, we will implement the simple GA in a short computer program and
‘examine some applications in practical problems.
@ PROBLEMS
L.A. Consider 2 black box containing eight multiple-position switches. Switches
1 and 2 may be set in any of 16 positions. Switches 3, 4, and 5 are four-position
switches, and switches 6-8 have only two positions. Calculate the number of
unique switch settings possible for this black box device.
1.2. For the black box device of Problem 1.1, design a natural string coding that
uses eight positions, one position for each switch. Count the number of switch24
Chapter 1 / A Gentle Introduction to Genetic Algorithms
settings represented by your coding and count the number of schemata or simi-
larity templates inherent in your coding.
1.3. For the black box device of Problem 1.1, design a minimal binary coding,
for the eight switches and compare the number of schemata in this coding to a
coding for Problem 1.2.
14. Consider a binary string of length 11, and consider a schema, 1**
Under crossover with uniform crossover site selection, calculate a lower limit on
the probability of this schema surviving crossover. Calculate survival probabilities
under the same assumptions for the following schemata: ****10"*"**,
MW ore Te9rte", P18 190",
15. Ifthe distance between the outermost alleles ofa particular schema is called
its defining length 5, derive an approximate expression for the survival proba-
bility of a particular schema of total length / and defining length 6 under the
operation of simple crossover.
1.6. Six strings have the following fitness function values: 5, 10, 15, 25, 50, 100.
Under roulette wheel selection, calculate the expected number of copies of each
string in the mating pool if a constant population size, 2 = 6, is maintained.
1.7. Instead of using roulette wheel selection during reproduction, suppose we
define a copy count for each string, ncount, as follows: ncount, ~ f/f where f,
is the fitness of the ‘th string and / is the average fitness of the population, The
copy count is then used to generate the number of members of the mating pool
by giving the integer part of ncownt, copies to the ith string and an additional
copy with probability equal to the fractional part of ncownt,. For example, with
Jf, = 100 and j = 80, string # would receive an ncount, of 1.25, and thus would
receive one copy with probability 1.0 and another copy with probability 0.25,
Using the string fitness values in Problem 1.6, calculate the expected number of
copies for each of the six strings. Calculate the total number of strings expected
in the gene pool under this form of reproduction.
1.8, The form of reproduction discussed in Problem 1.7 is sometimes called
reproduction with expected number control. In a short essay, explain why this is
so. In what ways are roulette wheel selection and expected number contro! sim-
ilar? In what ways are they different?
1.9. Suppose the probability of a mutation at a single bit position is 0.1. Calculate
the probability of a 10-bit string surviving mutation without change. Calculate
the probability of a 20-bit string surviving mutation without change. Recalculate
the survival probabilities for both 10- and 20-bit strings when the mutation prob-
ability is 0.01
1.10, Consider the strings and schemata of length 1. For the following schemata,
calculate the probability of surviving mutation if the probability of mutation is
0.1 at a single bit position: 0949, #111448", #1000010" LL
Recalculate the survival probabilities for a mutation probability P, = 0.01.
Tesoreee, 10Computer Assignments 25
@ COMPUTER ASSIGNMENTS
‘A. One of the primitive functions required in doing genetic algorithms on a
computer is the ability to generate pscudorandom numbers. The numbers arc
pseudorandom because as von Neumann once said, “Anyone who considers at-
ithmetical methods of producing random digits is, of course, in a state of sin.” As
part of this assignment, go forth and sin some more. Use the random number
enerator given in Appendix B to create a program where you generate 1000
random numbers between 0 and 1. Keep track of how many numbers are gen-
erated in cach of the four quartiles, 0-0.25, 0.25-0.5, 0.5-0.75, 0.75-1.0, and
compare the actual counts with the expected number. Is the difference within
reasonable limits? How can you quantify whether the difference is reasonable?
B. Suppose you have 10 strings with the following probabilities of selection in
the next gencration: 0.1, 0.2, 0.05, 0.15, 0.0, 0.11, 0.07, 0.04, 0.00, 0.12, 0.16. Given
that these are the only possible alternatives, calculate whether the probabilities
are consistent, Write a computer program that simulates roulette wheel selection
for these 10 strings. Spin the wheel 1000 times and keep track of the number of
selections for each string, comparing this number to the expected number of
selections.
Write a function that generates a pseudorandom integer between some spec-
ified lower limit and some specified upper limit. Test the program by generating
1000 numbers between 3 and and 12. Keep track of the quantity of each number
selected and compare these figures to the expected quantities.
D. Create a procedure that receives two binary strings and a crossing site value,
performs simple crossover, and returns two offspring strings. Test the program
by crossing the following strings of length 10: 1011101011, 0000110100. Try
crossing site values of — 3, 1, 6, and 20,
E, Create a function mutation that complements a particular bit value with
specified mutation probability p,,. Test the function by performing 1000 calls to
mutation using mutation probabilitics p,, = 0.001, 0.01, 0.1. Compare the real-
ized number of mutations to the expected number.
F. Using the simple crossover operator of Assignment D, repeatedly apply the
crossover operator to strings contained within the following population of size
n= 200and 1 = $:
100 copies of 11100
100 copies of 00011
Perform crossover (), = 1.0) for 50 generations without replacement under no
selection. Compare the initial and final distributions of strings. Also compare the
expected quantity of cach string to the realized quantity in generation 50.Genetic Algorithms
Revisited: Mathematical
Foundations
‘The broad brush of Chapter 1 painted an accurate, if somewhat crude, picture of
genetic algorithms and their mechanics and power. Perhaps these brush strokes
appeal to your own sense of human discovery and search, That somehow a reg-
ular though randomized procedure can achieve some of the breadth and intuitive
Alar of human search seems almost too good to be true. That this discovery pro-
cedure should mirror the natural processes that created the species possessing
the procedure is a recursion of which Godel, Escher, or Bach (Hofstadter, 1979)
could each have been proud. Despite their intuitive appeal, and despite their
symmetry, it is crucial that we back these fuzzy feelings and speculations about
genetic algorithms using cold, mathematical facts
‘Actually, we have already begun a more rigorous appraisal of GAs. Toward
the end of the last chapter, the fundamental concept of a schema or similarity
template was introduced. Quantitatively, we found that there are indeed a large
number of similarities to exploit in a population of strings. Intuitively, we saw
how genetic algorithms exploit in parallel the many similarities contained in
building blocks or short, high-performance schemata. In this chapter, we make
these observations more rigorous by doing several things First, we count the
schemata represented within a population of strings and consider which grow
and which decay during any given gencration. To do this, we consider the effect
of reproduction, crossover, and mutation on a particular schema. This analysis
Icads to the fundamental theorem of genetic algorithms that quantifies these
growth and decay rates more precisely; it also points to the mathematical form28
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
of this growth. This form is connected to an important and classical problem of
decision theory, the pwo-armed bandit problem (and its extension, the k-armed
bandit), The mathematical similarity between the optimal (minimal loss) solution
to the two-armed and A-armed bandit and the equation describing the number
of trials given to successive generations of schemata in the simple genetic algo
rithm is striking. Counting the number of schemata that are usefully processed
by the simple genetic algorithm reveals tremendous leverage in the building
block processing. Finally, we consider an important question: How do we know
that combining building blocks leads to high performance in arbitrary problems?
The question sparks our consideration of some relatively new tools of genetic
algorithm analysis: schema transforms and the minimal deceptive problem.
WHO SHALL LIVE AND WHO SHALL DIE?
THE FUNDAMENTAL THEOREM
The operation of genetic algorithms is remarkably straightiorward. After all, we
start with a random population of n strings, copy strings with some bias toward
the best, mate and partially swap substrings, and mutate an occasional bit value
for good measure. Even though genetic algorithms directly manipulate a popu-
lation of strings in this straightforward manner, in Chapter 1 we started to rec-
ognize that this explicit processing of strings really causes the implicit processing
of many schemata during cach generation. To analyze the growth and decay of
the many schemata contained in a population, we need some simple notation to
‘add rigor to the discussion. We consider the operation of reproduction, crossover,
and mutation on the schemata contained in the population.
We consider strings, without loss of generality, to be constructed over the
binary alphabet V = (0, I}. As a notational convenience, we refer to strings by
capital letters and individual characters by lowercase letters subscripted by their
position. For example, the seven-bit string A — 0111000 may be represented
symbolically as follows:
A= aaaaaas
Here each of the a, represents a single binary feature or detector (in accordance
with natural analogy, we sometimes call the a's gevies), where each feature may
take on a value 1 or 0 (we sometimes call the a, values alfetes), In the particular
string 0111000, a, is 0, 4, is 1, a, is 1, etc. Itis also possible to have strings where
detectors are not ordered sequentially as in string A For example a string 4
could have the following ordering:
A = aaaaaaa,
A later chapter explores the effect of extending the representation to allow fea-
tures to be located in 2 manner independent of their function. For now, assume
that a feature’s function may be determined by its position.
Meaningful genetic search requires a population of strings, and we considerWho Shall Live ond Who Shall Die? The Fundomental Theorem 29
a population of individual strings A,,f = 1, 2,.....m, contained in the population
A(t) at time (or generation )¢ where the boldface is used to denote a population.
Besides notation to describe populations, strings, bit positions, and alleles,
we need convenient notation to describe the schemata contained in individual
strings and populations, Let us consider a schema H taken from the three-letter
alphabet V+ = {0, 1, *}. As discussed in the previous chapter, the additional
symbol, the asterisk or star *, is a don’t care or wild card symbol which matches
either @ 0 or a 1 at a particular position. For example, consider the length 7
schema H = *11°0**. Note that the string A = 0111000 discussed above is an
example of the schema #, because the string alleles @, match schema positions
b, at the fixed positions 2, 3, and 5.
From the results of the last chapter, recall that there are 3! schemata or sim-
ilarity defined over a binary string of length £ In general, for alphabets of cardi-
nnality & there are (k + 1) schemata. Furthermore, recall that in a string
population with members there are at most 71°2! schemata contained in a pop-
ulation because each string is itself a representative of 2! schemata, These count-
ing arguments give us some feel for the magnitude of information being
processed by genetic algorithms; however, to really understand the important
building blocks of future solutions, we need to distinguish between different
types of schemata
All schemata are not created equal. Some are more specific than others. For
example, the schema 01171** is a more definite statement about important sim-
ilarity than the schema 0 Furthermore, certain schema span more of the
total string length than others. For example, the schema 1****1* spans a larger
portion of the string than the schema 1*1****, To quantify these ideas, we intro-
duce two schema propertics: schema order and defining length.
‘The order of a schema H, denoted by o(#), is simply the number of fixed
positions (in a binary alphabet, the number of 1's and 0's) present in the template,
In the cxamples above, the order of the schema 011°1** is 4 (symbolically,
o(011*1**) = 4), whereas the order of the schema 0****** is 1.
‘The defining length of a schema //, denoted by 8(//), is the distance between
the first and last specific string position. For example, the schema O11*1** has
defining length 8 = 4 because the last specific position is 5 and the first specific
position is 1, and the distance between them is 3(H) = 5 — 1 = 4. Inthe other
example (the schema 0******), the defining length is particularly easy to calcu:
late. Since there is only a single fixed position, the first and last specific positions
are the same, and the defining length 5 = 0.
Schemata and their properties are interesting notational devices for rigor
ously discussing and classifying string similarities, More than this, they provide
the basic means for analyzing the net effect of reproduction and genetic operators
‘on building blocks contained within the population. Let us consider the individ
ual and combined effect of reproduction, crossover, and mutation on schemata
contained within a population of strings.
‘The effect of reproduction on the expected number of schemata in the pop-
ulation is particularly easy to determine. Suppose at a given time step f there are30
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
‘m examples of a particular schema # contained within the population A(¢) where
we write m = m(H,t) (there are possibly different quantities of different sche-
mata H at different times ¢), During reproduction, a string is copied according to
its fitness, or more precisely a string A, gets selected with probability 2; = f/f,
After picking a nonoverlapping population of size m with replacement from the
population A(t), we expect to have m(H, ¢ + 1) representatives of the schema
H in the population at time ¢ + 1 as given by the equation m(H, t+ 1) =
m(H,1)-n fUL)/EZf,, where {(H) is the average fitness of the strings representing
schema H at time £ If we recognize that the average fitness of the entire popu-
lation may be written as f = Yf/n then we may rewrite the reproductive schema
growth equation as follows:
fH)
mH, t+ 1) = mn, yO
f
In words, a particular schema grows as the ratio of the average fitness of the
schema to the average fitness of the population. Put another way, schemata with
fitness values above the population average will receive an increasing number of
samples in the next generation, while schemata with fitness values below the
population average will receive a decreasing number of samples. It is interesting
to observe that this expected behavior is carried out with every schema H con-
tained in a particular population A in parallel. In other words, all the schemata in
2 population grow or decay according to their schema averages under the oper-
ation of reproduction alone. In a moment, we examine why this might be a good
thing to do. For the time being, simply note that many things go on in parallel
with simple operations on the 7 strings in the population.
The effect of reproduction on the number of schemata is qualitatively clear;
above-average schemata grow and below-average schemata die off. Can we learn
anything else about the mathematical form of this growth (decay) from the
schema difference equation? Suppose we assume that a particular schema H re.
mains above average an amount ¢f with ¢ a constant, Under this assumption we
can rewrite the schema difference equation as follows:
(ii, b+ 1) = my LED
r
Starting at f= 0 and assuming a stationary value of we obtain the equation
m(H, t) = mH, 0)° (1 + cy
= (1 te) mi, t),
Business-oriented readers will recognize this equation as the compound interest
‘equation, and mathematically oriented readers will recognize a geometric pro
gression or the discrete analog of an exponential form. The effect of reproduction
is now quantitatively clear; reproduction allocates exponentially increasing (de
creasing ) numbers of trials to above- (below-) average schemata. We will connect
this rate of schemata allocation to the multiarmed bandit problem, but for right
now we will investigate how crossover and mutation affect this allocation of trialsWho Sholl Live and Who Shall Die? The Fundamental Theorem an
‘To some extent it is curious that reproduction can allocate exponentially
increasing and decreasing numbers of schemata to future generations in parallel;
many, many different schemata are sampled in parallel according to the same rule
through the use of # simple reproduction operations, On the other hand, repro-
duction alone docs nothing to promote exploration of new regions of the search
space, since no new points are searched; if we only copy old structures without
change, then how will we ever try anything new? This is where crossover steps
in, Crossover is a structured yet randomized information exchange between
strings. Crossover creates new structures with a minimum of disruption to the
allocation strategy dictated by reproduction alone. This results in exponentially
increasing (or decreasing) proportions of schemata in a population on many of
the schemata contained in the population.
To see which schemata are affected by crossover and which are not, consider
4 particular string of length / = 7 and two representative schemata within that
string:
4 =0111000
He tleseso
met eeiose
Clearly the two schemata H, and H, are represented in the string 4, but to see
the effect of crossover on the schemata, we first recall that simple crossover pro-
ceeds with the random selection of a mate, the random selection of a crossover
site, and the exchange of substrings from the beginning of the string to the cross-
over site inclusively with the corresponding substring of the chosen mate. Sup-
pose string A has been chosen for mating and crossover. In this string of length
7, suppose we roll a single die to choose the crossing site (there are six sites in
a string of length 7). Further suppose that the die turns up a 3, meaning that the
cross cut will take place between positions 3 and 4. The effect of this cross on
our two schemata 4, and #, can be seen easily in the following example, where
the crossing site has been marked with the separator symbol. |
A =011/1000
sle]ereo
metettliore
Unless string 4's mate is identical to A at the fixed positions of the schema (a
possibility that we conservatively ignore). the schema H, will be destroyed be-
cause the 1 at position 2 and the 0 at position 7 will be placed in different off-
spring (they are on opposite sides of the separator symbol marking the cross
Point, or cut point). It is equally clear that with the same cut point (between bits
3 and 4), schema #7, will survive because the | at position 4 and the 0 at position
5 will be carried intact to a single offspring. Although we have used a specific cut
point for illustration, it is clear that schema H, is less likely to survive crossover
than schema H, because on average the cut point is more likely to fall between
the extreme fixed positions. To quantify this observation, we note that schema
H, has a defining length of 5, If the crossover site is selected uniformly at random32
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
among the ! ~ 1 = 7 ~ 1 = 6 possible sites, then clearly schema 1 is destroyed
with probability p, = 5(M,(/ — 1) = 5/6 (it survives with probability p, =
1 ~ p, = 16) Similarly, the schema H, has defining length 8(#,) = 1, and it is
destroyed during that one event in six where the cut site is selected to occur
between positions 4 and 5 such that p, = 1/6 of the survival probability is p,
1- p, = 56,
More generally, we see that a lower bound on crossover survival probability
p, can be calculated for any schema. Because a schema survives when the cross
site falls outside the defining length, the survival probability under simple cross-
over isp, = 1 — S(HY(I — 1), since the schema is likely to be disrupted when.
ever a site within the defining length is selected from the £ — 1 possible sites. If
crossover is itself performed by random choice, say with probability p, at a par-
ticular mating, the survival probability may be given by the expression
aH)
t
Perl ~ pe
which reduces to the earlier expression when p. = 10.
‘The combined effect of reproduction and crossover may now be considered.
‘As when we considered reproduction alone, we are interested in calculating the
number of a particular schema Hf expected in the next generation, Assuming.
independence of the reproduction and crossover operations, we obtain the
estimate:
Me
ti
mais 41) > meus 0) DL -p
Comparing this to the previous expression for reproduction alone, the combined
effect of crossover and reproduction is obtained by multiplying the expected
number of schemata for reproduction alone by the survival probability under
crossover p,. Once again the effect of the operations is clear. Schema H grows or
decays depending upon a multiplication factor. With both crossover and repro-
duction, that factor depends on two things: whether the schema is above or be-
low the population average and whether the schema has relatively short or long
defining length. Clearly, those schemata with both above-average observed per-
formance and short defining lengths are going to be sampled at exponentially
increasing rates.
The last operator to consider is mutation. Using our previous definition, mu-
tation is the random alteration of a single position with probability p,,. In order
for a schema H to survive, all of the specified positions must themselves survive.
Therefore, since a single allele survives with probability (1 — p,,),and since each
of the mutations is statistically independent, a particular schema survives when
each of the o() fixed positions within the schema survives. Multiplying the
survival probability (1 — p,,) by itself o(#) times, we obtain the probability of
surviving mutation, (1 — p,,)%?, For small values of Py, (Py << 1), the schema
survival probability may be approximated by the expression | — O(H)-Dy. We‘Schema Processing at Work: An Example by Hand Revisited 33
therefore conclude that a particular schema H receives an expected number of
copies in the next generation under reproduction, crossover, and mutation as
given by the following equation (ignoring small cross-product terms):
mH, t +1) = m(H, t) AY
f
The addition of mutation changes our previous conclusions little. Short, low-
order, above-average schemata receive exponentially increasing trials in subse-
quent generations. This conclusion is important, so important that we give it a
special name: the Schema Theorem, or the Fundamental Theorem of Genetic Al
gorithms. Although the calculations that led us to prove the schema theorem
were not too demanding, the theorem’s implications are far reaching and subtle,
To sce this, we examine the effect of the three-operator genetic algorithm on
schemata in a population through another visit to the hand-calculated GA of
Chapter 1
SCHEMA PROCESSING AT WORK: AN EXAMPLE
BY HAND REVISITED
Chapter 1 demonstrated the mechanics of the simple GA through a hand calcu.
lation of a single generation. Let us return to that example, this time observing
how the GA processes schemata—not individual strings—within the population
‘The hand calculation of Chapter 1 is reproduced in Table 2.1. In addition to the
information presented earlier, we also keep a running count of three particular
schemata, which we call H,, Hz, and H,, where H, = 1 and
Hy = 170.
Observe the effect of reproduction, crossover, and mutation on the first
schema, H,. During the reproduction phase, the strings are copied probabilist
cally according to their fitness values. Looking at the first column of the table, we
notice that strings 2 and 4 are both representatives of the schema 1****, After
reproduction, we notc that three copics of the schema have been produced
(strings 2, 3. 4 in the mating pool column). Does this number correspond with
the value predicted by the schema theorem? From the schema theorem we ex:
pect to have mf Vf copies. Calculating the schema average /(/H,). we obtain
(576 + 361)2 = 468.5. Dividing this by the population average f = 293 and
multiplying by the number of /, schemata at time f, m(H,, £) = 2, we obtain the
expected number of H, schemata at time ¢ + 1, m(H,t + 1) = 2:468:5/293 =
3.20. Comparing this to the actual number of schemata (three), we see that we
have the correct number of copies. Taking this one step further, we realize that
crossover cannot have any further effect on this schema because a defining length
8(,) = 0 prevents disruption of the single bit. Furthermore, with the mutation
rate set at p,, = 0.001 we expect to have mp,, = 3°0.001 = 0.003 oF no bits
changed within the three schema copies in the three strings. As 4 result, we ob:34
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
TABLE 2.1 GA Processing of Schemata—Hand Calculations
String Processing
‘Actual
Expected Count
xValue select, count feom
Unsigned’ fix) & L Roulette
Integer 2 ce Wheel
B 169 014058 1
24 57% 049197 2
8 64 006 0.22 o
19 361 OSL 1.23, 1
sum 1170-400 «400 40
Average 293 0.25 100 10
Max 5% 049197 20
Schema Processing
Before Reproduction
String Schema Average
Representatives Fitness (#1)
4, lites 24 469
a, *Loss 23 320
Hy 1etto 2 576
serve that for schema /,, we do obtain the expected exponentially increasing
number of schemata as predicted by the schema theorem.
So far, so good; but schema H, with its single fixed bit seems like something
ofa special case. What about the propagation of important similarities with longer
defining lengths? For example consider the propagation of the schema H, =
*10** and the schema /, = 1***0. Following reproduction and prior to crossover
the replication of schemata is correct. The case of H, starts with two examples
in the initial population and ends with two copies following reproduction. This
agrees with the expected number of copies, m(H2) = 2-320/293 = 2.18, where
320 is the schema average and 293 is the population average fitness. The case of
‘A, starts with a single example (string 2) and ends with two copies following
reproduction (strings 2 and 3 in the string copies column). This agrees with the
expected number of copies m(H,) = 1:576/293 = 1.97, where 576 is the sche.
ma’s average fitness and 293 is the population's average fitness. The citcum.
stances following crossover are a good bit different. Notice that for the short
‘schema, schema H,, the two copies are maintained even though crossover has‘Schema Processing at Work: An Example by Hand Revisited 35
TABLE 2.1 (Continued)
String Processing
Mating Poot aner, Mi#e——_ Crossover Site
Reproduction ‘Randomly’ ‘Randomly’ New x Ax)
(Cross ste Shown) \‘setected) (‘setected ) Population value x"
Ol10/1 2 4 o1100 12 144
1100]0 1 4 11001 25 os
11/000 4 2 11011 27 m9
Lojo11 3 2 10000 1% 256
Sum 1754
Average 439
Max has
Schema Processing
Alter Reproduction ‘Aiter All Operators
String suring
Expected Actual Represen: Expected Actual_Represen-
Count Count tatives Count Count tatives
3.20 3 234 3.20 3 234
218 2 1.64 2 23
197 2 00 t a
occurred. Because the defining length is short we expect crossover to interrupt
the process only one time in four (f — 1 = 5 — 1 = 4). Asa result, the schema
HH, survives with high probability. The actual expected number of H, schemata is
thus m(#H,, ¢+1) = 2.18-0.75 = 1.64, and this compares well with the actual
‘count of two schemata. //, is a schema of a different color. Because of the long
defining length (8(/1,) = 4), crossover usually destroys this schema
The hand calculation has confirmed the theory developed earlier in this
chapter. Short, low-order schemata arc given exponentially increasing or decreas-
ing numbers of samples depending on a schema’s average fitness. In a moment
we will estimate the number of schemata that are processed in this way, but
before we do, we must ask and answer a pressing question. Why should we give
an exponentially increasing number of trials to the observed best schemata? In
other words, why is this particular allocation strategy a good road to follow? The
answer lies along what at first may seem like a detour, following a gambler among
the slot machines of Las Vegas.36
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
THE TWO-ARMED AND K-ARMED BANDIT PROBLEM
The effect of reproduction, crossover, and mutation are now both quantitatively
and qualitatively clear. Schemata of short defining length, low order, and above
average fitness (building blocks) receive exponentially increasing trials in future
generations. This is a fact, beyond all question. Yet despite careful proof of this
point, at least one nagging question remains: Why is this a good thing to do? More
to the point, why should exponentially increasing samples be given to the ob:
served best building blocks? This question leads to an important problem of sta-
tistical decision theory, the two-armed bandit problem and its extension, the &-
armed bandit problem, Although this seems like @ detour from our main concern
(atter all, we are trying to understand genetic algorithms, not minimize our gam
bling losses), we shall soon see that the optimal solution to the bandit problem
is very similar in form to the exponential allocation of trials from our tripartite
GA
Suppose we have a two-armed slot machine, as depicted in Fig. 2.1 with two
arms named LeFr and RIGHT. Furthermore, let’s assume we know that one of the
arms pays an award j1, with variance a} and the other arm pays an award j., with
variance a} where jt, 2 j1,. Which arm should we play? Clearly, we would like to
play the arm that pays off more frequently (the arm with payoff j1,), but therein
lies the rub, fh arm is associated with the
FIGURE 2.1 The rwo-armed bandit problem poses a dilemma: how do we
search for the right answer (exploration) at the same time we use that informa-
tion (exploitation)?The Two-Armed and K-Armed Bandit Problem 37
higher expected reward, we are faced with an interesting dilemma, Not only mu:
we make a decision (more precisely, a sequence of decisions) about which acm
to play, but we must at the same time collect information about which is the
better arm, This trade-off between the exploration for knowledge and the ex-
ploitation of that knowledge is a recurrent and fundamentally important theme
in adaptive systems theory, How we address this dilemma will say a lot about the
ultimate success of our methods.
‘One way to approach the trade-off is to separate exploration from exploita
tion by first performing a single experiment and thereafter making a single irre-
versible decision that depends upon the outcome of the experiment. This is one
approach of traditional decision theory that we can describe rigorously as fol
lows. Suppose we have a total of V trials to allocate among the two arms, We first
allocate an equal number of trials 7 (2n < .N) trials to each of the two arms
during the experimental phase. After the experiment, we allocate the remaining
N = 2n trials to the arm with best observed payoff, Assuming we know Nj, bh,
@,, and o.,, we can calculate the expected Loss (De Jong, 1975}:
LN.) = [py — wal [CN = m)q(n) + nC = qn))),
where g(1) is the probability that the worst arm is the observed best arm after
1 trials have been attempted on cach machine. This probability is well approxi-
mated by the tail of the normal distribution:
Lee
n) = wher
an) Var = ere x
From these equations we can see that two sources of loss are associated with
the procedure. The first loss is a result of issuing 7 trials to the wrong arm during
the experiment. The second is a result of choosing the arm associated with the
lower payoff (1) even after performing the experiment. We cannot be absolutely
certain at the end of the experiment that we will pick the right arm, so we do
expect occasionally to pick the wrong arm and incur a loss on the remaining
N ~ 2n trials during the exploration phase. We may solve for the optimal expe
iment size n* by taking the dcrivative of the loss cquation and setting it to zero.
Figure 2.2 shows how the optimal experiment n* size varies with the total num-
ber of trials WV and c the ratio of signal difference to noise, where ¢ = (Hh) ~ HY
(a3 + 03)".
This simple procedure is casy to analyze, but there are no doubt better ways,
perhaps optimal ways, to allocate the trials to the better arm, Holland (1975) has
performed calculations that show how trials should be allocated between the two
arms to minimize expected losses. This results in the allocation of 7 trials to the
worse arm and N — 7* trials to the better arm where n* is given by the following
equation:
+ = 6'In| —~ |. where & = ov — ma)
r= On| awe |: Where & = ov ~ we38
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
on ve eo 10b.0
FIGURE 2.2 The modified total number of trials (c!N) grows at a greater than
‘exponential function of the modified optimal experiment size (c'n*) in the one-
shot, decision-theory approach to the two-armed bandit problem,
Turning the equation around, we realize that the number of trials given to the
observed better arm is given by the equation:
N~ nt = N= V8qnbiIn Ne”.
In other words, to allocate trials optimally (in the sense of minimal expected
loss), we should give slightly more than exponentially increasing trials 10 the
observed best arm. Unfortunately, this strategy is untealizable, as it requires
knowledge of outcomes before they occur. Nonetheless, it forms an important
bound on performance that a realizable strategy should try to approach. Certainly
‘many strategies can approach this ideal. The experimental approach analyzed ear-
lier showed how exponentially fewer trials were given to the worse arm as the
number of trials increased. Another method that comes even closer to the ideal
trial allocation is the three-operator genetic algorithm discussed earlier. The
schema theorem guarantees giving at least an exponentially increasing number
of trials to the observed best building blocks. In this way the genetic algorithm
is a realizable yet near optimal procedure (Holland, 1973a, 1975) for searching
among alternative solutions,
With a genetic algorithm we are no longer solving a simple two-armed bandit
problem, in the usual genctic algorithm we consider the simultaneous solution
of many multiarmed bandits. To make this point more forcefully, we first considerThe Two-Armed and K-Armed Bandit Problem 39
the form of the solution to a single k-armed bandit and then demonstrate that
the usual genetic algorithm may be thought of as the composition of many such
kearmed bandits.
The form of the bounding k-armed bandit solution was also discovered by
Holland (1973a), The minimal expected loss solution to the allocation of trials
to k competing arms is similar to the two-armed solution as it dictates that greater
than exponentially increasing numbers of trials be given to the observed best of
the & arms. This result is not surprising, but it does connect nicely to our notions
of schema processing if we consider a set of competing schemata as a particular
ke-armed bandit. To see this connection, we define this notion of a competing set
of schemata, and then count the number and size of the k-armed bandit problems
being solved within a genetic algorithm of given string length
‘Two schemata A and B with individual positions a, and b, are competing if
at all positions # = 1, 2,..., either @, = 6, = "ora, #",b, 4", a, # 6 for at
least one # value. For example, consider the set of eight schemata that compete
at locations 2, 3, and 5 in the following strings of length 7:
*oor08*
#00*14*
*01*0"*
soleies
*1lotoee
sloties
sLlisoee
elieies
‘There are eight competing schemata over the three positions 2, 3, and 5
because any of the three positions may take on either a 1 or 20 (2° = 8)
‘We can start to see the connection to the -armed bandit problem in our list,
of eight competing schemata Since these schemata are defined over the same
positions, they compete with one another for precious population slots. {n order
to allocate the population slots properly, we need to allocate exponentially in-
creasing numbers to the observed best schemata just as we give exponentially
increasing trials to the observed best arm in the k-armed bandit. One of the dif:
ferences between our situation in 2 genetic algorithm and the vanilla flavored k-
armed bandit is that we have a number of problems proceeding in parallel. For
cxample, with three positions fixed over a string of length 7 there are (j) = 35
of the (2* = 8) cightarmed bandit problems. In gencral, for schemata of order j
over strings of length {there are (') different kyarmed bandit problems, where
&
¢, = 2. Not alll of the (} = 2‘ problems are played out equally well because
‘crossover has a tendency to disrupt those bandits with long defining lengths as
discussed earlier. In the next section, we count the number of schemata that are
usefully processed by the genetic algorithm.40 Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
HOW MANY SCHEMATA ARE PROCESSED USEFULLY?
Our counting arguments thus far have indicated that we have somewhere be-
tween 2! and 1-2! schemata being processed in a string population with length ¢
and size n. As we know, not all of these are processed with high probability be-
cause crossover destroys those with relatively long defining lengths, In this sec-
tion we compute a lower bound on those schemata that are processed in a useful
‘manner—those that are sampled at the desirable exponentially increasing rate.
‘The most widely quoted counting of effective schemata processing is Hol-
land's well known, but poorly understood, O(n) estimate (Goldberg, 19854).
Simply stated, this estimate means that despite the processing of only 7 structures
each generation, a genetic algorithm processes something like 7’ schemata. This
result is so important, Holland has given it a special name, fmpltcit parallelism.
Even though each generation we perform computation proportional to the size
of the population, we get useful processing of something like #° schemata in
parallel with no special bookkeeping or memory other than the population itself.
Let us rederive this estimate to understand its underlying assumptions and ex-
amine the source of this computational leverage.
Consider 2 population of » binary strings of length We consider only sche-
‘mata that survive with a probability greater than p, a Constant. AS a result, assum-
ing the operation of simple crossover and a small mutation rate, we admit only
those schemata with an error rate © <1 — p, This leads us to consider only
those schemata with length f, << ef — 1) +1
With a particular schema length, we can estimate a lower bound on the num-
ber of unique schemata processed by an initially random population of strings.
To do this, we first count the number of schemata of length f, or less. We then
multiply this by an appropriate population size, chosen so we expect, on average,
‘no more than one of each schema of length £/2. Suppose we wish to count the
schemata of length /, in the following string of length / = 10:
1011100010
To do this we calculate the number of schemata in the first cell of 5,
(orijooo10
so the last bit in the cell is fixed. That is, we want all
schemata of the form
B&ZgZilesees
where the stars * are don't care symbols and the percent signs % take on either
the fixed value (the 1 or 0 at that position) or a don't care. Clearly there are
2" of these schemata because /, — 1 = 4 positions can be fixed or take on
the don't care. To count the total number of these, we simply slide the template
‘of 5 along one space at a time:
1JOTT10l0010The Building Block Hypothesis a
We perform this trick a total of / ~ J, + 1 times and we can estimate the total
number of schema of length ¢, or less as 2°" (1 ~ 1, + 1). This count is the
number of such schemata in this particular string. To overestimate the number
of such schemata in the whole population, we could simply multiply by the pop-
ulation size and obtain the count m2" (1 — J, + 1). This obviously over-
estimates the correct count because surely there will be duplicates of low-order
schemata in large populations, To refine the estimate, we pick a population size
nn = 22, By choosing in this manner, we expect to have one or fewer of all
schemata of order 4,/2 or more. Recognizing that the number of schema is binom-
ially distributed, we conclude that half are of higher order than //2 and half are
of smaller order. If we count only the higher order ones, we estimate a lower
bound on the number of schemata as follows:
n> ml ~ 1 + 12
‘This differs from the previous overestimate by a factor of 1/2, Furthermore, the
restriction of the population size to the particular value 2"? results in the
expression:
CU = 4+ Dn
”, a
Since n, = Cn, we conclude that the number of schemata is proportional to the
cube of the population size and is thus of order 7’, OC"),
‘Thus we see that despite the disruption of long, high-order schemata by
crossover and mutation, genetic algorithms inherently process a large quantity of
schemata while processing a relatively small quantity of strings.
THE BUILDING BLOCK HYPOTHESIS
‘The picture of genetic algorithms’ performance is much clearer with the per-
spective afforded by schemata. Short, low-order, and highly fit schemata are sam-
pled, recombined, and resampled to form strings of potentially higher fitness. In
«a way, by working with these particular schemata (the building blocks), we have
reduced the complexity of our problem; instead of building high-performance
strings by trying every conceivable combination, we construct better and better
strings from the best partial solutions of past samplings.
Because highly fit schemata of low defining length and low order play such
an important role in the action of genetic algorithms, we have already given them
a special name: building blocks. Just as a child creates magnificent fortresses
through the arrangement of simple blocks of wood, so does a genetic algorithm
seek near optimal performance through the juxtaposition of short, low-order,
high-performance schemata, or building blocks.
There is, however, one catch. Repeatedly Chapter 1 claimed that notions
combine to form better ideas. Just now, it was claimed that building blocks com:42
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
bine to form better strings. While these claims seem perfectly reasonable, how
do we know whether they hold true or not?
Certainly there is a growing body of empirical evidence to support these
claims in a variety of problem classes. Starting two decades ago with two pioncer-
ing dissertations (Bagley, 1967; and Rosenberg, 1967) and continuing through
the many genetic algorithm applications demonstrated at recent conferences de:
voted to genetic algorithms (Grefenstette, 1985a, 1987a), the building block hy
pothesis has held up in many different problem domains. Smooth, unimodal
problems, noisy multimodal problems, and combinatorial optimization problems
have all been attacked successfully using virtually the same reproduction-cross-
overmutation GA. While limited empirical evidence does no theory prove, it
oes suggest that genetic algorithms are appropriate for many of the types of
problems we normally encounter.
More recently, Bethke (1981) has shed some light on this topic. Using Walsh
functions and a clever transformation of schemata, he has devised an efficient,
analytical method for determining schema average fitness values using Walsh
coefficients, This in turn permits us to identify whether, given a particular func-
tion and coding, building blocks combine to form optima or near optima. Holland
(1987b) has extended Bethke’s computation to the analysis of schema averages
when the population is not uniformly distributed.
Bethke's and Holland's Walsh-schema transform work is beyond the scope of
this discussion, although the interested reader should consult Appendix F, which
briefly discusses some important results. Nonetheless, the concepts behind these
discoveries are sufficiently important that we must try to understand the regular-
ity implied in building block processing, at least from an intuitive, graphic view-
point. To do this, let us return to the fivebit coding example we started in
Chapter 1, Recall, in that problem we were trying to maximize the function
Ax) = 2? where x was coded as a five-bit unsigned integer. In this problem, what
do the building blocks look like and how do they lead to better solutions when
they are mixed with one another? Consider a simple schema, , = 1**"*. What
does this onc-fixed-bit schema look like? The answer is shown in Fig, 2.3 as the
shaded region of the domain. Apparently, a schema with the high-order bit set
covers the upper half of the domain of interest. Similarly, the schema 1, = 0****
covers the lower half, Other one-bit examples prove illuminating, for example
the schema #7, = ****1 shown in Fig. 2.4. This schema covers the half domain
that decodes to an odd number (00001 = 1, 00011 = 3, 00101 = 5, etc). The
schema #7, = ***0* also covers half the domain, but in the manner shown in Fig.
2,5, It seems that a one-bit schema covers half the domain, but the frequency of
oscillation depends on the position of the fixed bit.
Higher order building blocks are certainly of interest. Consider the schema
H, = 10°** as depicted in Fig. 2.6, This schema covers the lower quarter of the
upper half domain, Other two-bit schemata may be sketched similarly, such as
the schema H, = **1"1 as shown in Fig. 2.7. Like schema H,, it too covers
quarter domain (why is this?), but in a more broken-up fashion. For readers fa
miliar with Fourier series, the periodicity of the different schemata is suggestive.The Building Block Hypothesis
os
os
o7
oe
os
(Trovsencs)
ow
os
oO
oe ee ee
FIGURE 2.3. Skeich of schema 1°*** overlaying the function ftx) = 2°.
os
07
(Trousande)
g
FIGURE 2.4 Sketch of schema ****1.Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
07
(Trowsonde)
os
02
FIGURE 2.5 Sketch of schema ""*0°.
os
oe
(eusonas)
on
os
02
on
°
° + oe a oe oe ae
FIGURE 2.6 Sketch of schema 10°**.The Building Block Hypothesis 45
os
|
4 |
(Trovtonds)
os
}
oz ( |
In fact, itis this periodicity that permits the Walsh function analysis. Just as har-
monic analysis determines physical properties through an examination of the rel-
ative magnitudes of Fourier coefficients, so docs Walsh function analysis
determine the expected static performance of a genetic algorithm through an
analysis of the relative magnitudes of Walsh coefficients.
Although these transform methods are powerful mathematical tools for ge-
netic algorithm analysis in specific cases, generalization of these results to arbi-
trary codings and functions has proved difficult, Bethke has generated a number
Of test cases that are provably misleading for the simple three-operator genetic
algorithm (we call these coding function combinations GA-deceptive). These re-
sults suggest that functions and codings that are GA-deceptive tend to contain
isolated optima: the best points tend to be surrounded by the worst. Practically
‘speaking, many of the functions encountered in the real world do not have this
needle in-the-haystack quality; there is usually some regularity in the function:
coding combination—much like the regularity of one or more schemata—that
may be exploited by the recombination of building blocks. Furthermore, we can
argue that finding a needle in a haystack is going to be difficult regardless of the
search technique adopted. Nevertheless, it is important to keep in mind that the
simple genetic algorithm depends upon the recombination of building blocks to
seek the best points. If the building blocks are misleading due to the coding used
o the function itself, the problem may require long waiting times to arrive at
near optimal solutions.46 Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
ANOTHER PERSPECTIVE: THE MINIMAL DECEPTIVE PROBLEM
‘Schema visualization and Walsh-schema transforms provide insight into the work-
ig5 Of genetic algorithms, From a practical standpoint, however, these tech-
niques are at least as computationally cumbersome as an enumerative search of
the discrete problem space. As a result, they are not widely used to analyze prac
tical problems in genctic search. Nonetheless, we still need to understand better
what makes a problem difficult for a simple GA. To investigate this matter further,
let's construct the simplest problem that should cause a GA to diverge from the
global optimum (Goldberg, 1987b). To do this, we want to violate the building
block hypothesis in the extreme, Put another way, we would like to have short,
low-order building blocks lead to incorrect (suboptimal longer, higher order
building blocks. The smallest problem where we can have such deception is a
twoit problem. In this section, we briefly develop this minimal deceptive prob:
lem (MDP). Despite our best efforts to fool a simple GA, it is Somewhat surprising
that this GA-deceptive problem is not usually GA-hard (does not usually diverge
from the global optimum).
Suppose we have a set of four order-2 schemata over two defining positions,
each schema associated with a fitness value as follows:
see oeeesto® fy
see oeeeeele
seep eeseeoe fh
see leeeeere of
Ie 8H) 1
The fitness values are schema averages, assumed to be constant with no variance
(this last restriction may be lifted without changing our conclusions as we only
consider expected performance). To start, let's assume that f,, is the global
optimum:
Si > Soo fir > Sos Sur > foe
Since the problem is invariant to rotation or reflection in Hamming twospace,
the assumption of a particular global optimum is irrelevant to the generality of
‘our conclusions.
Next, we introduce the element of deception necessary to make this a tough
problem for a simple genetic algorithm. To do this, we want a problem where
‘one or both of the suboptimal, order-1 schemata are better than the optimal,
order-1 schemata. Mathematically, we want one or both of the following condi-
tions to hold:
KO") > fA
$00) > fC1).
In these expressions we have dropped consideration of all alleles other than the
‘two defining positions, and the fitness expression implies an average over all‘Another Perspective: The Minimal Decoptive Problem a7
strings contained within the specified similarity subset. Thus we would like the
following two expressions to hold:
00) + (OV). 10) + S11)
2 2
00) + (00) . (01) + UY)
2 2 "
Unfortunately, both expressions cannot hold simultaneously in the two:problem
(if they do, point 11 cannot be the global optimum), and without loss of gener
ality we assume that the first expression is true. Thus, the deceptive two-problem
is specified by the globality condition (/;, is the best) and one deception condi
tion (we choose f(0*) > f(1*)).
To put the problem into closer perspective, we normalize all fitness values
with respect to the fitness of the complement of the global optimum as follows:
Sa » — fo
Soo
He
Soo
We may rewrite the globality condition in normalized form:
rq r>h re.
We may also rewrite the deception condition in normalized form:
reite-c
From these conditions, we may conclude a number of interesting facts:
es
From these, we recognize that there are two types of deceptive two-problem:
Type fn > fo (€> 1).
Type lls foo > fon (C1).
Figures 2.8 and 2.9 are representative sketches of these problems where the fit-
ness is graphed as a function of two boolean variables. Both cases are deceptive,
and it may be shown that neither case can be expressed as a linear combination
of the individual allele values: neither case can be expressed in the form:
Sxx,) = b+ Dax,
In the biologist's terms, we have an cpistatic problem. Since it similarly may be
proved that no one-bit problem can be deceptive, the deceptive, two-problem is
the smallest possible deceptive problem: it is ¢he minimal, deceptive problem
(MDP), With the MDP defined, we now turn toward a complete schema analysis
of its behavior.Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
oo
‘0
FIGURE 2.8 sketch of Type 1, minimal deceptive problem (MDP) fu, > Jur
DX
FIGURE 2.9. Sketch of Type I, minimal deceptive problem (MDP) fu. > JuAnother Perspective: The Minimal Deceptive Problem 49
An Extended Schema Analysis of the Two-Problem
So far we have constructed a generalized two-bit problem that seems capable of
misleading a genetic algorithm when the two defining bits of the problem be-
‘come widely separated on the string, Judging from the schema theorem, we ex-
pect to have difficulty when the factor
a ~ pat |
7 T= 1
is less than or equal to 1 (assuming that p,, = 0). A more careful analysis requires
us to consider the details of crossover more closely.
In an earlier section, we saw how the usual calculation of the expected num:
ber of schemata in the next generation is a lower bound. This is so because the
derivation contains no source terms (one schema’s loss is another's gain) and it
assumes that we lose the schema whenever a cross occurs between the schema's
‘outermost defining bits. In the two-problem this latter assumption is overly con:
servative, because the mating and crossing of noncomplementary pairs conserve
the genetic material of the parents. For example, 00 crossed with ()1 yields 01
and 00, The only time a loss of genetic material occurs is when complements
mate and cross. In these cases, a 00 mated and crossed with a 11 yields the pair
O1 and 10, and likewise, a 01 mated and crossed with a 10 yields the pair 11 and
00. The full crossover yield table is shown in Table 2.2 where an 5 is used to
indicate that the offspring are the same as their parents.
In the yield table we see how complements lose material, although we also
see how this loss shows up as a gain to the other complementary pair of schemata
Using this information, it is possible to write more accurate difference relation:
ships for the expected proportion P of each of the four competing schemata. To
do this we must account for the correct expected loss and gain of schemata due
TABLE 2.2 Crossover Yield Table in Two-Bit Problem
x 00 or 10 u
or
00 s s s 10)
5 00 5
o s s ” s
00
10 s “ s s
oO
" i50
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
to crossover. Assuming proportionate reproduction, simple crossover, and ran-
dom mating of the products of reproduction, we obtain the following autono-
‘mous, nonlinear, difference equations:
| + lad PP:
| + pe as
bobs
=
| + hha,
| +P PeeP ii
In these equations, the superscripts are time indexes, and the subscript binary
numbers are schema indexes. The variable /is simply the current (generation £)
population average fitness, which may be evaluated as follows
Prafoo + Polar + Prof + Pah
The parameter p’, is the probability of having a cross that falls between the two
defining bits of the schema:
‘Together, these equations predict the expected proportions of the four sche-
mata in the next generation. With specified initial proportions, we may follow
the trajectory of the expected proportions through succeeding generations. A
necessary condition for the convergence of the GA is that the expected propor-
tion of optimal schemata must go to unity in the limit as generations continue:
lim BP}, = 1
To examine the behavior of these equations more carefully, we look at several
numerical solutions of the extended schema equations for Type T and Il problems.
Some theoretical results are presented briefly without proof.
MDP Results
Figure 2.10 presents computational results for a representative Type I problem.
At first, the optimum schema (schema 11) loses proportion; however, as sche.
mata 10 and 00 lose proportion, the remaining battle is fought between schemata
11 and 01 alone, with 11 winning in the end, It may be shown that this resultAnother Perspective: The Minimal Deceptive Problem 51
generalizes to any Type | problem with nonzero starting proportions of the four
‘schemata. This result is surprising, as we originally designed the problem to cause
divergence from the global optimum. In short, dynamic analysis tells us that the
‘Type I minimal deceptive problem is not GA-hard.
Figures 2.11 and 2.12 present computational results for a Type II MDP. In Fig.
2.11 we see representative convergent results where (as in the Type I case) the
solution converges to the optimum despite its initial deception. Not all Type II
problems converge like this, however; when the complementary schema 00 has
too great an initial proportion, schema 11 may be overwhelmed, with resulting
convergence to the second best solution. Representative divergent results are
shown in Fig, 2.12. Simple sufficient conditions for the convergence of a Type Il
problem may be derived (Goldberg, 1987b); it is surprising that all Type Il prob:
lems converge to the best solution for most starting conditions.
Recent results (Bridges and Goldberg, 1987; Goldberg, 1987a) have ex-
tended the exact schema analysis to higher order problems. Other work along
these lines may permit a constructive definition of the class of GA-hard problems,
Of course, all of this work assumes a fixed coding. The addition of reordering
operators such as inversion may be nature's answer to problems too difficult for
a simple GA. We consider such operators and their analysis in Chapter 5.
TYPE I: FOt > FOO
Population Propertion
so 00
Generation
IGURE 2.10 Numerical solution of a Type 1, minimal deceptive problem
(MDP): r = 1.1,¢ = 1.05,c’ = 0.452
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
TYPE II: FOO > FOL (CONVERGES)
Population Proportion
0 ee Too
Generation
FIGURE 2.11 Numerical solution of a Type Mf, minimal deceptive problem that
converges: r = 1.1,¢ = 0.9,¢' = 0.5 with equal initial proportions.
TYPE IT: FOO > FOS (DIVERGES)
7 a ry
Generation
FIGURE 2.12 Numerical solution of a Type II, minimal deceptive problem that
diverges: r= 1.1, ¢ = 0.9, €’ = 0.5 with unequal initial proportions.‘Schemata Revisited: Similarity Templates os Hyperplanes 53
SCHEMATA REVISITED: SIMILARITY TEMPLATES
AS HYPERPLANES
‘Over the past two sections we have looked at schema processing from two pet-
spectives: using schema visualization we have viewed schema processing as the
‘manipulation of important periodicities, and under the minimal deceptive prob-
Jem we have considered schema processing in a competitive, ecological setting
‘Another useful vantage point may be reached if we take a more geometric view
of the underlying bit space.
‘To generate this geometrical vision of our search space, consider the strings
and schemata of length / = 3. Because the string length is so short, it is easy t0
draw a picture of the search space (Fig. 2.13). In this representation, we view the
space as a three-dimensional vector space. Points in the space are strings or sche-
mata of order 3. Lines in the space, as indicated in the diagram, are schemata of
order 2. Planes in the space are schemata of order 1, and the whole space is
covered by the schema of order 0, the schema ***.
‘This result generalizes to spaccs of higher dimension where we must aban-
don the geometrical notions available to us in 3-space. Points, lines, and planes
described by schemata in three dimensions generalize to hyperplanes of varying
Ont Lu
‘MOM Plane
Xs
FIGURE 2.13 Visualization of schemata as hyperplanes in three-dimensional
space.54
Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
dimension in space. Thus we can think of a genetic algorithm cutting across
different hyperplanes to search for improved performance.
SUMMARY
In this chapter, we have undertaken a more rigorous appraisal of genetic
algorithm performance using a more careful analysis of scbemetta (similarity tem.
plates), The primary result, embodied in the fundamental theorem of genetic
algorithms, says that high-performance, short-defining-length, low-order sche-
mata receive at least exponentially increasing numbers of trials in successive gen-
erations. This occurs because reproduction allocates more copies to the best
schemata and because simple crossover does not disturb short-defining-length
schemata with high frequency. Since mutation ts fairly infrequent, it has little
effect on these important schemata. The exponential form of this trial allocation
turns out to be a rational way to allocate trials, as it connects with the optimal
solution to a two-armed bandit problem.
By processing similarities in this manner, a genetic algorithm reduces the
complexity of arbitrary problems. In a sense, these highly fit, short, low-order
schemata become the partial solutions to a problem (called building blocks), and
4 GA discovers new solutions by speculating on many combinations of the best
partial solutions contained within the current population
‘That building blocks do indeed lead to better performance is an underlying
assumption of the simple genctic algorithm called the building block hypothesis.
‘The Walsh-schema transform has provided us with an important tool for under-
standing whether particular problems arc amenable to simple GA solution. Work
in this arca has also suggested that functions that are GA-hard (that is, difficult to
solve with the simple three-operator genetic algorithm) tend to have remote op-
tima, something akin to finding a ncedic in a haystack; many optimization meth-
‘ods, not just genetic algorithms, have trouble finding answers in problems with
such isolated optima,
Additional insight into the workings of genetic algorithms has been obtained
through analysis of the minimal deceptive problem—the smallest problem that
can possibly be deceptive or misleading for the simple genetic algorithm. It is
surprising that for many likely initial conditions the MDP is not Ga-hard. In other
words, even though we construct a difficult and misleading function (a badly
epistatic function), the genetic algorithm often refuses to be misled, This is en:
couraging and is no doubt responsible for much of the empirical success of sim-
ple genetic algorithms in epistatic problem domains.
Analysis of genetic algorithm performance has led us to understand better
why GAS work. In the next chapter a simple genetic algorithm is programmed
‘using the Pascal programming language, and we observe its performance ina trial
problem.Problems 55
@ PROBLEMS
21, Consider three strings A) = 11101111, A =
01000011 and six schemata H, = 1*
Hy = 0°00", Hy = 1°***1*, and He *, Which schemata are
matched by which strings? What are the order and defining length of each of the
schemata? Estimate the probability of survival of each schema under mutation
when the probability of a single mutation is p,, = 0.001. Estimate the probability
of survival of each schema under crossover when the probability of crossover
Pe = 0.85.
2.2. A population contains the following strings and fitness values at generation
0:
# String Fitness
1 10001 20
2 11100 10
3 ooo 5
4 01110 15
The probability of mutation is p,, = 0.01 and the probability of crossover is
P. = 1.0, Calculate the expected number of schemata of the form 1**** in gen-
eration 1. Estimate the expected number of schemata of the form 0**1* in gen-
ration 1
2.3. Devise three methods of performing reproduction and calculate an estimate
of the expected number of schemata in the next generation under each method,
24. Suppose we perform a crossoverlike operation where we pick two cross
sites and exchange string material between the two sites.
xx x[xx[ xx KRKYYRE
yyyiyylyy yyyxxyy
Calculate a lower bound on the survival probability of a schema of defining length
3 and order o under this operator. Recalculate the survival probability when we
‘treat the string as a ring (when the left end is assumed to be adjacent to the right
end).
2.5. How many unique schemata exist within strings of length J = 10, 20, and
30 when the underlying alphabet is binary? How many unique schemata of order
3 exist in binary strings of length / = 10, 20, and 30? Calculate reasonable upper
and lower bounds on the number of schemata processed using strings of length
10, 20, and 30 when the population size m = 50. Assume a significant build:
ig block length equal to 10 percent of the total string length.Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations
2.6. Suppose a schema H when present in a particular string causes the string t©
have a fitness 25 percent greater than the average fitness of the current popula:
tion. If the destruction probabilities for this schema under mutation and cross-
over are negligible, and ifa single representative of the schema is contained in
the population at generation 0, determine when the schema Hf will overtake pop:
ulations of size n = 20, 50, 100, and 200.
2.7. Suppose a schema H when present in a particular string causes the string to
have a fitness 10 percent less than the average fitness of the current population.
If the destruction probabilities for this schema under mutation and crossover are
negligible, and if representatives of the schema are contained in 60 percent of
the population at generation 0, determine when the schema H will disappear
from populations of size n = 20, 50, 100, and 200.
2.8. A two-armed bandit pays equal awards with probabilities p, = 0.7 and
P, = 0.3, Estimate the number of trials that should be given to the observed best
arm after a total of 50 trials.
2.9, Derive a more accurate formula for calculating the number of unique sche
mata contained in a randomly generated initial population of size m when the
string length is £ (Hint: Consider the probability of having no schemata of a
particular order and use the complementary probability to count the number of
schemata represented by one or more.)
2.10. Suppose a problem is coded as a single unsigned binary integer between 0
and 127 (base 10), where 0000000, = Ojo, 1000000, = 64,o, and 1111111
127. Sketch the portion of the space covered by the following schemata: 1******,
serene, LLLLLLS, 10° ‘OL, #111"
™ COMPUTER ASSIGNMENTS
‘A. fitness function for a single locus genetic algorithm is given by the function
fi = constant and fy = constant. Derive the recursion relationship for the ex-
pected proportion of I's under reproduction alone and reproduction with muta-
tion in an infinitely large population. Program the finite difference relationship
and calculate the expected proportion of 1's between generation 0 and 100
assuming equal proportions of 1's and 0's initially and a ratio of f/f, =
r= 11,210.
B. Redo Problem A including mutation using mutation probability values of
Pa = 0.001, 0.01, 0.1Computer Assignments 57
C. Consider an order 2 schema where we assume constant fitness values as fol-
lows:
sepeeeles on,
seleesors fy
seoreeiee of,
ssoreegee or
Write finite difference relationships for the proportion of the four schemata (11,
10, 01, 00) under reproduction, crossover, and mutation. Be sure to use the cor-
rect loss and gain terms due to crossover. Program the finite difference relation-
ships and simulate the performance of a large population with the f constants of
Fig. 2.10. Compare and contrast results at p,, values of 0.001, 0.01, 0.1 to the
results of Fig. 2.10.
D. For the function f(x) = 2° on the integer interval [0, 31] coded as a five-bit,
unsigned binary integer, calculate the average fitness values for all 3° schemata
From this data, determine whether the function-coding is GA-deceptive oF not.
E, Design a three bit function that is GA-deceptive. Prove its deception by cal-
culating all 3 schema averages.Computer Implementation
of a Genetic Algorithm
When first approaching genctic algorithms, many users hesitate, not knowing
where to start or how to begin. On the one hand, this aversive reaction seems
strange. After all, in the first two chapters we have seen how genetic algorithms
are mechanically quite simple, involving nothing more than random number gen-
eration, string copies, and partial string exchanges. On the other hand, for many
business, scientific, and engineering users and programmers this stark simplicity
is itself part of the problem; these individuals are familiar with using and program-
ming high-level computer codes involving complex mathematics, interwoven da-
tabases, and intricate computations. Moreover, this same audience is most
comfortable with the reassuring repeatability of deterministic computer pro-
grams. The direct manipulation of bit strings, the construction of custom codings,
and even the randomness of GA operators can present a sequence of high hurdles
that prevent effective application.
In this chapter, we leap these obstacles by first constructing the data struc-
tures and algorithms necessary to implement the simple genetic algorithm de-
scribed earlier. Specifically, we write a Pascal computer code called the simple
genetic algorithm (SGA), which contains nonoverlapping string populations, re-
production, crossover, and mutation applied to the optimization ofa simple func-
tion of one variable coded as an unsigned binary integer. We also examine some
implementation issues such as discretization of parameters, coding of strings, en-
forcement of constraints, and mapping of fitness that arise in applying GAs to
particular problems.60 Chapter 3 / Computer Implementation of a Genetic Algorithm
DATA STRUCTURES
Genetic algorithms process populations of strings. Therefore it comes as no sur-
prise that the primary data structure for the simple genetic algorithm is a string
population. There are any number of ways to implement populations. For the
SGA we choose the simplest; we construct a population as an array of individuals
where each individual contains the phenotype (the decoded parameter or param:
eters), the genotype (the artificial chromosome or bit string), and the fitness
(objective function) value along with other auxiliary information. A schematic of
population is shown in Fig, 3.1, The Pascal code of Fig. 3.2 declares a population
type corresponding to this model. For readers unfamiliar with Pascal, the essen-
tials of the language are presented in Appendix B; this appendix also presents
some random number generation routines and utilities. Even without formal
training, many readers should be able to decipher the essence of this code,
Referring to Fig. 3.2, we see the declaration of a number of constants: the
maximum population size, maxpop, and the maximum string length, maxstring.
These set upper bounds on the population size and the string length. Following
the constant declarations, we declare the population itself, along with its com-
ponents in the fype block. As we can see, the type population is an array of type
individual (indexed between | and maxpop). Type individual is a record com-
posed of a type chromosome called chrom, a real variable called fitness, and a
real type variable called x. These represent the artificial chromosome, the string
fitness value, and the decoded parameter value x respectively. Digging further,
we sce that the type chromosome is itsclf an array of type allete (indexed be-
tween I and maxsiring), which in this case is simply another name for the boo-
lean type (a single bit, true or false ).
INDIVIDUAL. INDIVIDUALS
numper [stains | x [oo praca
1 out 15 | 225,
2 ‘1001 oa
a
n Com 7 1s
FIGURE 3.1 Schematic of a string population in a genetic algorithm.Dota Structures 6
const maxpop = 100;
maxstring — ~ 30;
type allele boolean; { Allele = bit p
chromosome = array(1. .maxstring] of alle
individual = record
chrom:chromosone; ( Genotype ~ bit string )
x:real; (Phenotype = unsigned Integer )
fitness:real; (Objective function value }
parentl, parent2, xsite:integer; ( parents & cross pt )
on )
; ( String of bits )
array(1..maxpop] of individual;
population
FIGURE 3.2 A simple genetic algorithm, SGA, data type declarations in Pascal,
In the SGA, we apply genetic operators to an entire population at cach gen-
ration, as shown in Fig. 3.3. To implement this operation cleanly, we utilize two
nonoverlapping populations, thereby simplifying the birth of offspring and the
replacement of parents. The declarations of the two populations o/dpop and new-
‘pop are shown in Fig. 3.4 along with the declaration of a number of other global
Program variables. With these two populations, it is a simple matter to create
new offspring from the members of oldpop using the genetic operators, place
those new individuals in newpop, and set oldpop to newpop when we are
GENERATION GENERATION
T Tt
1 coy
: 2
3 2
‘ +
eran
3 esaesver & ‘
: [MSTATION 3
wot Nea
" ————4
FIGURE 3.3. Schematic of nonoverlapping populations used in the SGA.62
Chapter 3 / Computer Implementation of a Genetic Algorithm
var oldpop, nevpop: population; { Ivo non-overlapping populations }
gen, mage ( Integer global varlables |
‘pmutation, sunfitness:r ( Real global variables )
mutation, neross:integer: ( Integer statistics )
avg, max, ‘min:real; (Real statiatics )
FIGURE 3.4 SGA global variable declarations in Pascal.
through. There are other, more storage-efficient methods of handling populations.
We could maintain a single overlapping population and pay more careful atten-
tion to who replaces whom in successive populations. There is also no particular
reason (o keep the population size constant. Natural populations certainly change
in size, and there may be motivation during artificial genetic search to permit
Population size variation from generation to generation. There is, however,
Stronger motivation in our current work to keep things simple, and this has
guided the choice of nonoverlapping populations of constant size. in our machine
Icarning work in later chapters we will need to come to terms with the popula
tion issue once more.
‘With our data structures designed and built, we need to understand the three
operators—reproduction, crossover, and mutation—essential to SGA operation,
Before we can do this, we need to define some of the more important global
program variables that affect the operation of the entire code. Looking at Fig. 3.4
once again, we see a number of variables of type integer. Among them are the
variables popsize, Ichrom, and gen. These important variables correspond to
what we have been calling population size (72), string length (J), and the gener:
ation counter (¢), Additionally the variable maxgen is an upper limit on the num:
ber of generations. Also shown in Fig. 3.4 are a number of important global reat
variables: peross, pmutation, sumfitness, avg, max, and min. The variables
peross and pmutation are the probabilities of crossover and mutation respec-
tively (p, and p,,). The sumfitness variable is the sum of the population fitness
values (3/), This variable is important during roulette wheel selection. There are
a few other global variables we have not discussed; a complete listing of the SGA
code is presented in Appendix C.
REPRODUCTION, CROSSOVER, AND MUTATION
‘The three operators of the simple tripartite algorithm can each be implemented
in straightforward code segments. This comes as no surprise, since we have been
touting the simple mechanics of these operators. Before we look at each routine,
we must remember the common thread running through the three operators:Reproduction, Crossover, and Mutation 63
cach depends on random choice. In the code segments that follow, we assume
the existence of three random choice routines:
random returns a real pseudorandom number between zero and one (a
uniform random variable on the real interval (0, 1).
Rip returns a boolean true value with specified probability (a Ber-
noulli random variable)
md returns an énteger value between specified lower and upper limits
(@ uniform random variable over a subsct of adjacent integers).
A more complete discussion of these routines is contained in Appendix B where
several programming examples are given.
In the simple genetic algorithm, reproduction is implemented in the function
select as a linear search through a roulette wheel with slots weighted in propor-
tion to string fitness values. In the code shown in Fig. 3.5, we see that sefect
returns the population index value corresponding to the selected individual, To
do this, the partial sum of the fitness values is accumulated in the real variable
partsum. The real variable rand contains the location where the wheel has
landed after a random spin according to the computation
rand := random * sumfitness
Here the sum of the population fitnesses (calculated in the procedure statistics)
is multiplied by the normalized pseudorandom number generated by random.
Finally the repeat-untif construct searches through the weighted roulette wheel
until the partial sum is greater than or equal to the stopping point rand The
function returns with the current population index value j assigned to select.
function select(popsize:integer; sunfitness:real;
‘var pop: population): integer;
¢ a single individual via roulette wheel selection }
{Random point on wheel, partial oun |
( populacion index }
begin
partsun := 0.0; J ‘= 0; ( Zero out counter and
coumslator }
( Wheel point cale. uses random number [0,1] )
rand := random ¥ sunfitnes:
repeat ( Find wheel slot }
Sin dtl
partsum '= partsun + pop{3].f{tness:
unefl (paresum >= rand) oF (J = popsize);
(Return individual number }
select i= $;
ends
FIGURE 3.5. Function select implements roulette wheel selection.Chapter 3 / Computer Implementation of a Genetic Algorithm
‘This is perhaps the simplest way to implement selection, There are more
efficient codes to implement this operator (a binary search will certainly speed
things up), and there are many other ways to choose offspring with appropriate
bias toward the best. We will examine some of these in later chapters, but for
now we stay with this basic mechanism.
The code segment select gives us a straightforward way of choosing offspring
for the next generation. From our previous descriptions, we know our next step
is crossover. In SGA the crossover operator is implemented in a procedure that
we, cleverly enough, have called crossover (Fig. 3.6). The routine crossover takes
two parent strings called parent! and parent2 and generates two offspring strings
called child and child2. The probabilities of crossover and mutation, peross and
pmutation, are passed to crossover, along with the string length [chrom, a cross
over count accumulator ncross, and a mutation count accumulator nmtattation,
Within crossover the operations mirror our description in Chapter 1. At the
top of the routine, we determine whether we are going to perform crossover on
the current pair of parent chromosomes. Specifically, we toss a biased coin that
comes up heads (true) with probability poross. The coin toss is simulated in the
boolean function flip, where flip in turn calls on the pseudorandom number
procedure crossover(var parentl, parent?, childl, child2:chronosome;
‘var Ichrom, ‘neross, nmutation, Jcross: integer;
peross, pmutation: real);
place in 2 child etringe |
( Crose 2 parent strings
var j:incegor;
begin
Af £11p(pcross) then begin {Do crossover with p(cross) )
Jeross '= rnd(1,lenrom-1); (Cross between 1 and 1-1 )
nerose := nerors + 1; ir counter )
ond else ‘site to force mutation }
jeross :~ Ichron,
(Ast exchange, 1 to 1 and 2 to 2)
for j := 1 to jeross do begin
ehiidi|j] := mutatton(parentl[}], pmutation, mutation);
ehild2|j] i= mutation(parent2[3], pmutation, mutation);
end;
( 2nd exchange, 1 to 2 and 2 to 1}
Af Jerosscichrom then { Skip If cross site 1s lehrom--no crossover }
for j ‘= Jeroses] to Ichrom ao begin
childi(}] += mitation(perent2[J], poutation, rmutation):
ehild2(j] mutation(parent1(j], pmutation, mautation);
end
ena
FIGURE 3.
Procedure crossover implements simple (single-point) crossover.Reproduction, Crossover, and Mutation 65
routine random. Ifa cross is called for, a crossing site is selected between 1 and
the last cross site, The crossing site is selected in the function rnd, which returns
a pseudorandom integer between specified lower and upper limits (between 1
and [chrom ~1). If no cross is to be performed, the cross site is sclected as
Ichrom (the full string length /) so a bit-by-bit mutation will take place despite
the absence of a cross. Finally, the partial exchange of crossover is carried out in
the two for-do constructs at the end of the code. The first for-do handles the
partial transfer of bits between parent! and child? and between parent2 and
child2, The second for-do construct handles the transfer and partial exchange of
material between parent! and child? and between parent? and cbildl, In all
ccascs, a bit-by-bit mutation is carried out by the boolean (or allelean) function
mutation
Mutation at a point is carried out by mutation as shown in Fig. 3.7. This
function uses the function flip (the biased coin toss) to determine whether or
not to change a true to a false (a 1 to a0) or vice versa, Of course the function
flip will only come up heads (true) pmutation percent of the time as a result of
the call to the pseudorandom number generator random within flip itself. The
function also keeps tabs on the number of mutations by incrementing the variable
‘nmutation, As with reproduction, there are ways to improve our simple mutation
operator. For example, it would be possible to avoid much random number gen-
eration if we decided when the next mutation should occur rather than calling
flip each time. Again, in this chapter, we avoid sophisticated niceties and stick
‘with the basics.
The three main pieces of our genetic algorithm puzzle have proven to be
none too puzzling, We have seen in this section how the three may be easily
coded and easily understood. The next section continues piecing together the
bigger GA picture as we coordinate reproduction, crossover, and mutation in a
single generation
function mtation(alle pmutation: real;
‘var nautation: integer) allele
( Mutate an allele w/ pmutation, count nunber of mutations }
var mutate:boolean;
begin
mitace := £1ip(pmucacion); (Flip the biased coin }
Sf mutate then begin
mutation = nmutation + 1;
mutation := not alleleval; ( Change bit value }
end else
‘mutation := alleleval: ( No change }
end:
FIGURE 3.7. Function mutation implements a single-bit, point mutation,66 Chapter 3 / Computer Implementation of a Genetic Algorithm
ATIME TO REPRODUCE, A TIME TO CROSS
With the Big Three designed and built, creating a new population from an old
one is no big deal. The proper sequencing is shown in Fig. 3.8 in the procedure
generation. Starting at an individual index j = 1 and continuing until the popu:
lation size, popsize, has been exceeded, we pick two mates, mate! and mate2,
using successive calls to select We cross and mutate the chromosomes using
crossover (which itself contains the necessary invocations of mutation). In a final
flurry of mad activity, we decode the pair of chromosomes, evaluate the objective
(fitness) function values, and increment the population index j by 2.
Having already examined select, crossover, and mutation in detail, we need
only concern ourselves with the two problem-dependent routines hinted at
above. For any problem we must create a procedure that decodes the string to
Procedure generation:
clon through select, crossover
‘sunes an even-numbered popsizé
| mate2, Jeross: integer;
and mutation )
’
( select, crossover, and mitation until nevpop is filled )
s, oldpop): ( pick patr of mates )
on, Jeross, peros:
( Decode string, evaluare fitness, & Yecord parentage dat
vith nepoplj ] do begin
x i= decode(chrom, lchrom);
Fltness := objfune(x) ;
parent)
parent?
xeite
end;
with newpop[+1] do begin
x i= decoda(cheom, Tehrom) ;
fon both children }
jeroee;
Fitness = obj func(x);
parentl := matel;
Fant? i= mate2;
xeite i= Jere
end;
{Increment population index )
Jest?
until J>popsize
fend;
FIGURE 3.8 Procedure generation gencratcs a new population from the pre-
vious population.ATime to Reproduce, « Time to Cross 67
create a parameter or set of parameters appropriate for that problem, We must
also create a procedure that receives the parameter or set of parameters thus
decoded and evaluate the figure of merit or objective function value associated
with the given parameter set. These routines, which we call decode and objfunc,
are the two places where the GA rubber meets the applications road, For different
problems we will often need different decoding routines (although later on in
this chapter we will examine some standard routines that have proven useful
a number of studies), and in different problems we will always need a different
fitness function routine. Having said this, it is still uscful to look at a particular
decoding routine and a particular fitness function, To be consistent with work
earlier in this book, we will continue to use binary unsigned integer coding, and
‘we will continue to use a simple power function as the fitness function; however,
we will increase the value of the exponent, using the function f(x) = 2°,
SGA uses the decoding routine shown in Fig, 3.9, the function decode. In this
function, a single chromosome is decoded starting at the low-order bit (position
1) and mapped right to left by accumulating the current power of 2— stored in
the variable poweroftwo—when the appropriate bit is set (value is true). The
accumulated value, stored in the variable acum, is finally returned by the func:
tion decode.
The objective function used in SGA is a simple power function, similar to the
function used in Chapter 1. In SGA we evaluate the function f(x) = (x/coeff)”.
‘The actual value of coeff is chosen to normalize the x parameter when a bit string
‘of length icbrom = 30 is chosen. Thus coeff = 2° — 1 = 10737418230. Since
the x value has been normalized, the maximum value of the function will be f(x)
= 1.0 when x = 2" — 1 for the case when fobrom = 30. A straightforward
implementation of the power function is presented 3.10 as the function
objfunc.
function decode(chrom:chromosome; Ibis: Lnteger) real;
( Decode string as unsigned binary integer - true-1, false~0 }
var j:integer;
‘accun, pouero£2:
begin
accum := 0.0; powerof? := 1;
for } := 1 to ibits do begin
if chrom[j} then accum ‘= accum + poverof?;
poverof2 = peverof? * 2;
end;
Secode := acum;
ena;
21;
FIGURE 3.9 Function decode decodes a binary string as a single, unsigned
integer.68 Chapter 3 / Computer Implementation of a Genetic Algorithm
function
Sonat coef = 107374i825-0; ( Gnefftectent to nomalize donate }
n= 10;
gin objfune := power( x/coef, n ) end;
FIGURE 3.10 Function objfunc calculates the fitness function f(x) = cx"” from
the decoded parameter x.
GET WITH THE MAIN PROGRAM
We have described the data structures. We have built the genetic operators, We
have decoded the strings, and we have figured the fitness values. Now is the time
to wrap a ribbon around this parcel, test it, and ship it on for further use. In Fig
3.11 we sce the main program of SGA. At the top of the code, we start innocently
enough by sctting the gencration counter to 0, gen := 0, We build steam as we
read in program data, initialize a random population, calculate initial population
statistics, and print out a special initial report using the procedure initialize. We
won't dwell on the initialization code here. The interested reader should refer to
Appendix C, which contains a complete copy of the SGA code.
‘At long last, with necessary preliminaries complete, we hit the main loop
contained within the repeat-until construct. In rapid succession we increment
the generation counter, generate a new generation in generation, calculate new
generation statistics in statistics, print out the generation report in report, and
advance the population in one fell swoop:
oldpop := newpop:
All this continues, step after relentless step, until the generation counter exceeds
the maximum, thereby forcing the machinery to a grinding halt.
In our rush to see the big picture, we have missed some important details.
‘The statistical routine statistics (Fig. 3.12) calculates the average, maximum, and
‘minimum fitness values; it also calculates the sumfitness required by the roulette
wheel. This version of statistics is again something of a minimally acceptable
solution. Many other interesting population statistics could and probably should
be tracked. For example, allele convergence statistics are often tabulated during
4a generation. Best string so far or best & strings so far could be stored for future
reference. Population standard deviation or even population histograms might
also be of interest in doing more detailed run postmortems, The separation of
statistical functions in the routine statistics permits the casy addition of any or all
of these computations.Get with the Main Program 69
( Main progean
gen := 0; ( Set things up )
Iniedalize;
repeat {Main iterative loop }
‘gen c= gen + 1;
generation;
statistics(popsize, max, avg, min, sunfitness, newpop):
Feport(gen):
oldpop := newpop; ( advance the generation }
une (gen >= maxgen)
end, | Bnd main program )
FIGURE 3.11 Main program for a simple genetic algorithm, SGA.
The procedure report presents the full population report, including strings,
fitnesses, and parameter valucs. A listing of report and its single subprocedure
writechrom are presented in Fig, 3.13. Once again, a wide array of tabular and
graphic reporting options may be useful in genetic algorithm work. The simple
report procedure is a good tool because it permits side-by-side comparison of
consecutive generations. In turn, this allows checking of operators and analysis
of the events leading to the construction of the best individuals,
Procedure statistics (popetze: in
‘var max,avg,min, sunfitness:real;
var pop:population);
( Galoulate population statistics )
var jrinteger:
begin
(Initialize )
suafitness :- pop{).fLtness;
ain = pop{1] . ficness:
max ‘= pop[1] ‘fitness;
2 to popaize do with pop[j] do begin
= sunfitness + fitness; { Accumulate fitness sun )
Af Eitness>max then max (Mew max )
Af fitnesscain then ain 2 (Mew ain)
end;
(Calculate average )
sunfitnese/popete:
FIGURE 3.12. Procedure statistics calculates important population statistics.70 Chapter 3 / Computer Implementation of a Genetic Algorithm
( report. sga: contains writechrom, report )
procedure writechrom(var out:text; chrom:chromosome; Ichrom: integer) ;
(Write a chromosome as a string of 1's (trues) and O's (false’s) }
var j: integer:
begin
for J i= chrom domto 1 do
4f chrom{j} then vrite(out, 1")
else velte(out, 0");
end
procedure report (gen: integer) |
(Write the population report’)
const Inelength ~ 132;
var J:integer:
har(Ist,‘~" ,.inelengeh) ; writeln(ist);
)7 1,50); weitein(Lse,"Fopulation Report’);
repehar(let,! 1,23); weite(Lst, ‘Generation ',gen-1:2);
repehar(Ist,’ 1157); weiteln(Lst, "Generation '\gen:2)
weiteln(lst);
verlee(ise," # sering x fleness*);
wricecise," # parents xsite’):
weiteln(ise, + string x tenes’);
repehar (Let,"-1, ineLength) ; vetteln (Let);
for J i= 1 to popsize do begin
welte(1se,J:2.0°)D;
Cord string |
with oldpoplj] do begin
vwritecheon(1et, chrom, chrom) ;
welee(lee,! "R10," ', eftness:6:6, 1 |:
end;
Citew seeing}
with newpopl}] do, begin
write(ist,? te 9:2, ") Cy pareneh:2, 1,', parent2:2,')¢
xsite a5
vrttechron(1ee, chon, Lchzom
wrltela(ist, 1 7,216," Ebeness:6:4);
end;
eve
repehar(1st,"~‘,linelength) ; weiteln(let);
{ Coneration statistics and accumulated values }
writeln(1st," Note: Generation ', gen:2, ' & Accumulated Statistics: '
Te maxm?, max:6:4,", ‘mine', min:6:4, ", aves’, avg:6:4, "sume!
Tsunfitness:6:4, ',” nmutation=", nmutation, ',' neross', ‘neross);
repchar(iet,"-",1inelength) ; writeln(ist);
page(1st);
end;
FIGURE 3.13 Procedures report and writechrom implement population
reports.
HOW WELL DOES IT WORK?
We have trudged through the SGA code, step by step, inch by inch, gaining a
better feel for some of the ins and outs of genetic algorithm programming Of
course, this is not the only way to skin a GA cat, and a number of public domainHow Well Does It Work? n
codes are available with numerous bells and whistles for effective optimization
in a variety of domains (Booker and De Jong, 1985; De Jong, 1982; Grefenstette,
1984a, 1984b). Actually, we will be adding several important features of our own
in this and later chapters. Let us resist temptation and hold off the fancy features.
In this section, we stick with the bare-bones GA and sce how welll it works.
We have already specified our simple test problem. The bit string decodes as
an unsigned 30-bit integer. The fitness function f is the power function f(x) =
(x/c)", where c has been chosen to normalize x and m has been chosen as 10,
Some of you may cry foul, wondering why we have chosen a different function
from the one followed in Chapters 1 and 2 (f(x) = =), Actually, we have
changed the problem to make things tougher for the GA, as illustrated in Fig.
3.14, With the larger exponent, the average function value is lower, and a smaller
Proportion of the domain maps to values above some specified quantity. AS a
result, the random starting population will not contain very good points to begin;
this is a better test of GA performance.
To specify our computer simulations more precisely, let’s choose a trial set
of GA parameters. In De Jong's (1975) study of genetic algorithms in function
optimization, a series of parametric studies across a five-function suite of prob
lems suggested that good GA performance requires the choice of a high crossover
probability, 2 low mutation probability (inversely proportional to the population
size), and a moderate population size. Following these suggestions we adopt the
following parameters for our first computer simulations:
0333 (probability of mutation)
06 (probability of crossover)
30 (population size, n)
pmutation
peross
popsize
(x)
x
FIGURE 3.14 Comparison of the functions +7 and x" on the unit interval.72
Chapter 3 / Computer Implementation of o Genetic Algorithm
‘The string length for the hand simulations of a genetic algorithm in Chapter
| was short (short by genetic algorithm standards): / = 5, This translated into a
ridiculously small space with only 2° = 32 points, where there was little practical
need for genetic search. Any enumerative search or random walk would have
found good points quickly, Of course, at that time our aim was pedagogical clarity,
and the size of the search space was of little interest. Now that we are interested
in seeing a stiffer test of GA performance, the string length is increased and the
exponent of the test function is increased, With a string length /ebrom = 30, the
search space is much larger and random walk or enumeration should not be 50
profitable. With icbrom = 30 there are 2” = 1,07( 10") points, With over 1,07
billion points in the space, one-at-a-time methods are unlikely to do very much
very quickly. Moreover, the increase in exponent has adjusted the space so that
only 1.05 percent of the points have a value greater than 0.9, as shown in Fig.
3.14, These two modifications make the problem a better test of GA performance.
We start the simple genetic algorithm and let it run for seven generations.
‘The statistical report for the run is shown in Fig. 3.15 and the initial generation
(gen = 0) and the first generation are shown side by side in Fig. 3.16. The initial
population starts out with a population average fitness of 0.0347. The average
fitness of the function on the specified interval may be calculated to be 0.0909.
In some sense, we have been unlucky (but not unrealistically so) in our random
choice of a population. Additionally glancing at our best member fitness in the
initial population, fn,, = 0.2824, we should expect to have 30(1 ~ 0.2824") =
3.56 or approximately four strings in a random population of 30 with fitness
Fopulation size (popsize> +
Chromosome (eget Centon) =
axtnun # oF geneeation (eaxcen) = 10
Cressover prebaility perssey = .9ooooooee-ot
tation prebebitity (gmutetion) = 333000000802
2.82413225526-01
population average flaness = 3.67i38327B0e-02
population minima fNtnese = 9. 34061595736-10
Ina population sum of Fitness = 1.QUI4740R37E00
FIGURE 3.15 initial report from an SGA, simple genetic algorithm, run.How Well Does It Work?
tienes
parents xalte
1) T0000 Teor19ecootor111110100 9. o.e782 | 14) (24,4) HS
s44rt0011eroo00ctoorin13110c0t 1.0u70Ee07 7772 | 13) GO, 1) 30
s14r1900T0t0D000T0ror188100100 1ospKESR O.AS8 | CO, 1) 30
MH41t0011r100000t0roorHtOTINY 1 0cBIEs00 0.7850 | »
sH4s1¥0recoootstororssstooton 1.065509 0.9079 | 18) ¢ 9,10) 30,
M11TH00tecooo0ctoorsr18110000 1spiEsOR 0.8715 |S
s1rTootecovoocrontocriooerer 1ossie? oars — | OCS,
s4rteorrrtoooeetororrtrticcos 1.oustesor 0.7850 | DCT) 8
Orrteorsrtootoctororrtstoctoo 7.rospees 0.0408 | BDC 1H) 8
‘M1rtT0reconooctoorist8110cot 1 os7seso7 0.9659 | (18.20) 80
s¥rrtt0nioconoootooriststto1tY 1ospIESOP OAS — | 2H) 18,20) 30
‘stiittrtiacooooetoorsststoctoo 1.0717E+09 0.9807 | 289 (16,30)
st1rteotartonooecorarrtstiontt 1.0nsoeeon 07am | 249 (18,30)
s11rtroorecrrtocorro1st1110000 tose? 0.8782 | 2D SLB) 1B
Cnr ot oooot0oyaDDN0yNOOeNON $-oREAEHOR o cons =| 28) (23, 5S) 8
sMArttoorteonoeetontiensriocor 1.csoteson o.eeot | 2) (2, 2) 30
sMttttantecttooecorarrsttoctoo 1.ospseeon corer | 30), 2) 3
Ganeratien 7 & Accumulated Seetietices mnc.9807, in0.0U08, aged.8
75
eneration 7
seloe ness
{WN ManrorNDMNKNNNDTT TIONG ID 1.059%E609 0.8779
{911 110010000000%0011011101100 1.08916609 0.8715
{N11MopYoc0o000%001orONOCNNY 1.059160 0.0715,
{NTT MMoreCOoHeOHODGGTTTH TONY 1.0875EHOF 0.9650
‘Hrsntorecopertnorey¥11140¥00 1:08536409 9070
‘11stop0terto0eco011 1811101100 1.048166 0.764
‘0111001 te000¢eN001 1811816181 7,7910€408 0.0405
irrmonrtmorecomtertTTiacto0 1.0510e+ 0.8872
‘8111001 ere0neeN901 18811 1c0D1 1:04 7DESO 0.7772
‘1114004 ofo00e0%0011881111001 1,05706609 0.7772
‘ntrrterrecooteconot¥1t0re001 1.0714€909 0.9807
‘nrr¥1erecopaeo199119111 0001 1.087560? 0.9650
‘H11MonroreDgecmnHeyHONT0HNT9.05646409 0.8757
‘111Y0ot6rY000e0H0H01181 106100 1,0440€409 0.76%
{111110ptee00080%001GCHT006101 1,0591E+0P 0.0715,
‘nrteoorioworeowoneyN¥1 116001 9.01350 0.5615
{WrrrMoronaoneo¥oD!T0¥NHH0NNT 1.0675E409 09650
{1911001 eop000¥0DNTHNNN eH
{Wr MpF 200000100171111 16109 1.08838%07 0.9066
{11 140910¥Y000200000111011¢001 4.06408609 0.7606
Wi seo9Hs0¥09!9090N098991111 1.09958607 6.5615
Mortar amor orer 16109 #,9821E408 6.4726
{Wry MootommoNorMDIeCIODNC ON 1.059NE40F 6.8715
{WNT Hpot0o¥ton099N0 1100000 1.05958409 6.8747
s41s19001ecr419c0nt0T1120T0100 1.059508 0.8752
‘Wr Moni0o19000 19917 IeOON T.6593e607 0.8736
swrytitpstongono%a04 1111 0109 t abaseaam 0.9522
ststt00tecootoctooorststtontt 1.059107 0.8720
s1rTHoerecoreoctoorstt8110011 1.059269 0.0724
revtation=201,
00, sume 26.2997,
rereese 71
FIGURE 3.18 SGA run, generation report ¢ = 6-7.
MAPPING OBJECTIVE FUNCTIONS TO FITNESS FORM
In many problems, the objective is more naturally stated as the minimization of
some cost function g(x) rather than the maximization of some utility or profit
function 14(:°). Even if the problem is natu
rally stated in maximization form, this
alone does not guarantee that the utility function will be nonnegative for all > as
‘we require in fitness function (recall that a
fitness function must be a nonnegative
figure of merit), As a result, it is often necessary to map the underlying natural
‘objective function to a fitness function for
rm through one or more mappings.76
Chapter 3 / Computer Implementation of a Gens
ie Algorithm
‘The duality of cost minimization and profit maximization is well known, In
normal operations research work, to transform a minimization problem to a max-
imization problem we simply multiply the cost function by a minus one. In ge-
netic algorithm work, this operation alone is insufficient because the measure
thus obtained is not guaranteed to be nonnegative in all instances. With GAs the
following cost-to-fitness transformation is commonly used:
S®) = Cone ~ CE) when (2) < Cans
=0 otherwise,
‘There are a variety of ways to choose the CoeffIcient Cy Coax May bE taken as
an input coefficient, as the largest g value observed thus far, as the largest g value
in the current population, or the largest of the last & generations. Perhaps more
appropriately, Cy, should vary depending on the population variance. We will
consider this last possibility in Chapter 4
When the natural objective function formulation is a profit or utility function
we have no difficulty with the direction of the function: maximized profit or
ity leads to desired performance. We may still have a problem with negative
utility function u(x) values. To overcome this, we simply transform fitness ac-
cording to the cquation:
Ax) = U6) + Coq when u(x) + Con > O,
=0 otherwise.
We may choose Ci 28 an input Coefficient, as the absolute value of the worst w
value in the current or last # generations, or as a function of the population
variance, We postpone further consideration of these possibilities to a later
chapter,
All this monkeying about with objective functions should arouse suspicion
about the underlying relationship between objective functions and fitness func-
tions. In nature, fitness (the number of offspring that survive to reproduction) is
a tautology. Large numbers of offspring survive because they are fit, and they are
fit because large numbers of offspring survive. Survival in natural populations is
the ultimate and only taskmaster of any import. By contradistinction, in genetic
algorithm work we have the opportunity and perhaps the duty to regulate the
level of competition among members of the population to achieve the interim
and ultimate algorithm performance we desire. This is precisely what we do
when we perform fitness scaling
FITNESS SCALING
Regulation of the number of copies is especially important in small population
genetic algorithms. At the start of GA runs it is common to have a few extraor-
dinary individuals in a population of mediocre colleagues. If left to the normal
selection rule (pselect, = f/2s), the extraordinary individuals would take over a