1.
Introduction to Genetic Algorithms (GA)
Genetic algorithms (GA) are an evolutionary optimization approach, which are an
alternative to traditional optimization methods. GA is one of the most appropriate
methods for complex non-linear models where location of the global optimum is a
difficult task. GA follows the concept of solution evolution by stochastically
developing generations of solution populations using a given fitness. They are
particularly applicable to problems, which are large, non-linear and possibly
discrete in nature.
1.1 Difference between GAs and other normal optimization and search
procedures:
Advantages of GA:
    GAs work with a coding for the parameter set, not the parameters
     themselves.
    GAs search from a population of points, not a single point.
    GAs use probabilistic transition rules, not deterministic rules.
    The Genetic Algorithms are first suited for the optimization problem.
    GAs are robust with respect to local minima, maxima.
    GAs work well on mixed discrete/continuous problems.
    GAs can operate on various representations.
    GAs are stochastic.
    GAs are easily parallelised.
    GAs are easy to implement and lend themselves well to hybridization.
Limitations of GA
    GA implementation is still an art. They often require a lot of tweaking, then
     again sometimes you can tweak all you want and they will still find the
     same result.
    GAs are computationally expensive.
    GAs typically require less information about the problem (e.g. no gradients),
     but designing an objective function and getting the representation and
     operators right can be difficult.
Summarizing: it is suited for the dynamic program, because the genetic
programming paradigm parallels nature in that it is a never-ending process.
However, as a practical matter, a run of the genetic programming paradigm
terminates when the 'termination criterion is satisfied'.
1.2 From Biology to Genetic Algorithms
We can borrow some terms (and ideas) from biology . . .
    Chromosome: The coding of a possible solution for a given problem, usually
      represented with an array of bits or characters
    Gene: A single bit or a set of bits coding part of the solution
    Allele: One of the elements used to code the genes
    Fitness: Evaluation of the actual solution . . . to model a learning process as
      evolution:
    Crossover: Generate new solution by “mixing” two existing solutions
    Mutation: Random change in the solution
2. Working Principle of GA
The GA procedure is based on the Darwinian principle of survival of the fittest.
    An initial population is created containing a predefined number of
      individuals, each represented by a genetic string.
    Each individual has an associated fitness measure. The concept that the
      fittest (or best) individuals in a population will produce a fitter offspring is
      then implemented in order to reproduce the next population.
    Selected individuals are chosen for reproduction (or crossover) at each
      generation, with an appropriate mutation factor to randomly modify the
      genes of an individual, in order to develop the new population.
    The result is another set of individuals based on the original subjects leading
      to subsequent populations with better fitness and those with lower fitness
      will naturally get discarded from the population.
The GA consists of four main stages: evaluation, selection, crossover and
mutation.
    The evaluation procedure measures the fitness of each individual solution in
        the population and assigns to it a score.
    The selection procedure randomly selects individuals of the current
        population for development of the next generation.
    The crossover procedure takes two selected individuals and combines them
        about a crossover point thereby creating two new individuals.
    The mutation procedure randomly modifies the genes of an individual
        subject to a small mutation factor, introducing further randomness into the
        population. This iterative process continues until one of the possible
        termination criteria is met: if a known optimal or acceptable solution level is
        attained; or if a maximum number of generations have been performed; or if
        a given number of generations without fitness improvement occur,
        (Goldberg 1989; Lewin 1994, 1996).
3. Structure of GAs
GAs are categorised under an umbrella term Evolutionary Algorithms, which are
used to describe computer-based problem solving systems which use
computational models of evolutionary processes as key elements in their design
and implementation. A variety of evolutionary algorithms have been proposed, of
which the major ones are: GAs, evolutionary programming, evolution strategies,
classifier systems, and genetic programming (check out www.google.com on these
keywords). They all share a common conceptual base of simulating the 'evolution'
of individual structures via processes of selection, mutation and reproduction.
The existing GAs are founded upon the following main principles:
   1.   Reproduction
   2.   Fitness
   3.   Crossover
   4.   Mutation
There are various flavours of GAs in circulation, varying in implementation of
these three parameters, but in essence the algorithms all follow a standard
procedure:
   1. [Start] Generate random population of n chromosomes (suitable solutions)
   2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
 3. [New population] Create a new population by repeating following steps
    until the new population is complete
       (a) [Selection] Select two parent chromosomes from a population
            according to their fitness (the better fitness, the bigger chance to be
            selected)
       (b) [Crossover] With a crossover probability cross over the parents to
       form new offspring. If no crossover was performed, offspring is the exact
       copy of parents.
       (c) [Mutation] With a mutation probability mutate new offspring at each
       locus
       (d) [Accepting] Place new offspring in the new population
 4. [Replace] Use new generated population for a further run of the algorithm
 5. [Test] If the end condition is satisfied, return the best solution in current
    population
 6. [Loop] Go to step 2
            Figure 1. Simplified flow chart of a Genetic Algorithm.
PSEUDOCODE
 Algorithm GA is
      // start with an initial time
      t := 0;
      // initialise a usually random population of individuals
        initpopulation P (t);
       // evaluate fitness of all initial individuals in population
         evaluate P (t);
       // test for termination criterion (time, fitness, etc.)
         while not done do
            // increase the time counter
            t := t + 1;
            // select sub-population for offspring production
            P' := selectparents P (t);
            // recombine the "genes" of selected parents
            recombine P' (t);
            // perturb the mated population stochastically
            mutate P' (t);
            // evaluate it's new fitness
            evaluate P' (t);
            // select the survivors from actual fitness
            P := survive P,P' (t);
       od
  end GA.
4. Genetic Algorithms: Encoding Operators
4.1 Encoding (I)
Binary encoding is the most common one (mainly because the first research of
GA used this type of encoding)
   In binary encoding, every chromosome is a string of bits (0 or 1)
   Simple Implementation of the genetic operators
   Not always natural for many problems
                   Cromosome 1        1101100100110110
                   Cromosome 2        1110111000011110
Example of Problem: Knapsack problem
    There are things with given value and size. The knapsack has given
     capacity. Select things to maximize the value of things in knapsack, but do
     not extend knapsack capacity.
    Each bit says, whether the corresponding thing is in knapsack.
4.2 Encoding (II)
Permutation encoding can be used in ordering problems
    Every chromosome is a string of numbers that represent a position in a
     sequence
    Crossover and mutation must be designed to leave the chromosome
     consistent (i.e. have real sequence in it)
             Cromosome 1:        15 7 8 3 5 1310111612 1 14 2 4 6 9
             Cromosome 2:        2 9 14 1 11 5 8 1513 6 1216 7 3 10 4
Example of Problem: Travelling salesman problem (TSP)
    There are cities and given distances between them. Travelling salesman has
     to visit all of them, but he does not want to travel more than necessary. Find
     a sequence of cities with a minimal travelled distance.
    Chromosome describes the order of cities
4.3 Encoding (III)
Direct value encoding can be used in problems where some more complicated
values are required
    Every chromosome is a sequence of some values connected to the problem,
     such as (real) numbers, chars or any objects
    Good choice for some special problems, but necessary to develop some
     specific crossover and mutation
                     Cromosome: D S A B H Y V V
                     Cromosome: 2.5678 1.4361 3.3426 7.8761
                     Cromosome: open walk back close
Example of Problem: Finding weights for a neural network
    A neural network is given with defined architecture. Find weights between
     neurons to get the desired output from the network
    Real values in chromosomes represent weights in the neural network
4.4 Encoding (IV)
Tree encoding is used mainly for evolving programs or expressions (i.e., genetic
programming)
    Every chromosome is a tree of some objects, such as functions or commands
     in programming language.
    Programming language LISP is often used for this purpose, so crossover and
     mutation can be done relatively easily.
                    Cromosome (+ X (/ 5 y))
Example of Problem: Finding a function that would approximate given pairs of
values
    Input and output values are given. The task is to find a function that will
     give the best outputs for all inputs.
    Chromosome are functions represented in a tree
5. Genetic Algorithms: Selection Operators
According to Darwin’s theory of evolution the best chromosome survive to create
new offspring. There are many methods in selecting the best chromosomes:
    Roulette wheel selection,
    Rank selection
    Tournament selection
    Boltzmann selection
5.1 Roulette wheel selection
The simplest selection scheme is roulette-wheel selection, also called stochastic
sampling with replacement. This is a stochastic algorithm and involves the
following technique:
The individuals are mapped to contiguous segments of a line, such that each
individual's segment is equal in size to its fitness. A random number is generated
and the individual whose segment spans the random number is selected. The
process is repeated until the desired number of individuals is obtained (called
mating population). This technique is analogous to a roulette wheel with each slice
proportional in size to the fitness.
This can be simulated by the following algorithm:
   1. [Sum] Calculate sum S of all chromosome fitness values in population.
   2. [Select] Generate random number r from interval (0, S).
   3. [Loop] Go through the population and sum fitness values from 0 - S. When
      the sum is greater then r, stop and return the chromosome where you are.
PSEUDOCODE
            for all members of population
              sum += fitness of this individual
            end for
            for all members of population
              probability = sum of probabilities + (fitness / sum)
              sum of probabilities += probability
            end for
            loop until new population is full
               do this twice
                  number = Random between 0 and 1
                for all members of population
                   if number > probability but less than next probability
                       then you have been selected
                end for
               end
               create offspring
            end loop
Table shows the fitness value and the selection probability for 8 individuals.
Individual 5 is the most fit individual and occupies the largest interval, and
individual 1 is the least fit individual having the smallest interval on the line. For
selecting the mating population the appropriate number of uniformly distributed
random numbers (uniform distributed between 0.0 and 1.0) is independently
generated.
No.    Initial       Fitness     Probability       Expected        Cummulative    Random         Chromosome       Selected
                     value Fi                      Count=          Pi             No.            No.              Population
       Population                Pi = Fi/sum       (n=8)* Pi
1      0000 0000     1           0.0429            0.33            0.0429         0.259          3                0001 0101
2      0010 0001     2.1         0.09              0.72            0.1329         0.038          1                0000 0000
3      0001 0101     3.11        0.1336            1.064           0.266          0.486          5                0110 1010
4      0010 1000     4.01        0.1723            1.368           0.438          0.428          4                0010 1000
5      0110 1010     4.66        0.2               1.6             0.638          0.095          2                0010 0001
6      1110 1000     1.91        0.082             0.656           0.72           0.3            4                0010 1000
7      1110 1101     1.93        0.0829            0.664           0.809          0.616          5                0110 1010
8      0111 1100     4.55        0.1955            1.56            1              0.897          8                0111 1100
Sum of all fitness values: sum = 23.264
Average fitness = 23.264/8 = 2.908
After one iteration, instead of the least fit individual 1 the next least fit individual 6
has been discarded from the pool. This shows the noisiness in roulette-wheel
selection. After repeating the selection procedure n = 8 times the count of each
chromosome is shown below.
No.                  1           2             3               4            5             6           7            8
Initial Population   0000 0000   0010 0001     0001 0101       00101000     0110 1010     1110 1000   1110 1101    0111 1100
Count in the pool    1           1             1               2            2             0           0            1
Selected             0000 0000   0010 0001     0001 0101       00101000     00101000      0110 1010   0110 1010    0111 1100
Population
Referring the expected counts 5 and 8 gets two copies and 1 and 6 get no copies.
After 8 iterations the pool we get is also very similar to the expected count.
5.2 Rank Selection
The previous selection will have problems when the fitness values differ very
much. For example, if the best chromosome fitness is 90% of all the roulette wheel
then the other chromosomes will have very few chances to be selected.
Rank selection first ranks the population and then every chromosome receives
fitness from this ranking. The worst will have fitness 1, second worst 2 etc. and the
best will have fitness N (number of chromosomes in population).
After this all the chromosomes have a chance to be selected. But this method can
lead to slower convergence, because the best chromosomes do not differ so much
from other ones.
5.3 Tournament Selection
All methods above rely on global population statistics.
          – Could be a bottleneck esp. on parallel machines
          – Relies on presence of external fitness function which might not exist:
            e.g. evolving game players
Tournament selection is an informal Procedure:
          – Pick k members at random then select the best of these
          – Repeat to select more individuals
       Probability of selecting i will depend on:
          – Rank of i
          – Size of sample k
         higher k increases selection pressure
          – Whether contestants are picked with replacement
       Picking without replacement increases selection pressure
          – Whether fittest contestant always wins (deterministic) or this happens
            with probability p
       For k = 2, time for fittest individual to take over population is the same
        as linear ranking with s = 2 • p
5.4 Steady-State Selection
This is not particular method of selecting parents. Main idea of this selection is
that big part of chromosomes should survive to next generation.
GA then works in a following way. In every generation are selected a few (good -
with high fitness) chromosomes for creating a new offspring. Then some (bad -
with low fitness) chromosomes are removed and the new offspring is placed in
their place. The rest of population survives to new generation.
5.5 Elitism
Idea of elitism has been already introduced. When creating new population by
crossover and mutation, we have a big chance, that we will loose the best
chromosome.
Elitism is name of method, which first copies the best chromosome (or a few best
chromosomes) to new population. The rest is done in classical way. Elitism can
very rapidly increase performance of GA, because it prevents losing the best found
solution.
6. Genetic Algorithm: Crossover and Mutation operators
There are many methods for crossovers and mutations depending on the kind of
encoding applied to represent the chromosomes.
6.1 Crossover/Mutation (I)
For binary encoding we have many Crossover operators:
    Single point crossover: one crossover point is selected, binary string from
     the beginning of the chromosome to the crossover point is copied from the
     first parent, the rest is copied from the other parent
                             11001011+11011111 = 11001111
    Two point crossover: two crossover points are selected, binary string from
     the beginning of the chromosome to the first crossover point is copied from
     the first parent, the part from the first to the second crossover point is copied
     from the other parent and the rest is copied from the first parent again.
                           11001011 + 11011111 = 11011111
    Uniform crossover: bits are randomly copied from the first or from the
     second parent
                           11001011 + 11011101 = 11011111
    Arithmetic crossover: some arithmetic operation is performed to make a
     new offspring (e.g., logic AND)
                       11001011 + 11011111 = 11001001 (AND)
Mutation: inversion of selected bits
                                 11001001 => 10001001
6.2 Crossover/Mutation (II)
For permutation encoding we have to preserve consistency:
    Single point crossover: one crossover point is selected, the permutation is
     copied from the first parent till the crossover point, then the other parent is
     scanned looking the other numbers
            (1 2 3 4 5 6 7 8 9) + (4 5 3 6 8 9 7 2 1) = (1 2 3 4 5 6 8 9 7)
Order changing mutation: two numbers are selected and exchanged
                      (1 2 3 4 5 6 8 9 7) => (1 8 3 4 5 6 2 9 7)
6.3 Crossover/Mutation (III)
For real value encoding we can reuse crossover from binary encoding:
Mutation: a small number is added (or subtracted) to selected values
             (1.29 5.68 2.86 4.11 5.55) => (1.29 5.68 2.73 4.22 5.55)
6.4 Crossover/Mutation (IV)
Specific operators have to be selected also for tree encoding
    Tree crossover: one crossover point is selected in both parents, parents are
     divided in that point and the parts below crossover points are exchanged to
     produce new offspring.
Changing mutation: the operator, number, or variable in a randomly selected
node is changed.
Parameters of GA
Today there is no general theory which would describe parameters of GA for any
problem. Recommendations are often results of some empiric studies of GAs,
which were often performed only on binary encoding.
Crossover rate:
Crossover rate generally should be high, about 80%-95%. (However some results
show that for some problems crossover rate about 60% is the best.)
Mutation rate:
On the other side, mutation rate should be very low. Best rates reported are about
0.5%-1%.
Population size:
It may be surprising, that very big population size usually does not improve
performance of GA (in meaning of speed of finding solution). Good population
size is about 20-30, however sometimes sizes 50-100 are reported as best. Some
research also shows that best population size depends on encoding, on size of
encoded string. It means, if you have chromosome with 32 bits, the population
should be say 32, but surely two times more than the best population size for
chromosome with 16 bits.
Selection:
Basic roulette wheel selection can be used, but sometimes rank selection can be
better. Also some more sophisticated method, which changes parameters of
selection during run of GA. Basically they behave like simulated annealing.
Selection again depends on the problem.
Encoding:
Encoding depends on the problem and also on the size of instance of the problem.
Check chapter about encoding for some suggestions or look to other resources.
Crossover and mutation type:
Operators depend on encoding and on the problem.
Applications of GA
Genetic algorithm has been used for difficult problems (such as NP-hard
problems), for machine learning and also for evolving simple programs. They have
been also used for some art, for evolving pictures and music.
Advantage of GAs is in their parallelism. GA is travelling in a search space with
more individuals (and with genotype rather than phenotype) so they are less likely
to get stuck in a local extreme like some other methods.
They are also easy to implement. Once you have some GA, you just have to write
new chromosome (just one object) to solve another problem. With the same
encoding you just change the fitness function and it is all. On the other hand,
choosing encoding and fitness function can be difficult.
Disadvantage of GAs is in their computational time. They can be slower than some
other methods. But with today’s computers it is not so big problem.
To get an idea about problems solved by GA, here is a short list of some
applications:
    Nonlinear dynamical systems - predicting, data analysis
    Designing neural networks, both architecture and weights
    Robot trajectory
    Evolving LISP programs (genetic programming)
    Strategy planning
    Finding shape of protein molecules
    TSP and sequence scheduling
    Functions for creating images