0 ratings0% found this document useful (0 votes) 13 views13 pagesGenetic Algorithm
Machine learning techniques Btech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
9.1 MOTIVATION
Genetic algorithms (GAs) provide a learning method motivated by an analogy to
biological evolution. Rather than search from general-to-specific hypotheses, or
from simple-to-complex, GAs generate successor hypotheses by repeatedly mutat-
ing and recombining parts of the best currently known hypotheses. At each step,
a collection of hypotheses called the current population is updated by replacing
some fraction of the population by offspring of the most fit current hypotheses.
The process forms a generate-and-test beam-search of hypotheses, in which vari-
ants of the best current hypotheses are most likely to be considered next. The
popularity of GAs is motivated by a number of factors including:
© Evolution is known to be a successful, robust method for adaptation within
biological systems.
© GAs can search spaces of hypotheses containing complex interacting parts,
where the impact of each part on overall hypothesis fitness may be difficult
to model.
Genetic algorithms are easily parallelized and can take advantage of the
decreasing costs of powerful computer hardware.GA(Fitness, Fitness threshold, p,,™)
Fitmess: A function that assigns an evaluation score, given a hypothesis.
Fitness.shreshold: A threshold specifying the termination criterion.
p: The number of hypotheses to be included in the population.
1: The fraction of the population to be replaced by Crossover at each step.
im: The mutation rate.
+ Initialize population: P — Generate p hypotheses at random
# Evaluate: For each h in P, compute Fitness(h)
+ While [max Fimness(h)] < Fitness threshold do
Create a new generation, Ps:
1, Seect: Probabilistically select (1 —r)p members of P to add to Ps. The probability Pr(h,) of
selecting hypothesis fy from P is given by
Fitness(h)
iy = Pltnere uy
Pe) = FF Fiimessthp
2. Crossover: Probabilistically sclect 5 pairs of hypotheses from P, according to Pr(ti) given
above. For each pair, (hi, 2), produce two offspring by applying the Crossover operator.
‘Add all offspring to Py.
3. Mutate: Choose m percent of the members of P, with uniform probability. For each, invert
‘one randomly selected bit in its representation.
4, Update: P = Py.
5S. Evaluate: for each h in P, compute Fitness(h)
‘* Retum the hypothesis from P that has the highest fitness.
TABLE 9.1
A prototypical genetic algorithm. A population containing p hypotheses is maintained. On each itera-
tion, the successor population Ps is formed by probabilistically selecting current hypotheses according
to their fitness and by adding new hypotheses. New hypotheses are created by applying a crossover
‘operator to pairs of most fit hypotheses and by creating single point mutations in the resulting gener-
ation of hypotheses. This process is iterated until sufficiently fit hypotheses are discovered. Typical
‘crossover and mutation operators are defined in a subsequent table.
9.2.2 Genetic Operators
The generation of successors in a GA is determined by a set of operators that
recombine and mutate selected members of the current population. Typical GA
operators for manipulating bit string hypotheses are illustrated in Table 9.1. These
operators correspond to idealized versions of the genetic operations found in bi-
ological evolution. The two most common operators are crossover and mutation.The crossover operator produces two new offspring from two parent strings,
by copying selected bits from each parent. The bit at position i in each offspring
is copied from the bit at position i in one of the two parents. The choice of which
parent contributes the bit for position i is determined by an additional string called
the crossover mask. To illustrate, consider the single-point crossover operator at
the top of Table 9.2. Consider the topmost of the two offspring in this case. This
offspring takes its first five bits from the first parent and its remaining six bits
from the second parent, because the crossover mask 11111000000 specifies these
choices for each of the bit positions. The second offspring uses the same crossover
‘mask, but switches the roles of the two parents. Therefore, it contains the bits that
were not used by the first offspring. In single-point crossover, the crossover mask
is always constructed so that it begins with a string containing n contiguous 1s,
followed by the necessary number of Os to complete the string. This results in
offspring in which the first n bits are contributed by one parent and the remaining
bits by the second parent, Each time the single-point crossover operator is applied,
Initial strings Crossover Musk Offspring
Single-point crossover:
4111091001000 11101010101
S 11111900000 <
(00001010101, (00001001000
Two-point crossover:
11101001000 11001011000
aa Su f
00001010101 (00101000101
Uniform crossover:
111101001000 19001000100
. 10011010011 a
‘oggo1010101 01101011001
Point mutation: 11101001000 = 11101011000,
TABLE 92
Common operators for genetic algorithms. These operators form offspring of hypotheses represented
by bit strings, The crossover operators create two descendants from two parents, using the crossover
‘mask to determine which parent contributes which bits. Mutation creates a single descendant from a
single parent by changing the value of a randomly chosen bit.the crossover point n is chosen at random, and the crossover mask is then created
and applied.
In two-point crossover, offspring are created by substituting intermediate
segments of one parent into the middle of the second parent string. Put another
way, the crossover mask is a string beginning with no zeros, followed by a con-
tiguous string of n, ones, followed by the necessary number of zeros to complete
the string. Each time the two-point crossover operator is applied, a mask is gen-
erated by randomly choosing the integers np and n,. For instance, in the example
shown in Table 9.2 the offspring are created using a mask for which no = 2 and
n, = 5. Again, the two offspring are created by switching the roles played by the
two parents.
Uniform crossover combines bits sampled uniformly from the two parents,
as illustrated in Table 9.2. In this case the crossover mask is generated as a random
bit string with each bit chosen at random and independent of the others.
In addition to recombination operators that produce offspring by combining
parts of two parents, a second type of operator produces offspring from a single
parent. In particular, the mutation operator produces small random changes to the
bit string by choosing a single bit at random, then changing its value. Mutation is
often performed after crossover has been applied as in our prototypical algorithm
from Table 9.1.
Some GA systems employ additional operators, especially operators that are
specialized to the particular hypothesis representation used by the system. For
example, Grefenstette et al. (1991) describe a system that learns sets of rules
for robot control. It uses mutation and crossover, together with an operator for
specializing rules. Janikow (1993) describes a system that learns sets of rules
using operators that generalize and specialize rules in a variety of directed ways
(e.g., by explicitly replacing the condition on an attribute by “don’t care”).9.2.3. Fitness Function and Selection
The fitness function defines the criterion for ranking potential hypotheses and for
probabilistically selecting them for inclusion in the next generation population. If
the task is to leam classification rules, then the fitness function typically has a
component that scores the classification accuracy of the rule over a set of provided
training examples. Often other criteria may be included as well, such as the com-
plexity or generality of the rule. More generally, when the bit-string hypothesis is
interpreted as a complex procedure (e.g. when the bit string represents a collec-
tion of if-then rules that will be chained together to control a robotic device), the
fitness function may measure the overall performance of the resulting procedure
rather than performance of individual rules.
In our prototypical GA shown in Table 9.1, the probability that a hypothesis
will be selected is given by the ratio of its fitness to the fitness of other members
of the current population as seen in Equation (9.1). This method is sometimes
called fitness proportionate selection, ot roulette wheel selection, Other methods
for using fitness to select hypotheses have also been proposed. For example, in
tournament selection, two hypotheses are first chosen at random from the current
population. With some predefined probability p the more fit of these two is then
selected, and with probability (1 — p) the less fit hypothesis is selected. Tourna-
ment selection often yields a more diverse population than fitness proportionate
selection (Goldberg and Deb 1991). In another method called rank selection, the
hypotheses in the current population are first sorted by fitness. The probability
that a hypothesis will be selected is then proportional to its rank in this sorted
list, rather than its fitness.9.5 GENETIC PROGRAMMING
Genetic programming (GP) is a form of evolutionary computation in which the in-
dividuals in the evolving population are computer programs rather than bit strings.
Koza (1992) describes the basic genetic programming approach and presents a
broad range of simple programs that can be successfully learned by GP.
9.5.1 Representing Programs
Programs manipulated by a GP are typically represented by trees correspond-
ing to the parse tree of the program. Each function call is represented by a
node in the tree, and the arguments to the function are given by its descendant
nodes. For example, Figure 9.1 illustrates this tree representation for the function
sin(x) + /x? + y. To apply genetic programming to a particular domain, the user
must define the primitive functions to be considered (¢.g., sin, cos, "+, —, ex-
ponentials), as well as the terminals (e.g., x, y, constants such as 2). The genetic
programming algorithm then uses an evolutionary search to explore the vast space
of programs that can be described using these primitives.
As in a genetic algorithm, the prototypical genetic programming algorithm
maintains a population of individuals (in this case, program trees). On each it-
eration, it produces a new generation of individuals using selection, crossover,
and mutation. The fitness of a given individual program in the population is typ-
ically determined by executing the progtam on a set of training data. Crossover
operations are performed by replacing a randomly chosen subtree of one parent
FIGURE 9.1
OO Program tree representation in genetic programming.
Arbitrary programs are represented by their parse trees.© ©
FIGURE 92
Crossover operation applied to two parent program trees (top). Crossover points (nodes shown in
bold at top) are chosen at random. The subtrees rooted at these crossover points are then exchanged
to create children trees (bottom)
program by a subtree from the other parent program. Figure 9.2 illustrates a typical
crossover operation.
Koza (1992) describes a set of experiments applying a GP to a number of
applications. In his experiments, 10% of the current population, selected prob-
abilistically according to fitness, is retained unchanged in the next generation.
The remainder of the new generation is created by applying
of programs from the current generation, again selected probabi
ing to their fitness. The mutation operator was not used in this particular set of
experiments.9.5.2 Ilustrative Example
One illustrative example presented by Koza (1992) involves learning an algorithm
for stacking the blocks shown in Figure 9.3. The task is to develop a general algo-
rithm for stacking the blocks into a single stack that spells the word “universal,”Ln]
©
s
r
ful {7 [i]
SSG... y/° uUe ,10Yv 5
FIGURE 93.
A block-stacking problem. The task for GP is to discover a program that can transform an arbitrary
initial configuration of blocks imto a stack that spells the word “universal.” A set of 166 such Infual
configurations was provided to evaluate fitness of candidate programs (after Koza 1992),
independent of the initial configuration of blocks in the world. The actions avail-
able for manipulating blocks allow moving only a single block at a time. In
particular, the top block on the stack can be moved to the table surface, or a
block on the table surface can be moved to the top of the stack.
As in most GP applications, the choice of problem representation has a
significant impact on the ease of solving the problem. In Koza’s formulation, the
primitive functions used to compose programs for this task include the following
three terminal arguments:
© CS (current stack), which refers to the name of the top block on the stack,
or F if there is no current stack.
TB (top correct block), which refers to the name of the topmost block on
the stack, such that it and those blocks beneath it are in the correct order.
‘* NN (next necessary), which refers to the name of the next block needed
above TB in the stack, in order to spell the word “universal,” or F if no
more blocks are needed.
As can be seen, this particular choice of terminal arguments provides a natu-
ral representation for describing programs for manipulating blocks for this task.
Imagine, in contrast, the relative difficulty of the task if we were to instead define
the terminal arguments to be the x and y coordinates of each block.
In addition to these terminal arguments, the program language in this appli-
cation included the following primitive functions:
(MS x) (move to stack), if block x is on the table, this operator moves x to
the top of the stack and returns the value T. Otherwise, it does nothing and
returns the value F.
(MT x) (move to table), if block x is somewhere in the stack, this moves the
block at the top of the stack to the table and returns the value 7. Otherwise,
it returns the value F.
# EQ x ») (equal), which retums T if x equals y, and returns F otherwise.
(NOT x), which returns T if x = F, and returns F if x = 7.© (DU x y) do until), which executes the expression x repeatedly until ex-
pression y returns the value 7.
To allow the system to evaluate the fitness of any given program, Koza
provided a set of 166 training example problems representing a broad variety of
initial block configurations, including problems of differing degrees of difficulty.
The fitness of any given program was taken to be the number of these examples
solved by the algorithm. The population was initialized to a set of 300 random
programs. After 10 generations, the system discovered the following program,
which solves all 166 problems.
(EQ (DU (MT CS)(NOT CS)) (DU (MS NN)(NOT NN)) )
Notice this program contains a sequence of two DU, or “Do Until” state-
ments. The first repeatedly moves the current top of the stack onto the table, until
the stack becomes empty. The second “Do Until” statement then repeatedly moves
the next necessary block from the table onto the stack. ‘The role played by the
top level EQ expression here is to provide a syntactically legal way to sequence
these two “Do Until” loops.
Somewhat surprisingly, after only a few generations, this GP was able to
discover a program that solves all 166 training problems. Of course the ability
of the system to accomplish this depends strongly on the primitive arguments
and functions provided, and on the set of training example cases used to evaluate
fitness.9.6 MODELS OF EVOLUTION AND LEARNING
In many natural systems, individual organisms learn to adapt significantly during
their lifetime. At the same time, biological and social processes allow their species
to adapt over a time frame of many generations. One interesting question regarding
evolutionary systems is “What is the relationship between learning during the
lifetime of a single individual, and the longer time frame species-level learning
afforded by evolution?”
9.6.1 Lamarckian Evolution
Lamarck was a scientist who, in the late nineteenth century, proposed that evo-
lution over many generations was directly influenced by the experiences of indi-
vidual organisms during their lifetime. In particular, he proposed that experiences
of a single organism directly affected the genetic makeup of their offspring: If
an individual learned during its lifetime to avoid some toxic food, it could pass
this trait on genetically to its offspring, which therefore would not need to learn
the trait. This is an attractive conjecture, because it would presumably allow for
more efficient evolutionary progress than a generate-and-test process (like that of
GAs and GPs) that ignores the experience gained during an individual's lifetime.
Despite the attractiveness of this theory, current scientific evidence overwhelm-
ingly contradicts Lamarck’s model. The currently accepted view is that the genetic
makeup of an individual is, in fact, unaffected by the lifetime experience of one’s
biological parents. Despite this apparent biological fact, recent computer studies
have shown that Lamarckian processes can sometimes improve the effectiveness
of computerized genetic algorithms (see Grefenstette 1991; Ackley and Littman
1994; and Hart and Belew 1995).9.6.2 Baldwin Effect
Although Lamarckian evolution is not an accepted model of biological evolution,
other mechanisms have been suggested by which individual learning can alter
the course of evolution. One such mechanism is called the Baldwin effect, after
J. M. Baldwin (1896), who first suggested the idea. The Baldwin effect is based
on the following observations:
© Ifa species is evolving in a changing environment, there will be evolution-
ary pressure to favor individuals with the capability to learn during their
lifetime. For example, if a new predator appears in the environment, then
individuals capable of learning to avoid the predator will be more successful
than individuals who cannot lear. In effect, the ability to learn allows an
individual to perform a small local search during its lifetime to maximize its
fitness. In contrast, nonlearning individuals whose fitness is fully determined
by their genetic makeup will operate at a relative disadvantage.
« Those individuals who are able to learn many traits will rely less strongly
on their genetic code to “hard-wire” traits. As a result, these individuals
‘can support a more diverse gene pool, relying on individual learning to
overcome the “missing” or “not quite optimized” traits in the genetic code.
This more diverse gene pool can, in turn, support more rapid evolutionary
adaptation, Thus, the ability of individuals to lear can have an indirect
accelerating effect on the rate of evolutionary adaptation for the entire pop-
ulation.
To illustrate, imagine some new change in the environment of some species,
such as a new predator. Such a change will selectively favor individuals capa-
ble of learning to avoid the predator. As the proportion of such self-improving
individuals in the population grows, the population will be able to support a
more diverse gene pool, allowing evolutionary processes (even non-Lamarckian
generate-and-test processes) to adapt more rapidly. This accelerated adaptation
may in tum enable standard evolutionary processes to more quickly evolve a
genetic (nonlearned) trait to avoid the predator (e.g., an instinctive fear of this
animal). Thus, the Baldwin effect provides an indirect mechanism for individ-
ual learning to positively impact the rate of evolutionary progress. By increas-
ing survivability and genetic diversity of the species, individual learning sup-
ports more rapid evolutionary progress, thereby increasing the chance that the
species will evolve genetic, nonleared traits that better fit the new environ-
‘ment.
There have been several attempts to develop computational models to study
the Baldwin effect. For example, Hinton and Nowlan (1987) experimented with
evolving a population of simple neural networks, in which some network weights
were fixed during the individual network “lifetime,” while others were trainable.
‘The genetic makeup of the individual determined which weights were train-
able and which were fixed. In their experiments, when no individual learning© Genetic programming is a variant of genetic algorithms in which the hy-
potheses being manipulated are computer programs rather than bit strings.
Operations such as crossover and mutation are generalized to apply to pro-
grams rather than bit strings. Genetic programming has been demonstrated
to learn programs for tasks such as simulated robot control (Koza 1992) and
recognizing objects in visual scenes (Teller and Veloso 1994).