Natural Computing
Lecture 9
Michael Herrmann
[email protected]
phone: 0131 6 517177
Informatics Forum 1.42
18/10/2011
Genetic Programming:
Examples and Theory:
see: http://www.genetic-programming.org, http://www.geneticprogramming.us
Example 1: Learning to Plan using GP
Aim:
To nd a program to transform any initial state into UNIVERSAL
NAT09 18/10/2011 J. M. Herrmann
Genetic Programming: Learning to Plan
Terminals:
CS returns the current stack's top block
TB returns the highest correct block in the stack (or NIL)
NN next needed block, i.e. the one above TB in the goal
Functions:
MS(x) move block x from table to the current stack. Return
T if does something, else NIL.
MT(x) move x to the table
DU(exp1, exp2) do exp1 until exp2 becomes TRUE
NOT(exp1) logical not (or exp1 is not executable)
EQ(exp1, exp2) test for equality
NAT09 18/10/2011 J. M. Herrmann
Learning to Plan: Results
Generation 0: (EQ (MT CS) NN) 0 tness cases
Generation 5: (DU (MS NN) (NOT NN)) 10 tness cases
Generation 10: (EQ (DU (MT CS) (NOT CS)) (DU (MS NN) (NOT NN)))
166 tness cases
population size 500
Koza shows how to amend the tness function for ecient, small
programs: Combined tness measure rewards
Correctness (number of solved tess cases)
AND eciency (moving as few blocks as possible)
AND small number of tree nodes (parsimony: number of
symbols in the string)
NAT09 18/10/2011 J. M. Herrmann
Automatically Dened Functions
Ecient code : Loops, subroutines, functions, classes, or . . .
variables
Automatically dened iterations (ADIs), automatically dened
loops (ADLs) and automatically dened recursions (ADRs)
provide means to re-use code. (Koza)
Automatically dened stores
(ADSs) provide means to
re-use the result of
executing code.
Solution: function- dening
branches (i.e., ADFs) and
result-producing branches
(the RPB)
e.g. RPB: ADF(ADF(ADF(x))), where ADF: arg0 arg0
NAT09 18/10/2011 J. M. Herrmann
Example 2: The Santa Fe Trail
Objective: To evolve a program which eats all the food on a trail
without searching too much when there are gaps in the trail.
Sensor can see the next cell in the direction it is facing
Terminals: move, (turn) left, (turn) right
Functions: if-food-ahead, progn2, progn3 (unconditional
connectives: evaluate 2 or 3 arguments in the given order)
Program with high tness:
(if-food-ahead move
(progn3
left
(progn2 (if-food-ahead move right)
(progn2 right (progn2 left right)))
(progn2 (if-food-ahead move left) move)
)
)
Fitness: E.g. amount of food collected in 400 time steps
NAT09 18/10/2011 J. M. Herrmann
Genetic programming: A practical example
Photo: Selena von Eichendorf
NAT09 18/10/2011 J. M. Herrmann
Evolving Structures
Example: Design of electronic circuits by composing
Non-terminals: e.g. frequency multiplier, integrator, rectier,
resistors, wiring ...
Terminals: input and output, pulse waves, noise generator
Structure usually not tree-like: Meaningful substructures
(boxes or subtrees) for crossover and structural mutations
Fitness by desired input-output relation (e.g. by wide-band
frequency response)
NAT09 18/10/2011 J. M. Herrmann
Initialisation
The initial population might be lost quickly, but general
features may determine the solutions
Assume the functions and terminal are sucient
Structural properties of the expected solution (uniformity,
symmetry, depth, . . . )
Practical: Start at root and choose k = 0, ..., K with
probability p (k ), choose a non-terminal with k > 0 arguments
or a terminal for k = 0. If k > 0 repeat until no non-terminals
are left or if maximal depth is reached (then k = 0)
Lagrange initialisation: Crossover can be shown to produce
programs with a typical distribution (Lagrange distribution of
the second kind) which can be used also for initialization
Seeding: Start with many copies of good candidates
Riccardo Poli, William B Langdon, Nicholas F. McPhee (2008) A Field Guide to Genetic Programming.
NAT09 18/10/2011 J. M. Herrmann
Genetic Programming: General Points
Suciency of the representation: Appropriate choice of
non-terminals
Variables: Terminals (variables) implied by the problem
Is there a bug in the code? Closure: Typed algorithms,
grammar based encoding
Program structure: Terminals also for auxiliary variables or
pointers to (automatically dened) functions
There are no silver bullets: Expect multiple runs (each with a
population of solutions)
Local search: Terminals (numbers) can often be found by
hill-climbing
Can you trust your results? Fitness: From tness cases using
crossvalidation (e.g. for symbolic regression)
Tree-related operators: Shrink, hoist, grow (in addition to
standard mutation and crossover)
NAT09 18/10/2011 J. M. Herrmann
Genetic programming: Troubleshooting
Study your populations: Analyse means and variances of
tness, depth, size, code used, run time, ... and correlations
among these
Runs can be very long: Checkpoint results (e.g. mean tness)
Control bloat in order to obtain small ecient programs: Size
limitations prevent unreasonable growth of programs e.g. by
soft thresholds
Control parameters during run-time
Small changes can have big eects
Big changes can have no eect
Encourage diversity and save good candidates,
Embrace approximation: No program is error-free
NAT09 18/10/2011 J. M. Herrmann
GP: Application Areas
Problem areas involving many variables that are interrelated in
a non-linear or unknown way (predicting electricity demand)
A good approximate solution is satisfactory
design, control (e.g. in simulations), classication and pattern
recognition, data mining, system identication and forecasting
Discovery of the size and shape of the solution is a major part
of the problem
Areas where humans nd it dicult to write programs
parallel computers, cellular automata, multi-agent strategies,
distributed AI, FPGAs
"Black art" problems
synthesis of topology and sizing of analog circuits, synthesis of
topology and tuning of controllers, quantum computing circuits
Areas where you simply have no idea how to program a
solution, but where the objective (tness measure) is clear
(e.g. generation of nancial trading rules)
Areas where large computerised databases are accumulating
and computerized techniques are needed to analyse the data
NAT09 18/10/2011 J. M. Herrmann
Genetic programming: Theory
Schema theorem (sub-tree at a
particular position)
worst case (Koza 1992)
exact for one-point crossover
(Poli 2000)
for many types of crossover
(Poli et al., 2003)
Two functions are equivalent if they coin-
Markov chain theory cide after a permutation of inputs. Pro-
gram trees are composed of NAND gates.
Distribution of tness in search space (s. gure)
as the length of programs increases, the proportion of
programs implementing a function approaches a limit
Halting probability
for programs of length L is of order 1/L1/2, while the expected
number of instructions executed by halting programs is of
order L1/2.
NAT09 18/10/2011 J. M. Herrmann
Genetic programming: Bloat
Bloat is an increase in
program size that is not
accompanied by any
corresponding increase in
tness. Problem: The
optimal solution might still
From: Genetic Programming by Riccardo Poli
be a large program
Theories (none of these is universally accepted) focus on
replication accuracy theory
inactive code
nature of program search-spaces theory
crossover bias (1-step-mean constant, but Lagrange variance)
Size-evolution equation (similar to exact schema theorem)
Practical solutions: Size and depth limits, parsimony pressure
(tness reduced by size: f - c l(i))
NAT09 18/10/2011 J. M. Herrmann
Genetic programming: Bloat Control
Constant: constant target size of 150.
Sin: target size
sin((generation + 1)/50) 50 + 150.
Linear: target size (150 + generation).
Limited: no size control until the size
reached 250 then hard-limited.
Local: adaptive target size
c = Cov(l , f )/Var(l ): a certain
amount of drift, avoided runaway bloat.
Average program size over 500 generations for multiple runs of the
6-MUX problem [decode a 2-bit address and return the value from the
corresponding register] with various forms of parsimony pressure.
Riccardo Poli, William B Langdon, Nicholas F. McPhee (2008) A Field Guide to Genetic Programming.
NAT09 18/10/2011 J. M. Herrmann
Genetic programming: Theory
GP applied to one-then-zeros
problem: independently of tree
structure tness is maximal if
all nodes have a identical sym-
bol. Expected to bloat, but
doesn't. Why?
E. Crane and N. McPhee The Eects of Size and Depth Limits on Tree Based Genetic Programming
NAT09 18/10/2011 J. M. Herrmann
Genetic programming: Control parameters
Representation and tness function
Population size (thousands or millions of individuals)
Probabilities of applying genetic operators
reproduction (unmodied) 0.08
crossover: 0.9
mutation 0.01
architecture altering operations 0.01
Limits on the size, depth, run time of the programs
NAT09 18/10/2011 J. M. Herrmann
Exact Schema Theory
Following: Genetic Programming by Riccardo Poli (University of Essex)
Exact schema theoretic models of GP have become available
only recently (rst proof for a simplied case: Poli 2001)
For a given schema H the selection/crossover/mutation
process can be seen as a Bernoulli trial, because a newly
created individual either samples or does not sample H
Therefore, the number of individuals sampling H at the next
generation, m (H , t + 1) is a binomially stochastic variable
So, if we denote with (H , t ) the success probability of each
trial (i.e. the probability that a newly created individual
samples H ), an exact schema theorem is simply
E [m (H , t + 1)] = M (H , t ), where M is the population size
and E [] is the mathematical expectation.
NAT09 18/10/2011 J. M. Herrmann
Exact Schema Theory
Variable size tree structure does not permit the same denition
of a schema as in GA
A schema is a (sub-)tree with some don't-care nodes ()
A schema represents a primitive function (or a terminal)
E.g. H = ( x (+y )
represents the programs
{(+x (+x y ) , (+x (+y y )) , (x (+y x )) , . . . }
(prex notation, can be terminal or non-terminal)
NAT09 18/10/2011 J. M. Herrmann
Exact Schema Theory
Assume: Only reproduction and one-ospring crossover are
performed (no mutation)
(H , t ) the success probability can be calculated because the
two operators are mutually exclusive
(H , t ) = Pr [an individual in H is obtained via reproduction]
+ Pr [an ospring matching H is produced by crossover]
Reproduction is performed with probability pr and crossover
with probability pc (pr + pc = 1), so
(H , t ) = pr Pr [an individual in H is selected for cloning]
parents and crossover points are
+ pc Pr such that the ospring matches H
where Pr [an individual in H is selected for cloning] = p (H , t )
NAT09 18/10/2011 J. M. Herrmann
Exact Schema Theory
parents and crossover points are
Pr such that the ospring matches H
Pr Choosing crossover points
X X
=
i and j in shapes k and l
For all pairs of For all crossover points
parent shapes k , l i , j in shapes k , l
Selecting parents with shapes k and l such that if
Pr
crossed over at points i and j produce and ospring in H
NAT09 18/10/2011 J. M. Herrmann
Exact Schema Theory
Crossover excises a subtree rooted at the chosen crossover
point in a parent , and replaces it with a subtree excises from
the chosen crossover point in the other parent.
This means that the ospring will have the right shape and
primitives to match the schema of interest if and only if, after
the excision of the chosen subtree, the rst parent has shape
and primitives compatible with the schema, and the subtree to
be inserted has shape and primitives compatible with the
schema.
Assume that crossover points are selected with uniform
probability
Choosing crossover points 1 1
Pr i j k l =
and in shapes and Nodes in shape k Nodes in shape l
NAT09 18/10/2011 J. M. Herrmann
Exact Schema Theory
k l such that if
Selecting parents with shapes and
Pr crossed over at points i and j produce and ospring in H
k such that its upper
Selecting a root-donating parent with shape
Pr part w.r.t crossover point i matches the upper part of H w.r.t. j
a subtree-donating parent with shape l such that its lower
Pr Selecting
part w.r.t crossover point j matches the lower part of H w.r.t. i
These two selection probabilities can be calculated exactly, but this
requires a bit more work .... cf. R. Poli and N. F. McPhee (2003)
General schema theory for GP with subtree swapping crossover:
Parts I&II. Evolutionary Computation 11 (1&2).
NAT09 18/10/2011 J. M. Herrmann
Conclusions on GP
In order to be successful GP algorithms need well structured
problems and lots of computing power
GPs have proven very successful in many applications, see the
lists of success stories in Poli's talk, in Koza's tutorial and in
GA in the news (many of these were actually GPs)
GP provide an interesting view on the art of programming
Exact schema theoretic models of GP have started shedding
some light on fundamental questions regarding the how and
why GP works and have also started providing useful recipes
for practitioners.
Next time: Ant Colony Optimisation (ACO)
NAT09 18/10/2011 J. M. Herrmann