Genetic Algorithm for Variable Selection
Jennifer Pittman ISDS Duke University
Genetic Algorithms Step by Step
Jennifer Pittman ISDS Duke University
Example: Protein Signature Selection in Mass Spectrometry
http://www.uni-mainz.de/~frosc
relati$e intensity
/f!g"po#.html
molecular weight
%enetic &lgorithm '(olland) * heuristic method !ased on + sur$i$al of the fittest , * useful when search space $ery large or too complex
for analytic treatment
* in each iteration 'generation) possi!le solutions or
indi$iduals represented as strings of num!ers
# .- # /0 #.1 - - --- - ------ -- - - - - - --- - ----
--
- - - --
-- - -
3 http://www.spectroscopynow.com
* all indi$iduals in population
e$aluated !y fitness function
* indi$iduals allowed to
reproduce 'selection)4 crosso$er4 mutate
2lowchart of %&
http://i!-poland.$irtuala$e.net/ee/genetic-/#geneticalgorithms.htm
'a simplified example)
5nitialization * proteins corresponding to ./6 mass spectrometry
$alues from # -#.// m/z
* assume optimal signature contains # peptides
represented !y their m/z $alues in !inary encoding
* population size ~M78/. where 8 is signature length
- - -
--- - ----
5nitial Population
M 7 -.
--
- - --- - ------ -- - - - - - --- - ----
- - - --
-- - -
8 7 .1
Searching * search space defined !y all possi!le encodings of
solutions
* selection4 crosso$er4 and mutation perform
+pseudo-random, wal9 through search space
* operations are non-deterministic yet directed
Phenotype :istri!ution
http://www.ifs.tuwien.ac.at/~aschatt/info/ga/genetic.html
E$aluation and Selection * e$aluate fitness of each solution in current
population 'e.g.4 a!ility to classify/discriminate) ;in$ol$es genotype-phenotype decoding<
* selection of indi$iduals for sur$i$al !ased on
pro!a!ilistic function of fitness
* on a$erage mean fitness of indi$iduals increases * may include elitist step to ensure sur$i$al of
fittest indi$idual
=oulette >heel Selection
3http://www.softchitech.com/ec"intro"html
?rosso$er * com!ine two indi$iduals to create new indi$iduals
for possi!le inclusion in next generation
* main operator for local search 'loo9ing close to
existing solutions)
* perform each crosso$er with pro!a!ility pc * crosso$er points selected at random
@ ./4A4 .0B
* indi$iduals not crossed carried o$er in population
5nitial Strings Single-Point
-- - - -- ---- - - -----
Cffspring
- - - -- ---
- ----- - -
Dwo-Point
-- - - -- ---- - - ----- - ----- -- - ----
Eniform
-- - - -- ---- - - ---- - - -------- - -- -
Mutation * each component of e$ery indi$idual is modified with
pro!a!ility pm
* main operator for glo!al search 'loo9ing at new
areas of the search space)
* pm usually small @
-4A4 . -B
rule of thum! 7 -/no. of !its in chromosome
* indi$iduals not mutated carried o$er in population
3http://www.softchitech.com/ec"intro"html
phenotype
# .- # /0 #.1 # -F # /G #-6/ # #6 #-0/ #-. #-GF # 00 #- 6 --
genotype
- - --- - ------ -- - - --- ----- - - -
fitness
.6F ..# .1/ .G1
- - - --
# 1 .
# 1 1 ---
- - -
--- - ---- --- ----- - -- - -
- - - -- - - --
selection
one-point crosso$er 'p7 .6)
.# .0 --- - --- - ---- --- ----- - -- - --- - --- ----- - -- - - --- - ----
- - - -- - - --
- - - -- - - --
mutation 'p7 . /)
- - ----- ---- --- - -- - - - -- ---- ----- - - - -
- --- - ----
-- - --- - ----
- - - -- - - --
-- - - - - --
starting generation
# .- # /0 #.1 # -F # /G #-6/ # #6 #-0/ #-. #-GF # 00 #- 6 -- - --- - ------ -- - - --- ----- - - .6F ..# .1/ .G1
- - - --
next generation
- - - --- ---- ----- - - - # .- # 1G #-.. #-66 #-01 #.1 #-GF #-. #- 6 #.-# # 00 # 1. .0.FF .1. .G0
-- - --- - ----
-- - - - - --
genotype
phenotype
fitness
%& E$olution
&ccuracy in Percent
-.
%enerations
http://www.sdsc.edu/s9idl/proHects/!io-SI5:8/
genetic algorithm learning
2itness criteria
-F
-6
-/
-1
%enerations
-/
http://www.demon.co.u9/apl#0//aplG6/s9om.htm
) de l acs' eu l a$ ss enti 2
iteration
* (olland4 J.
References
'-GG.)4 &daptation in natural and artificial systems 4 .nd Ed. ?am!ridge: M5D Press.
* :a$is4 8. 'Ed.) '-GG-)4 (and!oo9 of genetic algorithms.
Kew Lor9: Man Kostrand =einhold.
* %old!erg4 :. '-G0G)4 %enetic algorithms in search4
new philosophy of machine intelligence. Piscataway: 5EEE Press.
optimization and machine learning. &ddison->esley.
* 2ogel4 :. '-GG/)4 E$olutionary computation: Dowards a * NOc94 D.4 (ammel4 E.4 and Schwefel4 (. '-GGF)4
+E$olutionary computation: ?omments on the history and the current state,4 5EEE Drans. Cn E$ol. ?omp. -4 '-)
nline Resources
* http://www.spectroscopynow.com
/index.htm
* http://www.cs.!ris.ac.u9/~colin/e$ollect-/e$ollect * 5lli%&8 * %&li!
'http://www-illigal.ge.uiuc.edu/index.php#)
'http://lancet.mit.edu/ga/)
or p m i t necr eP
iteration
Schema and %&s * a schema is template representing set of !it strings
-PPP@--4 -- -4 - --4 -----4 A B
* e$ery schema s has an estimated a$erage fitness f's):
EtQ- 9 ;f's)/f'pop)< Et
* schema s recei$es exponentially increasing or decreasing
num!ers depending upon ratio f's)/f'pop)
* a!o$e a$erage schemas tend to spread through
population while !elow a$erage schema disappear 'simultaneously for all schema R +implicit parallelism,)
!A"DI#$ %
3www.protagen.de/pics/main/maldi..html