langdon jet 25 nov 2004h

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Repeated Sequences in Genetic Programming: 

Repeated Sequences in Genetic Programming W. B. Langdon Computer Science

Introduction: 

Introduction Langdon + Banzhaf in Memorial University, Canada Emergence: Repeated Sequences Repeated Sequences in Biology Linear and Tree Genetic Programming Test problems Repeated sequences, fragments and subtrees Movies So what? Where does this lead next? Other emergent phenomena? Conclusions

Emergence: 

Emergence Emergence of effects that have not been explicitly programmed into the system. Simple rules lead to complex behaviour. Intelligence emerging from many trivial interactions. Particle Swarm Optimisation (PSO) Flocking Boids Swarm intelligence Genetic Programming Bloat Repeated Sequences

Repeats in DNA: 

Repeats in DNA Many different types of repeated DNA sequence. Classified by repeat sequence length, number of repeats, location in DNA molecule etc. etc. Some may have biological meaning, e.g. as a clock counting cell divisions and enforcing limit, cell life limited, so cancer prevented. Repeated sequences in both expressed (protein coding) and non-expressed DNA. DNA whose sequence is not maintained by selection will develop periodicities as a result of random crossover [G.P. Smith, 1976].

Demonstration problems: 

Demonstration problems Want to run GP for many generations. Hard problems, not immediately solved. Want range of different problems Time series modeling. One variable, short integers (byte) arithmetic Bioinformatics. Binary classification, floating point, 20 inputs.

Mackey-Glass Chaotic Time Series: 

Mackey-Glass Chaotic Time Series Hard (impossible) since chaotic time series. IEEE benchmark, 1201 data points. Fast signal processing (integer arithmetic) 7 time lags: 1, 2, 4, …, 128 steps ago.

Mackey-Glass: 

Mackey-Glass

Predicting Protein Location: 

Predicting Protein Location Given only number of each amino acid (i.e. cheap info, Swissprot) in a protein, predict what it is. Very hard. Easier: predict where the protein will be found Simplified (A. Reinhardt and T. Hubbard, 1998) which covers animals and microbes, to just animals and two classes: In the cell nucleus or not.

Animal Nuclear Proteins: 

Animal Nuclear Proteins Non-linear 2D projection from 20 Dimensional Space

Animal Nuclear Proteins: 

Animal Nuclear Proteins Non-linear 2D projection from 20 Dimensional Space

Genetic Programming Approaches: 

Genetic Programming Approaches Linear GPengine (Nordin) crossover with mutation Headless chicken mutation (HCX) only Linear Machine Code Discipulus Tree GP

Linear Genetic Programming: 

Linear Genetic Programming Chromosome is program. A linear sequence instructions Executed from start to end (no loops) GPengine - interpreted. Discipulus Intel 486 instructions

Linear GP Chromosome: 

Linear GP Chromosome GPengine instruction format 90% Crossover 40% Mutation. Pop 500. Two point (4 crossover chosen independently) Homologous (parent crossover points aligned)

Performance (all approaches solve problems): 

Performance (all approaches solve problems)

Evolution of Mackey-Glass error: 

Evolution of Mackey-Glass error

Evolution of M-G program length: 

Evolution of M-G program length

Length of Repeated Sequences: 

Length of Repeated Sequences

Longest Repeats M-G and Protein: 

Longest Repeats M-G and Protein

Slide19: 

Red arrow indicates length of program. Single repeated instructions are not shown. Repeated pairs of instructions are shown in red. Repeated sequence of 3 instructions in blue. Four or more are plotted with purple lines. Length and Fitness, RMS error, as numbers.

Evolution of Location of Repeated Instructions : 

Evolution of Location of Repeated Instructions First two point crossover Mackey-Glass GPengine run int6.0.all.rep2_movie.gif

Slide21: 

Dot at i,j means instruction at location i is identical to that at location j. 1-10 repeated instructions are shown with red. 11 or more repeated sequence shown in blue. Length and Fitness, RMS error, given numerically. Same Mackey-Glass 2point crossover run

Animation: 

Animation 250 generations Mackey-Glass GPengine int6.0.250.movie.gif

Effective Code: 

Effective Code Majority of instructions have no effect on the output of the programs. No obvious link between repeat and effectiveness

Introns and Repeats evolved in one Mackey-Glass program: 

Introns and Repeats evolved in one Mackey-Glass program

Information Content: 

Information Content Lempel-Ziv compression shows bloated programs’ contain less information than random program of same length.

Evolution of Information Content: 

Evolution of Information Content

Repeats in largest Protein Prediction program: 

Repeats in largest Protein Prediction program Red 133 Blue 101-132 Black 33-100 Grey 11-32

Important Nodes: 

Important Nodes Black changes >10 training cases

Discussion: 

Discussion In trees, can get diffuse introns whereby whole program depends only on fraction of tree. Not classic introns, since most functions do depend on both arguments. Crossover evolves trees similar fractal shape properties as random trees BUT Repeats not random. Many subtrees have high fitness and pass information towards root, BUT Much of program can be discarded with little impact on fitness Genetic programming on simple problems assembles complete solutions by gradually, randomly, reusing existing partial solutions to get small improvements, rendering existing parts less important.

Conclusions: 

Conclusions On different problems and different GPs (2 linear and tree) where length is not constrained, repeated sequences/subtrees/fragments emerge from crossover Repeats cover large fraction of fit programs. This is an example of emergence. Are there examples in your EA of effects (which were not pre-programmed) which spontaneously evolved?

More information: 

More information More information on GP http://www.cs.ucl.ac.uk/staff/W.Langdon/ Foundations of GP, Springer, 2002 GP and Data Structures, Kluwer, 1998 http://liinwww.ira.uka.de/bibliography/Ai/genetic.programming.html http://www.cs.ucl.ac.uk/staff/W.Langdon/lisp2dot.html References: Repeated Sequences in Linear GP Genomes, W.B. Langdon and W. Banzhaf, (GECCO'2004 late breaking paper PDF gzipped postscript). Movie. Poster Smith, G.P. (1976) "Evolution of Repeated DNA Sequences by Unequal Crossover." Science, 191(4227), 528-535. [PDF]).

GPengine Mackey-Glass: 

GPengine Mackey-Glass

Discipulus Protein Prediction: 

Discipulus Protein Prediction

Tree Mackey-Glass (Protein Localisation): 

Tree Mackey-Glass (Protein Localisation)