logging in or signing up langdon jet 25 nov 2004h Maria Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 84 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 11, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Repeated Sequences in Genetic Programming: Repeated Sequences in Genetic Programming W. B. Langdon Computer Science Introduction: Introduction Langdon + Banzhaf in Memorial University, Canada Emergence: Repeated Sequences Repeated Sequences in Biology Linear and Tree Genetic Programming Test problems Repeated sequences, fragments and subtrees Movies So what? Where does this lead next? Other emergent phenomena? ConclusionsEmergence: Emergence Emergence of effects that have not been explicitly programmed into the system. Simple rules lead to complex behaviour. Intelligence emerging from many trivial interactions. Particle Swarm Optimisation (PSO) Flocking Boids Swarm intelligence Genetic Programming Bloat Repeated Sequences Repeats in DNA: Repeats in DNA Many different types of repeated DNA sequence. Classified by repeat sequence length, number of repeats, location in DNA molecule etc. etc. Some may have biological meaning, e.g. as a clock counting cell divisions and enforcing limit, cell life limited, so cancer prevented. Repeated sequences in both expressed (protein coding) and non-expressed DNA. DNA whose sequence is not maintained by selection will develop periodicities as a result of random crossover [G.P. Smith, 1976].Demonstration problems: Demonstration problems Want to run GP for many generations. Hard problems, not immediately solved. Want range of different problems Time series modeling. One variable, short integers (byte) arithmetic Bioinformatics. Binary classification, floating point, 20 inputs.Mackey-Glass Chaotic Time Series: Mackey-Glass Chaotic Time Series Hard (impossible) since chaotic time series. IEEE benchmark, 1201 data points. Fast signal processing (integer arithmetic) 7 time lags: 1, 2, 4, …, 128 steps ago.Mackey-Glass: Mackey-GlassPredicting Protein Location: Predicting Protein Location Given only number of each amino acid (i.e. cheap info, Swissprot) in a protein, predict what it is. Very hard. Easier: predict where the protein will be found Simplified (A. Reinhardt and T. Hubbard, 1998) which covers animals and microbes, to just animals and two classes: In the cell nucleus or not.Animal Nuclear Proteins: Animal Nuclear Proteins Non-linear 2D projection from 20 Dimensional SpaceAnimal Nuclear Proteins: Animal Nuclear Proteins Non-linear 2D projection from 20 Dimensional SpaceGenetic Programming Approaches: Genetic Programming Approaches Linear GPengine (Nordin) crossover with mutation Headless chicken mutation (HCX) only Linear Machine Code Discipulus Tree GPLinear Genetic Programming: Linear Genetic Programming Chromosome is program. A linear sequence instructions Executed from start to end (no loops) GPengine - interpreted. Discipulus Intel 486 instructionsLinear GP Chromosome: Linear GP Chromosome GPengine instruction format 90% Crossover 40% Mutation. Pop 500. Two point (4 crossover chosen independently) Homologous (parent crossover points aligned)Performance (all approaches solve problems): Performance (all approaches solve problems)Evolution of Mackey-Glass error: Evolution of Mackey-Glass errorEvolution of M-G program length: Evolution of M-G program lengthLength of Repeated Sequences: Length of Repeated SequencesLongest Repeats M-G and Protein: Longest Repeats M-G and ProteinSlide19: Red arrow indicates length of program. Single repeated instructions are not shown. Repeated pairs of instructions are shown in red. Repeated sequence of 3 instructions in blue. Four or more are plotted with purple lines. Length and Fitness, RMS error, as numbers. Evolution of Location of Repeated Instructions: Evolution of Location of Repeated Instructions First two point crossover Mackey-Glass GPengine run int6.0.all.rep2_movie.gifSlide21: Dot at i,j means instruction at location i is identical to that at location j. 1-10 repeated instructions are shown with red. 11 or more repeated sequence shown in blue. Length and Fitness, RMS error, given numerically. Same Mackey-Glass 2point crossover runAnimation: Animation 250 generations Mackey-Glass GPengine int6.0.250.movie.gifEffective Code: Effective Code Majority of instructions have no effect on the output of the programs. No obvious link between repeat and effectivenessIntrons and Repeats evolvedin one Mackey-Glass program: Introns and Repeats evolved in one Mackey-Glass programInformation Content: Information Content Lempel-Ziv compression shows bloated programs’ contain less information than random program of same length.Evolution of Information Content: Evolution of Information ContentRepeats in largest Protein Prediction program: Repeats in largest Protein Prediction program Red 133 Blue 101-132 Black 33-100 Grey 11-32Important Nodes: Important Nodes Black changes >10 training casesDiscussion: Discussion In trees, can get diffuse introns whereby whole program depends only on fraction of tree. Not classic introns, since most functions do depend on both arguments. Crossover evolves trees similar fractal shape properties as random trees BUT Repeats not random. Many subtrees have high fitness and pass information towards root, BUT Much of program can be discarded with little impact on fitness Genetic programming on simple problems assembles complete solutions by gradually, randomly, reusing existing partial solutions to get small improvements, rendering existing parts less important. Conclusions: Conclusions On different problems and different GPs (2 linear and tree) where length is not constrained, repeated sequences/subtrees/fragments emerge from crossover Repeats cover large fraction of fit programs. This is an example of emergence. Are there examples in your EA of effects (which were not pre-programmed) which spontaneously evolved?More information: More information More information on GP http://www.cs.ucl.ac.uk/staff/W.Langdon/ Foundations of GP, Springer, 2002 GP and Data Structures, Kluwer, 1998 http://liinwww.ira.uka.de/bibliography/Ai/genetic.programming.html http://www.cs.ucl.ac.uk/staff/W.Langdon/lisp2dot.html References: Repeated Sequences in Linear GP Genomes, W.B. Langdon and W. Banzhaf, (GECCO'2004 late breaking paper PDF gzipped postscript). Movie. Poster Smith, G.P. (1976) "Evolution of Repeated DNA Sequences by Unequal Crossover." Science, 191(4227), 528-535. [PDF]). GPengine Mackey-Glass: GPengine Mackey-GlassDiscipulus Protein Prediction: Discipulus Protein PredictionTree Mackey-Glass (Protein Localisation): Tree Mackey-Glass (Protein Localisation) You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
langdon jet 25 nov 2004h Maria Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 84 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 11, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Repeated Sequences in Genetic Programming: Repeated Sequences in Genetic Programming W. B. Langdon Computer Science Introduction: Introduction Langdon + Banzhaf in Memorial University, Canada Emergence: Repeated Sequences Repeated Sequences in Biology Linear and Tree Genetic Programming Test problems Repeated sequences, fragments and subtrees Movies So what? Where does this lead next? Other emergent phenomena? ConclusionsEmergence: Emergence Emergence of effects that have not been explicitly programmed into the system. Simple rules lead to complex behaviour. Intelligence emerging from many trivial interactions. Particle Swarm Optimisation (PSO) Flocking Boids Swarm intelligence Genetic Programming Bloat Repeated Sequences Repeats in DNA: Repeats in DNA Many different types of repeated DNA sequence. Classified by repeat sequence length, number of repeats, location in DNA molecule etc. etc. Some may have biological meaning, e.g. as a clock counting cell divisions and enforcing limit, cell life limited, so cancer prevented. Repeated sequences in both expressed (protein coding) and non-expressed DNA. DNA whose sequence is not maintained by selection will develop periodicities as a result of random crossover [G.P. Smith, 1976].Demonstration problems: Demonstration problems Want to run GP for many generations. Hard problems, not immediately solved. Want range of different problems Time series modeling. One variable, short integers (byte) arithmetic Bioinformatics. Binary classification, floating point, 20 inputs.Mackey-Glass Chaotic Time Series: Mackey-Glass Chaotic Time Series Hard (impossible) since chaotic time series. IEEE benchmark, 1201 data points. Fast signal processing (integer arithmetic) 7 time lags: 1, 2, 4, …, 128 steps ago.Mackey-Glass: Mackey-GlassPredicting Protein Location: Predicting Protein Location Given only number of each amino acid (i.e. cheap info, Swissprot) in a protein, predict what it is. Very hard. Easier: predict where the protein will be found Simplified (A. Reinhardt and T. Hubbard, 1998) which covers animals and microbes, to just animals and two classes: In the cell nucleus or not.Animal Nuclear Proteins: Animal Nuclear Proteins Non-linear 2D projection from 20 Dimensional SpaceAnimal Nuclear Proteins: Animal Nuclear Proteins Non-linear 2D projection from 20 Dimensional SpaceGenetic Programming Approaches: Genetic Programming Approaches Linear GPengine (Nordin) crossover with mutation Headless chicken mutation (HCX) only Linear Machine Code Discipulus Tree GPLinear Genetic Programming: Linear Genetic Programming Chromosome is program. A linear sequence instructions Executed from start to end (no loops) GPengine - interpreted. Discipulus Intel 486 instructionsLinear GP Chromosome: Linear GP Chromosome GPengine instruction format 90% Crossover 40% Mutation. Pop 500. Two point (4 crossover chosen independently) Homologous (parent crossover points aligned)Performance (all approaches solve problems): Performance (all approaches solve problems)Evolution of Mackey-Glass error: Evolution of Mackey-Glass errorEvolution of M-G program length: Evolution of M-G program lengthLength of Repeated Sequences: Length of Repeated SequencesLongest Repeats M-G and Protein: Longest Repeats M-G and ProteinSlide19: Red arrow indicates length of program. Single repeated instructions are not shown. Repeated pairs of instructions are shown in red. Repeated sequence of 3 instructions in blue. Four or more are plotted with purple lines. Length and Fitness, RMS error, as numbers. Evolution of Location of Repeated Instructions: Evolution of Location of Repeated Instructions First two point crossover Mackey-Glass GPengine run int6.0.all.rep2_movie.gifSlide21: Dot at i,j means instruction at location i is identical to that at location j. 1-10 repeated instructions are shown with red. 11 or more repeated sequence shown in blue. Length and Fitness, RMS error, given numerically. Same Mackey-Glass 2point crossover runAnimation: Animation 250 generations Mackey-Glass GPengine int6.0.250.movie.gifEffective Code: Effective Code Majority of instructions have no effect on the output of the programs. No obvious link between repeat and effectivenessIntrons and Repeats evolvedin one Mackey-Glass program: Introns and Repeats evolved in one Mackey-Glass programInformation Content: Information Content Lempel-Ziv compression shows bloated programs’ contain less information than random program of same length.Evolution of Information Content: Evolution of Information ContentRepeats in largest Protein Prediction program: Repeats in largest Protein Prediction program Red 133 Blue 101-132 Black 33-100 Grey 11-32Important Nodes: Important Nodes Black changes >10 training casesDiscussion: Discussion In trees, can get diffuse introns whereby whole program depends only on fraction of tree. Not classic introns, since most functions do depend on both arguments. Crossover evolves trees similar fractal shape properties as random trees BUT Repeats not random. Many subtrees have high fitness and pass information towards root, BUT Much of program can be discarded with little impact on fitness Genetic programming on simple problems assembles complete solutions by gradually, randomly, reusing existing partial solutions to get small improvements, rendering existing parts less important. Conclusions: Conclusions On different problems and different GPs (2 linear and tree) where length is not constrained, repeated sequences/subtrees/fragments emerge from crossover Repeats cover large fraction of fit programs. This is an example of emergence. Are there examples in your EA of effects (which were not pre-programmed) which spontaneously evolved?More information: More information More information on GP http://www.cs.ucl.ac.uk/staff/W.Langdon/ Foundations of GP, Springer, 2002 GP and Data Structures, Kluwer, 1998 http://liinwww.ira.uka.de/bibliography/Ai/genetic.programming.html http://www.cs.ucl.ac.uk/staff/W.Langdon/lisp2dot.html References: Repeated Sequences in Linear GP Genomes, W.B. Langdon and W. Banzhaf, (GECCO'2004 late breaking paper PDF gzipped postscript). Movie. Poster Smith, G.P. (1976) "Evolution of Repeated DNA Sequences by Unequal Crossover." Science, 191(4227), 528-535. [PDF]). GPengine Mackey-Glass: GPengine Mackey-GlassDiscipulus Protein Prediction: Discipulus Protein PredictionTree Mackey-Glass (Protein Localisation): Tree Mackey-Glass (Protein Localisation)