Share PowerPoint. Anywhere!

BGU HIT

Uploaded from authorPOINT Lite
Download as Download Not Available PPT
Presentation Description

No description available

Views: 4
Like it  ( Likes) Dislike it  ( Dislikes)
Added: March 04, 2008 This presentation is Public
Presentation Category :Education
Tags Add Tags
Presentation StatisticsNew!
Views on authorSTREAM: 4
Presentation Transcript

Slide1 : Self-Reconfigurable Software in Bionformatics Dr. Natalio Krasnogor Natalio.Krasnogor@nottingham.ac.uk http://slater.chem.nott.ac.uk/~natk/Public The Proteins Structure Comparison Case


Presentation contents : Presentation contents Acknowledgements (~1 min) A case for Self-Reconfigurable Software (~ 5 mins) SRS in bioinformatics Initial concepts The Comparison of Protein Structures (~ 15 mins) Problem definition Previous approaches SRS for structure comparisons (~ 20 mins) The memetic embodiment Preliminary results (~ 5 mins) Conclusions (~5 mins) Questions (~9 mins)


Acknowledgements : Acknowledgements J.E. Smith – University of the West of england, UK J.D. Hirst – University of Nottingham, UK W.E. Hart, R.D.Carr, M. Collins – Sandia National Labs, USA B. Wallencz – Celera Genomics, USA G. Lancia – University of Udine, Italy


A Case for Self-Reconfigurable Software in Optimization : A Case for Self-Reconfigurable Software in Optimization The most successful metaheuristics for combinatorial optimisation depend on: the existence of specialized (human-made) operators (eg. Local search, constructive methods, genetic operators) the appropriate coordination of these operators detailed fine tuning of parameters of both the metaheuristic and the operators. A conservation of competence principle applies: the better one algorithm is solving one specific instance (class) the worst it is solving a different instance (class) [Wolpert et.al.] It cannot be expected that a black-box metaheuristic will suit all problem classes and instances. We need embryonic metaheuristics which mature, grow and adapt during production. We need to better integrate machine learning (e.g.. COLT, MDL, etc) with optimisation technology


SRS in Bioinformatics : SRS in Bioinformatics Bioinformatics is a (relatively) young discipline. Unlike some operational research type of problems, for which certain knowledge exists as to what (meta)heuristic is/isn’t good for it, such knowledge is non-existent for bioinformatics (classes of) instances/problems. Biological databases (e.g. PDB) are growing very fast so it is unreasonable to expect that black-box metaheuristics built based on existing data will be useful with data yet to come. Bioinformatics is an excellent arena where to try a completely new approach to optimisation which differs philosophically from what has been done in other optimisation domains.


Self-Reconfigurable Software : Self-Reconfigurable Software Is more (or rather should be  ) : General Easy to deploy and use Adaptable to unforeseen circumstances Robust Reliable Efficient Non-perennial


Slide7 : SRS requires:


Slide8 : Memetics: a society of SRS


The Comparison of Protein Structures : The Comparison of Protein Structures The comparison of protein structures is at the core of biomedical research: to find relations between structures and functions to evaluate ab-initio, threading or homology structure predictors. A variety of structure comparison methods have been proposed, eg. SCO, DALI, LGA, etc


Problems with existing methodologies : Problems with existing methodologies Implicitly accept that a suitable scoring function can be defined for which optimal values corresponds to the best structural match. Numerical instabilities (eg. In those that use (variations of) RMSD and differences of distance matrices) Improper ranking due to ambiguous definition of structural similarity They ignore alternative (different) solutions. Etc.


Desideratum for a similarity measure [Goldman et.al] : Desideratum for a similarity measure [Goldman et.al] It should not penalize too heavily insertions and deletions It should be reasonably robust It should be easy to compute (or at least rigorously approximate) It must be able to discover local and global alignments It must take into account the self-avoidance of the structure It must be empirically tested and accepted in the field


Contact Maps : Contact Maps A contact map is an undirected graph G=(V,E) V = The set of residues E = The set of pairs edges (i,j) where i,j are residues that are closer than  (distance threshold) to each other Usually 2Å <=  <= 9Å measured from either: Cα atoms of i,j Any two atoms of i,j Center of mass of the side chains of i,j


Contact Maps Alignments : Contact Maps Alignments An alignment of two contact maps (C1,C2) is an assignment of residues from the C1 to C2: a=(r(i,c1),r(j,c2)) The value of an alignment (the overlap) is the number of size 4 cycles made with edges from C1,C2 and alignments. The MAX CMO problem was introduced in [Godzik et.al] and probed NP-Complete in [Goldman et.al] and later in [Krasnogor]


IP Formulation : IP Formulation A LP is used to obtain upper bounds on the value of alignments A metaheuristic is used to obtain lower bounds on the overlaps The methodology was introduced in [Lancia et.al] for local search and GAs. [Krasnogor et.al.] improved results using Memetic Algorithms (GAs+LS).


Self-Reconfigurable Memetic Algorithms (the embodiment) for the MAX-CMO Problem : Self-Reconfigurable Memetic Algorithms (the embodiment) for the MAX-CMO Problem Memetic Algorithms (MAs) are inspired by: Models of adaptation in natural systems that combine evolutionary adaptation of population of individuals (GAs) + Individual learning (LS) within a lifetime (others consider the LS as development stage). + R.Dawkin’s concept of a meme which represents a unit of cultural evolution that can exhibit refinement.


Slide16 : What happens inside an MA? This is my solution to your problem I used strategy X When I grow I’ll need to decide whether to used Dad’s strategy, X or my own


Details of the Evolutionary Algorithm : Details of the Evolutionary Algorithm A chromosome c  [0,…,m]n where m = |C1| and n=|C2| and m>=n A position j  c, c[j], specifies that the jth residue of C1 is to be aligned to the c[j]th residue of C2


Crossover : Crossover


Mutation : Mutation


Local Search and Selection : Local Search and Selection In previous work [Krasnogor, Krasnogor et.al.]the algorithms employed 6 human-designed, pre-existing, local search operators (memes) Individuals self-adapted to use the most profitable meme at different stages of the search. A (,) strategy was employed for replacement


Memetic Aspects of the Algorithm : Memetic Aspects of the Algorithm In current work: The algorithm self-generates the local searchers as memeplexes. A memeplex represents a set of co-adapted local search behaviours. It can adapt by changing its parameters or its code. We use a grammar to represent memeplexes. The grammar allows to specify: acceptance strategy (e.g. next or steepest ascent, random walk, etc) number of neighbours sampled number of iterations the move operator itself etc


Memeplexes Grammar : Memeplexes Grammar A syntactically valid sentence in the language generated by this grammar is a valid meme (i.e. complex local searcher)


Memetic Processes : Memetic Processes There are various processes that guide the Agent’s cultural evolution of local search strategies: Inheritance: an agent inherits the meme of the most successful of its parents Imitation: an agent imitates a successful non-genetically-related individual Innovation: an agent accidentally (randomly) change its meme Mental Simulation: an agent purposely improve its meme


Results on Randomly Generated Problems : Results on Randomly Generated Problems A random problem is specified by: r : number of residues d : density of random edges n : #patterns pi : edge (i,i+pi) pri : probability for the pattern We can control de difficulty of the instances


Slide25 : One pattern, different number of residues, identical contact maps


Slide26 : Simple two patterns contact maps


Slide27 : More complex random problems, 3 patterns on top


Slide28 : More complex random problems, 3 patterns


Slide33 : Results on Real Proteins the worst SGMA is worst than a GA in 4 out of 18 pairs the SGMA are worst than a MMA in 7 cases, better in 3 and equivalent in 8.


Conclusions : Conclusions We propose a SRS for optimisation in bioinformatics. The embodiment presented here is that of Memetic Algorithms but it could have been TS, SA, VNS, etc. SRS develop and learn; they are never finished. It pays to discover on-the-fly which local search to use and when to use it. Different problem(s) and instance(s) require different configurations of the software The discovered configurations are competitive (but still not better) with human-designed local searchers


Future Research : Future Research Test the approach with a larger set of real proteins We are developing a structural comparison web server that integrates the LP and the SRS Explore other embodiments


Slide36 : Questions???? Suggestions???? Interested in collaborating in this problem/methodology???? Drop me a line to Natalio.Krasnogor@nottingham.ac.uk Thanks for inviting me to give this talk!!


Bibliography : Bibliography [Wolpert et.al.] D.Wolpert and W.G.MacReady. No Free Lunch Theorems for Optimization. IEEE Trans. on Evolutionary Computation, 1(1) : pp 67-82, 1997. [Goldman et.al.] D.Goldman, S. Istrail and C.Papadimitriou. Algorithmic aspects of protein structure similarity. Proceedings of the 40th Annual Symposium on Foundations of Computer Sciences. 1999 [Krasnogor] N.Krasnogor. Studies on the Theory and Design Space of Memetic Algorithms. Doctoral dissertation. University of the West of England. 2002. Http://slater.chem.nott.ac.uk/~natk/Public/papers.html [Godzik et.al.] A.Godzik, J.Skolnick and A.Kolinski. A topology fingerprint approach to inverse protein folding problem. Journal of Molecular Biology. 227:227-238,1992. [Krasnogor et.al.] B.Carr, W.E. Hart, N.Krasnogor, E.K. Burke, J.D. Hirst and J.E. Smith. Alignment of Protein Structures with a Memetic Evolutionary Algorithm. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2002). Morgan Kaufman, 2002.


Bibliography : Bibliography [Lancia et.al.] G.Lancia, R.Carr, B. Walenz and S. Istrail. 101 Optimal pdb Structure Alignments: a branch-and-cut Algorithm for the Maximum Contact Map Overlap Problem. Proceedings of the Fifth Annual International Conference on Computational Molecular Biology. RECOMB 2001, 2001.