Stamatakis18 3 04

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood : 

Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact: stamatak@cs.tum.edu

Outline: 

Outline Motivation Introduction to phylogenetic tree inference Statistical inference methods Maximum Likelihood & associated problems Solutions: 2 simple heuristics parallel & distributed implementation Results Conclusion Availability & Future Work

Motivation: Towards a „Tree of Life“: 

Motivation: Towards a „Tree of Life“ 30.000 organisms available, current trees <= 1000 Where we are:

Motivation: Towards a „Tree of Life“: 

Motivation: Towards a „Tree of Life“ 30.000 organisms available, current trees <= 1000 Where we want to get:

Phylogenetic Tree Inference: 

Phylogenetic Tree Inference Input: „good“ multiple alignment of a distinguished, highly conserved part of DNA sequences Output: unrooted binary tree with the sequences at its leaves (all nodes: either degree 1 or 3) Various methods for phylogenetic tree inference Differ in computational complexity and quality of trees Most accurate methods: Maximum Likelihood Method (ML) and Bayesian Phylogenetic Inference: + most sound and flexible methods + other methods not suited for large/complex trees -- most computationally intensive methods

ML and Bayesian methods: 

ML and Bayesian methods T.Williams et al (March 2003) comparative analysis with simulated data shows: MrBayes is best program Guidon et al (May 2003) PHYML very fast & accurate ML program for real & simulated data: faster than MrBayes ML (PHYML, RAxML2): + Significantly faster than MrBayes + Reference/starting trees for bayesian methods -- Less powerful statistical model Bayesian Inference (MrBayes): + Powerful statistical model -- MCMC convergence problem Memory requirements for 1000/10000-taxon alignment: RAxML: 200MB/750MB PHYML: 900MB/8.8GB MrBayes: 1150MB/unknown

MCMC Convergence Problem: 

MCMC Convergence Problem

What does ML compute?: 

What does ML compute? Maximum Likelihood calculates: Topologies Branch lengths v[i] Likelihood of the tree Goal: Find tree topology wich maximizes likelihood Problem I: Number of possible topologies is exponential in n Problem II: Computation of likelihood value + branch length optimization is expensive Solution: Algorithmic Optimizations (previous work) + New heuristics + HPC S1 S2 S3 S4 S5 v1 v2 v3 v4 v5 v6 v7

New Heuristics for RAxML: 

New Heuristics for RAxML Two common methods to build a tree: Progressive addition of organisms e.g. stepwise addition algorithm Use a (random, simple) starting tree containing all organisms and optimize likelihood by application of topological changes RAxML (Randomized Axelerated Maximum Likelihood) computes parsimony starting tree with dnapars -> fast and relatively „good“ initial likelihood dnapars uses stepwise addition -> randomized sequence input order to obtain distinct starting trees Optimize starting tree by application of rearrangements Accelerate rearrangements by two simple ideas

Subtree Rearrangements: 

Subtree Rearrangements

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1 +1

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1 +1

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1 +1

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1 +1

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1 +2

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1 +2

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1 Optimize all branches

Subtree Rearrangements: 

Subtree Rearrangements ST5 ST2 ST6 ST4 ST3 ST1 Need to optimize all branches ?

Idea 1: Local Optimization of Branch Length: 

Idea 1: Local Optimization of Branch Length ST5 ST2 ST6 ST4 ST3 ST1

Idea 1: Local Optimization of Branch Length: 

Idea 1: Local Optimization of Branch Length ST5 ST2 ST6 ST4 ST3 ST1

Why is Idea 1 useful?: 

Why is Idea 1 useful? Local optimization of branch lengths: Update less likelihood vectors -> significantly faster Allows higher rearrangement settings -> better trees Likelihood depends strongly on topology Fast exploration of large number of topologies Straight-forward parallelization Store best 20 trees from each rearrangement step Branch length optimization of best 20 trees only Experimental results justify this mechanism

Idea 2:Subsequent Application of Topological Changes : 

Idea 2:Subsequent Application of Topological Changes

Idea 2:Subsequent Application of Topological Changes : 

Idea 2:Subsequent Application of Topological Changes ST3

Idea 2:Subsequent Application of Topological Changes : 

Idea 2:Subsequent Application of Topological Changes ST3 ST3

Idea 2:Subsequent Application of Topological Changes : 

Idea 2:Subsequent Application of Topological Changes ST5 ST2 ST6 ST4 ST3 ST1 ST5 ST2 ST6 ST4 ST1 ST3 ST3 ST3

Why is Idea 2 useful?: 

Why is Idea 2 useful? During inital 5-10 rearrengement steps many improved topologies are encountered Acceleration of likelihood improvment in initial optimization phase Enables fast optimization of random starting trees

Remainder of this Talk: 

Remainder of this Talk Motivation Introduction to phylogenetic tree inference Statistical inference methods Maximum Likelihood & associated problems Solutions: 2 simple heuristics parallel & distributed implementation Results Conclusion Availability & Future Work

Basic Parallel & Distributed Algorithm: 

Basic Parallel & Distributed Algorithm Basic idea: Distribute work by subtrees instead of topologies (e.g. parallel fastDNAml) Simple Master-Worker architecture Subsequent application of topological changes introduces non-determinism ST5 ST2 ST6 ST4 ST3 ST1

Basic Parallel & Distributed Algorithm: 

Basic Parallel & Distributed Algorithm Basic idea: Distribute work by subtrees instead of topologies (e.g. parallel fastDNAml) Simple Master-Worker architecture Subsequent application of topological changes introduces non-determinism ST5 MPI_Send(ST3_ID, tree) ST6 ST4 ST3 ST1 ST2

Basic Parallel & Distributed Algorithm: 

Basic Parallel & Distributed Algorithm Basic idea: Distribute work by subtrees instead of topologies (e.g. parallel fastDNAml) Simple Master-Worker architecture Subsequent application of topological changes introduces non-determinism ST5 MPI_Send(ST3_ID, tree) ST6 ST4 ST3 ST1 ST2 MPI_Send(ST2_ID, tree)

Differences between Parallel & Distributed Algorithm: 

Differences between Parallel & Distributed Algorithm Parallel: best tree list of max(20, #workers) maintained and merged at the master Parallel: Master distributes max(20, #workers) as toplogy-strings to workers for branch length optimization Distributed: Each worker maintains local best list of 20 trees Distributed: Worker performs fast branch length optimizations locally on all 20 trees -> returns only best topology to the master

Sequential Results: 

Sequential Results 50 distinct simulated 100-taxon alignments Measured average execution times & topological distance (RF-rate) from „true“ tree PHYML: 35.21 seconds, RF-rate: 0.0796 MrBayes: 945.32 seconds, RF-rate: 0.0741 RAxML: 29.27 seconds, RF-rate: 0.0818 9 distinct real alignments containing 101-1000 taxa Measured execution times & final likelihood values RAxML yields best-known likelihood for all data sets RAxML faster than PHYML & MrBayes

Sequential Results: Real Data: 

Sequential Results: Real Data

Sequential Results: Real Data: 

Sequential Results: Real Data

Sequential Results: Real Data: 

Sequential Results: Real Data

Sequential Results: Real Data: 

Sequential Results: Real Data

Sequential Results: Real Data: 

Sequential Results: Real Data

Parallel Results: Speedup 1000_ARB: 

Parallel Results: Speedup 1000_ARB

Distributed Results: First Tests : 

Distributed Results: First Tests Platforms: Infiniband-Cluster: 10 Intel Xeon 2.4 GHz Sunhalle: 50 Sun-Workstations for CS students Alignments: 1000_ARB 2025_ARB Larger trees to come .......... Results: Program executed correctly & terminated RAxML@home yielded best-known tree for 2025_ARB

Biological Results: 1st ML 10.000-taxon tree: 

Biological Results: 1st ML 10.000-taxon tree Calculated 5 parsimony starting trees + 3-4 initial rearrangement steps sequentially on Xeon 2.4GHz Further rearrangements of those 5 trees in parallel on 32 or 64 Xeon 2.66GHz at RRZE Accumulated CPU hours/tree ~ 3200hours Best ln likelihood: -949539 worst: -950026 Problems: Quality assessment? bootstrap not feasible Consense crashes for > 5 trees MrBayes/PHYML crash on 32-bit/4GB MrBayes crashed on Itanium Visualization?

Conclusion: 

Conclusion RAxML not able to handle protein data RAxML not able to perform model parameter optimization BUT: RAxML easy to parallelize/distribute Accurate & fast for large trees Significantly lower memory requirements than MrBayes/PHYML Conclusion: Imlement model parameter optimization & protein data in RAxML

Availability & Future Work: 

Availability & Future Work Further development & distribution of RAxML@home Big production runs with RAxML@home Survey: ML supertrees vs. integral trees Alignment split-up methods for ML supertrees RAxML implementation on GPUs RAxML2 download, benchmark, code: wwwbode.in.tum.de/~stamatak RAxML@home development: www.sourceforge.com/projects/axml