wabi2002 rome

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

Modified Mincut Supertrees Roderic Page University of Glasgow

Tree of Life: 

Tree of Life About 1.7 million species described. What we have so far: TreeBASE database (15,000 taxa) Ribosomal Database Project (RDP II) (20,000 sequences) The Tree of Life Project (11,000 taxa)

Slide3: 

Recent interest in the Tree of Life Assembling the Tree of Life: Science, Relevance, and Challenges AMNH, New York, May 2002 $US 10 million “to construct a phylogeny for the 1.7 million described species of Life” announced February 15th 2002 NSF sponsored “Tree of Life” workshops (2000-2001) European initiative (ATOL) under FP6

Problem: how to build the tree of life: 

Problem: how to build the tree of life Solutions: Find one or more “magic markers” that will allow us to recover the whole tree in one go (problems: combinability and complexity) Assemble big tree from many smaller trees derived from many kinds of data (supertrees)

Tree terminology: 

Tree terminology a b c d { a,b } { a,b,c } { a,b,c,d } root leaf internal node cluster edge

Nestings and triplets: 

Nestings and triplets a b c d {a,b} <T {a,b,c,d} {b,c} <T {a,b,c,d} (bc)d bc|d Nestings Triplets

Supertree: 

Supertree a b c b c d a b c d supertree T 1 T 2 + =

Some desirable properties of a supertree method (Steel et al., 2000): 

Some desirable properties of a supertree method (Steel et al., 2000) The supertree can be computed in polynomial time A grouping in one or more trees that is not contradicted by any other tree occurs in the supertree

Slide9: 

Homo sapiens 1 1 1 Pan paniscus 1 1 1 Gorilla gorilla 1 1 0 Pongo pygmaeus 1 0 0 Hylobates 0 0 0 1 2 3 1 2 3 MRP (Matrix Representation Parsimony) NP-hard Can generate many solutions

Aho et al.’s algorithm (OneTree): 

Aho et al.’s algorithm (OneTree) Aho, A. V., Sagiv, Y., Syzmanski, T. G., and Ullman, J. D. 1981. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J. Comput. 10: 405-421. Input: set of rooted trees 1. If set is compatible (i.e., will agree on a tree), output that tree. 2. If set is not compatible, stop!

Slide11: 

a b c b c d T 1 T 2 Aho et al.’s OneTree algorithm supertree

Mincut supertrees: 

Mincut supertrees Semple, C., and Steel, M. 2000. A supertree method for rooted trees. Discrete Appl. Math. 105: 147-158. Modifies OneTree by cutting graph Requires rooted trees (no analogue of OneTree for unrooted trees) Recursive Polynomial time

Slide13: 

a b c d e a b c d T 1 T 2 { T 1 , T 2 } S Semple and Steel (2000)

Collapsing the graph (Semple and Steel mincut algorithm): 

a b c d e a,b c d e 1 1 1 1 1 1 1 2 { T 1 , T 2 } S max S / E { T 1 , T 2 } { T 1 , T 2 } Collapsing the graph (Semple and Steel mincut algorithm) This edge has maximum weight

Cut the graph to get supertree: 

Cut the graph to get supertree a b c d e supertree a,b c d e 1 1 1 max S / E { T 1 , T 2 } { T 1 , T 2 }

My mincut supertree implementation darwin.zoology.gla.ac.uk/~rpage/supertree: 

My mincut supertree implementation darwin.zoology.gla.ac.uk/~rpage/supertree Written in C++ Uses GTL (Graph Template Library) to handle graphs (formerly a free alternative to LEDA) Finds all mincuts of a graph faster than Semple and Steel’s algorithm

A counter example: two input trees...: 

A counter example: two input trees... a b c x 1 x 2 x 3 c b a y 1 y 2 y 3 y 4

Mincut gives this (strange) result: 

Mincut gives this (strange) result c x 1 x 2 x 3 b a y 1 y 2 y 3 y 4 Disputed relationships among a, b, and c are resolved x1, x2, and x3 collapsed into polytomy

Problem: Cuts depend on connectivity (in this example it is a function of tree size): 

Problem: Cuts depend on connectivity (in this example it is a function of tree size) a x1 x2 y1 y3 y4 x3 y2 c b

So, mincut doesn’t work: 

So, mincut doesn’t work But, Semple and Steel said it did My program seems to work Argh!!! What is happening….?

What mincut does… …and does not do: 

What mincut does… …and does not do Mincut supertree is guaranteed to include any nesting which occurs in all input trees Makes no claims about nestings which occur in only some of the trees “Does exactly what it says on the tin™”

Modifying mincut supertree: 

Modifying mincut supertree Can we incorporate more of the information in the input trees? Three categories of information Unanimous (all trees have that grouping) Contradicted (trees explicitly disagree) Uncontradicted (some trees have information that no other tree disagrees with)

Uncontradicted information assume we have k input trees: 

Uncontradicted information assume we have k input trees a b a and b co-occur in a tree a and b nested in a tree a b c n c - n = 0  uncontradicted (if c = k then unanimous) c - n > 0  contradicted

Uncontradicted information assume we have k input trees: 

Uncontradicted information assume we have k input trees a b a and b co-occur in a tree a and b nested in a tree a b c n c - n -f = 0  uncontradicted (if c = k then unanimous) c - n - f > 0  contradicted a b a and b in a fan f

Classifying edges: 

a b c x 1 x x 3 y 1 y 2 y 3 y 4 2 a b c y 1 y 3 y 4 x 1 x 2 x 3 y 2 Uncontradicted Uncontradicted but adjacent to contradicted Contradicted Classifying edges { T 1 , T 2 } S

Modified mincut: 

Modified mincut Species a, b, and c form a polytomy x1, x2, and x3 resolved as per the input tree modified mincut a b c x 1 x 2 x 3 y 1 y 2 y 3 y 4

If no tree contradicts an item of information, is that information always in the supertree?: 

(12)5 (45)1 (23)5 (34)1 If no tree contradicts an item of information, is that information always in the supertree?

No! Steel, Dress, & Böcker 2000: 

No! Steel, Dress, & Böcker 2000 The four trees display (12)5, (23)5, (34)1, and (45)1 No tree displays (IK)J or (JK)I for any (IJ)K above Triplets are uncontradicted, but cannot form a tree

Future directions: 

Future directions Improve handling of uncontradicted information Add support for constraints Visualising very big trees Better integration into phylogeny databases (www.treebase.org) darwin.zoology.gla.ac.uk/~rpage/supertree