logging in or signing up SK BayesianMethods WoodRock Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 58 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: September 17, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Lecture 6 Bayesian Inference and Molecular Dating Thomas Bayes Slide2: Bayesian Inference: The explanation with the highest posterior probability Pr(H D) = Pr(H) Pr(D H) Pr(D) 2. Bayes’ Theorem Posterior probability, the probability of the hypothesis given the data Prior probability, the probability of the hypothesis on previous knowledge Likelihood function, probability of the data given the hypothesis Unconditional probability of the data, a normalizing constant ensuring the posterior probabilities sum to 1.00 Pr(A and B) = Pr(A) Pr(B A) = Pr(B) Pr(A B) 1. Definition of conditional probability Slide3: Pr(H D) = Pr(H) Pr(D H) Pr(D) Odds ratios are the simplest usage for Bayes’ theorem Felsenstein’s example: He gives his prior belief in 'martians' Pr(H) = 1/4. A space probe sent to mars has probability of 1/3 of finding martians if they are there. It finds none, so Pr(D H) = 1/3, where H is that martians exist is 1/3 = 1/4 1/3 1 = 1/12 Assuming all the data is correct Slide4: When the prior is a probability distribution, the posterior probability can be given as a probability distribution (often considered a density distribution under a curve) 0 24 Flat prior 0 24 Prior Skewed Prior Parameter value Parameter value Slide5: Influence of the Prior diminishes if the likelihood ratio between hypotheses increases with data collection Posterior pheads 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 10 coin tosses, 3 heads 100 coin tosses, 30 heads 'heads' 'tails' pheads Slide6: First use in phylogenetics: Li (1996, PhD thesis), Rannala and Yang (1996) A useful property of Bayesian inference for phylogenetics is that with flat priors (all hypotheses equal before the data is examined), posterior probabilities for two trees are proportional to their likelihood ratio. Bayesian inference in phylogenetics Tree1 Tree2 Tree3 Tree1 Tree2 Tree3 Tree1 Tree2 Tree3 Prior Posterior Likelihood Slide7: (ti D) = (ti) (D ti) (tj) (D tj) Where (D ti) = b (D ti,b, ) (b, ) dbd T J=1 Bayesian inference in phylogenetic notation Posterior distribution Prior distribution Likelihood distribution Summing over this integral becomes too complex with so many tree/parameter hypotheses ti = Treei = substitution model b = branch-length Slide8: Markov chain Monte Carlo (MCMC) analysis Sampling the MCMC provides a valid approximation for the posterior distribution of trees (over 100,000s – 1,000,000s of generations) - without having to know the denominator Tree 1 Tree 2 Tree 3 Generation: 1 2 3 4 5 6 New state accepted New state rejected Bayesian Posterior Probability for Tree 1 (BPPtree 1)= 4/6 The chain Slide9: Metropolis-Hastings algorithm (the proposal mechanism) 1. Start at a random (or predetermined) tree, Ti 2. Randomly select a tree that is a neighbour in tree space, the proposal tree, Tj 3 2 e.g. 5 4 1 4 2 5 3 1 3. Compute the acceptance ratio R = Pr(Tj) Pr(D Tj ) Pr(Ti) Pr(D Ti ) 4. If R ≥ 1, accept Tj 5. If R andlt; 1, randomly draw from 0-1, if andlt; R, accept Tj 6. Otherwise reject Tj and keep Ti 7. Return to step 2 Slide10: Given an efficient proposal mechanism and sufficient generations, the MCMC with reach an equilibrium distribution The problem of summing across all hypotheses cancels out in the acceptance ratio for Tj / Ti The section of the MCMC that is 'finding its way' towards the equilibrium distribution (the burnin) can be discarded - it is not a valid approximation for the posterior probability Useful properties of MCMC for Bayesian inference Slide11: Metropolis-coupled MCMC (MC3) n chains, of which n-1 are 'heated' such that they can more easily move across peaks and valleys in the landscape of trees. After all n chains have gone one step a swap between randomly chosen chains is proposed (in much the same way as between generations). If accepted the two chains switch states. The cold chain (which could be stuck in a local optimum) can escape when a proposed swap with a hot chain is successful. Slide12: 4100 -- (-15395.901) (-15376.724) [-15367.235] 4200 -- (-15395.854) (-15377.874) [-15368.146] 4300 -- (-15381.440) [-15376.072] (-15368.164) 4400 -- (-15375.695) (-15369.686) [-15369.127] 4500 -- [-15374.458] (-15372.051) (-15370.876) 4600 -- [-15358.899] (-15366.374) (-15365.927) 4700 -- [-15350.419] (-15366.082) (-15366.168) 4800 -- (-15369.614) [-15368.152] (-15369.902) Sampling three simultaneous chains (MrBayes 3.0: Heulsenbeck and Ronquist, 2001) Generation chain 1 chain 2 chain 3 Proposing new trees along individual chains and swapping trees between chains (to more efficiently explore tree space) Slide13: Inference is based only on sampling from the cold chain - for which the acceptance ratio for changes between generations is appropriate for approximating the posterior probability. tree tree_1 [p = 0.455, P = 0.455] (Ostrich,Rhea,(Moa,(Kiwi,(Emu,Cassowary)))) tree tree_2 [p = 0.215, P = 0.670] (Ostrich,(Kiwi,(Moa,Rhea)),(Emu,Cassowary)) tree tree_3 [p = 0.118, P = 0.788] (Ostrich,Moa,(Rhea,(Kiwi,(Emu,Cassowary))) tree tree_4 [p = 0.081, P = 0.869] (Ostrich,(Moa,Rhea),(Kiwi,(Emu,Cassowary))) tree tree_5 [p = 0.050, P = 0.919] (Ostrich,Kiwi,(Moa,(Rhea,(Emu,Cassowary)))) From Phillips, unpub. Ostrich Emu Moa Rhea Kiwi Cassowary Slide14: H1: Anapsida H3: Archosauria H2: Diapsida Amphibia (outgroup) Squamata Aves Crocodilia Mammalia A Return to the Turtle example Slide15: Bayesian inference in MrBayes 3.0 The same data matrix as for the likelihood example (16 taxa, 3110 nucleotides) GTR+I+ model (AIC recommendation in ModelTest) Four MC3 chains (3 are 'hot'): 2,000,000 generations sample every 1000th tree from the 'cold' chain Slide16: +------------------------------------------------------------+ -25419.04 | *****************************************************| | ** | | | | * | | * | | * | | | | | | * | | | | | | | |* | | | | | +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ -33441.42 ^ ^ 1 10000 Estimated marginal likelihood = -25423.76 (arithmetic mean) = -25451.08 (harmonic mean) Discard the 1st 250,000 generations as the 'burnin' 2,000,000 generations likelihood Equilibrium distribution Slide17: 95% Cred. Interval -------------------- Parameter Mean Variance Lower Upper Median -------------------------------------------------------------- TL 2.513384 0.005489 2.373000 2.663000 2.512000 r(Gandlt;-andgt;T) 1.000000 0.000000 1.000000 1.000000 1.000000 r(Candlt;-andgt;T) 14.895438 2.952511 11.968691 18.597121 14.745678 r(Candlt;-andgt;G) 0.088475 0.004142 0.011433 0.241428 0.074255 r(Aandlt;-andgt;T) 2.489403 0.105833 1.952249 3.220099 2.453488 r(Aandlt;-andgt;G) 8.309381 0.935429 6.684136 10.510313 8.229202 r(Aandlt;-andgt;C) 2.481878 0.108561 1.928505 3.190486 2.453910 pi(A) 0.323330 0.000044 0.310754 0.336207 0.323446 pi(C) 0.215915 0.000035 0.204413 0.227673 0.216025 pi(G) 0.209200 0.000035 0.197702 0.221016 0.209162 pi(T) 0.251555 0.000037 0.239747 0.263308 0.251377 alpha 1.017169 0.015644 0.785668 1.261082 1.013706 pinvar 0.226459 0.000787 0.164235 0.274568 0.229746 -------------------------------------------------------------- Model parameter summaries for the 1750 samples from the MCMC chain Slide18: Unlike likelihood, for which parameters are optimised, Bayesian inference gives a Posterior density distribution for parameter values (integrated over all trees sampled from the chain) 0 0.2 0.4 frequency Proportion of invariant sites 95% credible interval Slide19: Statistic Mean Estimated sample size Sampling analysis (e.g. Tracer 1.0) Rambaut and Drummond All ESS should be at least andgt;100 Slide20: Has the MCMC run for sufficient generations? Have the chains converged on similar likelihood values and swaps being made between each of them? Do different runs of the same analyses converge? Sampling over generation plots reveal that likelihoods have levelled (equilibrium distribution is reached) Estimated sample sizes for all parameters are andgt;100, indicating substantial coverage of parameter space Slide21: Salamander Caecilian Skink Iguana 1.00 Green Turtle Painted Turtle 1.00 Alligator Caiman 1.00 Cassowary Penguin 1.00 0.99 1.00 1.00 Platypus Echidna 1.00 Dog 3 toed Sloth 1.00 Kangaroo Opossum 1.00 0.79 1.00 1.00 Archosauria Diapsidia Mammalia Amphibia Bayesian inference tree with Posterior probabilities Slide22: Birds Crocodilia Amphibia Turtles Squamates Mammalia Birds Crocodilia Amphibia Turtles Squamates Mammalia Birds Crocodilia Amphibia Mammalia Squamates Turtles Tree 1 Tree 2 Tree 3 Tree1 Tree2 Tree3 -lnL +36.1 +11.7 andlt;bestandgt; KH 0.002 0.044 -- SH 0.003 0.153 -- AU 0.001 0.054 -- NPBP 0.005 0.041 -- SOWH andlt;0.001 andlt;0.001 -- BPP andlt;0.001 andlt;0.001 -- Slide23: Controversies with Bayesian inference 1. Flat prior? It depends on perspective 0.0 0.25 0.5 0.75 Branch-length as the probability of change 0 1 2 3 4 5 Priors for the same branch lengths (under a Jukes-Cantor model) Branch-length as expected substitutions per site Slide24: 2. A flat prior on branch-lengths is not a flat prior on clade probabilities A flat prior across all topologies for the whole tree favours the largest and smallest clades over clades of intermediate taxon inclusion However, the influence on posterior probabilities maybe quite small Pickett and Randle (MPE, 2005) Slide25: 3. Bounds on priors 0.0 100 0.0 100 Branch-length Branch-length Prior Posterior 95% credible interval A zero to infinity 'bound' provides a worst case scenario MLE Slide26: Simmons et al. (MBE, 2004) 4. Bayesian posterior probabilities tend to overestimate clade support (underestimate sampling effects) when the substitution model is misspecified – Which is essentially always when using biological datasets True Slide27: Bayesian inference methods are now being incorporated into phylogenetic analysis of morphological data Right scapula of an echidna (a.) and a marsupial cat (b) in distal view. ap, acromion process; gl, glenoid; if, infraspinous fossa; mc, metacoracoid; sf, supraspinous fossa; sp, scapular spine; vpg, ventral process of glenoid. a. 0 0 1 0 0 b. 0 1 1 1 1 mc fused to scapular glenoid (gl) orientation Supraspinous fossa (sf) ap relative to gl vpg points venteral gl gl if if ap ap mc mc sf sp sp vpg vpg medial medial (b.) (a.) Slide28: The Assumption for Likelihood that sites evolve by common mechanisms is not clear for morphological data The state '1' for one character does not mean the same thing as state '1' for another character Oak ATGACCGCTGCCAG Ash ACGCTCGCCATCAG Maple ATGCTCGCTACCGG Transitions at six sites, only one transversion is observed An ML model would allow for different transition and transversion substitution rates a. 0 0 1 0 0 b. 0 1 1 1 1 mc fused to scapular glenoid (gl) orientation Supraspinous fossa (sf) ap relative to gl vpg points venteral Slide29: Lewis (Syst. Biol., 2001) recognized that ML estimation of branch-lengths could help interpret the nature of morphological change among taxa (e.g. as resulting from shared ancestry or convergence) Mkv model for morphology resembles Jukes-Cantor (JC): All states assumed to have the = equilibrium frequency Changes between all states are equally probable Differences: Multiple (andgt;4) states allowed conditional on no characters being constant across all taxa Slide30: The root of the Australidelphian marsupial tree dasyurids bandicoots diprotodonts S. American marsupials Australasian Marsupiarnivora Slide31: dasyurids bandicoots diprotodonts S. American marsupials syndactylus Non-syndactylus Syndactyla Slide32: Phillips et al. (Syst. Biol, 2006) ML, MP and Bayesian inference on 17,864 nuclear and mitochondrial nucleotide sites Australasian Marsupicarnivora Slide33: Horovitz and Sánchez-Villagra (Cladistics, 2002) Parsimony on 230 morphological characters Syndactyla + Dromiciops Slide34: Mayulestes Pucadelphys Shrew opossum Monito del monte Pygmy possum Sugar glider Ringtail possum Cuscus Brushtail possum Tree kangaroo Pademellon Kangaroo Forest wallaby Wombat Koala Mulgara Quoll Phascogale Marsupial mole Spiny bandicoot Long-nosed bandicoot Virginia opossum Grey Short-tailed opossum Bayesian inference (Mkv-model) on the 230 morphological characters of Horovitz and Sánchez-Villagra (2002) Australasian Marsupicarnivora Slide35: Combined Molecular and Morphological data analysis Allows model-based estimation for molecular sequences to be combined with morphological data (including for fossil taxa) Taxon1 ACGTAAGTC 0000110 Taxon2 ATGGAAATT 1110302 Taxon3 ACATAAATC 1020111 Taxon4 ACGCTAGTC 0010012 Partition 1 (e.g. GTR model) Partition 2 (Mkv model) (ti D12) = (ti) (D1 ti) (D2 ti) (tj) (D1 tj) (D2 tj) T J=1 Slide36: Glenner et al. (Curr. Biol., 2005) Deuterostomes (inc. echinoderms, sea squirts, vertebrates) Ecdysozoa (inc. arthropods, nematodes) (Platyhelminthes, rotifers, acorn worms) Lophotrochozoa (inc. molluscs, annalids) Combined (DNA/morphology) Bayesian analysis of Metazoans Slide37: Bayesian character state reconstruction: Shell hairs on the Trochulus snails Pfenninger et al. (BMC Evol. Biol., 2005) Slide38: Relaxed-clock Bayesian Phylogenetics r = rate (substitutions/site/time); t = node height; branch-length (b) = rt; n = number of taxa A. Clock B. No-clock C. Relaxed-clock t1ri b4 b3 b2 b1 b1 b5 b4 b7 b3 b2 b6 t3rk t4rl t2rj Forces a molecular clock (r is constant) – assumes deviation from this is sampling error. Different rate for each branch – no sampling error is assumed. Rates are influenced by sampling error and vary according to an underlying distribution (e.g. exponential or lognormal). Slide39: A. Clock B. No-clock C. Relaxed-clock t1ri b4 b3 b2 b1 b1 b5 b4 b7 b3 b2 b6 t3rk t4rl t2rj n-1 branch-length parameters 2n-3 branch-length parameters n-1 node heights and 1-2 rate distribution parameters Most sequence datasets have not evolved in a clock-like fashion, and so the assumption of a clock often produces incorrect tree topologies Slide40: By allowing for the influence of sampling error, relaxed-clock models can more accurately infer underlying substitution rates and hence provide greater phylogenetic accuracy. By estimating fewer branch-length parameters, less variance will tend to be associated with relaxed-clock model estimates, thus providing for greater phylogenetic precision than no-clock models However, we need to know how well fitting the rate distribution models are Relaxed-clocks : the holy grail of phylogenetics? Slide41: A. B. Yeasts (106) C. Plants (61) D. Animals (99) E. Primates (500) B. C. D. E. A. Bacteria (102) Drummond et al. (PLoS Biol., 2006) Slide42: Drummond et al. (PLoS Biol., 2006) Relaxed phylogenetics allows co-estimation of phylogeny and divergence timing You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
SK BayesianMethods WoodRock Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 58 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: September 17, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Lecture 6 Bayesian Inference and Molecular Dating Thomas Bayes Slide2: Bayesian Inference: The explanation with the highest posterior probability Pr(H D) = Pr(H) Pr(D H) Pr(D) 2. Bayes’ Theorem Posterior probability, the probability of the hypothesis given the data Prior probability, the probability of the hypothesis on previous knowledge Likelihood function, probability of the data given the hypothesis Unconditional probability of the data, a normalizing constant ensuring the posterior probabilities sum to 1.00 Pr(A and B) = Pr(A) Pr(B A) = Pr(B) Pr(A B) 1. Definition of conditional probability Slide3: Pr(H D) = Pr(H) Pr(D H) Pr(D) Odds ratios are the simplest usage for Bayes’ theorem Felsenstein’s example: He gives his prior belief in 'martians' Pr(H) = 1/4. A space probe sent to mars has probability of 1/3 of finding martians if they are there. It finds none, so Pr(D H) = 1/3, where H is that martians exist is 1/3 = 1/4 1/3 1 = 1/12 Assuming all the data is correct Slide4: When the prior is a probability distribution, the posterior probability can be given as a probability distribution (often considered a density distribution under a curve) 0 24 Flat prior 0 24 Prior Skewed Prior Parameter value Parameter value Slide5: Influence of the Prior diminishes if the likelihood ratio between hypotheses increases with data collection Posterior pheads 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 10 coin tosses, 3 heads 100 coin tosses, 30 heads 'heads' 'tails' pheads Slide6: First use in phylogenetics: Li (1996, PhD thesis), Rannala and Yang (1996) A useful property of Bayesian inference for phylogenetics is that with flat priors (all hypotheses equal before the data is examined), posterior probabilities for two trees are proportional to their likelihood ratio. Bayesian inference in phylogenetics Tree1 Tree2 Tree3 Tree1 Tree2 Tree3 Tree1 Tree2 Tree3 Prior Posterior Likelihood Slide7: (ti D) = (ti) (D ti) (tj) (D tj) Where (D ti) = b (D ti,b, ) (b, ) dbd T J=1 Bayesian inference in phylogenetic notation Posterior distribution Prior distribution Likelihood distribution Summing over this integral becomes too complex with so many tree/parameter hypotheses ti = Treei = substitution model b = branch-length Slide8: Markov chain Monte Carlo (MCMC) analysis Sampling the MCMC provides a valid approximation for the posterior distribution of trees (over 100,000s – 1,000,000s of generations) - without having to know the denominator Tree 1 Tree 2 Tree 3 Generation: 1 2 3 4 5 6 New state accepted New state rejected Bayesian Posterior Probability for Tree 1 (BPPtree 1)= 4/6 The chain Slide9: Metropolis-Hastings algorithm (the proposal mechanism) 1. Start at a random (or predetermined) tree, Ti 2. Randomly select a tree that is a neighbour in tree space, the proposal tree, Tj 3 2 e.g. 5 4 1 4 2 5 3 1 3. Compute the acceptance ratio R = Pr(Tj) Pr(D Tj ) Pr(Ti) Pr(D Ti ) 4. If R ≥ 1, accept Tj 5. If R andlt; 1, randomly draw from 0-1, if andlt; R, accept Tj 6. Otherwise reject Tj and keep Ti 7. Return to step 2 Slide10: Given an efficient proposal mechanism and sufficient generations, the MCMC with reach an equilibrium distribution The problem of summing across all hypotheses cancels out in the acceptance ratio for Tj / Ti The section of the MCMC that is 'finding its way' towards the equilibrium distribution (the burnin) can be discarded - it is not a valid approximation for the posterior probability Useful properties of MCMC for Bayesian inference Slide11: Metropolis-coupled MCMC (MC3) n chains, of which n-1 are 'heated' such that they can more easily move across peaks and valleys in the landscape of trees. After all n chains have gone one step a swap between randomly chosen chains is proposed (in much the same way as between generations). If accepted the two chains switch states. The cold chain (which could be stuck in a local optimum) can escape when a proposed swap with a hot chain is successful. Slide12: 4100 -- (-15395.901) (-15376.724) [-15367.235] 4200 -- (-15395.854) (-15377.874) [-15368.146] 4300 -- (-15381.440) [-15376.072] (-15368.164) 4400 -- (-15375.695) (-15369.686) [-15369.127] 4500 -- [-15374.458] (-15372.051) (-15370.876) 4600 -- [-15358.899] (-15366.374) (-15365.927) 4700 -- [-15350.419] (-15366.082) (-15366.168) 4800 -- (-15369.614) [-15368.152] (-15369.902) Sampling three simultaneous chains (MrBayes 3.0: Heulsenbeck and Ronquist, 2001) Generation chain 1 chain 2 chain 3 Proposing new trees along individual chains and swapping trees between chains (to more efficiently explore tree space) Slide13: Inference is based only on sampling from the cold chain - for which the acceptance ratio for changes between generations is appropriate for approximating the posterior probability. tree tree_1 [p = 0.455, P = 0.455] (Ostrich,Rhea,(Moa,(Kiwi,(Emu,Cassowary)))) tree tree_2 [p = 0.215, P = 0.670] (Ostrich,(Kiwi,(Moa,Rhea)),(Emu,Cassowary)) tree tree_3 [p = 0.118, P = 0.788] (Ostrich,Moa,(Rhea,(Kiwi,(Emu,Cassowary))) tree tree_4 [p = 0.081, P = 0.869] (Ostrich,(Moa,Rhea),(Kiwi,(Emu,Cassowary))) tree tree_5 [p = 0.050, P = 0.919] (Ostrich,Kiwi,(Moa,(Rhea,(Emu,Cassowary)))) From Phillips, unpub. Ostrich Emu Moa Rhea Kiwi Cassowary Slide14: H1: Anapsida H3: Archosauria H2: Diapsida Amphibia (outgroup) Squamata Aves Crocodilia Mammalia A Return to the Turtle example Slide15: Bayesian inference in MrBayes 3.0 The same data matrix as for the likelihood example (16 taxa, 3110 nucleotides) GTR+I+ model (AIC recommendation in ModelTest) Four MC3 chains (3 are 'hot'): 2,000,000 generations sample every 1000th tree from the 'cold' chain Slide16: +------------------------------------------------------------+ -25419.04 | *****************************************************| | ** | | | | * | | * | | * | | | | | | * | | | | | | | |* | | | | | +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ -33441.42 ^ ^ 1 10000 Estimated marginal likelihood = -25423.76 (arithmetic mean) = -25451.08 (harmonic mean) Discard the 1st 250,000 generations as the 'burnin' 2,000,000 generations likelihood Equilibrium distribution Slide17: 95% Cred. Interval -------------------- Parameter Mean Variance Lower Upper Median -------------------------------------------------------------- TL 2.513384 0.005489 2.373000 2.663000 2.512000 r(Gandlt;-andgt;T) 1.000000 0.000000 1.000000 1.000000 1.000000 r(Candlt;-andgt;T) 14.895438 2.952511 11.968691 18.597121 14.745678 r(Candlt;-andgt;G) 0.088475 0.004142 0.011433 0.241428 0.074255 r(Aandlt;-andgt;T) 2.489403 0.105833 1.952249 3.220099 2.453488 r(Aandlt;-andgt;G) 8.309381 0.935429 6.684136 10.510313 8.229202 r(Aandlt;-andgt;C) 2.481878 0.108561 1.928505 3.190486 2.453910 pi(A) 0.323330 0.000044 0.310754 0.336207 0.323446 pi(C) 0.215915 0.000035 0.204413 0.227673 0.216025 pi(G) 0.209200 0.000035 0.197702 0.221016 0.209162 pi(T) 0.251555 0.000037 0.239747 0.263308 0.251377 alpha 1.017169 0.015644 0.785668 1.261082 1.013706 pinvar 0.226459 0.000787 0.164235 0.274568 0.229746 -------------------------------------------------------------- Model parameter summaries for the 1750 samples from the MCMC chain Slide18: Unlike likelihood, for which parameters are optimised, Bayesian inference gives a Posterior density distribution for parameter values (integrated over all trees sampled from the chain) 0 0.2 0.4 frequency Proportion of invariant sites 95% credible interval Slide19: Statistic Mean Estimated sample size Sampling analysis (e.g. Tracer 1.0) Rambaut and Drummond All ESS should be at least andgt;100 Slide20: Has the MCMC run for sufficient generations? Have the chains converged on similar likelihood values and swaps being made between each of them? Do different runs of the same analyses converge? Sampling over generation plots reveal that likelihoods have levelled (equilibrium distribution is reached) Estimated sample sizes for all parameters are andgt;100, indicating substantial coverage of parameter space Slide21: Salamander Caecilian Skink Iguana 1.00 Green Turtle Painted Turtle 1.00 Alligator Caiman 1.00 Cassowary Penguin 1.00 0.99 1.00 1.00 Platypus Echidna 1.00 Dog 3 toed Sloth 1.00 Kangaroo Opossum 1.00 0.79 1.00 1.00 Archosauria Diapsidia Mammalia Amphibia Bayesian inference tree with Posterior probabilities Slide22: Birds Crocodilia Amphibia Turtles Squamates Mammalia Birds Crocodilia Amphibia Turtles Squamates Mammalia Birds Crocodilia Amphibia Mammalia Squamates Turtles Tree 1 Tree 2 Tree 3 Tree1 Tree2 Tree3 -lnL +36.1 +11.7 andlt;bestandgt; KH 0.002 0.044 -- SH 0.003 0.153 -- AU 0.001 0.054 -- NPBP 0.005 0.041 -- SOWH andlt;0.001 andlt;0.001 -- BPP andlt;0.001 andlt;0.001 -- Slide23: Controversies with Bayesian inference 1. Flat prior? It depends on perspective 0.0 0.25 0.5 0.75 Branch-length as the probability of change 0 1 2 3 4 5 Priors for the same branch lengths (under a Jukes-Cantor model) Branch-length as expected substitutions per site Slide24: 2. A flat prior on branch-lengths is not a flat prior on clade probabilities A flat prior across all topologies for the whole tree favours the largest and smallest clades over clades of intermediate taxon inclusion However, the influence on posterior probabilities maybe quite small Pickett and Randle (MPE, 2005) Slide25: 3. Bounds on priors 0.0 100 0.0 100 Branch-length Branch-length Prior Posterior 95% credible interval A zero to infinity 'bound' provides a worst case scenario MLE Slide26: Simmons et al. (MBE, 2004) 4. Bayesian posterior probabilities tend to overestimate clade support (underestimate sampling effects) when the substitution model is misspecified – Which is essentially always when using biological datasets True Slide27: Bayesian inference methods are now being incorporated into phylogenetic analysis of morphological data Right scapula of an echidna (a.) and a marsupial cat (b) in distal view. ap, acromion process; gl, glenoid; if, infraspinous fossa; mc, metacoracoid; sf, supraspinous fossa; sp, scapular spine; vpg, ventral process of glenoid. a. 0 0 1 0 0 b. 0 1 1 1 1 mc fused to scapular glenoid (gl) orientation Supraspinous fossa (sf) ap relative to gl vpg points venteral gl gl if if ap ap mc mc sf sp sp vpg vpg medial medial (b.) (a.) Slide28: The Assumption for Likelihood that sites evolve by common mechanisms is not clear for morphological data The state '1' for one character does not mean the same thing as state '1' for another character Oak ATGACCGCTGCCAG Ash ACGCTCGCCATCAG Maple ATGCTCGCTACCGG Transitions at six sites, only one transversion is observed An ML model would allow for different transition and transversion substitution rates a. 0 0 1 0 0 b. 0 1 1 1 1 mc fused to scapular glenoid (gl) orientation Supraspinous fossa (sf) ap relative to gl vpg points venteral Slide29: Lewis (Syst. Biol., 2001) recognized that ML estimation of branch-lengths could help interpret the nature of morphological change among taxa (e.g. as resulting from shared ancestry or convergence) Mkv model for morphology resembles Jukes-Cantor (JC): All states assumed to have the = equilibrium frequency Changes between all states are equally probable Differences: Multiple (andgt;4) states allowed conditional on no characters being constant across all taxa Slide30: The root of the Australidelphian marsupial tree dasyurids bandicoots diprotodonts S. American marsupials Australasian Marsupiarnivora Slide31: dasyurids bandicoots diprotodonts S. American marsupials syndactylus Non-syndactylus Syndactyla Slide32: Phillips et al. (Syst. Biol, 2006) ML, MP and Bayesian inference on 17,864 nuclear and mitochondrial nucleotide sites Australasian Marsupicarnivora Slide33: Horovitz and Sánchez-Villagra (Cladistics, 2002) Parsimony on 230 morphological characters Syndactyla + Dromiciops Slide34: Mayulestes Pucadelphys Shrew opossum Monito del monte Pygmy possum Sugar glider Ringtail possum Cuscus Brushtail possum Tree kangaroo Pademellon Kangaroo Forest wallaby Wombat Koala Mulgara Quoll Phascogale Marsupial mole Spiny bandicoot Long-nosed bandicoot Virginia opossum Grey Short-tailed opossum Bayesian inference (Mkv-model) on the 230 morphological characters of Horovitz and Sánchez-Villagra (2002) Australasian Marsupicarnivora Slide35: Combined Molecular and Morphological data analysis Allows model-based estimation for molecular sequences to be combined with morphological data (including for fossil taxa) Taxon1 ACGTAAGTC 0000110 Taxon2 ATGGAAATT 1110302 Taxon3 ACATAAATC 1020111 Taxon4 ACGCTAGTC 0010012 Partition 1 (e.g. GTR model) Partition 2 (Mkv model) (ti D12) = (ti) (D1 ti) (D2 ti) (tj) (D1 tj) (D2 tj) T J=1 Slide36: Glenner et al. (Curr. Biol., 2005) Deuterostomes (inc. echinoderms, sea squirts, vertebrates) Ecdysozoa (inc. arthropods, nematodes) (Platyhelminthes, rotifers, acorn worms) Lophotrochozoa (inc. molluscs, annalids) Combined (DNA/morphology) Bayesian analysis of Metazoans Slide37: Bayesian character state reconstruction: Shell hairs on the Trochulus snails Pfenninger et al. (BMC Evol. Biol., 2005) Slide38: Relaxed-clock Bayesian Phylogenetics r = rate (substitutions/site/time); t = node height; branch-length (b) = rt; n = number of taxa A. Clock B. No-clock C. Relaxed-clock t1ri b4 b3 b2 b1 b1 b5 b4 b7 b3 b2 b6 t3rk t4rl t2rj Forces a molecular clock (r is constant) – assumes deviation from this is sampling error. Different rate for each branch – no sampling error is assumed. Rates are influenced by sampling error and vary according to an underlying distribution (e.g. exponential or lognormal). Slide39: A. Clock B. No-clock C. Relaxed-clock t1ri b4 b3 b2 b1 b1 b5 b4 b7 b3 b2 b6 t3rk t4rl t2rj n-1 branch-length parameters 2n-3 branch-length parameters n-1 node heights and 1-2 rate distribution parameters Most sequence datasets have not evolved in a clock-like fashion, and so the assumption of a clock often produces incorrect tree topologies Slide40: By allowing for the influence of sampling error, relaxed-clock models can more accurately infer underlying substitution rates and hence provide greater phylogenetic accuracy. By estimating fewer branch-length parameters, less variance will tend to be associated with relaxed-clock model estimates, thus providing for greater phylogenetic precision than no-clock models However, we need to know how well fitting the rate distribution models are Relaxed-clocks : the holy grail of phylogenetics? Slide41: A. B. Yeasts (106) C. Plants (61) D. Animals (99) E. Primates (500) B. C. D. E. A. Bacteria (102) Drummond et al. (PLoS Biol., 2006) Slide42: Drummond et al. (PLoS Biol., 2006) Relaxed phylogenetics allows co-estimation of phylogeny and divergence timing