logging in or signing up Comp Gen Phylo HMM Dabby Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 644 Category: Product Traini.. License: All Rights Reserved Like it (0) Dislike it (0) Added: September 25, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Phylogenetic Hidden Markov Models: A Tool for Comparative Genomics: Phylogenetic Hidden Markov Models: A Tool for Comparative Genomics COMP571 Fall 2006 Comparative Genomics: Comparative Genomics Comparing genetic material from different species to understand evolution, conservation, gene function, disease inheritance, etc. A 'feature' that is present/conserved across species may not be detected by looking at a single genome What Is Compared?: What Is Compared? Gene location (physical location on the chromosome) Gene structure (numbers and lengths of the various 'components') Gene characteristics (codon usage, conserved synteny, etc.) Today there is great interest in analyzing and understanding conserved non-coding sequences (CNS) Slide4: Regions of the human and mouse homologous genes: Coding exons (white), noncoding exons (gray}, introns (dark gray), and intergenic regions (black). Corresponding strong (white) and weak (gray) alignment regions of GLASS are shown connected with arrows. Dark lines connecting the alignment regions denote very weak or no alignment. The predicted coding regions of ROSETTA in human, and the corresponding regins in mouse, are shown (white) between the genes and the alignment regions. From 'Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction', Batzoglou et al., Genome Research. Slide5: Regions of conserved synteny between Mmu 16 and the human genome. The analysis was done at the protein level with the MUMmmer program; each line represents a pair of orthologous genes present in the mouse and human. Cyto, cytogenetic markers; SCF, scaffold distribution; and Ori, orientation of scaffolds. From 'A Comparison of Whole-genome Shotgun- Derived Mouse Chromosome 16 and the Human Genome', Mural et al., Science. Combining Phylogeny and HMMs: Combining Phylogeny and HMMs Molecular evolution can be viewed as a combination of two Markov processes One that operates in the dimension of space (along a genome) One that operates in the dimension of time (along the branches of a phylogenetic tree) Phylo-HMMs model this combination A Brief History of Phylo-HMMs: A Brief History of Phylo-HMMs First proposed as a way of improving phylogenetic modeling that allow for variation among sites in the rate of substitution (Felsensteinandamp;Churchill, 1996; Yang, 1995) Adapted for the problem of secondary structure prediction (Goldman et al., 1996; Thorne et al., 1996) Later used for recombination detection (Husmeierandamp;Wright, 2001) Recently, there has been a wide revival of interest in these models in connection with the explosion in the availability of comparative sequence data Phylo-HMMs: Phylo-HMMs A phylo-HMM is a probabilistic machine that generates a multiple alignment, column by column, such that each column is defined by a phylogenetic model Unlike single-sequence HMMs, the emission probabilities of phylo-HMMs are complex distributions defined by phylogenetic models Slide9: Single-sequence HMM Phylo-HMM Phylogenetic models (1): Phylogenetic models (1) Stochastic process of substitution that operates independently at each site in a genome A character is first drawn at random from the background distribution and assigned to the root of the tree; character substitutions then occur randomly along the tree branches, from root to leaves The characters at the leaves define an alignment column Phylogenetic Models (2): Phylogenetic Models (2) The different phylogenetic models associated with the states of a phylo-HMM may reflect different overall rates of substitution (as in conserved and non-conserved regions), different patterns of substitution or background distributions (as in coding and non-coding regions), or even different tree topologies (as with recombination) Phylo-HMMs: Formal Definition: Phylo-HMMs: Formal Definition A phylo-HMM is a 4-tuple : : set of states : set of associated phylogenetic models : transition probabilities : initial-state probabilities The Phylogenetic Model (1) : The Phylogenetic Model (1) : : substitution rate matrix : background frequencies : binary tree : branch lengths The Phylogenetic Model (2): The Phylogenetic Model (2) The model is defined with respect to an alphabet whose size is denoted d The substitution rate matrix has dimension dxd The background frequencies vector has dimension d The tree has n leaves, corresponding to n extant taxa The branch lengths are associated with the tree Probability of the Data: Probability of the Data Let X be an alignment consisting of L columns and n rows, with the ith column denoted Xi The probability that column Xi is emitted by state sj is simply the probability of Xi under the corresponding phylogenetic model, This is the likelihood of the column given the tree, which can be computed efficiently using Felsenstein’s 'pruning' algorithm (which we described earlier) Substitution Probabilities (1): Substitution Probabilities (1) Felsenstein’s algorithm requires the conditional probabilities of substitution for all bases a,b and branch lengths tj The probability of substitution of a base b for a base a along a branch of length t, denoted is based on a continuous-time Markov model of substitution, defined by the rate matrix Qj Substitution Probabilities (2): Substitution Probabilities (2) In particular, for any given non-negative value t, the conditional probabilities for all a,b are given the dxd matrix , where Example: HKY model: Example: HKY model represents the transition/transversion rate ratio for The - symbols indicate quantities required to make each row sum to zero Paths in Phylo-HMMs (1): Paths in Phylo-HMMs (1) A 'path' through the phylo-HMM is a sequence of states such that The joint probability of a path and and alignment is Paths in Phylo-HMMs (2): Paths in Phylo-HMMs (2) The likelihood is given by the sum over all paths The maximum-likelihood path is Computing the Probabilities: Computing the Probabilities The likelihood can be computed efficiently using the forward algorithm The maximum-likelihood path can be computed efficiently using the Viterbi algorithm The forward and backward algorithms can be combined to compute the posterior probability Higher-order Markov Models for Emissions : Higher-order Markov Models for Emissions It is common with gene-finding HMMs to condition the emission probability of each observation on the observations that immediately precede it in the sequence For example, in a 3-rd-codon-position state, the emission of a base xi='A' might have a fairly high probability if the previous two bases are xi-2='G' and xi-1='A' (GAA=Glu), but should have zero probability if the previous two bases are xi-2='T' and xi-1='A' (TAA=stop) Higher-order Markov Models for Emission: Higher-order Markov Models for Emission Considering the N observations preceding each xi corresponds to using an Nth order Markov model for emissions An Nth order model for emissions is typically parameterized in terms of (N+1)-tuples of observations, and conditional probabilities are computed as Nth Order Phylo-HMMs: Nth Order Phylo-HMMs Sum over all possible alignment columns Y (can be calculated efficiently by a slight modification of Felsenstein’s 'pruning' algorithm Probability of the N-tuple You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Comp Gen Phylo HMM Dabby Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 644 Category: Product Traini.. License: All Rights Reserved Like it (0) Dislike it (0) Added: September 25, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Phylogenetic Hidden Markov Models: A Tool for Comparative Genomics: Phylogenetic Hidden Markov Models: A Tool for Comparative Genomics COMP571 Fall 2006 Comparative Genomics: Comparative Genomics Comparing genetic material from different species to understand evolution, conservation, gene function, disease inheritance, etc. A 'feature' that is present/conserved across species may not be detected by looking at a single genome What Is Compared?: What Is Compared? Gene location (physical location on the chromosome) Gene structure (numbers and lengths of the various 'components') Gene characteristics (codon usage, conserved synteny, etc.) Today there is great interest in analyzing and understanding conserved non-coding sequences (CNS) Slide4: Regions of the human and mouse homologous genes: Coding exons (white), noncoding exons (gray}, introns (dark gray), and intergenic regions (black). Corresponding strong (white) and weak (gray) alignment regions of GLASS are shown connected with arrows. Dark lines connecting the alignment regions denote very weak or no alignment. The predicted coding regions of ROSETTA in human, and the corresponding regins in mouse, are shown (white) between the genes and the alignment regions. From 'Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction', Batzoglou et al., Genome Research. Slide5: Regions of conserved synteny between Mmu 16 and the human genome. The analysis was done at the protein level with the MUMmmer program; each line represents a pair of orthologous genes present in the mouse and human. Cyto, cytogenetic markers; SCF, scaffold distribution; and Ori, orientation of scaffolds. From 'A Comparison of Whole-genome Shotgun- Derived Mouse Chromosome 16 and the Human Genome', Mural et al., Science. Combining Phylogeny and HMMs: Combining Phylogeny and HMMs Molecular evolution can be viewed as a combination of two Markov processes One that operates in the dimension of space (along a genome) One that operates in the dimension of time (along the branches of a phylogenetic tree) Phylo-HMMs model this combination A Brief History of Phylo-HMMs: A Brief History of Phylo-HMMs First proposed as a way of improving phylogenetic modeling that allow for variation among sites in the rate of substitution (Felsensteinandamp;Churchill, 1996; Yang, 1995) Adapted for the problem of secondary structure prediction (Goldman et al., 1996; Thorne et al., 1996) Later used for recombination detection (Husmeierandamp;Wright, 2001) Recently, there has been a wide revival of interest in these models in connection with the explosion in the availability of comparative sequence data Phylo-HMMs: Phylo-HMMs A phylo-HMM is a probabilistic machine that generates a multiple alignment, column by column, such that each column is defined by a phylogenetic model Unlike single-sequence HMMs, the emission probabilities of phylo-HMMs are complex distributions defined by phylogenetic models Slide9: Single-sequence HMM Phylo-HMM Phylogenetic models (1): Phylogenetic models (1) Stochastic process of substitution that operates independently at each site in a genome A character is first drawn at random from the background distribution and assigned to the root of the tree; character substitutions then occur randomly along the tree branches, from root to leaves The characters at the leaves define an alignment column Phylogenetic Models (2): Phylogenetic Models (2) The different phylogenetic models associated with the states of a phylo-HMM may reflect different overall rates of substitution (as in conserved and non-conserved regions), different patterns of substitution or background distributions (as in coding and non-coding regions), or even different tree topologies (as with recombination) Phylo-HMMs: Formal Definition: Phylo-HMMs: Formal Definition A phylo-HMM is a 4-tuple : : set of states : set of associated phylogenetic models : transition probabilities : initial-state probabilities The Phylogenetic Model (1) : The Phylogenetic Model (1) : : substitution rate matrix : background frequencies : binary tree : branch lengths The Phylogenetic Model (2): The Phylogenetic Model (2) The model is defined with respect to an alphabet whose size is denoted d The substitution rate matrix has dimension dxd The background frequencies vector has dimension d The tree has n leaves, corresponding to n extant taxa The branch lengths are associated with the tree Probability of the Data: Probability of the Data Let X be an alignment consisting of L columns and n rows, with the ith column denoted Xi The probability that column Xi is emitted by state sj is simply the probability of Xi under the corresponding phylogenetic model, This is the likelihood of the column given the tree, which can be computed efficiently using Felsenstein’s 'pruning' algorithm (which we described earlier) Substitution Probabilities (1): Substitution Probabilities (1) Felsenstein’s algorithm requires the conditional probabilities of substitution for all bases a,b and branch lengths tj The probability of substitution of a base b for a base a along a branch of length t, denoted is based on a continuous-time Markov model of substitution, defined by the rate matrix Qj Substitution Probabilities (2): Substitution Probabilities (2) In particular, for any given non-negative value t, the conditional probabilities for all a,b are given the dxd matrix , where Example: HKY model: Example: HKY model represents the transition/transversion rate ratio for The - symbols indicate quantities required to make each row sum to zero Paths in Phylo-HMMs (1): Paths in Phylo-HMMs (1) A 'path' through the phylo-HMM is a sequence of states such that The joint probability of a path and and alignment is Paths in Phylo-HMMs (2): Paths in Phylo-HMMs (2) The likelihood is given by the sum over all paths The maximum-likelihood path is Computing the Probabilities: Computing the Probabilities The likelihood can be computed efficiently using the forward algorithm The maximum-likelihood path can be computed efficiently using the Viterbi algorithm The forward and backward algorithms can be combined to compute the posterior probability Higher-order Markov Models for Emissions : Higher-order Markov Models for Emissions It is common with gene-finding HMMs to condition the emission probability of each observation on the observations that immediately precede it in the sequence For example, in a 3-rd-codon-position state, the emission of a base xi='A' might have a fairly high probability if the previous two bases are xi-2='G' and xi-1='A' (GAA=Glu), but should have zero probability if the previous two bases are xi-2='T' and xi-1='A' (TAA=stop) Higher-order Markov Models for Emission: Higher-order Markov Models for Emission Considering the N observations preceding each xi corresponds to using an Nth order Markov model for emissions An Nth order model for emissions is typically parameterized in terms of (N+1)-tuples of observations, and conditional probabilities are computed as Nth Order Phylo-HMMs: Nth Order Phylo-HMMs Sum over all possible alignment columns Y (can be calculated efficiently by a slight modification of Felsenstein’s 'pruning' algorithm Probability of the N-tuple