Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment: Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment by Christian Ehrlich
Song1, C. Liu1, X. Huang2, R.L. Malmberg3, Y. Xu4, and L.Cai1
1 Dept. of Computer Science, Univ. of Georgia, USA
2 Dept. of Computer Science, Arkansas State Univ., USA
3 Dept. of Plant Bio., Univ. of Georgia, USA
4 Dept. of Biochemistry and Molecular Bio., Univ. of Georgia, USA
WABI2005, LNBI3692, pp. 376-388, 2005.
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Structure-sequence alignment http://legionella.cu-genome.org http://www.sanger.ac.uk
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Motivation
non-coding (nc)RNA
- gene regulation
- chromosome replication
- RNA modification
Why not simply search for the consensus sequence?
structure search is way more accurate !!!
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Existing methods for structure-sequence alignment
Integer Linear Programming (ILP) (Lenhof, Reinert, Vingron; 1998)
huge number of linear constrains
Divide-and-conquer based on ‘open links’ (Xu, Xu, Uberbacher; 1998)
only useable for very small RNAs
Standard DP extended to handle pseudoknots (Umura et al.; 1999)
memory overhead, computational intractable
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Main concept (Song et al.; 2005) 1 2 3 4 5 6 RNA structure 1 2 3 4 5 6 C N Structure graph a b c d e f Genomic sequence N C a b c d e f Sequence graph Tree-decomposition Dynamic programming Struc-Seq aligned!
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Constructing the structure graph H 1 2 3 4 5 6 RNA consensus structure Obtain RNA consensus structure from multiple alignment or DB (e.g. Rfam) Identify base-pairing regions Each base-pairing region is modeled as vertex.
Each structural interaction is modeled as undirected edge.
Each loop is modeled as directed edge. N- and C-terminus are modeled as source and sink, respectively.. done!
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Constructing the sequence graph G Genomic sequence Identify candidate region in the sequence (take the k best) For each set of candidates construct a structure graph Connect all pairing candidate regions Add N- and C-terminus done!
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Tree-decomposition 1 2 3 4 5 6 C N Structure graph H 1 2 3 1 2 6 1 6 5 N 1 C 4 5 6 3 4 5 2 3 4 Tree-decomposition T ‘all‘ tree properties are conserved! u v
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Dynamic programming approach (bottom up) 1 2 6 1 6 5 4 5 6 Tree-decomposition g h f A mapping is valid if
for any set of overlapping
candidates, at most one can
be selected in the subgraph and 1. 2. d1 d2 d3
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Objective function S1 - the alignment between a structural unit u and its candidate
S2 - the alignment between the interaction of two structural units (u,v)
and the interaction of the coresponding candidates f(u),f(v)
S3 - the alignment between a loop and the coresponding candidate loop Dynamic programming
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Sensitivity/Selectivity
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Performance
Take this massage home!: Take this massage home! Main concept 1 2 3 4 5 6 RNA structure 1 2 3 4 5 6 C N Structure graph a b c d e f Genomic sequence N C a b c d e f Sequence graph Tree-decomposition Dynamic programming Struc-Seq aligned!
Take this massage home!: Take this massage home! Why use struc.-seq. alignment based on tree-decomposition?
Sensitivity and selectivity comparable to other struc-seq
alignment models (e.g. CMs)
Speed-up of about 30 to 40 times over other methods
Applicable in large sequences (e.g. virus/bacteria genome)
Handling of pseudoknot-structures is very easy
Introduction Graph generation Alignment Results: Introduction Graph generation Alignment Results Referenzes Yinglei Song et al.: Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment. WABI2005, LNBI3692, pp. 376-388, 2005.
Yinglei Song et al.: Tree Decomposition Based Fast Search of RNA Structures Including Pseudoknots in Genomes. unpublished
Yinglei Song et al.: A Tree-Decomposition Approach to Protein Structure Prediction. unpublished
H.L. Bodlaende: A Tourist Guide through Treewidth. Acta Cybernetica, 11:1–21, 1193.
A. Ehrenfeucht et al.: An O(n2) Divide-and-Concer Algorithm for the Prime Tree Decomposition of Two-Structures and Modular Decomposition of Graphs. Journal of Algorithms 16, 283-294 (1994)
Any questions?: Any questions?