Insilico gene mapping

Views:
 
     
 

Presentation Description

For the students of genetics and molecular biology

Comments

Presentation Transcript

PowerPoint Presentation: 

In Vivo In Vitro In Silico  X R.Jayashree, Asst Prof(AGB),Veterinary College ,Bangalore AGB 706 Insilico Genemapping Prepared for the AGB706 course

PowerPoint Presentation: 

Prepared for the AGB706 course

PowerPoint Presentation: 

Prepared for the AGB706 course

PowerPoint Presentation: 

Eukaryotic gene structure Promoter TSS 5'UTR, usually includes translation start signal (ATG) ORF (exons and introns / intravening non coding sequences) Translational stop 3'UTR signals Prepared for the AGB706 course

PowerPoint Presentation: 

Pattern recognised in DNA sequence Identifying genes –regions that code for proteins, Exons Identifying signals -Promoters, Enhancers, Start, Stop, Donor and Acceptor site, Motif, CpG islands Prepared for the AGB706 course

PowerPoint Presentation: 

Bioinformatics Prepared for the AGB706 course

PowerPoint Presentation: 

Broad classification of Biological Databases Prepared for the AGB706 course

PowerPoint Presentation: 

Gene Discovery Pipelines DNA EST database Compare gene expression Trait Link to marker And position on chromosome Fine map Find useful variant Prepared for the AGB706 course

PowerPoint Presentation: 

Databases of Biological Sequences and Structures >BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus. MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSG DLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDE SKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYH WPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDE YSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGI KSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITR GNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVS LAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPY YLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNT KRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH NCBI: 15 , 465 , 325 sequences 17,089,143,893 nucleotides Swiss-Prot: 107,523 sequences 39,529,396 residues PDB: 17,869 structures Prepared for the AGB706 course

Bioinformatics Tools available free on Net: 

Bioinformatics Tools available free on Net GENSCAN : It allows prediction of complete gene structures in genomic sequences, including exons, introns, Translate Tool: It allows the translation of nucleotide (DNA/RNA) sequence to a protein sequence, ExPASy - Translate tool Prepared for the AGB706 course

PowerPoint Presentation: 

The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. GenBank Maintained by the National Center for Biotechnology Information (NCBI) as part of the International Nucleotide Sequence Database Collaboration (INSDC). Prepared for the AGB706 course

PowerPoint Presentation: 

EMBL Prepared for the AGB706 course

PowerPoint Presentation: 

DNA Database of Japan Prepared for the AGB706 course

PowerPoint Presentation: 

It is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Eg., SEQUEROME can be accessed freely Sequence profiling tool Prepared for the AGB706 course

PowerPoint Presentation: 

Prepared for the AGB706 course

PowerPoint Presentation: 

Finding ORF’s Homology/similarity- (evolutionary basis) Content Based method- Abinitio - (based on distinct nucleotide features in coding region) Signal Based method Combination of Abinitio and homology Approaches to gene finding Prepared for the AGB706 course

PowerPoint Presentation: 

Homology Based Direct comparison of genomic sequence with databases of expressed sequence tags using BLASTN & AAT (Basic Local Alignment Search Tool ) Comparison with protein sequence databases BLASTX Spliced alignment of a genomic sequence with a homologous protein PROCRUSTES Prepared for the AGB706 course

PowerPoint Presentation: 

Comparison of predicted peptides derived from GENSCAN with protein sequence databases. Comparison of a translated genomic sequence with a translated genomic or cDNA sequence using TBLASTX. Comparison of gene sequences of closely related species (Human Vs Mouse) using BLAST,CLUSTALTAW to identify conserved regions. Homology Based (Contd) Prepared for the AGB706 course

PowerPoint Presentation: 

Homology based gene prediction Alignment of EST’s/ cDNA/ mRNA sequences against genomic sequence (most reliable way of gene prediction) Eg.,Whole-genome shotgun (WGS) contigs of cattle genome assembly (More than 2 million in silico SNPs) Prepared for the AGB706 course

Global and local alignment: 

Global and local alignment Global - To align the entire sequence using as many characters as possible up to both ends of the sequence Local - Stretches of sequences with the highest density of matches. Prepared for the AGB706 course

Pairwise alignment tools -sim: 

Pairwise alignment tools -sim Prepared for the AGB706 course

PowerPoint Presentation: 

Prepared for the AGB706 course

Multiple sequence alignment: 

Multiple sequence alignment Prepared for the AGB706 course

Clustal w: 

Clustal w Compute the pair wise alignments for all against all sequences. (sequences versus sequences). Convert the sequence similarity matrix values to distance measures, reflecting evolutionary distance between each pair of sequences. Construct a guide tree for the order in which pairs of sequences are to be aligned and combined with previous alignments. Progressively align the sequences/alignments together into each branch point of the guide tree, starting with the least distant pairs of sequences. Prepared for the AGB706 course

T coffee: 

T coffee Tree based consistency objective function for alignment and evaluation performs multiple sequence alignment based on both local and global alignments. Prepared for the AGB706 course

PowerPoint Presentation: 

Prepared for the AGB706 course

PowerPoint Presentation: 

Two broad category of sequence patterns/features Content feature (eg. exon length, GC content) Signal feature (eg. ATG, splice site) Ab initio or statistical approach Prepared for the AGB706 course

PowerPoint Presentation: 

Content Based Method It is done by Coding measure: Number of codes correlated with protein coding function. Composition measure- It is the frequency of each nucleotide in each of the codon protein. Fourier method- Looks at correlation between letters/Frequency of the bases Prepared for the AGB706 course

PowerPoint Presentation: 

Signal Based method Looks for short sequence in and around a protein coding region which are involved in gene transcription process in post transcriptional modification. 1. Start (ATG) and stop signals (TAA, TAG , TGA) start stop 5’ - T A G A T G T T C A T G T C C T A T G T G A T A G A T C - 3’ Prepared for the AGB706 course

PowerPoint Presentation: 

Splice sites The donor splice site (5’ splice site) AG / GT RAGT The acceptor site (3’ splice site) pyrimidine rich YTTYYYYYYNC / AG G Less conserved branch site YRYYRY exon1 GT 5’ - A YRYYRY 5’ AG exon2 - 3’ intron1 2. Conserved intronic signals Prepared for the AGB706 course

PowerPoint Presentation: 

90 % of mRNA have a perfect copy of this signal (yeast, rice, Arabidopsis, fly, mouse and human) “C’ frequently upstream side and ‘T’ or ‘C’ downstream side ‘U’ rich sequence upstream to poly A signal 3. Conserved poly (A) signal 5’ - AATAAA - 3’ Prepared for the AGB706 course

PowerPoint Presentation: 

Different features are integrated into a single score eg. (splice site + codon bias + hexamer frequency) Genes are predicted …. Finding the highest score for a given sequence, leading to exon assembly and single transcript prediction. Integration of the scores Prepared for the AGB706 course

PowerPoint Presentation: 

Deviations in genetic code Overlapping genes Genes within genes Horizontal gene transfers Factors influencing gene prediction Prepared for the AGB706 course

PowerPoint Presentation: 

Prepared for the AGB706 course

PowerPoint Presentation: 

Prepared for the AGB706 course

PowerPoint Presentation: 

Prepared for the AGB706 course

PowerPoint Presentation: 

ADDRESSES OF DATABASES Database Address EMBL /Nucleic acid http://www/ebi.ac.uk/ebi db/ebi/toopembl.html EMBL SRS http://www/ebi.ac.uk/srs/srsc GenBank http://www.ncbi.nlm.nih.gov/ Web/Search/index.html GemBank Enterz http://www3.ncbi.nlm.nih.gov/ Enterz/index.html PIR http://www.gdb.org/Dan/proteins/pir.html PUMA http://www.mcs.anl.gov/home/compbio/ PUMA/Production/puma.html RDP http://rdp.life.uiuc.edu/ rRNA server http://rrna.uia.ac.be/ SwissProt http://www.ebi.ac.uk/ebi docs/swissprot db/swisshome.html Prepared for the AGB706 course

PowerPoint Presentation: 

HMG CoA Reductase – Sequence Prepared for the AGB706 course

PowerPoint Presentation: 

Fasta format of HMG CoA reductase Prepared for the AGB706 course

PowerPoint Presentation: 

BLAST searches for HMG CoA reductase Prepared for the AGB706 course

PowerPoint Presentation: 

Protein Model by SwissPDB Server Prepared for the AGB706 course

PowerPoint Presentation: 

Greater understanding of CpG islands and methylation patterns Understanding the alternative splicing process Predict short and very long exons more accurately To identify non-translated exons To predict insulators and boundary elements, and matrix-attachment and scaffold attachment regions To predict replication origins and recombination hot spots. Future Challenges for Gene Prediction Prepared for the AGB706 course

PowerPoint Presentation: 

Gene identification insilico,Dr.Nita Parekh,Internation Institute of Information technology,Hyderabad. Thank you Prepared for the AGB706 course