logging in or signing up taylor Arkwright26 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 991 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: June 19, 2007 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... By: vmani83 (31 month(s) ago) Hello , hw ru ? This ppt is very nice to understand human transcriptome for my studies..plz send to my mail ID: vmani83@gmail.com. Thanking you Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Beyond the Human Genome:Transcriptomics: Beyond the Human Genome: Transcriptomics Dr Jen Taylor Henry Wellcome Centre for Gene Function Bioinformatics Department of Statistics taylor@stats.ox.ac.uk Slide2: Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville andamp; Caius College, Cambridge, UK. Beyond the Human Genome: 1995 Human Genome sequencing begins in earnest 'Mapping the Book of Life' 1999 Human Genome 2000 - First Draft Human Genome 2003 - Essential Completion Human Genome = approx 140, 000 genes = 30, 000 – 40,000 genes ?? = 24, 195 genes !!!??? Slide3: Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville andamp; Caius College, Cambridge, UK. Beyond the Human Genome: Gene Number ≠ Complexity Gene Slide4: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Slide5: Transcriptome: 'transcriptome, the mRNAs expressed by a genome at any given time..' (Abbott, 1999) Central Dogma of Molecular Biology: Central Dogma of Molecular Biology Image: Access Excellence, National Institutes of Heath mRNA – single stranded RNA molecule Complementary to DNA Processed (spliced and polyadenylated) RNA transcript Carries the sequence of a gene out of the nucleus into the cytoplasm where it can be translated into a protein structure Transcriptome: An evolving definition: Transcriptome: An evolving definition (the population of) mRNAs expressed by a genome at any given time (Abbott, 1999) The complete collection of transcribed elements of the genome. (Affymetrix, 2004) mRNAs: 35, 913 transcripts (including alternative spliced variants) Non-coding RNAs tRNAs (497 genes) rRNAs (243 genes) snmRNAs (small non-messenger RNAs) microRNAs and siRNAs (small interferring RNAs) snoRNAs (small nucleolar RNAs) snRNAs (small nuclear RNAs) Pseudogenes (~ 2,000) The human transcriptome: The human transcriptome Kampa et al., Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Research. 2004 High density oligonucleotide arrays across 11 different cell lines ~ 70% of transcripts non-coding ~79-88% have multiple transcripts Kapranov et al., 2002 ~ 90% of transcribed nucleotides outside annotated exons Nucleotides The dimensions of the unique transcriptome?? andgt;andgt;andgt; current 40,000 estimate Transcriptomics: Transcriptomics Scope the population of functional RNA transcripts. the mechanisms that regulate the production of RNA transcripts dynamics of the trancriptome (time, cell type, genotype, external stimuli) Definition The study of characteristics and regulation of the functional RNA transcript population of a cell/s or organism at a specific time. Slide10: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Observing the transcriptome: Observing the transcriptome High-throughput friendly Context dependent and dynamic Regulatory network Predicts Biology Transcriptome Genome Proteome **Li et al., 2004 ** Slide12: Data from PubMed Publications: Expression Profiling vs Proteomics Observing the transcriptome?: Observing the transcriptome? Classic Human Transcriptome Profiling Studies: Trancriptome reflects Biology Golub et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999. ALL – acute lymphoblastic leukemia AML – acute myeloid leukemia Scherf et al., A gene expression database for the molecular pharmacology of cancer. Nature Genetics 2000 60 human cancer cell lines Observing the transcriptome: Observing the transcriptome Focussed Experimental Approaches: Northern Blotting Analysis Real time PCR (quantitative or semi-quantitative) Highthroughput Approaches: Closed System Profiling: Microarray expression profiling Open System Profiling: Serial analysis of gene expression (SAGE) Massively Parallel Signature Sequencing (MPSS) Slide15: Limit of Detection: 1 in 30,000 transcripts ~ 20 transcripts/cell Red – increase of Cy5 sample transcripts Green – increase of Cy3 sample transcripts Yellow – equal abundance Experimental overview:: Experimental overview: Slide17: Limit of Detection: 1 in 30,000 transcripts ~ 20 transcripts/cell Red – increase of Cy5 sample transcripts Green – increase of Cy3 sample transcripts Yellow – equal abundance Platforms and Formats: Platforms and Formats Isotope Nylon – cDNA (300-900 nt) Two-colour Glass cDNA or Oligo (80 nt) 500 – 11,000 elements Affymetrix Silicone – oligo (20 nt) 22 ,000 elements Tissue Arrays Glass Tissue Discs (20-150) Slide19: Affymetrix GeneChip® Affymetrix GeneChip® Limits: 1: 100,000 transcripts ~ 5 transcripts/cell Slide20: http://www.affymetrix.com Affymetrix:: Affymetrix: Gene Expression Arrays Transcripts/Genes Arabidopsis Genome 24,000 C. elegans Genome 22,500 Drosophila Genome 18, 500 E. coli Genome 20, 366 Human Genome U133 Plus 47,000 Mouse Genome 39, 000 Yeast Genome 5, 841 (S. cerevisiae) andamp; 5, 031 (S. pombe) Rat Genome 30, 000 Zebrafish 14, 900 Plasmodium/Anopheles 4,300 (P. falciparum) andamp; 14,900 (A. gambiae) Barley (25,500), Soybean (37,500 + 23,300 pathogen), Grape (15,700) Canine (21,700), Bovine (23,000) B.subtilis (5,000), S. aureus (3,300 ORFS), Xenopus (14, 400) Microarray and GeneChip Approaches: Microarray and GeneChip Approaches Advantages: Rapid Method and data analysis well described and supported Robust Convenient for directed and focussed studies Disadvantages: Closed system approach Difficult to correlate with absolute transcript number Sensitive to alternative splicing ambiguities Serial Analysis of Gene Expression (SAGE): Serial Analysis of Gene Expression (SAGE) The principles: Velculescu et al., Science 1995 A transcript (new or novel) can be recognised by a small subset (e.g. 14) of its nucleotides – a tag Linking tags allows for rapid sequencing. Open system for transcript profiling AAAAAAAAA – 3’ TAG AAAAAAAAA – 3’ TAG AAAAAAAAA – 3’ TAG 14 nt TAG TAG TAG AAAAAAAAA – 3’ TAG TAG Sequence AGCTTGAACCGTGACATCATGGCCATTGGCCCCAATTGAGACAGTGAGTTCAATGC Modified SAGE methods LongSAGE (21 nt) SAGE-lite, micro-SAGE, mini-SAGE RASL/DASL methods (5’ and 3’ Tags) SAGE: SAGE Advantages: Potential ‘open’ system method – new transcripts can be identified Accuracy of unambiguous transcript observation Digital output of data Quantitative and qualitative information Disadvantages: Characterising novel transcripts is often computationally difficult from short tag sequences Tag specificity (recently increased length to 21 bp) Length of tags can vary (RE enzyme activity variable with temperature) A subset of transcripts do not contain enzyme recognition sequence Sensitive to a subset of alternative splice variants Slide25: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Slide26: Biological question Biological verification and interpretation Microarray experiment Experimental design Platform Choice Image analysis Normalization Clustering Pattern Discovery Sample Attributes 16-bit TIFF Files (Rspot, Rbkg), (Gspot, Gbkg) Data Mining Classification Statistical Analysis Analysis: Analysis 47,000 x 2 x 2 datapoints 47,000 x 2 x 2 datapoints Liver Brain 47,000 x 2 x 2 datapoints Lymphocyte 188, 000 188, 000 188, 000 Analysis: Analysis Essential problem: Given a large dataset with technical and biological noise: Find: A) Transcripts: patterns (common themes or differences) measures of robustness or some idea of uncertainty B) Sample: similarities or differences between samples on global/multi-gene level Analysis: Analysis Which transcripts are different? What are the patterns? Liver Brain Lymphocytes Biologists Nightmare: Statisticians Playground: Biologists Nightmare: Statisticians Playground Characteristics of the expression profiling data: High dimensionality Sample number (n) low and observation number high (p) Non-independence of observations Complex patterns: visualisation and extraction Incorporation of contextual information Standardisation and data sharing Integration of andamp; with other data types Analysis Methods: Analysis Methods Classical parametric andamp; non-parametric statistical tests for hypothesis testing Unsupervised clustering algorithms Hierarchical clustering Kmeans and Self-Organising Maps Classification e.g. Machine learning and Linear discriminant analysis Dimensionality Reduction or Principal Component Analysis e.g. Gene Shaving and Multi-dimensional Scaling Probabilistic Modelling Dynamic Bayesian Networks Markov Models Analysis Methods: Classical Parametric Statistical Analysis: Liver Brain Lymphocyte Analysis Methods Tools: T-test ANOVA Mann Whitney U Test Fold Change Analysis Methods: Classical Parametric Statistical Analysis: Analysis Methods Difficulties Assumes that observations are normally distributed and independent ‘Statistical significance’ does not equal biological significance Appropriate multiple testing corrections are difficult ??? (P=0.01) 20,000 transcripts = 200 transcripts Analysis Methods: Analysis Methods Algorithms: Hierarchical clustering Kmeans clustering Self organising maps Clustering Approaches: Divides or groups genes/samples into groups 'clusters', based on similarities and differences Number of groups is user defined Distance Metrics: Distance Metrics Euclidean Pearson(r*-1) Distance between 2 expression vectors 4.2 1.4 -1.00 -0.90 Time Distance Metric: Distance Metric Transcription Factor Transcript Target Transcript 1 Target Transcript 2 Hierarchical Clustering: Hierarchical Clustering g1 is most like g8 g4 is most like {g1, g8} Hierarchical Tree: Hierarchical Tree Clustering: Case Study: Clustering: Case Study Sorlie et al., 2001 Breast tissue subtypes Hierarchical clustering Slide40: K-means clustering Partition or centroid algorithms Step 1: User specifies K clusters K = 3 x x x Brain Expression Level Liver Expression Level Slide41: Step 2 – Using Euclidean distance nearest points assigned to clusters (k) K = 3 x x x K-means clustering Step 3 – New centroids calculated Slide42: K = 3 K-means clustering Step 4 – Points re-assigned to nearest centroid Step 5 – New centroids calculated Slide43: Classification Adapted from Florian Markowetz Transcript A Transcript B K-nearest neighbour methods (KNN) Linear Discriminant Analysis (LDA) Machine Learning: Support Vector Machines Neural Network Analysis Slide44: Classification Training Set 2/3 sample set Test Set 1/3 sample set Define Classification Rule Gene A Gene B Linear Discriminant Analysis KNN Slide45: Classification More complex classifiers Adapted from Florian Markowetz Gene A Gene B KNN – Voting scheme – (k=3) Use three closest points to classify Slide46: Probabilistic Modelling Incorporate dependencies and prior knowledge into the identification of patterns/clusters: - relationships in time between samples - relationships between genes Handle measures of uncertainty well Conceptually simple, consideration needed on implementation Markov modelling Dynamic bayesian networks Analysis Methods: Analysis Methods Classical parametric andamp; non-parametric statistical tests for hypothesis testing Unsupervised clustering algorithms Hierarchical clustering Kmeans and Self-Organising Maps Classification Machine learning and Linear discriminant Analysis Dimensionality Reduction or Principal Component Analysis Gene Shaving and Multi-dimensional Scaling Probabilistic Modelling Dynamic Bayesian Networks and Pattern recognition Markov Models Slide48: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data curation and analysis pipelines Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Slide49: …. to be continued. Slide50: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data curation and analysis pipelines Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Regulation of Gene Expression: Regulation of Gene Expression Abundance (transcript) = Rate of Transcription – Rate of Decay Transcription Decay Protein/DNA interactions cis and trans regulatory sequence motifs chromatin structure Methylation Protein/RNA interactions cis-acting regulatory motifs secondary structure Regulation of Transcription: Regulation of Transcription Wray et al., 2003 Slide53: Regulation of Decay Stabilisation – facilitates rapid increase in potential protein production Destabilisation – facilitates precise time and dose control of transcripts Sequence-mediated mRNA decay – AU rich elements (AREs) 3’ UTR, 50 – 150 nucleotides usually multiple copies (e.g. AUUUA x 5) protein recruitment for destabilisation size and content variation (functionally critical motif unknown) andgt;30% of vertebrate homologous mRNAs have highly conserved elements in the 3’UTR - often sequence andamp; position Time Time Abundance Abundance Stabile Decay Slide54: The importance of the decay process BMP2 (bone morphogenetic protein 2) developmentally critical, highly conserved protein in vertebrates (Fritz et al., 2004) 3’ UTR conservation: - 73% /100 nucleotides, 450 myr evolution - 95% within mammals Cancer related genes: C-fos, C-myc, C-jun, MMP-13, Cyclooxygenase-2, Cyclin D, Cyclin E, Cyclins A and B, Cdk inhibitors, DNA methyltransferase 1………. (Review: Audic and Hartley, 2004) Regulation of Transcription: Regulation of Transcription Wray et al., 2003 Regulation of Trancription: Regulation of Trancription Diverse orientations, structure and functional properties of regulatory modules Wray et al., 2003 Regulation of the transcriptome: Regulation of the transcriptome Finding regulatory elements using co-abundant transcripts Assumption: shared abundance profile = same cluster = shared regulatory machinery Penacchio and Rubin, 2001 Slide58: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome The transcriptome & the genome: The transcriptome andamp; the genome Using the genome to infer/observe the transcriptome: Construction of whole genome/transcriptome arrays and SAGE tags Using sequence features to predict gene expression: Beer and Tavazoie. Predicting gene expression from sequence. Cell 2004 Using chromatin structure to predict regulation of gene expression: Sabo et al. Genome-wide identification of DNaseI hypersenstive sites. PNAS 2004 Quantitative trait loci mapping Morley et al., Genetic analysis of genome-wide variation in human gene expression. Nature 2004 Schadt et al., Genetics of gene expression surveyed in mouse, human and maize. Nature 2003 Transcriptome & Genome: Transcriptome andamp; Genome Beer and Tavazoie, Cell. 2004 Abundance profile Predict potential gene expression patterns Transcription factor binding site Transcriptome & Genome: Transcriptome andamp; Genome Beer and Tavazoie, Cell. 2004 AND Logic: OR Logic, NOT Logic: AND Logic, OR Logic: Combinatorial patterns help identify groups of transcripts predicted to show similar abundance profiles Solid: Actual expression Dashed: Predicted Slide62: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome The transcriptome & the proteome: The transcriptome andamp; the proteome Functional annotations of co-abundant genes Yang et al., 2003 Decay rates of human mRNAs: Correlation with functional characteristics and sequence attributes. Genome Research. Co-ordinated patterns of decay rates within functional classes of transcripts Transcription factor functional classes have 'fast-decaying' mRNAs (andlt;2 hr half lives). Transcripts of multi-subunit proteins have correlated decay patterns and rates The transcriptome & the proteome: The transcriptome andamp; the proteome Do they agree? Studies of direct correlation between mRNA abundance and protein abundances ( r = 0.6) (Hegde et al., 2003) Biological Issues: Post-translational modifications Protein stability and folding Alternative splicing products Technical Issues: Inter-platform variability (microarray and RT PCR: r = 0.8) Protein abundance measures – 2D gel electrophoresis The transcriptome & the proteome: The transcriptome andamp; the proteome The integration of transcriptomics and proteomics Hegde et al., 2003 Synergistic approaches to biological problems using both transcriptomics and proteomics Beyond the Human Transcriptome: Beyond the Human Transcriptome Challenges for the Future: (short and long term) Integration of different datatypes - sequence, exon structure, transcript abundance, protein abundance and function Dealing with alternative splice variants The regulatory processes behind any given RNA abundance Dealing with gene ontologies in a quantitative manner Beyond the Human Transcriptome: Beyond the Human Transcriptome Future Directions: ‘Open’ systems for comprehensively cataloguing the transcriptome - between tissues/cells/developmental time points - between individuals Variation of transcriptome between individuals - coding variants, epigenetic variation and inheritance Clinical deployment of transcriptome profiling approaches in diagnostics and pharmacogenetics Human Regulatory Network Resources for Tissues Acknowledgements: Acknowledgements Oxford Centre for Gene Function Jotun Hein Chris Holmes Gerton Lunter Lizhong Hao Ben Holtom Karen Lees http://www.stats.ox.ac.uk/~taylor/Presentations You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
taylor Arkwright26 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 991 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: June 19, 2007 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... By: vmani83 (31 month(s) ago) Hello , hw ru ? This ppt is very nice to understand human transcriptome for my studies..plz send to my mail ID: vmani83@gmail.com. Thanking you Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Beyond the Human Genome:Transcriptomics: Beyond the Human Genome: Transcriptomics Dr Jen Taylor Henry Wellcome Centre for Gene Function Bioinformatics Department of Statistics taylor@stats.ox.ac.uk Slide2: Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville andamp; Caius College, Cambridge, UK. Beyond the Human Genome: 1995 Human Genome sequencing begins in earnest 'Mapping the Book of Life' 1999 Human Genome 2000 - First Draft Human Genome 2003 - Essential Completion Human Genome = approx 140, 000 genes = 30, 000 – 40,000 genes ?? = 24, 195 genes !!!??? Slide3: Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville andamp; Caius College, Cambridge, UK. Beyond the Human Genome: Gene Number ≠ Complexity Gene Slide4: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Slide5: Transcriptome: 'transcriptome, the mRNAs expressed by a genome at any given time..' (Abbott, 1999) Central Dogma of Molecular Biology: Central Dogma of Molecular Biology Image: Access Excellence, National Institutes of Heath mRNA – single stranded RNA molecule Complementary to DNA Processed (spliced and polyadenylated) RNA transcript Carries the sequence of a gene out of the nucleus into the cytoplasm where it can be translated into a protein structure Transcriptome: An evolving definition: Transcriptome: An evolving definition (the population of) mRNAs expressed by a genome at any given time (Abbott, 1999) The complete collection of transcribed elements of the genome. (Affymetrix, 2004) mRNAs: 35, 913 transcripts (including alternative spliced variants) Non-coding RNAs tRNAs (497 genes) rRNAs (243 genes) snmRNAs (small non-messenger RNAs) microRNAs and siRNAs (small interferring RNAs) snoRNAs (small nucleolar RNAs) snRNAs (small nuclear RNAs) Pseudogenes (~ 2,000) The human transcriptome: The human transcriptome Kampa et al., Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Research. 2004 High density oligonucleotide arrays across 11 different cell lines ~ 70% of transcripts non-coding ~79-88% have multiple transcripts Kapranov et al., 2002 ~ 90% of transcribed nucleotides outside annotated exons Nucleotides The dimensions of the unique transcriptome?? andgt;andgt;andgt; current 40,000 estimate Transcriptomics: Transcriptomics Scope the population of functional RNA transcripts. the mechanisms that regulate the production of RNA transcripts dynamics of the trancriptome (time, cell type, genotype, external stimuli) Definition The study of characteristics and regulation of the functional RNA transcript population of a cell/s or organism at a specific time. Slide10: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Observing the transcriptome: Observing the transcriptome High-throughput friendly Context dependent and dynamic Regulatory network Predicts Biology Transcriptome Genome Proteome **Li et al., 2004 ** Slide12: Data from PubMed Publications: Expression Profiling vs Proteomics Observing the transcriptome?: Observing the transcriptome? Classic Human Transcriptome Profiling Studies: Trancriptome reflects Biology Golub et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999. ALL – acute lymphoblastic leukemia AML – acute myeloid leukemia Scherf et al., A gene expression database for the molecular pharmacology of cancer. Nature Genetics 2000 60 human cancer cell lines Observing the transcriptome: Observing the transcriptome Focussed Experimental Approaches: Northern Blotting Analysis Real time PCR (quantitative or semi-quantitative) Highthroughput Approaches: Closed System Profiling: Microarray expression profiling Open System Profiling: Serial analysis of gene expression (SAGE) Massively Parallel Signature Sequencing (MPSS) Slide15: Limit of Detection: 1 in 30,000 transcripts ~ 20 transcripts/cell Red – increase of Cy5 sample transcripts Green – increase of Cy3 sample transcripts Yellow – equal abundance Experimental overview:: Experimental overview: Slide17: Limit of Detection: 1 in 30,000 transcripts ~ 20 transcripts/cell Red – increase of Cy5 sample transcripts Green – increase of Cy3 sample transcripts Yellow – equal abundance Platforms and Formats: Platforms and Formats Isotope Nylon – cDNA (300-900 nt) Two-colour Glass cDNA or Oligo (80 nt) 500 – 11,000 elements Affymetrix Silicone – oligo (20 nt) 22 ,000 elements Tissue Arrays Glass Tissue Discs (20-150) Slide19: Affymetrix GeneChip® Affymetrix GeneChip® Limits: 1: 100,000 transcripts ~ 5 transcripts/cell Slide20: http://www.affymetrix.com Affymetrix:: Affymetrix: Gene Expression Arrays Transcripts/Genes Arabidopsis Genome 24,000 C. elegans Genome 22,500 Drosophila Genome 18, 500 E. coli Genome 20, 366 Human Genome U133 Plus 47,000 Mouse Genome 39, 000 Yeast Genome 5, 841 (S. cerevisiae) andamp; 5, 031 (S. pombe) Rat Genome 30, 000 Zebrafish 14, 900 Plasmodium/Anopheles 4,300 (P. falciparum) andamp; 14,900 (A. gambiae) Barley (25,500), Soybean (37,500 + 23,300 pathogen), Grape (15,700) Canine (21,700), Bovine (23,000) B.subtilis (5,000), S. aureus (3,300 ORFS), Xenopus (14, 400) Microarray and GeneChip Approaches: Microarray and GeneChip Approaches Advantages: Rapid Method and data analysis well described and supported Robust Convenient for directed and focussed studies Disadvantages: Closed system approach Difficult to correlate with absolute transcript number Sensitive to alternative splicing ambiguities Serial Analysis of Gene Expression (SAGE): Serial Analysis of Gene Expression (SAGE) The principles: Velculescu et al., Science 1995 A transcript (new or novel) can be recognised by a small subset (e.g. 14) of its nucleotides – a tag Linking tags allows for rapid sequencing. Open system for transcript profiling AAAAAAAAA – 3’ TAG AAAAAAAAA – 3’ TAG AAAAAAAAA – 3’ TAG 14 nt TAG TAG TAG AAAAAAAAA – 3’ TAG TAG Sequence AGCTTGAACCGTGACATCATGGCCATTGGCCCCAATTGAGACAGTGAGTTCAATGC Modified SAGE methods LongSAGE (21 nt) SAGE-lite, micro-SAGE, mini-SAGE RASL/DASL methods (5’ and 3’ Tags) SAGE: SAGE Advantages: Potential ‘open’ system method – new transcripts can be identified Accuracy of unambiguous transcript observation Digital output of data Quantitative and qualitative information Disadvantages: Characterising novel transcripts is often computationally difficult from short tag sequences Tag specificity (recently increased length to 21 bp) Length of tags can vary (RE enzyme activity variable with temperature) A subset of transcripts do not contain enzyme recognition sequence Sensitive to a subset of alternative splice variants Slide25: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Slide26: Biological question Biological verification and interpretation Microarray experiment Experimental design Platform Choice Image analysis Normalization Clustering Pattern Discovery Sample Attributes 16-bit TIFF Files (Rspot, Rbkg), (Gspot, Gbkg) Data Mining Classification Statistical Analysis Analysis: Analysis 47,000 x 2 x 2 datapoints 47,000 x 2 x 2 datapoints Liver Brain 47,000 x 2 x 2 datapoints Lymphocyte 188, 000 188, 000 188, 000 Analysis: Analysis Essential problem: Given a large dataset with technical and biological noise: Find: A) Transcripts: patterns (common themes or differences) measures of robustness or some idea of uncertainty B) Sample: similarities or differences between samples on global/multi-gene level Analysis: Analysis Which transcripts are different? What are the patterns? Liver Brain Lymphocytes Biologists Nightmare: Statisticians Playground: Biologists Nightmare: Statisticians Playground Characteristics of the expression profiling data: High dimensionality Sample number (n) low and observation number high (p) Non-independence of observations Complex patterns: visualisation and extraction Incorporation of contextual information Standardisation and data sharing Integration of andamp; with other data types Analysis Methods: Analysis Methods Classical parametric andamp; non-parametric statistical tests for hypothesis testing Unsupervised clustering algorithms Hierarchical clustering Kmeans and Self-Organising Maps Classification e.g. Machine learning and Linear discriminant analysis Dimensionality Reduction or Principal Component Analysis e.g. Gene Shaving and Multi-dimensional Scaling Probabilistic Modelling Dynamic Bayesian Networks Markov Models Analysis Methods: Classical Parametric Statistical Analysis: Liver Brain Lymphocyte Analysis Methods Tools: T-test ANOVA Mann Whitney U Test Fold Change Analysis Methods: Classical Parametric Statistical Analysis: Analysis Methods Difficulties Assumes that observations are normally distributed and independent ‘Statistical significance’ does not equal biological significance Appropriate multiple testing corrections are difficult ??? (P=0.01) 20,000 transcripts = 200 transcripts Analysis Methods: Analysis Methods Algorithms: Hierarchical clustering Kmeans clustering Self organising maps Clustering Approaches: Divides or groups genes/samples into groups 'clusters', based on similarities and differences Number of groups is user defined Distance Metrics: Distance Metrics Euclidean Pearson(r*-1) Distance between 2 expression vectors 4.2 1.4 -1.00 -0.90 Time Distance Metric: Distance Metric Transcription Factor Transcript Target Transcript 1 Target Transcript 2 Hierarchical Clustering: Hierarchical Clustering g1 is most like g8 g4 is most like {g1, g8} Hierarchical Tree: Hierarchical Tree Clustering: Case Study: Clustering: Case Study Sorlie et al., 2001 Breast tissue subtypes Hierarchical clustering Slide40: K-means clustering Partition or centroid algorithms Step 1: User specifies K clusters K = 3 x x x Brain Expression Level Liver Expression Level Slide41: Step 2 – Using Euclidean distance nearest points assigned to clusters (k) K = 3 x x x K-means clustering Step 3 – New centroids calculated Slide42: K = 3 K-means clustering Step 4 – Points re-assigned to nearest centroid Step 5 – New centroids calculated Slide43: Classification Adapted from Florian Markowetz Transcript A Transcript B K-nearest neighbour methods (KNN) Linear Discriminant Analysis (LDA) Machine Learning: Support Vector Machines Neural Network Analysis Slide44: Classification Training Set 2/3 sample set Test Set 1/3 sample set Define Classification Rule Gene A Gene B Linear Discriminant Analysis KNN Slide45: Classification More complex classifiers Adapted from Florian Markowetz Gene A Gene B KNN – Voting scheme – (k=3) Use three closest points to classify Slide46: Probabilistic Modelling Incorporate dependencies and prior knowledge into the identification of patterns/clusters: - relationships in time between samples - relationships between genes Handle measures of uncertainty well Conceptually simple, consideration needed on implementation Markov modelling Dynamic bayesian networks Analysis Methods: Analysis Methods Classical parametric andamp; non-parametric statistical tests for hypothesis testing Unsupervised clustering algorithms Hierarchical clustering Kmeans and Self-Organising Maps Classification Machine learning and Linear discriminant Analysis Dimensionality Reduction or Principal Component Analysis Gene Shaving and Multi-dimensional Scaling Probabilistic Modelling Dynamic Bayesian Networks and Pattern recognition Markov Models Slide48: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data curation and analysis pipelines Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Slide49: …. to be continued. Slide50: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data curation and analysis pipelines Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome Regulation of Gene Expression: Regulation of Gene Expression Abundance (transcript) = Rate of Transcription – Rate of Decay Transcription Decay Protein/DNA interactions cis and trans regulatory sequence motifs chromatin structure Methylation Protein/RNA interactions cis-acting regulatory motifs secondary structure Regulation of Transcription: Regulation of Transcription Wray et al., 2003 Slide53: Regulation of Decay Stabilisation – facilitates rapid increase in potential protein production Destabilisation – facilitates precise time and dose control of transcripts Sequence-mediated mRNA decay – AU rich elements (AREs) 3’ UTR, 50 – 150 nucleotides usually multiple copies (e.g. AUUUA x 5) protein recruitment for destabilisation size and content variation (functionally critical motif unknown) andgt;30% of vertebrate homologous mRNAs have highly conserved elements in the 3’UTR - often sequence andamp; position Time Time Abundance Abundance Stabile Decay Slide54: The importance of the decay process BMP2 (bone morphogenetic protein 2) developmentally critical, highly conserved protein in vertebrates (Fritz et al., 2004) 3’ UTR conservation: - 73% /100 nucleotides, 450 myr evolution - 95% within mammals Cancer related genes: C-fos, C-myc, C-jun, MMP-13, Cyclooxygenase-2, Cyclin D, Cyclin E, Cyclins A and B, Cdk inhibitors, DNA methyltransferase 1………. (Review: Audic and Hartley, 2004) Regulation of Transcription: Regulation of Transcription Wray et al., 2003 Regulation of Trancription: Regulation of Trancription Diverse orientations, structure and functional properties of regulatory modules Wray et al., 2003 Regulation of the transcriptome: Regulation of the transcriptome Finding regulatory elements using co-abundant transcripts Assumption: shared abundance profile = same cluster = shared regulatory machinery Penacchio and Rubin, 2001 Slide58: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome The transcriptome & the genome: The transcriptome andamp; the genome Using the genome to infer/observe the transcriptome: Construction of whole genome/transcriptome arrays and SAGE tags Using sequence features to predict gene expression: Beer and Tavazoie. Predicting gene expression from sequence. Cell 2004 Using chromatin structure to predict regulation of gene expression: Sabo et al. Genome-wide identification of DNaseI hypersenstive sites. PNAS 2004 Quantitative trait loci mapping Morley et al., Genetic analysis of genome-wide variation in human gene expression. Nature 2004 Schadt et al., Genetics of gene expression surveyed in mouse, human and maize. Nature 2003 Transcriptome & Genome: Transcriptome andamp; Genome Beer and Tavazoie, Cell. 2004 Abundance profile Predict potential gene expression patterns Transcription factor binding site Transcriptome & Genome: Transcriptome andamp; Genome Beer and Tavazoie, Cell. 2004 AND Logic: OR Logic, NOT Logic: AND Logic, OR Logic: Combinatorial patterns help identify groups of transcripts predicted to show similar abundance profiles Solid: Actual expression Dashed: Predicted Slide62: Introduction: The scope of transcriptomics – a definition of the transcriptome Part I: Observing the transcriptome Experimental methodology Data analysis Part II: Using the transcriptome The regulation of the trancriptome The transcriptome and the genome The transcriptome and the proteome Beyond the Human Transcriptome The transcriptome & the proteome: The transcriptome andamp; the proteome Functional annotations of co-abundant genes Yang et al., 2003 Decay rates of human mRNAs: Correlation with functional characteristics and sequence attributes. Genome Research. Co-ordinated patterns of decay rates within functional classes of transcripts Transcription factor functional classes have 'fast-decaying' mRNAs (andlt;2 hr half lives). Transcripts of multi-subunit proteins have correlated decay patterns and rates The transcriptome & the proteome: The transcriptome andamp; the proteome Do they agree? Studies of direct correlation between mRNA abundance and protein abundances ( r = 0.6) (Hegde et al., 2003) Biological Issues: Post-translational modifications Protein stability and folding Alternative splicing products Technical Issues: Inter-platform variability (microarray and RT PCR: r = 0.8) Protein abundance measures – 2D gel electrophoresis The transcriptome & the proteome: The transcriptome andamp; the proteome The integration of transcriptomics and proteomics Hegde et al., 2003 Synergistic approaches to biological problems using both transcriptomics and proteomics Beyond the Human Transcriptome: Beyond the Human Transcriptome Challenges for the Future: (short and long term) Integration of different datatypes - sequence, exon structure, transcript abundance, protein abundance and function Dealing with alternative splice variants The regulatory processes behind any given RNA abundance Dealing with gene ontologies in a quantitative manner Beyond the Human Transcriptome: Beyond the Human Transcriptome Future Directions: ‘Open’ systems for comprehensively cataloguing the transcriptome - between tissues/cells/developmental time points - between individuals Variation of transcriptome between individuals - coding variants, epigenetic variation and inheritance Clinical deployment of transcriptome profiling approaches in diagnostics and pharmacogenetics Human Regulatory Network Resources for Tissues Acknowledgements: Acknowledgements Oxford Centre for Gene Function Jotun Hein Chris Holmes Gerton Lunter Lizhong Hao Ben Holtom Karen Lees http://www.stats.ox.ac.uk/~taylor/Presentations