logging in or signing up Oxford 0410 1 Arundel0 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 172 Category: Product Traini.. License: All Rights Reserved Like it (0) Dislike it (0) Added: June 19, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Ontologies for Informatics . Infrastructure for Systems Biology . Oxford October 19 2004 Slide2: To provide structured controlled vocabularies for the representation of biological knowledge in biological databases. Slide3: Manifesto of Liberation Bioinformatics Be open source Use open standards Make data andamp; code available without constraint Involve your community Slide4: Gene Ontology - 1998 FlyBase Drosophila Cambridge, EBI, Harvard Berkeley andamp; Bloomington. SGD Saccharomyces Stanford. MGI Mus Jackson Labs., Bar Harbor. Gene Ontology - 2004: Gene Ontology - 2004 Fruitfly - FlyBase Budding yeast - Saccharomyces Genome Database (SGD) Mouse - Mouse Genome Database (MGD andamp; GXD) Rat - Rat Genome Database (RGD) Weed - The Arabidopsis Information Resource (TAIR) Worm - WormBase Dictyostelium discoidem - Dictybase InterPro/UniProt at EBI - InterPro Fission yeast - Pombase Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB - Sanger Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR Grasses - rice andamp; maize - Gramene database zebra fish - Zfin Coming: Xenopus, Chlamydomonas, Tetrahymena, Gallus andamp; more. GOThree (Orthogonal) Ontologies: GO Three (Orthogonal) Ontologies Biological Process Goal or objective within cell, tissue .. Molecular Function Elemental activity or task Cellular Component location or complex Slide7: molecular function 7422 terms biological process 8972 terms cellular component 1472 terms all 17,866 terms definitions 16,600 (93%) Content of GO What is the least complex data structure that is sufficient?: What is the least complex data structure that is sufficient? Key word list? Hierarchical tree? Directed acyclic graph? Other? What data structure to use ? Directed Acyclic Graph: Directed Acyclic Graph tree directed acyclic graph Classes of parent-child relationship: ISA (hypernomy/hyponomy) as in: an elephant is a mammal PARTOF (meronomy/holonomy) as in: a trunk is part of an elephant REGULATES carbohydrate metabolism regulates: regulation of carbohydrate metabolism Classes of parent-child relationship Slide11: Cellular component %membrane %vacuolar membrane %nuclear membrane %intracellular %cell andlt;cytoplasm andlt;vacuole andlt;vacuolar membrane andlt;vacuolar lumen andlt;nucleus andlt;nuclear membrane ISA (%) PARTOf (andlt;) Structure of the Ontologies Slide12: term: chloroplast go_id: GO:0009507 definition: A chlorophyll-containing plastid with thylakoids organized into grana and frets, or stroma thylakoids, and embedded in a stroma. definition_reference: ISBN:0471245208 term: ketone catabolism goid: GO:0042182 definition: The breakdown into simpler components of ketones, a class of organic compounds that contain the carbonyl group, CO, and in which the carbonyl group is bonded only to carbon atoms. The general formula for a ketone is RCOR, where R and R are alkyl or aryl groups. definition_reference: GO:curators GO terms are defined andamp; have unique id’s Annotation of GO terms to gene products: literature curation: Inferred from Mutant Phenotype Inferred from Direct Assay Inferred from Genetic Interaction Inferred from Physical Interaction Inferred from Expression Pattern Traceable Author Statement Non-traceable Author Statement. 'homologies': Inferred from Sequence Similarity computed annotation: Inferred from Electronic Annotation Annotation of GO terms to gene products Slide14: GO Gene Association Tables Herpes viruses Vibrio cholerae, B. anthracis, Coxiella burnetii, Pseudomonas syringae, Shewanella oneidensis … Dictyostelium discoidem Saccharomyces cerevisiae, Schizosaccharomyces pombe Trypanosoma brucei, Leishmania major, Plasmodium falciparum Caenorhabditis elegans Drosophila melanogaster, Glossina morsitans Danio rerio Mus 'domesticus', Rattus norvegicus, Homo sapiens bioinformaticus Arabidopsis thaliana, Oryza sativa Slide15: FB FBgn0015567 andamp;agr;-Adaptin GO:0005886 FB:FBrf0093110|PMID:9118220 IDA C FB FBgn0015567 andamp;agr;-Adaptin GO:0007269 FB:FBrf0108281|PMID:10218159 NAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0016192 FB:FBrf0124164 NAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0030122 FB:FBrf0115359 NAS C FB FBgn0015567 andamp;agr;-Adaptin GO:0030122 FB:FBrf0124164 NAS C FB FBgn0015567 andamp;agr;-Adaptin GO:0006901 FB:FBrf0108281|PMID:10218159 TAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0008021 FB:FBrf0108281|PMID:10218159 TAS C FB FBgn0015567 andamp;agr;-Adaptin GO:0016181 FB:FBrf0141528|PMID:11697879 TAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0016183 FB:FBrf0108281|PMID:10218159 TAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0030135 FB:FBrf0108281|PMID:10218159 TAS C FB FBgn0010215 andamp;agr;-Cat GO:0003779 FB:FBrf0132100 ISS F FB FBgn0010215 andamp;agr;-Cat GO:0007016 FB:FBrf0129868|PMID:10908592 ISS P FB FBgn0010215 andamp;agr;-Cat GO:0008092 FB:FBrf0132100 ISS F FB FBgn0010215 andamp;agr;-Cat GO:0016342 FB:FBrf0129868|PMID:10908592 ISS C FB FBgn0010215 andamp;agr;-Cat GO:0016343 FB:FBrf0129868|PMID:10908592 ISS F FB FBgn0010215 andamp;agr;-Cat GO:0005912 FB:FBrf0151280|PMID:12147138 NAS C SGD S0004660 AAC1 GO:0005743 SGD_REF:12031|PMID:2167309 TAS C SGD S0004660 AAC1 GO:0006854 SGD_REF:12031|PMID:2167309 IDA P SGD S0004660 AAC1 GO:0005471 SGD_REF:12031|PMID:2167309 IDA F SGD S0000289 AAC3 GO:0005743 SGD_REF:13606|PMID:1915842 TAS C SGD S0000289 AAC3 GO:0006854 SGD_REF:13606|PMID:1915842 IMP P SGD S0000289 AAC3 GO:0005471 SGD_REF:13606|PMID:19158 42 IMP F ADP/ATP translocator YBR085W|ANC3 gene taxid:4932 20010213 SGD go/gene_associations/ Slide16: Curated GO Annotations 1.12.2001 1.12.2003 Gene products 42421 253962 GO terms 4262 7741 Slide17: Expression studies: Human ontogenic tumor gene expression Human breast cancer gene expression Human endothelial cell gene expression Human fibrosarcoma cell cDNAs Human osteoblast progenitor cell gene expression Human fibrosarcoma cell gene expression Mouse cDNAs - FANTOM/FANTOM2 Projects Mouse lung gene expression Mouse dendritic cell gene expression Mouse hepatic and hippocampal gene expression Mouse liver tumor gene expression Drosophila gene expression during aging Drosophila embryo gene expression Affymetrix Probe Sets Protein annotation: Vertebrate nuclear proteins Human GPCR proteins Mouse proteome PANTHER protein families EST collections: Cattle ESTs, Pig ESTs, Dog ESTs Paracoccidioides brasiliensis ESTs Plasmodium falciparum ESTs Honey bee ETSs Schizophyllum commune ESTs Meloidogyne incognita ESTs Plasmodium vivax ESTs Amblyomma variegatum ETSs Genomic annotation: Drosophila melanogaster genome Caenorhabditis briggsae genome Anopheles gambiae genome Schizosaccharomyces pombe genome Plasmodium yoelli genome Plasmodium falciparum genome Dictyostelium genome Rice genome Plant alternatively spliced genes Human pseudogenes http://www.geneontology.org/GO.biblio.html Database annotations: SGD: Dwight et al. 2002 Database annotations Annotation summaries: Meloidogyne incognita: McCarter et al. 2003 Annotation summaries Slide20: The combinatorial nightmare Combinatoric explosion: Combinatoric explosion Process Body part Regulation Negative or Positive 2 * 1 * (# of processes - 1) Induction 2 * 2 * (# of processes - 2) 2 * 2 * (# of processes - 2) * (# of body parts) Slide22: Slide23: Slide24: OBOL - Open Biological Ontologies Language Chris Mungall The OBOL System: The OBOL System Approach: annotation-time term composition vs tools for maintenance of large directed acyclic graphs Requires new generalization hierarchies Term decomposition using grammars Generating computable logical definitions Using logical definitions – term creation and error checking A A Formal Grammar for OBO terms Formal Grammar for OBO terms: A A Formal Grammar for OBO terms Formal Grammar for OBO terms All GO terms are NOUN-PHRASES A NOUN-PHRASE is (recursively) made from a NOUN (includes inflected verbs; eg binding) an ADJECTIVE followed by a NOUN-PHRASE a NOUN-PHRASE preceeded by a NOUN-PHRASE acting as ADJECTIVE; eg clathrin coat a NOUN-PHRASE then PREPOSITION then NOUN-PHRASE; eg regulation of transcription an (optional) NOUN-PHRASE then a RELATIONAL ADJECTIVE then a NOUN-PHRASE; eg clathrin-coated vesicle Precedence rules are also required to prune parse forest A Formal Grammar for OBO terms Slide27: Gene Ontology Software Browsers - Amigo Database - mySQL Editor - DAG-EDIT geneontology.sourceforge.net Third party software (e.g. Spotfire; TreeMap; GoFish; FatiGO) Slide28: Slide29: Slide30: Slide31: Slide32: OBO-Edit - a powerful editor for directed acyclic graphs. data adaptors multiple edits on same graph define your own relationship types plug in architecture - e.g. add an external in-line dictionary Slide33: Slide34: The importance of community feedback Everyone can suggest new terms for GO and tell us what errors we have made. geneontology.sourceforge.net You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Oxford 0410 1 Arundel0 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 172 Category: Product Traini.. License: All Rights Reserved Like it (0) Dislike it (0) Added: June 19, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Ontologies for Informatics . Infrastructure for Systems Biology . Oxford October 19 2004 Slide2: To provide structured controlled vocabularies for the representation of biological knowledge in biological databases. Slide3: Manifesto of Liberation Bioinformatics Be open source Use open standards Make data andamp; code available without constraint Involve your community Slide4: Gene Ontology - 1998 FlyBase Drosophila Cambridge, EBI, Harvard Berkeley andamp; Bloomington. SGD Saccharomyces Stanford. MGI Mus Jackson Labs., Bar Harbor. Gene Ontology - 2004: Gene Ontology - 2004 Fruitfly - FlyBase Budding yeast - Saccharomyces Genome Database (SGD) Mouse - Mouse Genome Database (MGD andamp; GXD) Rat - Rat Genome Database (RGD) Weed - The Arabidopsis Information Resource (TAIR) Worm - WormBase Dictyostelium discoidem - Dictybase InterPro/UniProt at EBI - InterPro Fission yeast - Pombase Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB - Sanger Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR Grasses - rice andamp; maize - Gramene database zebra fish - Zfin Coming: Xenopus, Chlamydomonas, Tetrahymena, Gallus andamp; more. GOThree (Orthogonal) Ontologies: GO Three (Orthogonal) Ontologies Biological Process Goal or objective within cell, tissue .. Molecular Function Elemental activity or task Cellular Component location or complex Slide7: molecular function 7422 terms biological process 8972 terms cellular component 1472 terms all 17,866 terms definitions 16,600 (93%) Content of GO What is the least complex data structure that is sufficient?: What is the least complex data structure that is sufficient? Key word list? Hierarchical tree? Directed acyclic graph? Other? What data structure to use ? Directed Acyclic Graph: Directed Acyclic Graph tree directed acyclic graph Classes of parent-child relationship: ISA (hypernomy/hyponomy) as in: an elephant is a mammal PARTOF (meronomy/holonomy) as in: a trunk is part of an elephant REGULATES carbohydrate metabolism regulates: regulation of carbohydrate metabolism Classes of parent-child relationship Slide11: Cellular component %membrane %vacuolar membrane %nuclear membrane %intracellular %cell andlt;cytoplasm andlt;vacuole andlt;vacuolar membrane andlt;vacuolar lumen andlt;nucleus andlt;nuclear membrane ISA (%) PARTOf (andlt;) Structure of the Ontologies Slide12: term: chloroplast go_id: GO:0009507 definition: A chlorophyll-containing plastid with thylakoids organized into grana and frets, or stroma thylakoids, and embedded in a stroma. definition_reference: ISBN:0471245208 term: ketone catabolism goid: GO:0042182 definition: The breakdown into simpler components of ketones, a class of organic compounds that contain the carbonyl group, CO, and in which the carbonyl group is bonded only to carbon atoms. The general formula for a ketone is RCOR, where R and R are alkyl or aryl groups. definition_reference: GO:curators GO terms are defined andamp; have unique id’s Annotation of GO terms to gene products: literature curation: Inferred from Mutant Phenotype Inferred from Direct Assay Inferred from Genetic Interaction Inferred from Physical Interaction Inferred from Expression Pattern Traceable Author Statement Non-traceable Author Statement. 'homologies': Inferred from Sequence Similarity computed annotation: Inferred from Electronic Annotation Annotation of GO terms to gene products Slide14: GO Gene Association Tables Herpes viruses Vibrio cholerae, B. anthracis, Coxiella burnetii, Pseudomonas syringae, Shewanella oneidensis … Dictyostelium discoidem Saccharomyces cerevisiae, Schizosaccharomyces pombe Trypanosoma brucei, Leishmania major, Plasmodium falciparum Caenorhabditis elegans Drosophila melanogaster, Glossina morsitans Danio rerio Mus 'domesticus', Rattus norvegicus, Homo sapiens bioinformaticus Arabidopsis thaliana, Oryza sativa Slide15: FB FBgn0015567 andamp;agr;-Adaptin GO:0005886 FB:FBrf0093110|PMID:9118220 IDA C FB FBgn0015567 andamp;agr;-Adaptin GO:0007269 FB:FBrf0108281|PMID:10218159 NAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0016192 FB:FBrf0124164 NAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0030122 FB:FBrf0115359 NAS C FB FBgn0015567 andamp;agr;-Adaptin GO:0030122 FB:FBrf0124164 NAS C FB FBgn0015567 andamp;agr;-Adaptin GO:0006901 FB:FBrf0108281|PMID:10218159 TAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0008021 FB:FBrf0108281|PMID:10218159 TAS C FB FBgn0015567 andamp;agr;-Adaptin GO:0016181 FB:FBrf0141528|PMID:11697879 TAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0016183 FB:FBrf0108281|PMID:10218159 TAS P FB FBgn0015567 andamp;agr;-Adaptin GO:0030135 FB:FBrf0108281|PMID:10218159 TAS C FB FBgn0010215 andamp;agr;-Cat GO:0003779 FB:FBrf0132100 ISS F FB FBgn0010215 andamp;agr;-Cat GO:0007016 FB:FBrf0129868|PMID:10908592 ISS P FB FBgn0010215 andamp;agr;-Cat GO:0008092 FB:FBrf0132100 ISS F FB FBgn0010215 andamp;agr;-Cat GO:0016342 FB:FBrf0129868|PMID:10908592 ISS C FB FBgn0010215 andamp;agr;-Cat GO:0016343 FB:FBrf0129868|PMID:10908592 ISS F FB FBgn0010215 andamp;agr;-Cat GO:0005912 FB:FBrf0151280|PMID:12147138 NAS C SGD S0004660 AAC1 GO:0005743 SGD_REF:12031|PMID:2167309 TAS C SGD S0004660 AAC1 GO:0006854 SGD_REF:12031|PMID:2167309 IDA P SGD S0004660 AAC1 GO:0005471 SGD_REF:12031|PMID:2167309 IDA F SGD S0000289 AAC3 GO:0005743 SGD_REF:13606|PMID:1915842 TAS C SGD S0000289 AAC3 GO:0006854 SGD_REF:13606|PMID:1915842 IMP P SGD S0000289 AAC3 GO:0005471 SGD_REF:13606|PMID:19158 42 IMP F ADP/ATP translocator YBR085W|ANC3 gene taxid:4932 20010213 SGD go/gene_associations/ Slide16: Curated GO Annotations 1.12.2001 1.12.2003 Gene products 42421 253962 GO terms 4262 7741 Slide17: Expression studies: Human ontogenic tumor gene expression Human breast cancer gene expression Human endothelial cell gene expression Human fibrosarcoma cell cDNAs Human osteoblast progenitor cell gene expression Human fibrosarcoma cell gene expression Mouse cDNAs - FANTOM/FANTOM2 Projects Mouse lung gene expression Mouse dendritic cell gene expression Mouse hepatic and hippocampal gene expression Mouse liver tumor gene expression Drosophila gene expression during aging Drosophila embryo gene expression Affymetrix Probe Sets Protein annotation: Vertebrate nuclear proteins Human GPCR proteins Mouse proteome PANTHER protein families EST collections: Cattle ESTs, Pig ESTs, Dog ESTs Paracoccidioides brasiliensis ESTs Plasmodium falciparum ESTs Honey bee ETSs Schizophyllum commune ESTs Meloidogyne incognita ESTs Plasmodium vivax ESTs Amblyomma variegatum ETSs Genomic annotation: Drosophila melanogaster genome Caenorhabditis briggsae genome Anopheles gambiae genome Schizosaccharomyces pombe genome Plasmodium yoelli genome Plasmodium falciparum genome Dictyostelium genome Rice genome Plant alternatively spliced genes Human pseudogenes http://www.geneontology.org/GO.biblio.html Database annotations: SGD: Dwight et al. 2002 Database annotations Annotation summaries: Meloidogyne incognita: McCarter et al. 2003 Annotation summaries Slide20: The combinatorial nightmare Combinatoric explosion: Combinatoric explosion Process Body part Regulation Negative or Positive 2 * 1 * (# of processes - 1) Induction 2 * 2 * (# of processes - 2) 2 * 2 * (# of processes - 2) * (# of body parts) Slide22: Slide23: Slide24: OBOL - Open Biological Ontologies Language Chris Mungall The OBOL System: The OBOL System Approach: annotation-time term composition vs tools for maintenance of large directed acyclic graphs Requires new generalization hierarchies Term decomposition using grammars Generating computable logical definitions Using logical definitions – term creation and error checking A A Formal Grammar for OBO terms Formal Grammar for OBO terms: A A Formal Grammar for OBO terms Formal Grammar for OBO terms All GO terms are NOUN-PHRASES A NOUN-PHRASE is (recursively) made from a NOUN (includes inflected verbs; eg binding) an ADJECTIVE followed by a NOUN-PHRASE a NOUN-PHRASE preceeded by a NOUN-PHRASE acting as ADJECTIVE; eg clathrin coat a NOUN-PHRASE then PREPOSITION then NOUN-PHRASE; eg regulation of transcription an (optional) NOUN-PHRASE then a RELATIONAL ADJECTIVE then a NOUN-PHRASE; eg clathrin-coated vesicle Precedence rules are also required to prune parse forest A Formal Grammar for OBO terms Slide27: Gene Ontology Software Browsers - Amigo Database - mySQL Editor - DAG-EDIT geneontology.sourceforge.net Third party software (e.g. Spotfire; TreeMap; GoFish; FatiGO) Slide28: Slide29: Slide30: Slide31: Slide32: OBO-Edit - a powerful editor for directed acyclic graphs. data adaptors multiple edits on same graph define your own relationship types plug in architecture - e.g. add an external in-line dictionary Slide33: Slide34: The importance of community feedback Everyone can suggest new terms for GO and tell us what errors we have made. geneontology.sourceforge.net