Chemical Semantics - European Conference of Computational Chemistry

Views:
 
     
 

Presentation Description

Publication and Retrieval of Computational Chemical-Physics Data via The Semantic Web

Comments

Presentation Transcript

ChemicalSemantics, Inc.:

ChemicalSemantics , Inc. Publication and Retrieval of Computational Chemical-Physics Data v ia The Semantic Web Applying the Semantic Web to Computational Chemistry

What is this all about ?:

What is this all about ? The principal objective of our enterprise is to create a testbed for comprehensive exploration of ideas behind the practical application of the S emantic W eb in computational chemistry. The aforementioned working testbed (Chemical Semantics Portal) is initially limited to computational chemistry and a limited class of users. In addition, we will focus on the semi-empirical , ab - initio and density functional (DFT) calculations of quantum chemistry and their typical results. The purpose of this talk is to present the ideas of the Semantic Web and their possible application in computational chemistry, and to present the working prototype of the Chemical Semantic Portal.

INTRODUCTION The Basics of Semantic Web:

INTRODUCTION The Basics of Semantic Web

The evolution of the Web:

The evolution of the Web WEB 1.0 - Web of documents WEB 2.0 - Social, Read/Write Web WEB 3.0 - Semantic Web = Web of Data ? WEB 4.0 - Intelligent Web ? * Assuming Christmas 1990 as its beggining (http:// en.wikipedia.org/wiki/History_of_the_World_Wide_Web) The web is only 8287 days* ( 23 years) old ! Print – 203,800 days Newspapers – 142,800 days Radio – 41,200 days TV – 28,000 days

Web 1.0 – Web of documents:

Web 1.0 – Web of documents 1989-2000 - Web of Hyperlinked documents

Web 2.0 – Social/Read-Write Web:

Web 2.0 – Social/Read-Write Web 2000-2010 - The Web of Social Networks and “ Wisdom of the Crowds”

Web 3.0 – Semantic Web:

Web 3.0 – Semantic Web 2010-2020 (?) - Web of Data, Linked Data Web Link Link Link Link Link Link Link Link Link Link Resource Resource Resource Resource Resource Resource Resource Resource hasPeople humanResources hasServices hasProducts hasPeople hasPeople hasProduct hasProduct colleague colleague Organization HR Services Products People People Product Product

What is wrong with today’s Web?:

What is wrong with today’s Web?

Web 1.0 & 2.0 major issues:

The WEB is TOO BIG to know Web 1.0 & 2.0 major issues The WEB is TOO BIG to know Social Web dwells in isolated silos Data Deluge - Scientific data stored in isolated silos ? People look at the Web through Google’s Goggles

THE SOLUTION: Semantic Web – Web 3.0:

THE SOLUTION: Semantic Web – Web 3.0

What is Semantic Web ?:

What is Semantic Web ? The Semantic Web is a Web of data. It is a n extension of the current Web that provides an easier way to find, share, reuse and combine information . “The vision of the Semantic Web is to extend principles of the Web from documents to data . (...) This also means creation of a common framework that allows data to be shared and reused across application, enterprise, and community boundaries, to be processed automatically by tools as well as manually, including revealing possible new relationships among pieces of data.” http://www.w3.org/2001/sw/

Foundations of Semantic Web:

Foundations of Semantic Web “ Semantic ” in “ Semantic Web” is about MEANING of data, not about the syntax it is expressed in. Semantic Web = Web Full of Meaning = Web of meaningful Data Semantic Web is about representation of THINGS ( OBJECTS and CONCEPTS ) and their properties on the Web, not just about documents Semantic Web uses global NAMING scheme to identify THINGS , not just to address documents Semantic Web links THINGS with TYPED LINKS , not with “blind ” hyperlinks Semantic Web allows DISCOVERY of new FACTS about THINGS , not just browsing through pages * Picture by Roger Sayle (http:// pubs.acs.org/doi/abs/10.1021/ci800243w)

Example:

Example COC(=O)[C@H](C1=CC=CC=C1Cl)N2CCC3=C(C2)C=CS3 InChI=1S/C16H16ClNO2S/c1-20-16(19)15(12-4-2-3-5-13(12)17)18-8-6-14-11(10-18)7-9-21-14/h2-5,7,9,15H,6,8,10H2,1H3/t15-/m0/s1 InchI (Key)=GKTWGGQPFAXNFI-HNNXBMFYSA-N “ Plavix ” ( Clopidogrel) * Based on “Foreign Language Translation of Chemical Nomenclature by Computer ” by Roger Sayle (DOI: 10.1021/ci800243w) http://www.chemspider.com/InChIKey=GKTWGGQPFAXNFI-HNNXBMFYSA-N

How do we represent THINGS on SW:

How do we represent THINGS on SW On the Semantic WEB we represent THINGS using elementary UNITS of data: TRIPLES . We can create logical and structural relations between elements of the triple, build taxonomies, vocabularies and classes and finally “ reason ” on large sets of triples. The file format we store the triples in — is called RDF . :H2O gnvc:hasInChIString “ 1S/H2O/h1H2 ” For example: Subject Predicate Object Thing Property Value Resource Description Framework : has MolecularMass “18.0153” “ RDF is for THINGS as HTML is for DOCUMENTS”

How do we Identify Things on the Semantic Web:

How do we Identify Things on the Semantic Web For unambiguous identification of things (objects) on the Web and their properties, Semantic Web uses URIs — Universal Resource Identifiers , a generalization of URL i.e. Ordinary Web addresses: Water Molecular Mass “ 18.0153” http:// www. chemicalsemantics .com/h2o http:// purl.org/ chem / ns#MM A number

RDF Serialization – preliminary example:

RDF/XML or Turtle (Terse RDF Triple Language) 1 @prefix cs : < http://ChemicalSemantics.com/chem/dictionary/ns#> . 2 @prefix mol: < http:// ChemicalSemantics.com/chem/molecules/simplewater.ttl #> . 3 @prefix xs : < http://www.w3.org/2001/XMLSchema#> . 4 mol:molecule_31 a cs:molecule ; 5 cs:name “water" ; 6 cs:atom _:atom31_1 ; 7 cs:atom _:atom31_2 ; 8 cs:atom _:atom31_3 ; 9 cs:bond _:bond31_1 ; 10 cs:bond _:bond31_2 . 11 _:atom31_1 cs:atomType cs:O ; 12 cs:x3 "-0.381950"^^ xs:double ; 13 cs:y3 "0.243825"^^ xs:double ; 14 cs:z3 "0.000000"^^ xs:double . 15 _:atom31_2 cs:atomType cs:H ; 16 cs:x3 "-0.381950"^^ xs:double ; 17 cs:y3 "1.203825"^^ xs:double ; 18 cs:z3 "0.000000"^^ xs:double . 19 _:atom31_3 cs:atomType cs:H ; 20 cs:x3 "0.523148"^^ xs:double ; (.....) RDF Serialization – preliminary example

Semantic Web allows Discovery:

Semantic Web allows Discovery Semantic Web tools for building “ inteligent ” vocabularies – RDFS (RDFS Schema) and OWL ontologies allow for simple logical INFERENCES and discovery of IMPLICIT facts. For example: When a user searches for a molecule with specific properties , it is possible to automatically provide him with other molecules that belong to the same “ class ” of molecules . .

Semantic Web = GGG (Giant Global Graph):

Semantic Web = GGG (Giant Global Graph) Organization HR Services Products People People Product Product hasPeople humanResources hasServices hasProducts hasPeople hasPeople hasProduct hasProduct colleague colleague GGG – term coined by Tim Berners Lee in 2007 Ooops … sorry, but it’s BIG  Semantic Web = GGG (Giant Global Graph)

Core Semantic Web Technologies:

Core Semantic Web Technologies RDF — Resource Description Framework RDFa — RDF “ in a ttributes” RDFS — Resource Description Framework Schema Language OWL — Ontology Web Language SPARQL — Semantic Protocol & RDF Query Language RIF — Rule Interchange Format RDF  deals with THINGS RDFa  enables to embed RDF into ordinary HTML Web Pages RDFS  deals with SETS and CLASSES of THINGS OWL  deals with intelligent VOCABULARIES (with logical relations between concepts) SPARQL  allows for searching through graphs of triples stored in “ triple stores” RIF  allows to express and interchange generalized IF...THEN constructs

... and one about Semantic Web Philosophy:

AAA — A nyone can say A nything about A ny T opic. ... and one about Semantic Web Philosophy OWA — Open World Assumption. We must assume that at any time a new piece of information may come so w e can’t assume that we have ALL the information at the moment of information consumption. It also means that not knowing something does not necessarily imply falsity ! Hendler Hypothesis : “ A Little S emantics Goes A L ong Way”

Hendler Hypothesis in action...:

Link Data Four Principles: Use WEB ADDRESES (URLs) as names for things . Use ADDRESSES THAT WORK ON THE WEB - so that people can look up those names. When someone looks up a URL , PROVIDE USEFUL INFORMATION, USING THE STANDARDS ( like RDF). Include LINKS TO OTHER URLs , so that they can discover more things. Hendler Hypothesis in action... The Semantic Web isn't just about putting data on the web. It is about making links , so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. (Tim-Berners Lee )

Ontologies:

Ontologies “An ontology formally represents knowledge as a set of concepts within a domain , and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about concepts. ” (Wikipedia) The fundamental goals of ontologies : Define concepts used in Semantic graphs (like RDF) Enable terminological standardisation Provide tools for building intelligent dictionaries with synonyms and cross-references Enable encoding of taxonomies (hierarchical definitions) Enable reasoning and inferencing – discovering implicit knowledge

Early ideas in ontology :

Antoine Lavoisier “ Traité élémentaire de chimie ” Early ideas in ontology "We think only through the medium of words. --Languages are true analytical methods. (…) The art of reasoning is nothing more than a language well arranged . Thus, while I thought myself employed only in forming a Nomenclature, and while I proposed to myself nothing more than to improve the chemical language, my work transformed itself by degrees, without my being able to prevent it, into a treatise upon the Elements of Chemistry.

Example of Ontology “Hello world” :

Nivaldo J. Tro “ Chemistry. A Molecular Approach ” Example of Ontology “ Hello world” @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix chem: <http://purl.org/chem/simple_classification#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix foo: <http://example.com/this/> . ## Classes chem:Matter a rdfs:Class ; rdfs:label "Matter"@en ; rdfs:label "Matière"@fr ; rdfs:label "Materia"@pl . chem:PureSubstances a rdfs:Class ; rdfs:label "Pure Substances"@en ; rdfs:label "Substances Pures"@fr ; rdfs:label "Substancja"@pl ; rdfs:subClassOf chem:Matter . chem:Mixture a rdfs:Class ; rdfs:label "Mixture"@en ; rdfs:label "Mélange "@fr ; rdfs:label "Mieszanina"@pl ; rdfs:subClassOf chem:Matter . chem:Heterogeneous a rdfs:Class ; rdfs:label "Heterogeneous"@en ; rdfs:label "Hétérogène"@fr ; rdfs:label "Heterogeniczny"@pl ; rdfs:subClassOf chem:Mixture . chem:Homogeneous a rdfs:Class ; rdfs:label "Homogeneous"@en ; rdfs:label "Homogène"@fr ; rdfs:label "Jednorodny"@pl ; rdfs:subClassOf chem:Mixture . ## Properties chem:atomicNumber a rdf:Property ; rdfs:domain chem:Element; rdfs:range rdfs:Literal . chem:moleculeName a rdf:Property ; rdfs:domain chem:Compound; rdfs:range rdfs:Literal . chem:componentName a rdf:Property ; rdfs:domain chem:Mixture ; rdfs:range chem:Matter .

Non-Trivial Ontologies in Chemistry:

Non-Trivial Ontologies in Chemistry ChEBI – Chemical Entities of Biological Interest http://www.ebi.ac.uk/chebi / Project of EMBL-EBI European Bioinformatics Institute (Cambridge) of European Molecular Biology Lab (Heidelberg) OBO Foundry Ontology ( http ://www.obofoundry.org / ) The Open Biological and Biomedical Ontologies Chemical Entities of Biological Interest ( ChEBI ) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The term ‘molecular entity’ refers to any constitutionally or isotopicaly distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified .

Non-Trivial Ontologies in Chemistry:

Non-Trivial Ontologies in Chemistry Ch emINF – Chemical Information Ontology https://code.google.com/p/semanticchemistry / Janna Hastings, Nico Adams, Christoph Steinbeck ( EBI ) Leonid Chepelev, Michel Dumontier, Egon Willighagen, Nico Adams OBO Foundry Candidate ChemINF descibes : Chemical graphs, and various formats for encoding them. Chemical descriptors, with definitions and axioms describing what they are specifically about. Specifications for certain descriptors. Algorithms and their software implementations and axioms describing their inputs and outputs. Chemical data representation formalisms and formats.

Chemical Semantics Ontology:

Chemical Semantics Ontology http :// purl.org/gc/ gc.owl Gainesville Core (alpha edition) Gainesville Core describes: Molecular Publications Molecular Systems Molecular Calculations   Molecular Systems contain Molecules The Molecules may have Residues (for biopolymers and polymers) Molecular Calculations contain Initial Data and Results The Initial Data may have Methods, Basis Sets, Functionals , etc . The Results may have Energies , Wave Functions and Spectra , etc . GC aims at complete description of typical Computational Chemistry experiment

Chemical Semantics Ontology:

Chemical Semantics Ontology gc.owl with Protege

Related Ontologies ...:

Related Ontologies ... SIO – Semanticscience Integrated Ontology OPB – Ontology of Physics for Biology RXNO – Name Reaction Ontology CMO – Chemical Methods Ontology MOP – Molecular Proocesses Ontology SO – The Sequence Ontology Project

Importance of Structural Data Structures:

Importance of Structural Data Structures CML – Chemical Markup Language “CML is not 'just another file format'; it is capable of holding extremely complex information structures and so acting as an interchange mechanism or for archival. It interfaces easily with modern database architectures such as relational databases or object-oriented databases. Most importantly, it a large amount of generic XML software to process and transform it is already available from the community . ” P. Murray-Rust, H. S. Rzepa , 2001 CML “ paved the road” to Semantics in Chemistry. Extremely useful as an interchange format between CC software and Semantic Web Our position: Chemical Semantics will use CSX – similar structural format enriched by explicit description of molecular constituents, enriched description of computations inputs and results .

A timeline of Semantic Web:

A timeline of Semantic Web RDF – 1999 CML - Chemical Markup Language - 1999 FOAF - 2000 RDFa - 2004 DBPedia – 2007 ChEBI - Chemical Entities of Biological Interest - 2007 GoodRelations (2008, Google adoption: November 2, 2010 ) Schema.org – June 2011 Google’s Knowledge Graph – May 2012 Facebook Graph Search - January 2013

Conclusion:

An emerging successor to the web, the Semantic Web , will likely profoundly change the very nature of how scientific knowledge is produced and shared, in ways that we can now barely imagine. Conclusion

Chemical Semantics Portal http://portal.chemicalsemantics.com/cs:

Chemical Semantics Portal http:// portal.chemicalsemantics.com/cs

CS Portal main targets:

CS Portal main targets Interoperable PUBLISHING of Computational C hemistry calculations FEDERATION of published data with existing web-based chemical datasets Cloud-like ARCHIVING of Computational Chemistry calulations results, input/output files etc.

http://portal.chemicalsemantics.com/cs:

http://portal.chemicalsemantics.com/cs

http://portal.chemicalsemantics.com/cs:

http://portal.chemicalsemantics.com/cs

http://portal.chemicalsemantics.com/cs:

http://portal.chemicalsemantics.com/cs Manual publication (upload) Automated publication directly from Modelling Software - via Web API

http://portal.chemicalsemantics.com/cs:

http://portal.chemicalsemantics.com/cs Automated generation of permanent URIs

Permanent Chemical URIs:

Permanent Chemical URIs Automated generation of permanent URIs http://purl.org/chem/pub/2013-08-04-quercetin Owned & controlled by OCLC (Online Computer Library Center) Is claimed to be persistent and eternal. Owned by OCLC controlled by Chemical Semantics, Inc. Generated by Chemical Semantics, Inc. f or the user. Owned by the user.

URI naming scheme:

URI naming scheme Publication http://purl.org/chem/pub/2013-08-05-betacyanin http:// purl.org/chem/pub/2013-08-05-betacyanin/mol-calc Molecular Calculations http:// purl.org/chem/pub/2013-08-05-betacyanin/molSys Molecular System A Molecule of the system http:// purl.org/chem/pub/2013-08-05-betacyanin/molSys/m1 B onds between atoms in the molecule http:// purl.org/chem/pub/2013-08-05-betacyanin/molSys/m1/a1a12

Dual nature of the URIs:

Dual nature of the URIs Realizes Linked Data Principles For Humans (i.e. as seen via web browser) http:// purl.org/chem/pub/2013-08-02-pyridine_base Returns:

Dual nature of the URIs:

Dual nature of the URIs Realizes Linked Data Principles For Machines (i.e. as seen via Semantic Tools (rdfEditor, Fidler)) http:// purl.org/chem/pub/2013-08-02-pyridine_base Returns: Content-negotiations: “ One gets what one asks for ”

More on “Human-oriented” views:

More on “Human-oriented” views “Results” – a prototype for future publication “digest”

More on “Human-oriented” views:

More on “Human-oriented” views “Molecules” – generic, webGL based molecular viewer

More on “Human-oriented” views:

More on “Human-oriented” views “Wave function” – visualization of orbital energies

More on “Human-oriented” views:

More on “Human-oriented” views “Graph” – explore the knowledge structure about your system

More on “Human-oriented” views:

More on “Human-oriented” views “Data Federation” – explore Semantic Links to eternal resources

More on “Human-oriented” views:

More on “Human-oriented” views “Data sets” – use CS Portal for archiving purposes

SPARQL queries on CS Portal:

SPARQL queries on CS Portal Counting number of triples in the graphs of the CS Portal SELECT     ? graph   ( count ( * )   as   ?count ) WHERE   { GRAPH   ?graph   {   ?s   ?p   ?o   .    } } group by   ? graph order by  DESC ( ?count )

SPARQL queries on CS Portal:

SPARQL queries on CS Portal Counting number of elements in all molecular systems on the CS Portal PREFIX   rdf :  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX   gc :  <http://purl.org/gc/> PREFIX   rdfs :  <http://www.w3.org/2000/01/rdf-schema#> SELECT        ? element ( count ( * )   as   ?count ) WHERE   {      ? atom   gc:isElement   ?element   . } GROUP BY   ?element   ORDER BY  DESC ( ?count )

SPARQL queries on CS Portal:

SPARQL queries on CS Portal N umber of different calculations in all molecular systems of the CS Portal PREFIX   rdf :  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX   gc :  <http://purl.org/gc/> SELECT      ? resultType   ( count ( * )   as   ?count ) WHERE   {      GRAPH   ?graph   {          ? calc   rdf:type   gc:Calculation   ;              gc:hasResult   ?result   .          ?result   rdf:type   ? resultType   .      } } group by   ? resultType order by  DESC ( ?count )

SPARQL queries on CS Portal:

SPARQL queries on CS Portal N umber of molecular systems with halogen atoms the CS Portal PREFIX   rdf :  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX   gc :  <http://purl.org/gc/> PREFIX   rdfs :  <http://www.w3.org/2000/01/rdf-schema#> SELECT   ? graph WHERE   {      GRAPH   ?graph   {   {          ?something   gc:hasAtom   ?atom1   ;              rdf:type   ? somethingType   ;              rdfs:label   ? somethingLabel   .          ?atom1   gc:isElement   "F"   .      }      UNION        {          ?something   gc:hasAtom   ?atom2   ;              rdf:type   ? somethingType   ;              rdfs:label   ? somethingLabel   .          ?atom2   gc:isElement   " Cl "   .      }      UNION      {          ?something   gc:hasAtom   ?atom3   ;              rdf:type   ? somethingType   ;              rdfs:label   ? somethingLabel   .          ?atom3   gc:isElement   "Br"   .      }      UNION      {          ?something   gc:hasAtom   ?atom4   ;              rdf:type   ? somethingType   ;              rdfs:label   ? somethingLabel   .          ?atom4   gc:isElement   "I"   .      }      UNION      {          ?something   gc:hasAtom   ?atom4   ;              rdf:type   ? somethingType   ;              rdfs:label   ? somethingLabel   .          ?atom4   gc:isElement   "At"   .      } }   }

SPARQL queries on CS Portal:

SPARQL queries on CS Portal N umber of inorganic molecular systems ##    Show all molecules that contain atoms other than C,O,N,H  PREFIX   rdf :  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX   gc :  <http://purl.org/gc/> PREFIX   rdfs :  <http://www.w3.org/2000/01/rdf-schema#> SELECT   DISTINCT   ?graph   WHERE   {        { GRAPH   ?graph   {   ? mol   gc:hasAtom   ?atom }}        MINUS        { GRAPH   ?graph   {   ?a   gc:isElement   "C"   }}      MINUS      { GRAPH   ?graph   {   ?b   gc:isElement   "O"   }}      MINUS      { GRAPH   ?graph   {   ?b   gc:isElement   "N"   }}      MINUS      { GRAPH   ?graph   {   ?b   gc:isElement   "H"   }} }

SPARQL queries on CS Portal:

SPARQL queries on CS Portal Energy values computed of all of molecular systems PREFIX   rdf :  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX   gc :  <http://purl.org/gc/> SELECT      ? sysEnergy   ? energyValue   ? energyName WHERE   {      GRAPH   ?graph   {          ? molSys   rdf:type   gc:MolecularSystem   ;              gc:hasCalculationOn   ? molCalc   .          ? molCalc   rdf:type   gc:Calculation   ;              gc:hasResult   ? sysEnergy   .          ? sysEnerg   rdf:type   gc:SystemEnergies   ;              ?p   ?o   .          ?o   gc:hasFloatValue   ? energyValue ;              rdfs:label   ? energyName .      } } ORDER BY   ? energyName

Stay tuned ...:

Stay tuned ... If you want to work with us, or just share your opinions, Do not hesitate to notify us at: info@chemicalsemantics.com

Thank you…:

Thank you … Neil Ostlund , Hypercube, Inc. 1115 NW 4th St. Gainesville, FL 32608 , USA Phone: (352) 371 7744 Web: www.hyper.com e M ail : ostlund@hyper.com Mirek Sopek MakoLab SA Demokratyczna 46 , 93-430 Lodz , Poland Phone: +48 600 814 537 Web: www.makolab.com e M ail : sopek@makolab.com

authorStream Live Help