MashQL

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide 1: 

Towards Data Mashups and Pipes MashQL Reading: Mustafa Jarrar and Marios D. Dikaiakos: MashQL: A Query-by-Diagram Topping SPARQL -Towards Semantic Data Mashups. In ONISW’08 workshop, part of the CiKM'08 confernce, ACM. 2008 http://www.jarrar.info/publications/JD08.pdf

Slide 2: 

Imagine We are in 3008. The internet is a database Information about every little thing Structured, granular data Semantics, linked data How we will yahoo/google this knowledge !!?

Outline : 

The Data Web and the role of Mashups Mashup Challenges MashQL (A new Mashup Language) Conclusions and Discussion Outline Jarrar-University of Cyprus

Slide 4: 

Jarrar-University of Cyprus Web 2.0 and the phenomena of APIs

Slide 5: 

Web 2.0 and the phenomena of APIs

Slide 6: 

Web 2.0 and the phenomena of APIs

Slide 7: 

Web 2.0 and the phenomena of APIs

Slide 8: 

Web 2.0 and the phenomena of APIs

Slide 9: 

Web 2.0 and the phenomena of APIs

Slide 10: 

Web 2.0 and the phenomena of APIs

Slide 11: 

Web 2.0 and the phenomena of APIs And many, many others APIs

Slide 12: 

Web 2.0 and the phenomena of APIs Jarrar-University of Cyprus

Slide 13: 

An application that combines data from multiple sources (APIs). Mashups Jarrar-University of Cyprus

Slide 14: 

An application that combines data from multiple sources (APIs). Mashups Jarrar-University of Cyprus

Mashups (Example) : 

Mashups (Example)

Mashups (Example) : 

Mashups (Example)

How can I build a mashup? : 

How can I build a mashup? What do you want to do? Which data you need? APIs/RSS available? How is your programming skills? Start coding Use mashup editors Start Configuring Semi-Technical Skills Geek Microsoft Popfly Yahoo! Pipes QEDWiki by IBM Google Mashup Editor (Coming) Serena Business Mashups Dapper JackBe Presto Wires

Mashup Editors : 

Mashup Editors

Mashup Editors : 

Mashup Editors

Mashup Editors : 

Mashup Editors

Mashup Editors : 

Mashup Editors

Mashup Editors : 

Mashup Editors

Mashup Editors : 

Mashup Editors Limitations Focus only on providing encapsulated access to (some) public APIs and feeds (rather than querying data sources). Still require programming skills. Cannot play the role of a general-purpose data retrieval, as mashups are sophisticated applications. Lacks a formal framework for pipelining mashups.

Vision and Challenges : 

Vision and Challenges Instead of accessing a method in an API in a programmatic style, can these APIs act as query end-points over http (i.e. a URL is a query). Regard the internet as a database, where a data source is seen as a table, and a mashup is a query. A Mashup can be a simple inquiry (e.g., Hacker’s articles after 2000). In short, allow (casual users) to search and consume the Data Web intuitively, like we use search engines (or at least the “advance search” in search engines). But the problem then is: users need to know the schema and technical details of the data sources they want to query. Jarrar-University of Cyprus

Vision and Challenges : 

How a user can query a source without knowing its schema, structure, and vocabulary? SELECT S.Title FROM Google.Scholar S Where (S.Author=‘Hacker’) Union SELECT P.PattentTitle FROM Ggoogle.Patent P Where (P.Inventor =‘Hacker’) Union SELECT A.Title FROM Citeseer A Where (P.Author =‘Hacker’) DateSources Vision and Challenges Jarrar-University of Cyprus

Vision and Challenges : 

How a user can query a source without knowing its schema, structure, and vocabulary? SELECT S.Title FROM Google.Scholar S Where (S.Author=‘Hacker’) Union SELECT P.PattentTitle FROM Ggoogle.Patent P Where (P.Inventor =‘Hacker’) Union SELECT A.Title FROM Citeseer A Where (P.Author =‘Hacker’) DateSources Vision and Challenges Jarrar-University of Cyprus

Vision and Challenges : 

Vision and Challenges PREFIX S1: <http://site1.com/rdf> PREFIX S2: <http://site1.com/rdf> SELECT ? ArticleTitle FROM <http://site1.com/rdf> FROM <http://site2.com/rdf> WHERE { {{?X S1:Title ?ArticleTitle}UNION {?X S2:Title ?ArticleTitle}} {?X S1:Author ?X1} UNION {?X S2:Author ?X1} {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2} FILTER regex(?X1, “^Hacker”) FILTER (?X2 > 2000)} Some data sources may come without a schema at all, as: Hacker’s articles after 2000 Programmers usually explore such sources by eyes, and remember the vocabulary and structure…!! (Casual users?)

Slide 28: 

MashQL Jarrar-University of Cyprus

MashQL : 

MashQL A simple query language for the Data Web, in a mashup style. MashQL allows querying a dataspace(s) without any prior knowledge about its schema, vocabulary or technical details (a source may not have a schema al all). Explore unknown graph Does not assume any knowledge about RDF, SPARQL, XML, or any technology, to get started. Users only use drop-lists to formulate queries. (query-by-diagram/interaction). Jarrar-University of Cyprus

MashQL Example 1 : 

MashQL Example 1 Hacker’s Articles after 2000? MashQL From: RDF Input http://www.site1.com/rdf Everything Title ArticleTitle Author “^Hacker” Year\PubYear > 2000 http://www.site2.com/rdf Jarrar-University of Cyprus

MashQL Example 1 : 

MashQL Example 1 Hacker’s Articles after 2000? MashQL From: RDF Input http://www.site1.com/rdf http://www.site2.com/rdf Everything Everything Interactive query formulation Jarrar-University of Cyprus

MashQL Example 1 : 

MashQL Example 1 Hacker’s Articles after 2000? MashQL From: RDF Input http://www.site1.com/rdf http://www.site2.com/rdf Everything Title ArticleTitle Jarrar-University of Cyprus

MashQL Example 1 : 

MashQL Example 1 Hacker’s Articles after 2000? MashQL From: RDF Input http://www.site1.com/rdf http://www.site2.com/rdf Everything Title Article title Author Con Hacker Jarrar-University of Cyprus

MashQL Example 1 : 

MashQL Example 1 Hacker’s Articles after 2000? MashQL From: RDF Input http://www.site1.com/rdf http://www.site2.com/rdf Everything Title Article title Author “^Hacker” Year mor 2000 \ PubYe Jarrar-University of Cyprus

MashQL Example 1 : 

MashQL Example 1 Hacker’s Articles after 2000? MashQL From: RDF Input http://www.site1.com/rdf http://www.site2.com/rdf Everything Title Article title Author “^Hacker” Year/PubYear > 2000 PREFIX S1: <http://site1.com/rdf> PREFIX S2: <http://site1.com/rdf> SELECT ? ArticleTitle FROM <http://site1.com/rdf> FROM <http://site2.com/rdf> WHERE { {{?X S1:Title ?ArticleTitle}UNION {?X S2:Title ?ArticleTitle}} {?X S1:Author ?X1} UNION {?X S2:Author ?X1} {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2} FILTER regex(?X1, “^Hacker”) FILTER (?X2 > 2000)}

MashQL Example 2 : 

 Retrieve every Article that has a title, written by an author, who has an address, this address has a country called Cyprus, and the article published after 2008. MashQL Example 2 The recent articles from Cyprus MashQL Article Title ArticleTitle Author Address Country “Cyprus” Year > 2008 URL: RDF Input http://www4.wiwiss.fu-berlin.de/dblp/ Jarrar-University of Cyprus

The Intuition of MashQL : 

The Intuition of MashQL A query is a tree The root is called the query subject. Each branch is a restriction. Branches can be expanded, (information path) Object value filters Def. A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn. Dif. A Subject S  (I  V), where I is an identifier and V is a variable. Dif. A Restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without), P is a predicate (P  I  V), and Of is an object filter. MashQL Article Title ArticleTitle Author Address Country “Cyprus” Year > 2008 URL: RDF Input http://www4.wiwiss.fu-berlin.de/dblp/ Article

The Intuition of MashQL : 

The Intuition of MashQL MashQL Article Title ArticleTitle Author Address Country “Cyprus” Year > 2008 URL: RDF Input http://www4.wiwiss.fu-berlin.de/dblp/ An Object filter is one of : Equals Contains MoreThan LessThan Between one of Not(f) Information Path (sub query) Def. An object filter Of = <O, f>, where O is an object and f is a filtering function one of : Of = <O>, where O is an object, O  V  I. Of = <O, Equals(X, T, Lt)>, where X can be a variable or a constant, T is a datatype, and Lt is a language tag. Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and Lt is a language. Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype. Of = <O, LessThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype identifier. Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier. Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant. Of = <O, Not(f)>, where f is one of the functions defined above. Of = <O, Qi(O)>, where O is an object (O  V  I), and Qi(O) is a sub-query with O being the query subject.

More MashQL Constructs : 

More MashQL Constructs Resection Operators {Required, Maybe, or Without} All restriction are required (i.e. AND), unless they are prefixed with “maybe” or “without” SELECT ?PersonName, ?University WHERE { ?Person :Name ?PersonName. ?Person :WorkFor :Yahoo. OPTIONAL{?Person :StudyAt ?University} OPTIONAL{?Person :Salary ?X1} FILTER (!Bound(?X1))} } Jarrar-University of Cyprus

More MashQL Constructs : 

More MashQL Constructs Union operator (denoted as “\”) between Objects, Predicates, Subjects and Queries SELECT ?Person WHERE { ?Person :WorkFor :Google UNION ?Person WorkFor :Yahoo} SELECT ?FName WHERE { ?Person :Surname ?FName UNION ?Person :Firstname ?FName} SELECT ?AgentName, ?AgentPhone WHERE { {?Person rdf:type :Person. ?Person :Name ?AgentName. ?Person :Phone ?AgentPhone} UNION {?Company rdf:type :Company. ?Company :Name ?AgentName. ?Company :Phone ?AgentPhone}} SELECT ?CustName, WHERE { ?Person :Name ?CustName. UNION {?Company :Title ?CustName. ?Company :City ?X1. FILTER regex(?X1, “Paris”)}}

More MashQL Constructs : 

More MashQL Constructs And several other constructs, including: Types and Reverse Predicates Datatypes and Language Tags …. Jarrar-University of Cyprus

MashQL Queries : 

MashQL Queries In the background, MashQL queries are translated into and executed as SPARQL queries. At the moment, we focus on RDF (/RDFa) as a data format, and SPARQL (/Oracle’s SPARQL) as a backend query language. However, MashQL can be easily mappable to other query languages. MashQL is not merely a user interface, by also a query language with its intuition (it focuses on path pattern, rather than triple pattern). Jarrar-University of Cyprus

Slide 43: 

Rule-1: The symbol  before a variable means that it will be returned in the results; i.e., included in the SELECT part of in SPARQL. If the output of the query is input to another, use “CONSTRUCT *”. Rule-2: In any of the following rules, if a subject, predicate, or object is italicized: it is seen as a SPARQL variable, i.e. prefixed with “?”. Rule-3: If S is a subject and R = < , P, Of>, the mapping is: {S P O}. Rule-4: If S is a subject and R = <maybe, P, Of>, the mapping is: {OPTIONAL{S P O}}. Rule-5: If S is a subject and R = < without, P, Of>, the mapping is: {S P O. FILTER (!bound(?O))}. Rule 6. If Of = <O, Equals(X, T, Lt)>: Append the mapping with: FILTER(?O = X) If T  Null: Append the mapping with: FILTER(datatype(?O)=T) If Lt  Null: Append the mapping with: FILTER(lang(?O) = Lt) Rule 7. If Of = Contains(X, T, Lt)>: Append the mapping with: FILTER regex(?O, X) If T  Null: Append the mapping with: FILTER(datatype(?O)=T) If Lt  Null: Append the mapping with: FILTER(lang(?O) = Lt) Rule 8. If Of = <O, MoreThan(X, T)>: Append the mapping with: FILTER(?O > X) If T  Null: Append the mapping with: FILTER(datatype(?O=T) Rule 9. If Of = <O, LessThan(X, T)>: Append the mapping with: FILTER(?O < X) If T  Null: Append the mapping with: FILTER(datatype(?O=T) Rule 10. If Of = <O, Between(X, Y, T)>: Append the mapping with: FILTER(?O >=X)&& FILTER(?O<=Y) If T  Null: Append the mapping with: FILTER(datatype(?O)=T) Rule 11. If Of = <O, OneOf (V)>: Append the mapping with: {FILTER(?O = V1)|| . . . || FILTER(?O = Vn)} If Vi is a regex-ed literal, the ith filter above should be replaced with: FILTER Regex(?O, Vi) Rule 12. If Of = <O, Not(f)>: The f filter will be generated as above, but with a negation. Rule 13. If Of = <O, Qi(O)>: Repeat all mapping rules to generate Qi(O). Rule 14. If a subject S is prefixed with “a” or “an”: Append the mapping with: {?S rdf:type :S} Rule 15. If an object O is prefixed with “a” or “an”: Append the mapping with: {?O rdf:type :O} Rule 16. Given On , If n >1 and Oi  I : The mapping in rules 3-4 will be:{{S P :O1} UNION . . . UNION {S P :On}} Rule 17. Given Pn , If n >1 and Pi  I : The mapping in rules 3-4 will be: {{S :P1 O} UNION . . . UNION {S :Pn O}} Rule 18. Given Sn , If n >1 and Si  I : Regenerate the query n times, each time with Si as a root, and with a UNION between the queries. Rule 19. Given Qn , If n >1 : Add UNION between the n queries. Rule 20. If S is a subject and R = <~P, O>, the mapping is: {O P S}. MashQL-SPARQL Mapping Rules Also mapped into SQL and Oracle’s SPARQL Jarrar-University of Cyprus

MashQL Compilation : 

MashQL Markup: an XML Schema to represent pipes in XML. The reference grammar (Technical specification). MashQL Compilation Jarrar-University of Cyprus

MashQL Compilation : 

MashQL Compilation Depending on the pipeline structure, MashQL generates either SELECT or CONSTRUCT queries: SELECT returns the results in a tabular form (e.g. ArticleTitle, Author) CONSTRUCT returns the results in a triple form (e.g. Subject, Predicate, Object). … CONSTRUCT * WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)} … SELECT ?Job ?Firm WHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}} Jarrar-University of Cyprus

Slide 46: 

System Model (Online Mashup Editor) Download(http) Query Loader Client         Results Render   Bulk-load B.Query(AJAX) RunQuery(http) DataSources(AJAX) Results(http) (Wikipedia Titles, 28 MB zip, 316 MB nt, 2.7 M triples): Download (37 s, 600KB/s) Bulk-Load Oracle-RDF (70 Sec, 40K triples per Sec). Query (one/few Sec.) Mashup Server Jarrar-University of Cyprus The output of a mashup can be an input to another. (Enabling people to collaborate and innovate, build of each others’ results)

Slide 47: 

MashQL Editor Jarrar-University of Cyprus Under Construction

Slide 48: 

MashQL Firefox Add-On (Light-mashups @ your browser)

Use Case: Job Seeking : 

Use Case: Job Seeking A mashup of job vacancies based on Google Base and on Jobs.ac.uk. … CONSTRUCT * WHERE { {{?Job :Category :Health}UNION {?Job :Category :Medicine}} ?Job :Role ?X1. ?Job :Salary ?X2. ?X2 :Currency :UPK. ?X2 :Minimun ?X3. FILTER(?X1=“Research” || ?X1=”Academic”) FILTER (?X3 > 50000) } … CONSTRUCT * WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)} … SELECT ?Job ?Firm WHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}} Jarrar-University of Cyprus

Use Case: My Citations : 

Use Case: My Citations A mashup of cited Hacker’s articles (but no self citations), over Scholar and Siteseer Jarrar-University of Cyprus

Use Case: eHealth Research : 

Use Case: eHealth Research A mashup based an eHealth database to find what cases Prostate Cancer Add/remove restrictions until you retrieve all and only the people with prostate cancer, (the restrictions the symptoms ) Jarrar-University of Cyprus

Use Case: Retailers : 

Use Case: Retailers A Retailer mashup of three RDF data sources with a user-input of some barcode numbers. When scanning a product, retrieve its English and French titles directly from the manufacturer online catalog. Jarrar-University of Cyprus

Use Case: Car Rental business Auditing : 

Use Case: Car Rental business Auditing A government connects to the databases of car rental companies to audit whether they are in compliance to the local regulations. (Each query is a business rule, if the results not empty, valuation) Vehicles were rented without being insured. Rentals to people without licenses Rentals to people without proper licenses Jarrar-University of Cyprus

Evaluation : 

Evaluation First: Query Execution : The performance of executing a MashQL query is bounded to the performance to executing its backend language (i.e. SPARQL/SQL). A query with medium size complexity takes one or few seconds (Oracle’s SPARQL, [Chong et al 2007]). Jarrar-University of Cyprus

Evaluation : 

Evaluation Second: Background Queries: These are the queries that the MashQL editor performs in the background (to generate drop-down lists), while a user formulate his/her query. Executing background queries should be fast enough to allow efficient query formulation. Experiments over: DBLP data (12 million triples, 700 MB ) DBPedia data (25 Million triples , 2.x GB) Jarrar-University of Cyprus

Evaluation : 

Evaluation MashQL From: RDF Input http://www.informatik.uni-trier.de/~ley 12 Million Triples Article Title ArticleTitle Creator Name “^Berners-Lee^” Year > 1993 Jarrar-University of Cyprus

Evaluation of the Background Queries : 

MashQL From: RDF Input http://www.informatik.uni-trier.de/~ley 12 Million Triples Evaluation of the Background Queries [00.00] Article Everything Select O FROM … (?S <rdf:type> ?O) … Group by O Order by O; Jarrar-University of Cyprus

Evaluation of the Background Queries : 

MashQL From: RDF Input http://www.informatik.uni-trier.de/~ley 12 Million Triples Evaluation of the Background Queries [00.03] Title Article [00.00] ArticleTitle Select P FROM … (?S <rdf:type> ?O) (?O ?P ?O1) … Group by P Order by P; Jarrar-University of Cyprus

Evaluation of the Background Queries : 

MashQL From: RDF Input http://www.informatik.uni-trier.de/~ley 12 Million Triples Evaluation of the Background Queries [00.03] Creator Article [00.00] Title [00.03] ArticleTitle Select P FROM … (?S <rdf:type> ?O) (?O ?P ?O1) … Group by P Order by P; Jarrar-University of Cyprus

Evaluation of the Background Queries : 

MashQL From: RDF Input http://www.informatik.uni-trier.de/~ley 12 Million Triples Evaluation of the Background Queries [00.43] Name Cont Berners-Lee Article [00.00] Title [00.03] ArticleTitle Creator [00.03] Select P FROM … (?S <rdf:type> ?O) (?O <:Creator> ?O1) (?O1 ?P ?O2) … Group by P Order by P; Jarrar-University of Cyprus

Evaluation of the Background Queries : 

MashQL From: RDF Input http://www.informatik.uni-trier.de/~ley 12 Million Triples Evaluation of the Background Queries [00.03] Year More 1994 Article [00.00] Title [00.03] ArticleTitle Creator [00.03] Name [00.43] “^Berners-Lee^” Select P FROM … (?S <rdf:type> ?O) (?O ?P ?O1) … Group by P Order by P; Jarrar-University of Cyprus

Evaluation of the Background Queries : 

MashQL From: RDF Input http://www.informatik.uni-trier.de/~ley 12 Million Triples Evaluation of the Background Queries Article [00.00] Title [00.03] ArticleTitle Creator [00.03] Name [00.43] “^Berners-Lee^” Year [00.03] > 1993 Jarrar-University of Cyprus

Evaluation of the Background Queries : 

Evaluation of the Background Queries Summary Our goal is not to benchmark whether Oracle is fast and scalable, but to if know Oracle’s speed is sufficient for MashQL interactivity ?  Yes. Jarrar-University of Cyprus

Conclusions : 

Conclusions A formal but yet simple query language for the Data Web, in a mashup and declarative style. Allows people to discover and navigate unknown data spaces(/graphs) without prior knowledge about the schema or technical details. Can be use as a general purpose data retrieval and filtering (rather than only sophisticated Mashups). Query Cursors: to cache history information paths. Formal framework for query pipelines: caching, materialization. Query distribution and scheduling. Jarrar-University of Cyprus

Question : 

Question

Thank You : 

Thank You