Share PowerPoint. Anywhere!

XMLDB M8 2005

Uploaded from authorPOINT Lite
Download as Download Not Available PPT
Presentation Description

No description available

Views: 8
Like it  ( Likes) Dislike it  ( Dislikes)
Added: November 01, 2007 This presentation is Public
Presentation Category :Entertainment
Tags Add Tags
Presentation StatisticsNew!
Views on authorSTREAM: 8
Presentation Transcript

Module 8 : Module 8 Push-based communication (RSS, Publish/Subscribe, Information Filtering)


Outline : Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information Filtering


Detour into web services : Detour into web services This is an excerpt of module 4 (which was omitted) But web service also provide a good motivation on the rest of today‘s lecture Important: Web services are XML all over


Technology overview : Technology overview XML SOAP: message format WSDL: service description UDDI: service directory XQuery BPEL XL …


SOAP : SOAP SOAP = Simple Object Access Protocol W3C Standard; current version 1.2 Communication between applications (e.g. RPC, Streams of Sensor Data) Defines Layout (Type) of Messages Use in Internet and through Firewalls Platform- and PL independent Based on XML Simple and extensible Basis for further standards (Encryption, ...)


WSDL : WSDL Web Service Description Language Describes the Interface of a Web Service Call of a Web Service done via SOAP Allows the registration of services Basis for UDDI Syntax is XML


UDDI : UDDI Universal Description Discovery Integration Directory which stores WSDL „Jini“ for the Web, „yellow pages“ Communicates via SOAP Messages Organized in white, yellow and green pages white, yellow pages: Informationen about Providers green pages: WSDL of Services IBM and Microsoft have public UDDI Server Today, typically used in Intranet


Use cases for Web Services : Use cases for Web Services Automization of Processes Enterprise Application Integration (EAI) Workflow Management Data Integration Enterprise Information Integration (EII) (Connectivity, Global Data Model) Portals Integration, Integration, Integration


Application Integration : Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX


Application Integration : Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX XII XI


Application Integration : Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX X XI What impact do delays have? Who is affected by a change in one interface? How can this process be optimized? What about humans? How to exploit a Grid of machines?


Loose Coupling of Apps : Loose Coupling of Apps App App App App Message Broker


Loose Coupling of Apps (Web Services) : Loose Coupling of Apps (Web Services) App App App App Message Broker Web Services SOAP SOAP WSDL WSDL WSDL WSDL Routing? Virtualisation?


Virtualisierung von Anwendungen : Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind


Virtualisierung von Anwendungen : Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind x1 I want y!


Virtualisierung von Anwendungen : Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind x1 x2,x3 I want y! y


Virtualisierung von Anwendungen : Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! x1,x2,x3 Find & Bind x1 x2,x3 I want y! y


Summary Web Services : Summary Web Services There is a lot more in web services see module 4 Important: Loose coupling of applications Virtualization Service discovery Common protocols And now back to our main track – push-style interaction


Outline : Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information Filtering


Is „Pull“ the winner? : Is „Pull“ the winner? Most of our interaction with the Web and Databases is „Pull“ Browse the web to find the pictures of my friend‘s vacation on some remote island Query the database to find out which books were the the top sellers in Zurich around Christmas Invoke a web service to compute π up to ten billion digits Is this the whole story?


Examples of „Push“ : Examples of „Push“ E-Mail communication: Who uses a Blackberry? Who sends more than a 100 SMS/month? Event notification Information about offers (apartments, cars, jobs) News tickers Sensor data Put 100.000 very cheap (10 cent) sensors all over ETH Notify building security if temperature exceeds 55 degree centigrade


Factors favoring „Push“ : Factors favoring „Push“ Long-standing interest Very low or very high update rate 1 update per week 100 updates per second Large number of independant sources Watching 100.000 news sites all over the world is impossible, but watching an RSS feed via Google News is certainly possible Scalability: many users want the same thing E.g., this lecture, TV, …


Outline : Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information Filtering


RSS : RSS Content syndication: News tickers Blogs Alerts Simple XML format Lightweight Still some get it wrong 


RSS 2.0 : RSS 2.0 Simple Message Format for Data Push    ...    ...


RSS Items and Types : RSS Items and Types


RSS 2.0 example : RSS 2.0 example D-INFK Events Events of the Department of Computer Science, ETH Zurich http://www.inf.ethz.ch/news/events/ http://www.inf.ethz.ch/rssTue, 17 Jan 2006 11:06:04 GMT http://www.inf.ethz.ch/rss/inf-logo.png Department of Computer Science http://www.inf.ethz.ch/ 140 35 Establishing trust in electronic business correspondence http://www.inf.ethz.ch/news/events/details/index?id=593 ZISC Colloquium Tuesday, 17 January 2006 17:15, by: Dr. Ralf Hauser Privasphere AG 2006-01-17 http://www.inf.ethz.ch/news/events/details/index?id=593


RSS Notes : RSS Notes Format wars: RSS 0.91, 1.0, 2.0, Atom All major news sites use it now Blogs would not work without it Currently targeted to human-machine communication Might be a good candidate for push-style machine-machine communication, too Ironically, RSS currently uses push-only as interaction model pull at the communication level


Outline : Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information Filtering


XML Message Brokering : XML Message Broker                 Weather and Tide Updates for Norfolk By John Smith ……. client queries query results XML Message Brokering XML messages           Weather and Tide Updates for Norfolk …….   Weather and Tide Updates for Norfolk By John Smith ……. Q1 Q2 Q3 Q4   Filtering Transformation Routing


Message-based Middleware : Message-based Middleware Publish/Subscribe Subscribers express interests, later notified of relevant data from publishers. Loose coupling at the communication level. XML, a de facto standard for online data exchange Flexible, extensible, self-describing. Enhanced functionality: XSLT, XQuery, … Loose coupling at the content level. XML message brokering Publish/subscribe + XML = flexibility at communication and content levels. Declarative XML queries provide high functionality.


New Applications : Message brokering supports a large number of emerging distributed applications: Application integration Personalized newspaper generation Stock tickers Network monitoring Mobile services … New Applications


Problem Statement : Problem Statement Inputs: (1) continuously arriving XML messages (usually small) (2) a set of XQuery queries representing client interests Main functions of an XML message broker: Filtering: matches messages to query predicates. Transformation: restructures the matching messages. Routing: directs messages to queries over a network of brokers. Challenges: providing this functionality for large numbers of queries (e.g., 10’s thousands of them) high volumes of XML messages (e.g., tens or hundreds/sec)


Design Space : Design Space TIBCO MQ Pub/Sub JMS Pub/Sub Siena[SIGCOMM03] Gryphon [PODC99] xmlBlaster Snoeren et al.[SOSP01] Le Subscribe [SIGMOD01] YFilter [VLDB03] ONYX [VLDB04] Oracle Advanced Queuing Subject- based Predicate- based XML filtering XML filtering & transformation Distribution Expressive-ness         Weather and Tide Updates for Norfolk


YFilter & ONYX : YFilter & ONYX YFilter, a system for XML filtering and transformation. Filtering exploiting sharing: Order-of-magnitude performance benefits over previous work. Scalable to 100’s thousands of distinct queries. YFilter 1.0 release: used in research projects and product development, being integrated into Apache Hermes for WS-Notification. Transformation exploiting sharing: The first algorithm for transformation for a large set of queries. Scalable up to 10’s of thousands of distinct queries. Routing (ONYX): an overlay network of brokers with routing abilities, providing flexible, Internet-scale XML dissemination services.


The Filtering Problem : The Filtering Problem Full XPath/XQuery too expensive  Query language: path expression = ( (‘/’ | ‘//’) (ElementName | ‘*’) Predicate* )+ The filtering problem: Given (1) a set Q = Q, …, Qn of path queries, where each Qi has an associated query identifier, and (2) a stream of XML documents. Compute, for each document D, the set of query identifiers corresponding to the XPath queries that match D.


Constructing an FSM for a Query : Constructing an FSM for a Query Location steps FSM fragments Map location steps to FSM fragments. Concatenate FSM fragments for location steps in a query. Query “/a//b” Key Idea: represent query paths as state machine that are driven by the XML parser (SAX) Simple paths: ( (“/” | “//”) (ElementName | “*”) )+ A finite state machine (FSM) for each path: mapping steps to machine states.


Constructing the Combined FSM : YFilter builds a single combined FSM for all paths! Complete prefix sharing among paths. Nondeterministic Finite Automaton (NFA)-based implementation: a small machine size, flexible, easy to maintain, etc. Output function (Moore machine): accepting states → partition of query ids. Constructing the Combined FSM Q1=/a/b Q2=/a/c Q3=/a/b/c Q4=/a//b/c Q5=/a/*/b Q6=/a//c Q7=/a/*/*/c Q8=/a/b/c


Execution Algorithm : YFilter uses a stack mechanism to handle XML Backtracking in the NFA. No repeated work for the same element! Execution Algorithm Runtime Stack NFA


DFA vs. NFA : DFA vs. NFA DFA has exponential number of states Large main-memory requirements Or I/O needed in order to process messages DFA has high maintenance costs Need to rerun Myhill/Büchi algorithm, everytime a new profile is posted or deleted NFA is slower than DFA NFA: entries in stack can grow exponentially In practice, XML documents are fairly flat NFA is the clear winner (current trade-offs)!


Performance results for YFilter : Performance results for YFilter YFilter scales to 150,000 distinct path queries w/o predicates. Consistently takes 30 msec or less. Achieves a 25x performance improvement over previous approaches Deep element nesting: No exponential blow-up of active states. Sensitivity to ‘*’ and “//”: Little, due to effective prefix sharing. NFA maintenance for query updates: Tens of milliseconds for inserting 1000 queries. YFilter handles 100’s thousands of queries with predicates. No real competition before Mechanism not shown here. What are the difficulties?


Outline : Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information Filtering


Context-aware information filtering : Context-aware information filtering This is our (my) current research project Publish/Subscribe systems like YFilter assume a fairly static profile set: „I am interested in all messages regarding soccer“ In reality, matches are often influenced by some state, external events: „Give me all soccer results if they influence the ranking of my favorite club. If I am in the office, send me a small video clip, if I only have a mobile phone, send me a SMS“ We call those influencing factors „context“


Example of context : Example of context I am at work I am at home I am on a plane I have 200 unread mails I have IBM stock I sold all my Euros Some state (context) is referenced in the profile and included into the matching decision


Examples of context usage in profiles : Examples of context usage in profiles Only send me today‘s mensa menu if I am in less than 500 m distance to HG/CAB Only send me the train delay messages if I plan to take that train or need to pick someone up who is on that train Route the order message to a warehouse that has all items in stock


Context-Aware Information Filter : Context-Aware Information Filter State updates can come from external sources State change does not directly trigger a reaction – it merely affects further matching decisions Context updates compete with messages in the filter Information Filter Stream of messages Stream of (Message, Matched Profiles) profiles/ rules Stream of context updates


Challenges with Context-Aware IF : Challenges with Context-Aware IF Information filters are optimized for: High throughput of messages Large number of profiles Static profiles Context usage adds: Dynamic profiles with many context updates Fluctuation in update rates => Integrated IF + Context Management


Components of an IF : Components of an IF Profile Indexes: produce candidates for matching profiles (eliminate most other candidates) Merge: combine results of index lookups, (evaluate combinations of simple predicates) Postfilter: evaluate predicates individually for each profile (eliminate all false positives) Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M


Architecture of an CIF : Architecture of an CIF Context Management: keep track of context state Propagate the changes to the indexes where this information used Postfilter also refers to context information Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U


The solution: AGILE (Adaptive Generic Indexing with Local Escalations) : The solution: AGILE (Adaptive Generic Indexing with Local Escalations) Treat index as filter (allow false positives) Adapt to workload: High update rate: Make index less accurate (Fewer updates, more false positives) Decreasing update rate: Make index more accurate (More updates, fewer false positives) Control accuracy in granularity of profile


Decrease accuracy: Escalation example : Decrease accuracy: Escalation example 5 {} {} {} 6 {} {} {} 2 {} { ,B} {} A = 2 B = 2 A 3 { } A<5 Update A = 3 Index Context Management


Probe on escalated item : Probe on escalated item 6 {} {} {} 2 {} {B} {} {A} Result for Probe(2): { A ,B} {} {} 5 A = B = 2 3


Update on escalated item : Update on escalated item 6 {} {} {} 2 {} {B} {} {A} Update A = 1 5 {} {} A = 3 B = 2 1


Deescalation of an escalated item : Deescalation of an escalated item 6 {} {} {} 2 { } {B} {} { } 5 {} {} A A = B = 2 1 1 {} {} {A}


Index accuracy control in AGILE : Index accuracy control in AGILE Two new operations: Escalate: decrease accuracy Deescalate: increase accuracy Generic, applicable to all tree indexes Fine-grained adaptivity Open Issue: Control Policy Our solution: Escalation on every update (unless not necessary) Deescalation by feedback of false positives


Accuracy control by feedback : Accuracy control by feedback Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Cost for message handling Cost for update handling Index cost Postfilter cost Feedback of false positives Accuracy balance High Low


Accuracy control: more updates : Accuracy control: more updates Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Feedback of false positives Decrease accuracy High Low


Accurcy control: fewer updates : Accurcy control: fewer updates Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Feedback of false positives Increase accuracy High Low


AGILE benefits : AGILE benefits Adaptation to changes in update rate No Updates => Accurate Index Many Updates => Increasingly fuzzy index Adaptation to value skew in workload Updates in one area, probes in another => Fuzzy in the first, accurate in the other When skew changes, the index will change, too Overhead: Filtering false positives Accuracy control


Some performance numbers : Some performance numbers 500 K profiles, 8 attributes, range predicates No updates: 450 messages/sec Only updates: 1 million updates/sec Change between those extremes with some hundred messages


Status of Context-Aware IF : Status of Context-Aware IF Using context in IF enables richer profiles, but introduces frequent updates AGILE deals with the update/probe balance: Flexible index accuracy Automatic tuning by feedback Applicable to any tree-like index AGILE works well: Best performance in middle ground Low overhead at the extremes Good adaptation to load changes No manual intervention needed during runtime


Research Topics in IF (also Thesis Ad) : Research Topics in IF (also Thesis Ad) AGILE on “classic” indexes (B-Tree, R-Tree), also on disk Reliability How do you make sure that all events are processed, even if there machine crashes or network outages? QoS issues in IF How do you provide certain properties of service, e.g. latency or throughput? Adaptivity/Feedback for other aspects Go to http://www.dbis.ethz.ch/education/Theses/infofilt (No RSS yet )