logging in or signing up XMLDB M8 2005 Natalya Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 77 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 01, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Module 8: Module 8 Push-based communication (RSS, Publish/Subscribe, Information Filtering)Outline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringDetour into web services: Detour into web services This is an excerpt of module 4 (which was omitted) But web service also provide a good motivation on the rest of today‘s lecture Important: Web services are XML all overTechnology overview: Technology overview XML SOAP: message format WSDL: service description UDDI: service directory XQuery BPEL XL …SOAP: SOAP SOAP = Simple Object Access Protocol W3C Standard; current version 1.2 Communication between applications (e.g. RPC, Streams of Sensor Data) Defines Layout (Type) of Messages Use in Internet and through Firewalls Platform- and PL independent Based on XML Simple and extensible Basis for further standards (Encryption, ...)WSDL: WSDL Web Service Description Language Describes the Interface of a Web Service Call of a Web Service done via SOAP Allows the registration of services Basis for UDDI Syntax is XMLUDDI: UDDI Universal Description Discovery Integration Directory which stores WSDL „Jini“ for the Web, „yellow pages“ Communicates via SOAP Messages Organized in white, yellow and green pages white, yellow pages: Informationen about Providers green pages: WSDL of Services IBM and Microsoft have public UDDI Server Today, typically used in IntranetUse cases for Web Services: Use cases for Web Services Automization of Processes Enterprise Application Integration (EAI) Workflow Management Data Integration Enterprise Information Integration (EII) (Connectivity, Global Data Model) Portals Integration, Integration, IntegrationApplication Integration: Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IXApplication Integration: Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX XII XIApplication Integration: Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX X XI What impact do delays have? Who is affected by a change in one interface? How can this process be optimized? What about humans? How to exploit a Grid of machines?Loose Coupling of Apps: Loose Coupling of Apps App App App App Message BrokerLoose Coupling of Apps (Web Services): Loose Coupling of Apps (Web Services) App App App App Message Broker Web Services SOAP SOAP WSDL WSDL WSDL WSDL Routing? Virtualisation?Virtualisierung von Anwendungen: Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & BindVirtualisierung von Anwendungen: Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind x1 I want y!Virtualisierung von Anwendungen: Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind x1 x2,x3 I want y! yVirtualisierung von Anwendungen: Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! x1,x2,x3 Find & Bind x1 x2,x3 I want y! ySummary Web Services: Summary Web Services There is a lot more in web services see module 4 Important: Loose coupling of applications Virtualization Service discovery Common protocols And now back to our main track – push-style interactionOutline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringIs „Pull“ the winner?: Is „Pull“ the winner? Most of our interaction with the Web and Databases is „Pull“ Browse the web to find the pictures of my friend‘s vacation on some remote island Query the database to find out which books were the the top sellers in Zurich around Christmas Invoke a web service to compute π up to ten billion digits Is this the whole story?Examples of „Push“: Examples of „Push“ E-Mail communication: Who uses a Blackberry? Who sends more than a 100 SMS/month? Event notification Information about offers (apartments, cars, jobs) News tickers Sensor data Put 100.000 very cheap (10 cent) sensors all over ETH Notify building security if temperature exceeds 55 degree centigrade Factors favoring „Push“: Factors favoring „Push“ Long-standing interest Very low or very high update rate 1 update per week 100 updates per second Large number of independant sources Watching 100.000 news sites all over the world is impossible, but watching an RSS feed via Google News is certainly possible Scalability: many users want the same thing E.g., this lecture, TV, …Outline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringRSS: RSS Content syndication: News tickers Blogs Alerts Simple XML format Lightweight Still some get it wrong RSS 2.0: RSS 2.0 Simple Message Format for Data Push <channel> <item> ... <cal:startTime>...</cal:startTime> </item>… </channel> RSS Items and Types: RSS Items and TypesRSS 2.0 example: RSS 2.0 example <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <channel> <title>D-INFK Events</title> <description>Events of the Department of Computer Science, ETH Zurich</description> <link>http://www.inf.ethz.ch/news/events/</link> <docs>http://www.inf.ethz.ch/rss</docs> … <pubDate>Tue, 17 Jan 2006 11:06:04 GMT</pubDate> <image> <url>http://www.inf.ethz.ch/rss/inf-logo.png</url> <title>Department of Computer Science</title> <link>http://www.inf.ethz.ch/</link> <width>140</width> <height>35</height> </image> <item rdf:about="http://www.inf.ethz.ch/news/events/details/index?id=593"> <title>Establishing trust in electronic business correspondence</title> <link>http://www.inf.ethz.ch/news/events/details/index?id=593</link> <category>ZISC Colloquium</category> <description>Tuesday, 17 January 2006 17:15, by: Dr. Ralf Hauser Privasphere AG</description> <dc:date>2006-01-17</dc:date> <guid>http://www.inf.ethz.ch/news/events/details/index?id=593</guid> </item> RSS Notes: RSS Notes Format wars: RSS 0.91, 1.0, 2.0, Atom All major news sites use it now Blogs would not work without it Currently targeted to human-machine communication Might be a good candidate for push-style machine-machine communication, too Ironically, RSS currently uses push-only as interaction model pull at the communication levelOutline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringXML Message Brokering: XML Message Broker <?xml version="1.0" ?> <nitf version="-//IPTC-NAA//DTD NITF-XML 2.1//EN" > <head> <tobject tobject.type="news"> <tobject.subject tobject.subject.type="Weather"/> <tobject.subject tobject.subject.matter="Statistics"/> </tobject> <docdata doc-idref="iptc.32.a"> <doc-id id-string="iptc.32.b" /> <evloc city="Norfolk" state-prov="VA" iso-cc="US" /> <series series.name="Tide Forecasts" series.part="5"/> </docdata> </head> <body> <body.head> <hedline><hl1>Weather and Tide Updates for Norfolk</hl1> </hedline> <byline>By <person>John Smith</person></byline> </body.head> ……. client queries query results XML Message Brokering XML messages <?xml version="1.0" ?> <nitf version="-//IPTC-NAA//DTD NITF-XML 2.1//EN" > <head> <tobject tobject.type="news"> <tobject.subject tobject.subject.type="Weather"/> <tobject.subject tobject.subject.matter="Statistics"/> </tobject> </head> <body> <body.head> <hedline><hl1>Weather and Tide Updates for Norfolk</hl1> </hedline> </body.head> ……. <?xml version="1.0" ?> <nitf version="-//IPTC-NAA//DTD NITF-XML 2.1//EN" > <body> <body.head> <hedline><hl1>Weather and Tide Updates for Norfolk</hl1> </hedline> <byline>By <person>John Smith</person></byline> </body.head> ……. Q1 Q2 Q3 Q4 Filtering Transformation RoutingMessage-based Middleware: Message-based Middleware Publish/Subscribe Subscribers express interests, later notified of relevant data from publishers. Loose coupling at the communication level. XML, a de facto standard for online data exchange Flexible, extensible, self-describing. Enhanced functionality: XSLT, XQuery, … Loose coupling at the content level. XML message brokering Publish/subscribe + XML = flexibility at communication and content levels. Declarative XML queries provide high functionality.New Applications: Message brokering supports a large number of emerging distributed applications: Application integration Personalized newspaper generation Stock tickers Network monitoring Mobile services … New ApplicationsProblem Statement: Problem Statement Inputs: (1) continuously arriving XML messages (usually small) (2) a set of XQuery queries representing client interests Main functions of an XML message broker: Filtering: matches messages to query predicates. Transformation: restructures the matching messages. Routing: directs messages to queries over a network of brokers. Challenges: providing this functionality for large numbers of queries (e.g., 10’s thousands of them) high volumes of XML messages (e.g., tens or hundreds/sec)Design Space: Design Space TIBCO MQ Pub/Sub JMS Pub/Sub Siena[SIGCOMM03] Gryphon [PODC99] xmlBlaster Snoeren et al.[SOSP01] Le Subscribe [SIGMOD01] YFilter [VLDB03] ONYX [VLDB04] Oracle Advanced Queuing Subject- based Predicate- based XML filtering XML filtering & transformation Distribution Expressive-ness <?xml version="1.0" ?> <nitf version="-//DTD NITF-XML 2.1//EN" > <head> <tobject tobject.type="news"> <tobject.subject tobject.subject.type="Weather"/> </tobject> </head> <body> <hedline><hl1>Weather and Tide Updates for Norfolk</hl1> </body> </nitf>YFilter & ONYX: YFilter & ONYX YFilter, a system for XML filtering and transformation. Filtering exploiting sharing: Order-of-magnitude performance benefits over previous work. Scalable to 100’s thousands of distinct queries. YFilter 1.0 release: used in research projects and product development, being integrated into Apache Hermes for WS-Notification. Transformation exploiting sharing: The first algorithm for transformation for a large set of queries. Scalable up to 10’s of thousands of distinct queries. Routing (ONYX): an overlay network of brokers with routing abilities, providing flexible, Internet-scale XML dissemination services. The Filtering Problem: The Filtering Problem Full XPath/XQuery too expensive Query language: path expression = ( (‘/’ | ‘//’) (ElementName | ‘*’) Predicate* )+ The filtering problem: Given (1) a set Q = Q, …, Qn of path queries, where each Qi has an associated query identifier, and (2) a stream of XML documents. Compute, for each document D, the set of query identifiers corresponding to the XPath queries that match D.Constructing an FSM for a Query: Constructing an FSM for a Query Location steps FSM fragments Map location steps to FSM fragments. Concatenate FSM fragments for location steps in a query. Query “/a//b” Key Idea: represent query paths as state machine that are driven by the XML parser (SAX) Simple paths: ( (“/” | “//”) (ElementName | “*”) )+ A finite state machine (FSM) for each path: mapping steps to machine states.Constructing the Combined FSM : YFilter builds a single combined FSM for all paths! Complete prefix sharing among paths. Nondeterministic Finite Automaton (NFA)-based implementation: a small machine size, flexible, easy to maintain, etc. Output function (Moore machine): accepting states → partition of query ids. Constructing the Combined FSM Q1=/a/b Q2=/a/c Q3=/a/b/c Q4=/a//b/c Q5=/a/*/b Q6=/a//c Q7=/a/*/*/c Q8=/a/b/cExecution Algorithm: YFilter uses a stack mechanism to handle XML Backtracking in the NFA. No repeated work for the same element! <b> <c> </c> Execution Algorithm Runtime Stack NFA DFA vs. NFA: DFA vs. NFA DFA has exponential number of states Large main-memory requirements Or I/O needed in order to process messages DFA has high maintenance costs Need to rerun Myhill/Büchi algorithm, everytime a new profile is posted or deleted NFA is slower than DFA NFA: entries in stack can grow exponentially In practice, XML documents are fairly flat NFA is the clear winner (current trade-offs)!Performance results for YFilter : Performance results for YFilter YFilter scales to 150,000 distinct path queries w/o predicates. Consistently takes 30 msec or less. Achieves a 25x performance improvement over previous approaches Deep element nesting: No exponential blow-up of active states. Sensitivity to ‘*’ and “//”: Little, due to effective prefix sharing. NFA maintenance for query updates: Tens of milliseconds for inserting 1000 queries. YFilter handles 100’s thousands of queries with predicates. No real competition before Mechanism not shown here. What are the difficulties?Outline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringContext-aware information filtering: Context-aware information filtering This is our (my) current research project Publish/Subscribe systems like YFilter assume a fairly static profile set: „I am interested in all messages regarding soccer“ In reality, matches are often influenced by some state, external events: „Give me all soccer results if they influence the ranking of my favorite club. If I am in the office, send me a small video clip, if I only have a mobile phone, send me a SMS“ We call those influencing factors „context“Example of context: Example of context I am at work I am at home I am on a plane I have 200 unread mails I have IBM stock I sold all my Euros Some state (context) is referenced in the profile and included into the matching decisionExamples of context usage in profiles: Examples of context usage in profiles Only send me today‘s mensa menu if I am in less than 500 m distance to HG/CAB Only send me the train delay messages if I plan to take that train or need to pick someone up who is on that train Route the order message to a warehouse that has all items in stockContext-Aware Information Filter: Context-Aware Information Filter State updates can come from external sources State change does not directly trigger a reaction – it merely affects further matching decisions Context updates compete with messages in the filter Information Filter Stream of messages Stream of (Message, Matched Profiles) profiles/ rules Stream of context updatesChallenges with Context-Aware IF: Challenges with Context-Aware IF Information filters are optimized for: High throughput of messages Large number of profiles Static profiles Context usage adds: Dynamic profiles with many context updates Fluctuation in update rates => Integrated IF + Context ManagementComponents of an IF: Components of an IF Profile Indexes: produce candidates for matching profiles (eliminate most other candidates) Merge: combine results of index lookups, (evaluate combinations of simple predicates) Postfilter: evaluate predicates individually for each profile (eliminate all false positives) Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* MArchitecture of an CIF: Architecture of an CIF Context Management: keep track of context state Propagate the changes to the indexes where this information used Postfilter also refers to context information Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U The solution: AGILE (Adaptive Generic Indexing with Local Escalations): The solution: AGILE (Adaptive Generic Indexing with Local Escalations) Treat index as filter (allow false positives) Adapt to workload: High update rate: Make index less accurate (Fewer updates, more false positives) Decreasing update rate: Make index more accurate (More updates, fewer false positives) Control accuracy in granularity of profile Decrease accuracy: Escalation example: Decrease accuracy: Escalation example 5 {} {} {} 6 {} {} {} 2 {} { ,B} {} A = 2 B = 2 A 3 { } A<5 Update A = 3 Index Context ManagementProbe on escalated item : Probe on escalated item 6 {} {} {} 2 {} {B} {} {A} Result for Probe(2): { A ,B} {} {} 5 A = B = 2 3Update on escalated item : Update on escalated item 6 {} {} {} 2 {} {B} {} {A} Update A = 1 5 {} {} A = 3 B = 2 1Deescalation of an escalated item: Deescalation of an escalated item 6 {} {} {} 2 { } {B} {} { } 5 {} {} A A = B = 2 1 1 {} {} {A}Index accuracy control in AGILE: Index accuracy control in AGILE Two new operations: Escalate: decrease accuracy Deescalate: increase accuracy Generic, applicable to all tree indexes Fine-grained adaptivity Open Issue: Control Policy Our solution: Escalation on every update (unless not necessary) Deescalation by feedback of false positivesAccuracy control by feedback: Accuracy control by feedback Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Cost for message handling Cost for update handling Index cost Postfilter cost Feedback of false positives Accuracy balance High LowAccuracy control: more updates: Accuracy control: more updates Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Feedback of false positives Decrease accuracy High Low Accurcy control: fewer updates: Accurcy control: fewer updates Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Feedback of false positives Increase accuracy High Low AGILE benefits: AGILE benefits Adaptation to changes in update rate No Updates => Accurate Index Many Updates => Increasingly fuzzy index Adaptation to value skew in workload Updates in one area, probes in another => Fuzzy in the first, accurate in the other When skew changes, the index will change, too Overhead: Filtering false positives Accuracy controlSome performance numbers: Some performance numbers 500 K profiles, 8 attributes, range predicates No updates: 450 messages/sec Only updates: 1 million updates/sec Change between those extremes with some hundred messagesStatus of Context-Aware IF: Status of Context-Aware IF Using context in IF enables richer profiles, but introduces frequent updates AGILE deals with the update/probe balance: Flexible index accuracy Automatic tuning by feedback Applicable to any tree-like index AGILE works well: Best performance in middle ground Low overhead at the extremes Good adaptation to load changes No manual intervention needed during runtimeResearch Topics in IF(also Thesis Ad): Research Topics in IF (also Thesis Ad) AGILE on “classic” indexes (B-Tree, R-Tree), also on disk Reliability How do you make sure that all events are processed, even if there machine crashes or network outages? QoS issues in IF How do you provide certain properties of service, e.g. latency or throughput? Adaptivity/Feedback for other aspects Go to http://www.dbis.ethz.ch/education/Theses/infofilt (No RSS yet ) You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
XMLDB M8 2005 Natalya Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 77 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 01, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Module 8: Module 8 Push-based communication (RSS, Publish/Subscribe, Information Filtering)Outline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringDetour into web services: Detour into web services This is an excerpt of module 4 (which was omitted) But web service also provide a good motivation on the rest of today‘s lecture Important: Web services are XML all overTechnology overview: Technology overview XML SOAP: message format WSDL: service description UDDI: service directory XQuery BPEL XL …SOAP: SOAP SOAP = Simple Object Access Protocol W3C Standard; current version 1.2 Communication between applications (e.g. RPC, Streams of Sensor Data) Defines Layout (Type) of Messages Use in Internet and through Firewalls Platform- and PL independent Based on XML Simple and extensible Basis for further standards (Encryption, ...)WSDL: WSDL Web Service Description Language Describes the Interface of a Web Service Call of a Web Service done via SOAP Allows the registration of services Basis for UDDI Syntax is XMLUDDI: UDDI Universal Description Discovery Integration Directory which stores WSDL „Jini“ for the Web, „yellow pages“ Communicates via SOAP Messages Organized in white, yellow and green pages white, yellow pages: Informationen about Providers green pages: WSDL of Services IBM and Microsoft have public UDDI Server Today, typically used in IntranetUse cases for Web Services: Use cases for Web Services Automization of Processes Enterprise Application Integration (EAI) Workflow Management Data Integration Enterprise Information Integration (EII) (Connectivity, Global Data Model) Portals Integration, Integration, IntegrationApplication Integration: Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IXApplication Integration: Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX XII XIApplication Integration: Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX X XI What impact do delays have? Who is affected by a change in one interface? How can this process be optimized? What about humans? How to exploit a Grid of machines?Loose Coupling of Apps: Loose Coupling of Apps App App App App Message BrokerLoose Coupling of Apps (Web Services): Loose Coupling of Apps (Web Services) App App App App Message Broker Web Services SOAP SOAP WSDL WSDL WSDL WSDL Routing? Virtualisation?Virtualisierung von Anwendungen: Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & BindVirtualisierung von Anwendungen: Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind x1 I want y!Virtualisierung von Anwendungen: Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind x1 x2,x3 I want y! yVirtualisierung von Anwendungen: Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! x1,x2,x3 Find & Bind x1 x2,x3 I want y! ySummary Web Services: Summary Web Services There is a lot more in web services see module 4 Important: Loose coupling of applications Virtualization Service discovery Common protocols And now back to our main track – push-style interactionOutline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringIs „Pull“ the winner?: Is „Pull“ the winner? Most of our interaction with the Web and Databases is „Pull“ Browse the web to find the pictures of my friend‘s vacation on some remote island Query the database to find out which books were the the top sellers in Zurich around Christmas Invoke a web service to compute π up to ten billion digits Is this the whole story?Examples of „Push“: Examples of „Push“ E-Mail communication: Who uses a Blackberry? Who sends more than a 100 SMS/month? Event notification Information about offers (apartments, cars, jobs) News tickers Sensor data Put 100.000 very cheap (10 cent) sensors all over ETH Notify building security if temperature exceeds 55 degree centigrade Factors favoring „Push“: Factors favoring „Push“ Long-standing interest Very low or very high update rate 1 update per week 100 updates per second Large number of independant sources Watching 100.000 news sites all over the world is impossible, but watching an RSS feed via Google News is certainly possible Scalability: many users want the same thing E.g., this lecture, TV, …Outline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringRSS: RSS Content syndication: News tickers Blogs Alerts Simple XML format Lightweight Still some get it wrong RSS 2.0: RSS 2.0 Simple Message Format for Data Push <channel> <item> ... <cal:startTime>...</cal:startTime> </item>… </channel> RSS Items and Types: RSS Items and TypesRSS 2.0 example: RSS 2.0 example <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <channel> <title>D-INFK Events</title> <description>Events of the Department of Computer Science, ETH Zurich</description> <link>http://www.inf.ethz.ch/news/events/</link> <docs>http://www.inf.ethz.ch/rss</docs> … <pubDate>Tue, 17 Jan 2006 11:06:04 GMT</pubDate> <image> <url>http://www.inf.ethz.ch/rss/inf-logo.png</url> <title>Department of Computer Science</title> <link>http://www.inf.ethz.ch/</link> <width>140</width> <height>35</height> </image> <item rdf:about="http://www.inf.ethz.ch/news/events/details/index?id=593"> <title>Establishing trust in electronic business correspondence</title> <link>http://www.inf.ethz.ch/news/events/details/index?id=593</link> <category>ZISC Colloquium</category> <description>Tuesday, 17 January 2006 17:15, by: Dr. Ralf Hauser Privasphere AG</description> <dc:date>2006-01-17</dc:date> <guid>http://www.inf.ethz.ch/news/events/details/index?id=593</guid> </item> RSS Notes: RSS Notes Format wars: RSS 0.91, 1.0, 2.0, Atom All major news sites use it now Blogs would not work without it Currently targeted to human-machine communication Might be a good candidate for push-style machine-machine communication, too Ironically, RSS currently uses push-only as interaction model pull at the communication levelOutline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringXML Message Brokering: XML Message Broker <?xml version="1.0" ?> <nitf version="-//IPTC-NAA//DTD NITF-XML 2.1//EN" > <head> <tobject tobject.type="news"> <tobject.subject tobject.subject.type="Weather"/> <tobject.subject tobject.subject.matter="Statistics"/> </tobject> <docdata doc-idref="iptc.32.a"> <doc-id id-string="iptc.32.b" /> <evloc city="Norfolk" state-prov="VA" iso-cc="US" /> <series series.name="Tide Forecasts" series.part="5"/> </docdata> </head> <body> <body.head> <hedline><hl1>Weather and Tide Updates for Norfolk</hl1> </hedline> <byline>By <person>John Smith</person></byline> </body.head> ……. client queries query results XML Message Brokering XML messages <?xml version="1.0" ?> <nitf version="-//IPTC-NAA//DTD NITF-XML 2.1//EN" > <head> <tobject tobject.type="news"> <tobject.subject tobject.subject.type="Weather"/> <tobject.subject tobject.subject.matter="Statistics"/> </tobject> </head> <body> <body.head> <hedline><hl1>Weather and Tide Updates for Norfolk</hl1> </hedline> </body.head> ……. <?xml version="1.0" ?> <nitf version="-//IPTC-NAA//DTD NITF-XML 2.1//EN" > <body> <body.head> <hedline><hl1>Weather and Tide Updates for Norfolk</hl1> </hedline> <byline>By <person>John Smith</person></byline> </body.head> ……. Q1 Q2 Q3 Q4 Filtering Transformation RoutingMessage-based Middleware: Message-based Middleware Publish/Subscribe Subscribers express interests, later notified of relevant data from publishers. Loose coupling at the communication level. XML, a de facto standard for online data exchange Flexible, extensible, self-describing. Enhanced functionality: XSLT, XQuery, … Loose coupling at the content level. XML message brokering Publish/subscribe + XML = flexibility at communication and content levels. Declarative XML queries provide high functionality.New Applications: Message brokering supports a large number of emerging distributed applications: Application integration Personalized newspaper generation Stock tickers Network monitoring Mobile services … New ApplicationsProblem Statement: Problem Statement Inputs: (1) continuously arriving XML messages (usually small) (2) a set of XQuery queries representing client interests Main functions of an XML message broker: Filtering: matches messages to query predicates. Transformation: restructures the matching messages. Routing: directs messages to queries over a network of brokers. Challenges: providing this functionality for large numbers of queries (e.g., 10’s thousands of them) high volumes of XML messages (e.g., tens or hundreds/sec)Design Space: Design Space TIBCO MQ Pub/Sub JMS Pub/Sub Siena[SIGCOMM03] Gryphon [PODC99] xmlBlaster Snoeren et al.[SOSP01] Le Subscribe [SIGMOD01] YFilter [VLDB03] ONYX [VLDB04] Oracle Advanced Queuing Subject- based Predicate- based XML filtering XML filtering & transformation Distribution Expressive-ness <?xml version="1.0" ?> <nitf version="-//DTD NITF-XML 2.1//EN" > <head> <tobject tobject.type="news"> <tobject.subject tobject.subject.type="Weather"/> </tobject> </head> <body> <hedline><hl1>Weather and Tide Updates for Norfolk</hl1> </body> </nitf>YFilter & ONYX: YFilter & ONYX YFilter, a system for XML filtering and transformation. Filtering exploiting sharing: Order-of-magnitude performance benefits over previous work. Scalable to 100’s thousands of distinct queries. YFilter 1.0 release: used in research projects and product development, being integrated into Apache Hermes for WS-Notification. Transformation exploiting sharing: The first algorithm for transformation for a large set of queries. Scalable up to 10’s of thousands of distinct queries. Routing (ONYX): an overlay network of brokers with routing abilities, providing flexible, Internet-scale XML dissemination services. The Filtering Problem: The Filtering Problem Full XPath/XQuery too expensive Query language: path expression = ( (‘/’ | ‘//’) (ElementName | ‘*’) Predicate* )+ The filtering problem: Given (1) a set Q = Q, …, Qn of path queries, where each Qi has an associated query identifier, and (2) a stream of XML documents. Compute, for each document D, the set of query identifiers corresponding to the XPath queries that match D.Constructing an FSM for a Query: Constructing an FSM for a Query Location steps FSM fragments Map location steps to FSM fragments. Concatenate FSM fragments for location steps in a query. Query “/a//b” Key Idea: represent query paths as state machine that are driven by the XML parser (SAX) Simple paths: ( (“/” | “//”) (ElementName | “*”) )+ A finite state machine (FSM) for each path: mapping steps to machine states.Constructing the Combined FSM : YFilter builds a single combined FSM for all paths! Complete prefix sharing among paths. Nondeterministic Finite Automaton (NFA)-based implementation: a small machine size, flexible, easy to maintain, etc. Output function (Moore machine): accepting states → partition of query ids. Constructing the Combined FSM Q1=/a/b Q2=/a/c Q3=/a/b/c Q4=/a//b/c Q5=/a/*/b Q6=/a//c Q7=/a/*/*/c Q8=/a/b/cExecution Algorithm: YFilter uses a stack mechanism to handle XML Backtracking in the NFA. No repeated work for the same element! <b> <c> </c> Execution Algorithm Runtime Stack NFA DFA vs. NFA: DFA vs. NFA DFA has exponential number of states Large main-memory requirements Or I/O needed in order to process messages DFA has high maintenance costs Need to rerun Myhill/Büchi algorithm, everytime a new profile is posted or deleted NFA is slower than DFA NFA: entries in stack can grow exponentially In practice, XML documents are fairly flat NFA is the clear winner (current trade-offs)!Performance results for YFilter : Performance results for YFilter YFilter scales to 150,000 distinct path queries w/o predicates. Consistently takes 30 msec or less. Achieves a 25x performance improvement over previous approaches Deep element nesting: No exponential blow-up of active states. Sensitivity to ‘*’ and “//”: Little, due to effective prefix sharing. NFA maintenance for query updates: Tens of milliseconds for inserting 1000 queries. YFilter handles 100’s thousands of queries with predicates. No real competition before Mechanism not shown here. What are the difficulties?Outline: Outline Detour into web services Motivation for Push RSS YFilter XML Path Filtering XML Transformations Semantic Overlays Context-Aware Information FilteringContext-aware information filtering: Context-aware information filtering This is our (my) current research project Publish/Subscribe systems like YFilter assume a fairly static profile set: „I am interested in all messages regarding soccer“ In reality, matches are often influenced by some state, external events: „Give me all soccer results if they influence the ranking of my favorite club. If I am in the office, send me a small video clip, if I only have a mobile phone, send me a SMS“ We call those influencing factors „context“Example of context: Example of context I am at work I am at home I am on a plane I have 200 unread mails I have IBM stock I sold all my Euros Some state (context) is referenced in the profile and included into the matching decisionExamples of context usage in profiles: Examples of context usage in profiles Only send me today‘s mensa menu if I am in less than 500 m distance to HG/CAB Only send me the train delay messages if I plan to take that train or need to pick someone up who is on that train Route the order message to a warehouse that has all items in stockContext-Aware Information Filter: Context-Aware Information Filter State updates can come from external sources State change does not directly trigger a reaction – it merely affects further matching decisions Context updates compete with messages in the filter Information Filter Stream of messages Stream of (Message, Matched Profiles) profiles/ rules Stream of context updatesChallenges with Context-Aware IF: Challenges with Context-Aware IF Information filters are optimized for: High throughput of messages Large number of profiles Static profiles Context usage adds: Dynamic profiles with many context updates Fluctuation in update rates => Integrated IF + Context ManagementComponents of an IF: Components of an IF Profile Indexes: produce candidates for matching profiles (eliminate most other candidates) Merge: combine results of index lookups, (evaluate combinations of simple predicates) Postfilter: evaluate predicates individually for each profile (eliminate all false positives) Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* MArchitecture of an CIF: Architecture of an CIF Context Management: keep track of context state Propagate the changes to the indexes where this information used Postfilter also refers to context information Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U The solution: AGILE (Adaptive Generic Indexing with Local Escalations): The solution: AGILE (Adaptive Generic Indexing with Local Escalations) Treat index as filter (allow false positives) Adapt to workload: High update rate: Make index less accurate (Fewer updates, more false positives) Decreasing update rate: Make index more accurate (More updates, fewer false positives) Control accuracy in granularity of profile Decrease accuracy: Escalation example: Decrease accuracy: Escalation example 5 {} {} {} 6 {} {} {} 2 {} { ,B} {} A = 2 B = 2 A 3 { } A<5 Update A = 3 Index Context ManagementProbe on escalated item : Probe on escalated item 6 {} {} {} 2 {} {B} {} {A} Result for Probe(2): { A ,B} {} {} 5 A = B = 2 3Update on escalated item : Update on escalated item 6 {} {} {} 2 {} {B} {} {A} Update A = 1 5 {} {} A = 3 B = 2 1Deescalation of an escalated item: Deescalation of an escalated item 6 {} {} {} 2 { } {B} {} { } 5 {} {} A A = B = 2 1 1 {} {} {A}Index accuracy control in AGILE: Index accuracy control in AGILE Two new operations: Escalate: decrease accuracy Deescalate: increase accuracy Generic, applicable to all tree indexes Fine-grained adaptivity Open Issue: Control Policy Our solution: Escalation on every update (unless not necessary) Deescalation by feedback of false positivesAccuracy control by feedback: Accuracy control by feedback Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Cost for message handling Cost for update handling Index cost Postfilter cost Feedback of false positives Accuracy balance High LowAccuracy control: more updates: Accuracy control: more updates Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Feedback of false positives Decrease accuracy High Low Accurcy control: fewer updates: Accurcy control: fewer updates Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Feedback of false positives Increase accuracy High Low AGILE benefits: AGILE benefits Adaptation to changes in update rate No Updates => Accurate Index Many Updates => Increasingly fuzzy index Adaptation to value skew in workload Updates in one area, probes in another => Fuzzy in the first, accurate in the other When skew changes, the index will change, too Overhead: Filtering false positives Accuracy controlSome performance numbers: Some performance numbers 500 K profiles, 8 attributes, range predicates No updates: 450 messages/sec Only updates: 1 million updates/sec Change between those extremes with some hundred messagesStatus of Context-Aware IF: Status of Context-Aware IF Using context in IF enables richer profiles, but introduces frequent updates AGILE deals with the update/probe balance: Flexible index accuracy Automatic tuning by feedback Applicable to any tree-like index AGILE works well: Best performance in middle ground Low overhead at the extremes Good adaptation to load changes No manual intervention needed during runtimeResearch Topics in IF(also Thesis Ad): Research Topics in IF (also Thesis Ad) AGILE on “classic” indexes (B-Tree, R-Tree), also on disk Reliability How do you make sure that all events are processed, even if there machine crashes or network outages? QoS issues in IF How do you provide certain properties of service, e.g. latency or throughput? Adaptivity/Feedback for other aspects Go to http://www.dbis.ethz.ch/education/Theses/infofilt (No RSS yet )