Module 8 : Module 8 Push-based communication
(RSS, Publish/Subscribe, Information Filtering)
Outline : Outline Detour into web services
Motivation for Push
RSS
YFilter
XML Path Filtering
XML Transformations
Semantic Overlays
Context-Aware Information Filtering
Detour into web services : Detour into web services This is an excerpt of module 4 (which was omitted)
But web service also provide a good motivation on the rest of today‘s lecture
Important: Web services are XML all over
Technology overview : Technology overview XML
SOAP: message format
WSDL: service description
UDDI: service directory
XQuery
BPEL
XL
…
SOAP : SOAP SOAP = Simple Object Access Protocol
W3C Standard; current version 1.2
Communication between applications (e.g. RPC, Streams of Sensor Data)
Defines Layout (Type) of Messages
Use in Internet and through Firewalls
Platform- and PL independent
Based on XML
Simple and extensible
Basis for further standards (Encryption, ...)
WSDL : WSDL Web Service Description Language
Describes the Interface of a Web Service
Call of a Web Service done via SOAP
Allows the registration of services
Basis for UDDI
Syntax is XML
UDDI : UDDI Universal Description Discovery Integration
Directory which stores WSDL
„Jini“ for the Web, „yellow pages“
Communicates via SOAP Messages
Organized in white, yellow and green pages
white, yellow pages: Informationen about Providers
green pages: WSDL of Services
IBM and Microsoft have public UDDI Server
Today, typically used in Intranet
Use cases for Web Services : Use cases for Web Services Automization of Processes
Enterprise Application Integration (EAI)
Workflow Management
Data Integration
Enterprise Information Integration (EII) (Connectivity, Global Data Model)
Portals
Integration, Integration, Integration
Application Integration : Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX
Application Integration : Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX XII XI
Application Integration : Application Integration App App App App I/VI II/V III/IV VII/VIII VII/IX X XI What impact do delays have?
Who is affected by a change in one interface?
How can this process be optimized?
What about humans? How to exploit a Grid of machines?
Loose Coupling of Apps : Loose Coupling of Apps App App App App Message Broker
Loose Coupling of Apps (Web Services) : Loose Coupling of Apps (Web Services) App App App App Message Broker Web Services SOAP SOAP WSDL WSDL WSDL WSDL Routing?
Virtualisation?
Virtualisierung von Anwendungen : Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind
Virtualisierung von Anwendungen : Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind x1 I want y!
Virtualisierung von Anwendungen : Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! Find & Bind x1 x2,x3 I want y! y
Virtualisierung von Anwendungen : Virtualisierung von Anwendungen App App App App Message Broker Web Services WSDL WSDL WSDL WSDL I want all x! x1,x2,x3 Find & Bind x1 x2,x3 I want y! y
Summary Web Services : Summary Web Services There is a lot more in web services see module 4
Important:
Loose coupling of applications
Virtualization
Service discovery
Common protocols
And now back to our main track – push-style interaction
Outline : Outline Detour into web services
Motivation for Push
RSS
YFilter
XML Path Filtering
XML Transformations
Semantic Overlays
Context-Aware Information Filtering
Is „Pull“ the winner? : Is „Pull“ the winner? Most of our interaction with the Web and Databases is „Pull“
Browse the web to find the pictures of my friend‘s vacation on some remote island
Query the database to find out which books were the the top sellers in Zurich around Christmas
Invoke a web service to compute π up to ten billion digits
Is this the whole story?
Examples of „Push“ : Examples of „Push“ E-Mail communication:
Who uses a Blackberry?
Who sends more than a 100 SMS/month?
Event notification
Information about offers (apartments, cars, jobs)
News tickers
Sensor data
Put 100.000 very cheap (10 cent) sensors all over ETH
Notify building security if temperature exceeds 55 degree centigrade
Factors favoring „Push“ : Factors favoring „Push“ Long-standing interest
Very low or very high update rate
1 update per week
100 updates per second
Large number of independant sources
Watching 100.000 news sites all over the world is impossible, but watching an RSS feed via Google News is certainly possible
Scalability: many users want the same thing
E.g., this lecture, TV, …
Outline : Outline Detour into web services
Motivation for Push
RSS
YFilter
XML Path Filtering
XML Transformations
Semantic Overlays
Context-Aware Information Filtering
RSS : RSS Content syndication:
News tickers
Blogs
Alerts
Simple XML format
Lightweight
Still some get it wrong
RSS 2.0 : RSS 2.0 Simple Message Format for Data Push
- ... ...
…
RSS Items and Types : RSS Items and Types
RSS 2.0 example : RSS 2.0 example
D-INFK Events
Events of the Department of Computer Science, ETH Zurich
http://www.inf.ethz.ch/news/events/
http://www.inf.ethz.ch/rss
…
Tue, 17 Jan 2006 11:06:04 GMT
http://www.inf.ethz.ch/rss/inf-logo.png
Department of Computer Science
http://www.inf.ethz.ch/
140 35
-
Establishing trust in electronic business correspondence http://www.inf.ethz.ch/news/events/details/index?id=593
ZISC Colloquium
Tuesday, 17 January 2006 17:15, by: Dr. Ralf Hauser Privasphere AG
2006-01-17 http://www.inf.ethz.ch/news/events/details/index?id=593
RSS Notes : RSS Notes Format wars: RSS 0.91, 1.0, 2.0, Atom
All major news sites use it now
Blogs would not work without it
Currently targeted to human-machine communication
Might be a good candidate for push-style machine-machine communication, too
Ironically, RSS currently uses
push-only as interaction model
pull at the communication level
Outline : Outline Detour into web services
Motivation for Push
RSS
YFilter
XML Path Filtering
XML Transformations
Semantic Overlays
Context-Aware Information Filtering
XML Message Brokering : XML
Message
Broker
Weather and Tide Updates for Norfolk
By John Smith
……. client
queries query
results XML Message Brokering XML messages
Weather and Tide Updates for Norfolk
…….
Weather and Tide Updates for Norfolk
By John Smith
……. Q1 Q2 Q3 Q4 Filtering Transformation Routing
Message-based Middleware : Message-based Middleware Publish/Subscribe
Subscribers express interests, later notified of relevant data from publishers.
Loose coupling at the communication level.
XML, a de facto standard for online data exchange
Flexible, extensible, self-describing.
Enhanced functionality: XSLT, XQuery, …
Loose coupling at the content level.
XML message brokering
Publish/subscribe + XML = flexibility at communication and content levels.
Declarative XML queries provide high functionality.
New Applications : Message brokering supports a large number of emerging distributed applications:
Application integration
Personalized newspaper generation
Stock tickers
Network monitoring
Mobile services
… New Applications
Problem Statement : Problem Statement Inputs: (1) continuously arriving XML messages (usually small)
(2) a set of XQuery queries representing client interests
Main functions of an XML message broker:
Filtering: matches messages to query predicates.
Transformation: restructures the matching messages.
Routing: directs messages to queries over a network of brokers.
Challenges: providing this functionality for
large numbers of queries (e.g., 10’s thousands of them)
high volumes of XML messages (e.g., tens or hundreds/sec)
Design Space : Design Space TIBCO
MQ Pub/Sub
JMS Pub/Sub Siena[SIGCOMM03]
Gryphon [PODC99] xmlBlaster
Snoeren et al.[SOSP01] Le Subscribe
[SIGMOD01] YFilter
[VLDB03] ONYX
[VLDB04] Oracle Advanced
Queuing Subject-
based Predicate-
based XML
filtering XML filtering
& transformation Distribution Expressive-ness
Weather and Tide
Updates for Norfolk
YFilter & ONYX : YFilter & ONYX YFilter, a system for XML filtering and transformation.
Filtering exploiting sharing:
Order-of-magnitude performance benefits over previous work.
Scalable to 100’s thousands of distinct queries.
YFilter 1.0 release: used in research projects and product development, being integrated into Apache Hermes for WS-Notification.
Transformation exploiting sharing:
The first algorithm for transformation for a large set of queries.
Scalable up to 10’s of thousands of distinct queries.
Routing (ONYX): an overlay network of brokers with routing abilities, providing flexible, Internet-scale XML dissemination services.
The Filtering Problem : The Filtering Problem Full XPath/XQuery too expensive
Query language: path expression =
( (‘/’ | ‘//’) (ElementName | ‘*’) Predicate* )+
The filtering problem:
Given (1) a set Q = Q, …, Qn of path queries, where each Qi has an associated query identifier, and (2) a stream of XML documents.
Compute, for each document D, the set of query identifiers corresponding to the XPath queries that match D.
Constructing an FSM for a Query : Constructing an FSM for a Query Location
steps FSM
fragments Map location steps to FSM fragments. Concatenate FSM fragments for location steps in a query. Query “/a//b” Key Idea: represent query paths as state machine that are driven by the XML parser (SAX)
Simple paths: ( (“/” | “//”) (ElementName | “*”) )+
A finite state machine (FSM) for each path: mapping steps to machine states.
Constructing the Combined FSM : YFilter builds a single combined FSM for all paths!
Complete prefix sharing among paths.
Nondeterministic Finite Automaton (NFA)-based implementation: a small machine size, flexible, easy to maintain, etc.
Output function (Moore machine): accepting states → partition of query ids. Constructing the Combined FSM Q1=/a/b Q2=/a/c Q3=/a/b/c Q4=/a//b/c Q5=/a/*/b Q6=/a//c Q7=/a/*/*/c Q8=/a/b/c
Execution Algorithm : YFilter uses a stack mechanism to handle XML
Backtracking in the NFA.
No repeated work for the same element! Execution Algorithm Runtime Stack NFA
DFA vs. NFA : DFA vs. NFA DFA has exponential number of states
Large main-memory requirements
Or I/O needed in order to process messages
DFA has high maintenance costs
Need to rerun Myhill/Büchi algorithm, everytime a new profile is posted or deleted
NFA is slower than DFA
NFA: entries in stack can grow exponentially
In practice, XML documents are fairly flat
NFA is the clear winner (current trade-offs)!
Performance results for YFilter : Performance results for YFilter YFilter scales to 150,000 distinct path queries w/o predicates.
Consistently takes 30 msec or less.
Achieves a 25x performance improvement over previous approaches
Deep element nesting: No exponential blow-up of active states.
Sensitivity to ‘*’ and “//”: Little, due to effective prefix sharing.
NFA maintenance for query updates: Tens of milliseconds for inserting 1000 queries.
YFilter handles 100’s thousands of queries with predicates.
No real competition before
Mechanism not shown here. What are the difficulties?
Outline : Outline Detour into web services
Motivation for Push
RSS
YFilter
XML Path Filtering
XML Transformations
Semantic Overlays
Context-Aware Information Filtering
Context-aware information filtering : Context-aware information filtering This is our (my) current research project
Publish/Subscribe systems like YFilter assume a fairly static profile set: „I am interested in all messages regarding soccer“
In reality, matches are often influenced by some state, external events: „Give me all soccer results if they influence the ranking of my favorite club. If I am in the office, send me a small video clip, if I only have a mobile phone, send me a SMS“
We call those influencing factors „context“
Example of context : Example of context I am at work I am at home I am on a plane I have 200 unread mails I have IBM stock I sold all my Euros Some state (context) is referenced in the profile and included into the matching decision
Examples of context usage in profiles : Examples of context usage in profiles
Only send me today‘s mensa menu if I am in less than 500 m distance to HG/CAB
Only send me the train delay messages if I plan to take that train or need to pick someone up who is on that train
Route the order message to a warehouse that has all items in stock
Context-Aware Information Filter : Context-Aware Information Filter State updates can come from external sources
State change does not directly trigger a reaction – it merely affects further matching decisions
Context updates compete with messages in the filter Information Filter Stream of messages Stream of (Message, Matched Profiles) profiles/ rules Stream of context updates
Challenges with Context-Aware IF : Challenges with Context-Aware IF Information filters are optimized for:
High throughput of messages
Large number of profiles
Static profiles
Context usage adds:
Dynamic profiles with many context updates
Fluctuation in update rates => Integrated IF + Context Management
Components of an IF : Components of an IF Profile Indexes: produce candidates for matching profiles (eliminate most other candidates)
Merge: combine results of index lookups, (evaluate combinations of simple predicates)
Postfilter: evaluate predicates individually for each profile (eliminate all false positives) Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M
Architecture of an CIF : Architecture of an CIF Context Management: keep track of context state
Propagate the changes to the indexes where this information used
Postfilter also refers to context information Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U
The solution: AGILE (Adaptive Generic Indexing with Local Escalations) : The solution: AGILE (Adaptive Generic Indexing with Local Escalations) Treat index as filter (allow false positives)
Adapt to workload:
High update rate: Make index less accurate
(Fewer updates, more false positives)
Decreasing update rate: Make index more accurate
(More updates, fewer false positives)
Control accuracy in granularity of profile
Decrease accuracy: Escalation example : Decrease accuracy: Escalation example 5 {} {} {} 6 {} {} {} 2 {} { ,B} {} A = 2 B = 2 A 3 { } A<5 Update A = 3 Index Context Management
Probe on escalated item : Probe on escalated item 6 {} {} {} 2 {} {B} {} {A} Result for Probe(2): { A ,B} {} {} 5 A = B = 2 3
Update on escalated item : Update on escalated item 6 {} {} {} 2 {} {B} {} {A} Update A = 1 5 {} {} A = 3 B = 2 1
Deescalation of an escalated item : Deescalation of an escalated item 6 {} {} {} 2 { } {B} {} { } 5 {} {} A A = B = 2 1 1 {} {} {A}
Index accuracy control in AGILE : Index accuracy control in AGILE Two new operations:
Escalate: decrease accuracy
Deescalate: increase accuracy
Generic, applicable to all tree indexes
Fine-grained adaptivity
Open Issue: Control Policy
Our solution:
Escalation on every update (unless not necessary)
Deescalation by feedback of false positives
Accuracy control by feedback : Accuracy control by feedback Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Cost for message handling Cost for update handling Index cost Postfilter cost Feedback of false positives Accuracy
balance High Low
Accuracy control: more updates : Accuracy control: more updates Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Feedback of false positives Decrease accuracy High Low
Accurcy control: fewer updates : Accurcy control: fewer updates Ù Ú Indexes Merge Postfilter Matches for single index Result with false positives P* P* P* P* P* M U Context Management U U Feedback of false positives Increase accuracy High Low
AGILE benefits : AGILE benefits Adaptation to changes in update rate
No Updates => Accurate Index
Many Updates => Increasingly fuzzy index
Adaptation to value skew in workload
Updates in one area, probes in another => Fuzzy in the first, accurate in the other
When skew changes, the index will change, too
Overhead:
Filtering false positives
Accuracy control
Some performance numbers : Some performance numbers 500 K profiles, 8 attributes, range predicates
No updates: 450 messages/sec
Only updates: 1 million updates/sec
Change between those extremes with some hundred messages
Status of Context-Aware IF : Status of Context-Aware IF Using context in IF enables richer profiles, but introduces frequent updates
AGILE deals with the update/probe balance:
Flexible index accuracy
Automatic tuning by feedback
Applicable to any tree-like index
AGILE works well:
Best performance in middle ground
Low overhead at the extremes
Good adaptation to load changes
No manual intervention needed during runtime
Research Topics in IF(also Thesis Ad) : Research Topics in IF (also Thesis Ad) AGILE on “classic” indexes (B-Tree, R-Tree), also on disk
Reliability
How do you make sure that all events are processed, even if there machine crashes or network outages?
QoS issues in IF
How do you provide certain properties of service, e.g. latency or throughput?
Adaptivity/Feedback for other aspects
Go to http://www.dbis.ethz.ch/education/Theses/infofilt (No RSS yet )