Safety Guarantee of Continuous Join Queries over Punctuated Data Streams : Safety Guarantee of Continuous Join Queries over Punctuated Data Streams
Hua-Gang Li *, Songting Chen, Junichi Tatemura
Divykant Agrawal, K. Selcuk Candan and Wang-Pin Hsiung
NEC Laboratories America
* University of California, Santa Barbara
Stream Query Processing : Stream Query Processing Continuous Queries Stream Query
Engine Streaming Data Streaming Data Online transaction management
Network analysis
Sensor network monitoring
…
Motivating Example : Motivating Example Window approach
However, window size may be hard to determine
Exploiting stream constraints
Uniqueness, sorted input, etc
Punctuations
Punctuation : Punctuation Punctuation
A predicate that must be evaluated to false for every element following the punctuation
Representation [Tucker et al. TKDE 2003]
A special tuple (*, c, *, *)
E.g., Item(sellerid,itemid,name,initialprice)
A punctuation 'no more item with itemid = 1'
is denoted as (*, 1, *, *)
State of the Art : State of the Art Semantic modeling of punctuations [Tucker et al. TKDE 2003]
Punctuation-aware query optimization
Binary join [Ding et al. EDBT 2004]
Group By [Li et al. SIGMOD 2005]
Generation of useful punctuations, i.e., heartbeats, from time domain [Srivastava et al. PODS 2004]
However, one fundamental problem is not addressed
Whether a query can benefit from available punctuations, refer to as 'safety checking' problem
Outline : Outline Formulate safety checking problem for continuous join queries
Sound and complete safety condition for simple punctuations
Sounds and complete safety condition for complex punctuations
Conclusion and future work
Punctuation Scheme : Punctuation Scheme Punctuation scheme
Describe the types of punctuation instances that a data stream can have at runtime
Can be viewed as metadata of punctuation instances
Representation
Simple punctuation schemes: e.g., Item(sellerid, itemid, name, initialprice). punctuation scheme (–,+,–,–), instance (*, 1, *, *)
Complex punctation schemes: e.g., Bid(bidderid, itemid, increase). punctuation scheme (+,+,–), instance (1, 1, *)
Determined by application semantics
Safety Checking Problem : Safety Checking Problem Given a continuous join query Q (CJQ) and a set of punctuation schemes,
Determine If Q still requires unbounded memory consumption no matter what punctuation instances (described by the punctuation schemes) may occur
For example:
Unsafe if we only have following two punctuation schemes
Item(sellerid,itemid,name,initialprice) (–, +, –, –)
Bid(bidderid,itemid,increase) (+, +, –)
Safety .vs. Runtime memory consumption
Unsafe query always requires infinite runtime memory
However, safe query does not guarantee low runtime memory consumption
Concepts : Join State
Refer to the space used for storing the inputs of each join operator
Purgeability
Purgeability of a join state
for every tuple t, there exists a finite set of punctuation instances such that t will not
produce any join results with any new tuples
Purgeability of a join operator
Safe Execution Plan
Every join operator involved is purgeable
Safe CJQ
There exists at least one safe execution plan Concepts … … √
Purging for Binary Join Operator : Purging for Binary Join Operator Purge S2 is similar.
Hence we need
punctuation schemes
S1 (–, +), S2 (+, –)
A CJQ with no Safe Binary Join Plan : A CJQ with no Safe Binary Join Plan S1.A = S3.A Punctuation Schemes
S1 (A–, B+), S2 (B–, C+), S3 (C–, A+) CJQ Unsafe Plan
Purging for M-Way Join Operator : Purging for M-Way Join Operator
Chained Purge Strategy : Chained Purge Strategy There is a punctuation
propagation effect for
M-way join operator!
Punctuation Graph (simple punctuation scheme) : Punctuation Graph (simple punctuation scheme) Capture such punctuation propagation effect
Purgeability of a Join Operator : THEOREM 1. The join state S is purgeable iff there exists a path from S to every other node Si in the punctuation graph
COROLLARY 1. A join operator is purgeable iff its punctuation graph is a strongly connected graph.
Purgeability of a Join Operator S S1 S2 S3 … … S’
Safety for CJQ : Safety for CJQ Safe CJQ requires at least one safe execution plan
However, the number of execution plans is exponential
THEOREM 2. A CJQ is safe iff its M-join plan is safe
→ If M-join plan is unsafe, no other safe plan exists
→ Linear safety checking for simple punctuation schemes
Handling Complex Punctuation Schemes : Handling Complex Punctuation Schemes S3: (+,+) cannot purge either S1 or S2, but can purge S1 S2 S3
(A, C)
Generalized Punctuation Graph : Generalized Punctuation Graph Intermediate result Purge of raw data stream Purge of intermediate result
CJQ Safety under Complex Punctuations Schemes : CJQ Safety under Complex Punctuations Schemes Intuition: intermediate results have to be purgeable as well
Transformed Punctuation Graph
1. Identify strongly connected sub-graph, merge them into a single merged node
2. Take the generalized punctuation edges of merged node into account, continue Step 1
THEOREM 3. A CJQ is safe iff transformed punctuation graph ends up in a single merged node
Polynomial safety checking for complex punctuation schemes
Conclusion & Future Work : Conclusion andamp; Future Work Formulate the safety checking problem for CJQ
Sound and complete safety conditions
Based on novel punctuation graph
Linear for simple punctuation schemes
Polynomial for complex punctuation schemes
Future work
Optimization of Chained Purge Strategy for M-join
M-join purge .vs. a tree binary-join purge
Optimization of CJQ
Purge plan .vs. join plan
Adaptive purge plan
Generation of Punctuations
Slide21 :