Tutorial on Topk/Skyline Queries (p2)

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Efficient Top-k Query Processing : 

Efficient Top-k Query Processing

FA(Fagin‘s Algorithm) : 

FA(Fagin‘s Algorithm) Fagin‘s Algorithm [Fag96] (PODS) Sorted list on Mileage F(id)=2*mileage+age (monotonic) K = 2 Sorted list on Age

QuickCombine / TA : 

QuickCombine / TA QuickCombine [GBK00] (VLDB), Threshold Algorithm [FLN01] (PODS) F(id)=2*mileage+age Sorted list on Age Threshold Sorted list on Mileage K = 2

StreamCombine / NRA : 

StreamCombine [GBK00] (ITCC), No Random Access [FLN01] (PODS) F(id)=2*mileage+age K = 2 Threshold Sorted list on Mileage Sorted list on Age StreamCombine / NRA

Slide 5: 

Expensive predicates No pre-computed indexes for zero-time sorted-access Need a probe to evaluate each object (similar to sequential scan) Unified abstraction for: User-defined functions: functional extensibility Query conditions can be arbitrary, user-specific e.g., cheap(price, size) External predicates: data extensibility Source interface may require one probe per object e.g., safe(zip) access crime rate from apbnews.com Fuzzy joins Associations of relations can be arbitrary e.g., close(house.zip, park.zip) Mpro(Minimal Probing) [BGM02] (ICDE), [CH02] (SIGMOD)

Slide 6: 

Goal: Perform only necessary random accesses (or, “probes”) Necessary probes A probe is necessary if top-k answers cannot be determined by any algorithm without it, regardless of the outcomes of other probes. Optimal algorithm An algorithm is probe-optimal if it performs only the necessary probes. Mpro(Minimal Probing)

Slide 7: 

k=1, F=min(x,p1,p2); suppose H=(p1,p2) OID x p1 p2 F=min(x,p1,p2) a 0.9 b 0.8 c 0.7 d 0.6 e 0.5 ? ? ? ? ? 1 1 0.9 1 1 0.7 1 1 0.6 1 1 0.5 top 1 Maybe Not! £ 0.8 Mpro(Minimal Probing)

Slide 8: 

OID x p1 p2 F=min(x,p1,p2) a 0.9 b 0.8 c 0.7 d 0.6 e 0.5 ? £ 0.9 1 1 0.7 1 1 0.6 1 1 0.5 top 1? Necessary! 1 1 0.8 Mpro(Minimal Probing)

Slide 9: 

Objects in current top-k must be further probed Probe-optimal object scheduling [CH02] (SIGMOD) Use a priority queue with ceiling scores as priorities b:0.78 pr(a,p1) =0.85 pr(a,p2) =0.75 pr(b,p1) =0.78 pr(b,p2) =0.90 top 1 Mpro(Minimal Probing)

Unified Algorithm : 

Summary: FA, TA, QuickCombine r =1 (cheap) r = h (expensive) r = ¥ (impossible) CA, SR-Combine NRA, StreamCombine TAz, MPro, Upper s =1 (cheap) s = h (expensive) s = ¥ (impossible) Random Access Sorted Access FA, TA, QuickCombine TAz, MPro, Upper NRA, StreamCombine Unified Algorithm Unified

Unified Algorithm : 

11 Cost-based optimization [HC05] (ICDE) Finding optimal algorithm, with minimum cost, from a space  General across a wide range of scenarios One “algorithm” for all Adaptive to the specific one at run time Truly optimal (in principle) Unified Algorithm