s409 guha

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS: 

SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENN

Synopses: 

Synopses Given n input numbers, summarize the input using B numbers, minimizing some error. Examples Histograms – piecewise constant repn. Wavelets – uses the wavelet basis Fourier, Bessel, SVD, what have you…

Why space efficiency: 

Why space efficiency “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things.” Woody Allen. From a computational viewpoint however…

Space is the cruelest resource : 

Space is the cruelest resource Resources Time : tweedle thumbs Access (stream): make more passes Program simply will not run – or if data is shifted to disk, will run quite slow(er). Further, if we had more space, maybe we can compute a better (more accurate) synopsis

Examples - I: 

Examples - I Histograms Many error measures V-OPT, Jagadish etal, 1998 O(n2B) time O(nB) space Only O(n) space at a time (working space) O(n2B2) time and O(n) space Is that the best ? Here: O(n2B) time O(n) space.

Example - II: 

Example - II (Haar) Wavelets Orthonormal systems For l2 error store the largest B coeffs of input Does not work for non l2 Find the best B coeffs to retain (note, restricted). Garofalakis & Kumar, 04 O(n2B log B) time O(n2B) space, but O(nB) needed at a time (for l1 ) Here O(n) space, and O(n2) time

Example - III: 

Example - III Extended Wavelets Multiple measures Optimization is similar to Knapsack with choices. Previous best – Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(MnB), but needing O(nM+MB) at a time Guha, Kim, Shim, 04, reduced space to O(BM+min {nM,B2}) Here, O(BM) space

What we will not talk about: 

What we will not talk about Approximation algorithms for histograms Range Query Histograms Basically improvement of a factor B in space across the board. B is not always small, specially when n is large

The main idea: 

The main idea Can we solve using a non DP paradigm ? Well, divide & conquer … Small details – how do we divide ? Interaction Does a small interaction partitioning exist ? How (much size) to represent it ? Ease of finding it (in the given representation) ?

A case study - Histograms: 

A case study - Histograms Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 Consider one bucket. The mean is the best value. A natural DP …

The DP for histograms: 

The DP for histograms Err[i,b] = Error of approximating x1,…,xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i,b] = min Err[i,b], Err[j,b-1] + error(j+1,i) B n

What if: 

What if We could figure out what was the story at the middlepoint ! Two questions So what ? How ? (use a DP)

Wait a minute …: 

Wait a minute … We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ?

The code: 

The code

The end of working space: 

The end of working space If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. Working space = total space.

How much is little ?: 

How much is little ?

Wavelets: 

Wavelets A set of vectors {1,-1,0,0,0,0…}, {0,0,1,-1,0,0,…},{0,0,0,0,1,-1,0,0},{0,0,0,0,0,0,1-1} {1,1,-1,-1,0,0,0,0},{0,0,0,0,1,1,-1,-1} {1,1,1,1,-1,-1,-1,-1},{1,1,1,1,1,1,1,1} A natural multi-resolution

Wavelet Synopsis Construction : 

Wavelet Synopsis Construction Formally, given a signal X and the Haar basis {i} find a representation F=i zi i with at most B non-zero zi minimizing some error which a fn of X-F Restriction. Zi is either 0 or h X,i i Debate. Unrestricted or restricted. Omit.

Wavelets: 

Wavelets ||X-F||1 Long history Matias, Vitter Wang ’98 Garofalakis, Gibbons, ’02 Garofalakis, Kumar, ’04 State of the Art O(n2B log B) time O(n2B) space O(nB) working space Here O(n2log B) time O(n) space SEE ALSO NEXT TALK …

What happens to wavelets [GK04] ?: 

What happens to wavelets [GK04] ?

Extensions: 

Extensions Approximation Algorithms Range Query Histograms Extended Wavelets

Histograms: 

Histograms Saves space across all algorithms except algorithms which extend to general error measure over streams

Range Query: 

Range Query Same story Open Q: faster algorithm obeying synopsis size

That’s all folks: 

That’s all folks