logging in or signing up s409 guha Pumbaa Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 32 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 03, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS: SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENNSynopses: Synopses Given n input numbers, summarize the input using B numbers, minimizing some error. Examples Histograms – piecewise constant repn. Wavelets – uses the wavelet basis Fourier, Bessel, SVD, what have you… Why space efficiency: Why space efficiency “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things.” Woody Allen. From a computational viewpoint however… Space is the cruelest resource : Space is the cruelest resource Resources Time : tweedle thumbs Access (stream): make more passes Program simply will not run – or if data is shifted to disk, will run quite slow(er). Further, if we had more space, maybe we can compute a better (more accurate) synopsisExamples - I: Examples - I Histograms Many error measures V-OPT, Jagadish etal, 1998 O(n2B) time O(nB) space Only O(n) space at a time (working space) O(n2B2) time and O(n) space Is that the best ? Here: O(n2B) time O(n) space.Example - II: Example - II (Haar) Wavelets Orthonormal systems For l2 error store the largest B coeffs of input Does not work for non l2 Find the best B coeffs to retain (note, restricted). Garofalakis & Kumar, 04 O(n2B log B) time O(n2B) space, but O(nB) needed at a time (for l1 ) Here O(n) space, and O(n2) timeExample - III: Example - III Extended Wavelets Multiple measures Optimization is similar to Knapsack with choices. Previous best – Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(MnB), but needing O(nM+MB) at a time Guha, Kim, Shim, 04, reduced space to O(BM+min {nM,B2}) Here, O(BM) space What we will not talk about: What we will not talk about Approximation algorithms for histograms Range Query Histograms Basically improvement of a factor B in space across the board. B is not always small, specially when n is largeThe main idea: The main idea Can we solve using a non DP paradigm ? Well, divide & conquer … Small details – how do we divide ? Interaction Does a small interaction partitioning exist ? How (much size) to represent it ? Ease of finding it (in the given representation) ?A case study - Histograms: A case study - Histograms Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 Consider one bucket. The mean is the best value. A natural DP … The DP for histograms: The DP for histograms Err[i,b] = Error of approximating x1,…,xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i,b] = min Err[i,b], Err[j,b-1] + error(j+1,i) B n What if: What if We could figure out what was the story at the middlepoint ! Two questions So what ? How ? (use a DP) Wait a minute …: Wait a minute … We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ?The code: The codeThe end of working space: The end of working space If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. Working space = total space.How much is little ?: How much is little ?Wavelets: Wavelets A set of vectors {1,-1,0,0,0,0…}, {0,0,1,-1,0,0,…},{0,0,0,0,1,-1,0,0},{0,0,0,0,0,0,1-1} {1,1,-1,-1,0,0,0,0},{0,0,0,0,1,1,-1,-1} {1,1,1,1,-1,-1,-1,-1},{1,1,1,1,1,1,1,1} A natural multi-resolution Wavelet Synopsis Construction : Wavelet Synopsis Construction Formally, given a signal X and the Haar basis {i} find a representation F=i zi i with at most B non-zero zi minimizing some error which a fn of X-F Restriction. Zi is either 0 or h X,i i Debate. Unrestricted or restricted. Omit. Wavelets: Wavelets ||X-F||1 Long history Matias, Vitter Wang ’98 Garofalakis, Gibbons, ’02 Garofalakis, Kumar, ’04 State of the Art O(n2B log B) time O(n2B) space O(nB) working space Here O(n2log B) time O(n) space SEE ALSO NEXT TALK …What happens to wavelets [GK04] ?: What happens to wavelets [GK04] ?Extensions: Extensions Approximation Algorithms Range Query Histograms Extended WaveletsHistograms: Histograms Saves space across all algorithms except algorithms which extend to general error measure over streamsRange Query: Range Query Same story Open Q: faster algorithm obeying synopsis sizeThat’s all folks: That’s all folks You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
s409 guha Pumbaa Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 32 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 03, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS: SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENNSynopses: Synopses Given n input numbers, summarize the input using B numbers, minimizing some error. Examples Histograms – piecewise constant repn. Wavelets – uses the wavelet basis Fourier, Bessel, SVD, what have you… Why space efficiency: Why space efficiency “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things.” Woody Allen. From a computational viewpoint however… Space is the cruelest resource : Space is the cruelest resource Resources Time : tweedle thumbs Access (stream): make more passes Program simply will not run – or if data is shifted to disk, will run quite slow(er). Further, if we had more space, maybe we can compute a better (more accurate) synopsisExamples - I: Examples - I Histograms Many error measures V-OPT, Jagadish etal, 1998 O(n2B) time O(nB) space Only O(n) space at a time (working space) O(n2B2) time and O(n) space Is that the best ? Here: O(n2B) time O(n) space.Example - II: Example - II (Haar) Wavelets Orthonormal systems For l2 error store the largest B coeffs of input Does not work for non l2 Find the best B coeffs to retain (note, restricted). Garofalakis & Kumar, 04 O(n2B log B) time O(n2B) space, but O(nB) needed at a time (for l1 ) Here O(n) space, and O(n2) timeExample - III: Example - III Extended Wavelets Multiple measures Optimization is similar to Knapsack with choices. Previous best – Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(MnB), but needing O(nM+MB) at a time Guha, Kim, Shim, 04, reduced space to O(BM+min {nM,B2}) Here, O(BM) space What we will not talk about: What we will not talk about Approximation algorithms for histograms Range Query Histograms Basically improvement of a factor B in space across the board. B is not always small, specially when n is largeThe main idea: The main idea Can we solve using a non DP paradigm ? Well, divide & conquer … Small details – how do we divide ? Interaction Does a small interaction partitioning exist ? How (much size) to represent it ? Ease of finding it (in the given representation) ?A case study - Histograms: A case study - Histograms Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 Consider one bucket. The mean is the best value. A natural DP … The DP for histograms: The DP for histograms Err[i,b] = Error of approximating x1,…,xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i,b] = min Err[i,b], Err[j,b-1] + error(j+1,i) B n What if: What if We could figure out what was the story at the middlepoint ! Two questions So what ? How ? (use a DP) Wait a minute …: Wait a minute … We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ?The code: The codeThe end of working space: The end of working space If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. Working space = total space.How much is little ?: How much is little ?Wavelets: Wavelets A set of vectors {1,-1,0,0,0,0…}, {0,0,1,-1,0,0,…},{0,0,0,0,1,-1,0,0},{0,0,0,0,0,0,1-1} {1,1,-1,-1,0,0,0,0},{0,0,0,0,1,1,-1,-1} {1,1,1,1,-1,-1,-1,-1},{1,1,1,1,1,1,1,1} A natural multi-resolution Wavelet Synopsis Construction : Wavelet Synopsis Construction Formally, given a signal X and the Haar basis {i} find a representation F=i zi i with at most B non-zero zi minimizing some error which a fn of X-F Restriction. Zi is either 0 or h X,i i Debate. Unrestricted or restricted. Omit. Wavelets: Wavelets ||X-F||1 Long history Matias, Vitter Wang ’98 Garofalakis, Gibbons, ’02 Garofalakis, Kumar, ’04 State of the Art O(n2B log B) time O(n2B) space O(nB) working space Here O(n2log B) time O(n) space SEE ALSO NEXT TALK …What happens to wavelets [GK04] ?: What happens to wavelets [GK04] ?Extensions: Extensions Approximation Algorithms Range Query Histograms Extended WaveletsHistograms: Histograms Saves space across all algorithms except algorithms which extend to general error measure over streamsRange Query: Range Query Same story Open Q: faster algorithm obeying synopsis sizeThat’s all folks: That’s all folks