pods03

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Concise Description of Structured Sets: 

Concise Description of Structured Sets Alberto O. Mendelzon Ken Q. Pu University of Toronto

Motivation: A true story: 

Motivation: A true story An OLAP report writing software: GUI: Range Selection SQL Translation RDBMS Report Writer / Plotting

The GUI: 

The GUI User selects the region of interest via GUI: CTRL-A selects ALL, Click’n-drag selects LOTS.

An Engineering Problem and Solution: 

An Engineering Problem and Solution The Problem: SQL statement has a length limit. User selection is LARGE, especially when the visualization is plotting. The Solution: Break a large selection to a collection of SQL queries. The Complaint: Kind of slow!

A formalization: 

A formalization A tree-structured categorical set: (dimensions) A selected subset: (user selection) Find the shortest possible description of the selected subset: (translated SQL query)

Outline: 

Outline Formal definition of structured sets, expressions and the Minimal Descriptive Length (MDL) problem Partition structures Hierarchical structures Multidimensional product structures

The structured set: 

The structured set A finite universe: U A finite alphabet:  An interpretation function: [  ] :   P(U) E.g. [ a ] = { 1, 2, 3 } [ 1 ] = { 1 } U/ is the set cover naturally induced by  on U.

Expressions: 

Expressions A language L An evaluation function: [  ] : L  P(U) The Propositional Language: L (U, ) L =  | L + L | L – L | L ·L The L+ (U, ) :L+ =  | L+ + L+ “+” is union, “ – ” is difference, “ · ” is intersection.

The L -MDL Problem: 

The L -MDL Problem Let V  U, its descriptive length: ||V||L = min { ||s|| : [ s ] = V and s  L } The L -MDL decision problem: ||V||L  k ? The L -MDL problem: Find a (non-unique) compact (shortest) expression for V in L.

Complexity Issues: 

Complexity Issues The L (U, )-MDL problem is NP-complete. The L+-MDL problem is NP-complete. [reduction from set-cover] Is there anything good we can do? Back to Engineering

Tractable MDL-Problems: 

Tractable MDL-Problems Strategy: reduce the complexity of the MDL problem by restricting to a class of structures. Two tractable cases: Partition:  = 1  U, U/1 is a partition Hierarchy:  = N  N-1  … 1  U, and U/i  U/i+1 for all i > 0.

Partition: 

Partition Toronto Ottawa Windsor San Jose San Francisco San Diego U = { Ottawa, Toronto, Windsor, Kingston San Jose, San Francisco, San Diego } 1 = { Ontario, California } Kingston

Efficient Symbols: 

Efficient Symbols V = { Ottawa, Toronto, Kingston, San Francisco } #(V) = { Ontario } California is not efficient. Efficient symbols for V: #(V) = {  1 : | [ ]  V |  | [ ] – V | + 1 }

Normal Form for Partitions: 

Normal Form for Partitions N is a sub-language of L : t = (1 + 2 + 3 + …) + (a1 + a2 + a3 + …) – (b1 + b2 + b3 + …) t is N -compact for V if S = #(V) A = V – [ S ] B = [ S ] – V S A B

L-MDL for Partitions: 

L-MDL for Partitions Every expression in L can be reduced to an equivalent expression in N ! To compute a compact expression for V:  (V) = #(V) (V) = (V – [ (V)] , [ (V)] – V )

An Example: 

An Example Toronto Ottawa Windsor San Jose San Francisco San Diego Kingston V = { Ottawa, Toronto, Kingston, San Francisco }  (V) = { Ontario }, (V) = ({San Francisco}, {Windsor}) t = Ontario + San Francisco – Windsor

A Normal Form for Hierarchy: 

A Normal Form for Hierarchy Levels: (0=U)  1  2  3 …  N A recursive definition for N : The expression t is normal w.r.t level i if t = t’ + A – B, with A, B  i and t’ is normal w.r.t. level (i+1). Example: t = (( Canada + California – Ontario ) + Toronto – San Jose) t’ A B

L-MDL for Hierarchies: 

L-MDL for Hierarchies Every expression in L can be reduced to an equivalent expression in N ! t = t’ + A – B is a N -compact expression for V if [ t’ ]1 = 1#(V) and is N -compact (A, B) = (V)

An iterative algorithm: 

An iterative algorithm V ( VN + AN - BN ) ( s3 + A3 - B3 ) ( s1 + A1 - B1 ) is compact

Back to Engineering: 

Back to Engineering

Multiple dimensions: 

What about multi-dimensional sets: V  U1  U2 U1 and U2 are hierarchically structured sets: U1 has  = 0  1  2  … U2 has  = 0  1  2  … Form a new (product) structured set: U = U1  U2  =    Interpretation: Multiple dimensions .

Example:: 

Example: U1 -- Time = { Jan, Feb, Mar } U2 -- Geography = { Ontario, California } [Jan  Ontario] = {(Jan, Ontario)} L-MDL for product structure is NP-complete.

Reduction by example: 

Reduction by example Reduction from 3-set cover

Related Work: 

Related Work “The generalized MDL approach for summarization”, Lakshmanan, Ng, Wang, Zhou, Johnson, VLDB, 1999 limited expressiveness for multidimensional data sets; optimal expression is computed in polynomial time Multidimensional MDL problem resembles rectilinear polygon packing and covering problem “Covering rectilinear polygons with axis-parallel rectangles”, Kumar, Ramesh, ACM STOC, 1999

What’s next: 

What’s next More complex structures -- ordering More complex operators -- range operator Conjectured and known NP-hardness of the MDL problems: L (, +) is NP-complete , polynomial if  is total. ?

Slide26: 

The End.