mondrian

Uploaded from authorPOINT
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

MONDRIAN: Annotating and querying databases through colors and blocks: 

Henrico Dolfing Seminar Digital Information Curation MONDRIAN: Annotating and querying databases through colors and blocks

Outline: 

Outline Introduction Colors and Blocks Color Algebra Mondrian System Discussion References

Introduction: 

Introduction Geerts, F., Kementsietsidis, A., Milano, D., 'MONDRIAN: Annotating and querying databases through color and blocks', accepted for ICDE 2006 Annotation-oriented data model for manipulating and querying both data and annotations. MONDRIAN, a prototype implementation of the annotation mechanism

Motivation: 

Motivation Scientific databases Huge amounts of data Different formats (flat text, images, xml, ...) Challenges Integrate, annotate and cross reference such diverse collections of data. Maintain data provenance Pressing needs of biological databases

Use Case (1/2): 

Use Case (1/2) GDB, a human genome database Swissprot, a proteine database

Use Case (2/2): 

Use Case (2/2) PIR, a protein sequence database SwissProt andamp; PIR  UniProt

Colors and Blocks (1/2): 

Colors and Blocks (1/2) I78825 A45770 A01399 A25218 PID 120231 120232 120233 120234 GID P21359 P35240 P01138 P08138 SID John Mary Peter Mary John John, Mary

Colors and Blocks (2/2): 

Colors and Blocks (2/2) Block = annotated group of attribute values Color = each annotation is represented by a color Block overlapping Inheritance Transitivity Color Queries = queries on annotated databases, that are written in a 'Color Algebra'

Color Algebra (1/2): 

Color Algebra (1/2) Projection Selection Cartesian product Block selection Block projections Merge Recoloring Renaming Union

Color Algebra (2/2): 

Color Algebra (2/2) Definition: The color algebra consists of all expressions obtained by composing a finite number of the operators. Theorem: The set of operators in the color algebra is minimal

Projection: 

Projection

L-Type Block Projection: 

L-Type Block Projection

U-Type Block Projection: 

U-Type Block Projection I78825 A45770 A01399 A25218 PID 120231 120232 120233 120234 GID

Combined Block Projection: 

Combined Block Projection I78825 A45770 A01399 A25218 PID 120231 120232 120233 120234 GID

Query example: 

Query example Consider original relation in our use case. Assume we want to find all the tuples that have a block annotated by Mary, or concern the protein with sid P038138. Assume we are only interested in keeping the {gid,sid} attributes from these tuples.

Block Selection: 

Block Selection I78825 A45770 A01399 A25218 PID 120231 120232 120233 120234 GID P21359 P35240 P01138 P08138 SID John Mary Peter Mary John John, Mary

Block Selection: 

Block Selection I78825 A45770 A01399 PID 120231 120232 120233 GID P21359 P35240 P01138 SID Mary Mary John, Mary

Selection: 

Selection I78825 A45770 A01399 A25218 PID 120231 120232 120233 120234 GID P21359 P35240 P01138 P08138 SID

Selection: 

Selection A25218 PID 120234 GID P08138 SID

Union: 

Union

Union: 

Union I78825 A45770 A01399 A25218 PID 120231 120232 120233 120234 GID P21359 P35240 P01138 P08138 SID Mary Mary John, Mary

Projection: 

Projection I78825 A45770 A01399 A25218 PID 120231 120232 120233 120234 GID P21359 P35240 P01138 P08138 SID Mary Mary John, Mary

Projection: 

Projection 120231 120232 120233 120234 GID P21359 P35240 P01138 P08138 SID Mary Mary John, Mary

Cartesian Product: 

Cartesian Product

Cartesian Product: 

Cartesian Product I78825 A45770 A25218 PID 120231 120232 120234 GID 120231 120232 120234 GID’ P21359 P35240 P08138 SID’

Merge: 

Merge Projecting out GID’ I78825 A45770 A25218 PID 120231 120232 120234 GID P21359 P35240 P08138 SID’

Merge: 

Merge Projecting out GID I78825 A45770 A25218 PID 120231 120232 120234 GID’ P21359 P35240 P08138 SID’

Merge: 

120231 120232 120234 Merge I78825 A45770 A25218 PID GID’ P21359 P35240 P08138 SID’

Mondrian System : 

Mondrian System Piet Mondria(a)n: Dutch painter whose paintings mainly consist of color blocks Victory Boogie Woogie (€ 40.000.000)

Desirable properties: 

Desirable properties No restructuring of the existing database schema Only extra tables need to be added Minimum overhead in terms of Space Query execution time Annotations should be treated as first class citizens of the database, ie be able to query them

Current state of Mondrian System: 

Current state of Mondrian System Text based CA Query Equivalent CRA Query Equivalent SQL Query MySQL Relational DBMS Result Graphical CA Query

Relational Representation: 

Relational Representation Assume assoc(pid,bpid), assoc(gid,bgid) and assoc (sid,bsid) Data is separated from annotation representation

Current state: 

Current state Text based CA Query Equivalent CRA Query Equivalent SQL Query MySQL Relational DBMS Result Graphical CA Query

Experimental Results: 

Experimental Results

Discussion: 

Discussion

Literature : 

Literature [Geerts et al., 2005] Geerts, F., Kementsietsidis, A., and Milano, D., „MONDRIAN: Annotating and querying databases through colors and blocks', Accepted for ICDE 2006, 2005 [Buneman et al., 2005] Buneman, P., Bose, R., Ecklund, D., „Annotation in Scientific Data: a Scoping Report', 2005 [Grey et al., 2002] Grey, J., Szalay, A.S., Thakar, A.R., Stoughton, C., van den Berg, J., „Online Scientific Data Curation, Publication, and Archiving' ,Technical Report MSR-TR-2002-74, Microsoft Research, 2002

Colour chart: 

Colour chart How to colour an object Select the preferred colour Click the ‘Format Painter’ button on the button bar Go to preferred slide and click the target object you want to colour What to do with Clipart colours Do not use them! (Except for Océ clipart purposes) Format Painter