ismb03 slides1 15

Category: Education

Presentation Description

No description available.


Presentation Transcript

Processing of the natural language queries to a relational database: 

Processing of the natural language queries to a relational database Maria Samsonova, Andrei Pisarev and Maxim Blagov St.Petersburg State Polytechnical University, St.Petersburg, Russia


FlyEx database Contains Images of segmentation gene expression in individual embryos. Quantitative data on segmentation gene expression in each nucleus of an individual embryo.  Averaged data on expression of each segmentation gene at each time point and at cellular resolution. Natural language interface is available at


Conceptual Schema “is characterized by”(“EMBRYO”,“DEVELOPMENTAL TIME”) “is an instance of”(“TIME FROM ONSET OF CYCLE 14A”, “DEVELOPMENTAL TIME”) “is derived from”(“TIME FROM ONSET OF CYCLE 14A”, “INVAGINATING MEMBRANE”), etc.


Main Steps of Processing of a Query in NL


Processing of the NL Query ‘Which embryos are scanned for expression of bcd and belong to stage 4?’


Transformation of Synonyms and Concepts to Logical Level Terms 1. Transformation rules: ‘stage 4 ’ -> ‘cleavage cycle 11 or 12 or 13 ’, ‘stage 5’ -> ‘cleavage cycle 14A’, ‘embryonic syncytium’ -> ‘nuclear mask’, etc. 2. Dictionary, containing all word forms of these concepts. 3. Table, which contains conditional probabilities of correspondence of each concept’s main word form to the term of logical level.


Transformation of Synonyms and Concepts to Logical Level Terms: an example ‘Which embryos are scanned for expression of bcd and belong to stage 4?’ Stage 4 -> cleavage cycle 11 or 12 or 13 , as transformation condition r = 1 is satisfied. L = [1 1 0 0 0] A = R = [1 0.5 0] 0.5 0.5 0 0.5 0 0 0 0.5 0 0 0 0.5 0 0 0.5


Domain-oriented Dictionary `Which embryos are scanned for expression of bcd and belong to cleavage cycle 11 or 12 or 13?’ English dictionary: .................. cleavage/S; cycle/DGRS; embryo/MS which; bcd; Embryo/MS: S: embryo + -s = embryos; M: embryo + ‘s = embryo’s Cycle/DGRS: D: cycle + -ed = cycled, G: cycle + -ing = cycling, R: cycle + -er = cycler, S: cycle + -s = cycles  


Algorithm of Word Conversion to the Main Word Form 1. Check if the word given is in the dictionary. If yes, it means that the main word form was used already in the query. 2. Otherwise the main word form is to be constructed.   Apply rules of generation of word forms in reverse order, e.g. Rule M: embryos – ‘s = no result Rule S: embryos - -s = embryo Check a possibility of generation of the initial word form from the main word form:   S: embryo + -s = embryos .


Initial Chain of Semantic Components ‘Which embryos are scanned for expression of bcd and belong to cleavage cycle 11 or 12 or 13?’   embryos(table,metadata.embryo,metadata.embryo) bcd(value,,’bicoid') and(logic,-,and) cleavage(field,metadata.embryo.cleavage, metadata.embryo.cleavage) cycle(field,metadata.embryo.cleavage, metadata.embryo.cleavage) 11(value, metadata.embryo.cleavage,11) or(logic,-,or) 12(value, metadata.embryo.cleavage,12) or(logic,-,or) 13(value, metadata.embryo.cleavage,13)   In brackets type, orientation and body of each semantic component are shown.


Types of Semantic Components Type Description   Table is mapped on the database table Field is mapped on the field of the database table Value value of the table field Relation relation operations (>, <, =, etc.) between semantic components of other types. Half interval one-side intervals of the field values > 5, < = 5 Aggregate main aggregation operations (COUNT, SUM, MIN, MAX) Logic operands AND, OR, NOT. Predicate represents a condition of selection of rows in a SQL query Function combines the component of the type aggregate with the component of the type table (e.g., COUNT (embryo.*)) or field (e.g., COUNT(embryo.embrid)).


Construction of semantic network Productions rules if <applicability condition> then <operator>   IF c1.type – field; c2.type – value, c3.type – logic, c4.type – value, …….. c1.orientation = c2.orientation = c4.orientation …   THEN c = (predicate,c1.orientation,(c1.body = c2.body c3.body c1.body = c4.body …))  embryos (table, metadata.embryo, metadata.embryo) bcd (value,, ‘bicoid') and (logic, -, and) (predicate, metadata.embryo.cleavage, ((metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13)))


SQL query generation SELECT fieldlist FROM tablelist WHERE condition   Semantic network: embryos (table, metadata.embryo, metadata.embryo) (predicate, -,‘bicoid' and ((metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13))) SQL query: SELECT DISTINCT metadata.embryo.cleavage as "Cleavage cycle", as "Embryo name", metadata.embryo.temporal as "Temporal class“ FROM metadata.embryochannel, metadata.protein, metadata.embryo WHERE ('bicoid') and (metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13)) and (metadata.protein.prid=metadata.embryochannel.prid and metadata.embryo.embrid=metadata.embryochannel.embrid)


SQL query optimization


SQL query optimization: the algorithm 1. Construction of a matrix of the shortest paths using the Floyd algorithm. 2. Search of the shortest path between each pair of nodes. All nodes, appearing in these paths, form a set of nodes. 3. Construction of a sub-graph, which contains only the nodes, which appear in the set. 4. Construction of the minimum skeleton tree of the sub-graph by means of the Kruskal algorithm. 5. All terminal nodes, which have only one adjacent edge and were not listed as a requested semantic components, are deleted from the skeleton.

authorStream Live Help