logging in or signing up ismb03 slides1 15 Teodora Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 63 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 07, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Processing of the natural language queries to a relational database: Processing of the natural language queries to a relational database Maria Samsonova, Andrei Pisarev and Maxim Blagov St.Petersburg State Polytechnical University, St.Petersburg, Russia Slide2: FlyEx database Contains Images of segmentation gene expression in individual embryos. Quantitative data on segmentation gene expression in each nucleus of an individual embryo. Averaged data on expression of each segmentation gene at each time point and at cellular resolution. Natural language interface is available at http://urchin.spbcas.ru/NLP/NLP.htmlSlide3: Conceptual Schema “is characterized by”(“EMBRYO”,“DEVELOPMENTAL TIME”) “is an instance of”(“TIME FROM ONSET OF CYCLE 14A”, “DEVELOPMENTAL TIME”) “is derived from”(“TIME FROM ONSET OF CYCLE 14A”, “INVAGINATING MEMBRANE”), etc. Slide4: Main Steps of Processing of a Query in NLSlide5: Processing of the NL Query ‘Which embryos are scanned for expression of bcd and belong to stage 4?’ Slide6: Transformation of Synonyms and Concepts to Logical Level Terms 1. Transformation rules: ‘stage 4 ’ -> ‘cleavage cycle 11 or 12 or 13 ’, ‘stage 5’ -> ‘cleavage cycle 14A’, ‘embryonic syncytium’ -> ‘nuclear mask’, etc. 2. Dictionary, containing all word forms of these concepts. 3. Table, which contains conditional probabilities of correspondence of each concept’s main word form to the term of logical level. Slide7: Transformation of Synonyms and Concepts to Logical Level Terms: an example ‘Which embryos are scanned for expression of bcd and belong to stage 4?’ Stage 4 -> cleavage cycle 11 or 12 or 13 , as transformation condition r = 1 is satisfied. L = [1 1 0 0 0] A = R = [1 0.5 0] 0.5 0.5 0 0.5 0 0 0 0.5 0 0 0 0.5 0 0 0.5Slide8: Domain-oriented Dictionary `Which embryos are scanned for expression of bcd and belong to cleavage cycle 11 or 12 or 13?’ English dictionary: .................. cleavage/S; cycle/DGRS; embryo/MS which; bcd; Embryo/MS: S: embryo + -s = embryos; M: embryo + ‘s = embryo’s Cycle/DGRS: D: cycle + -ed = cycled, G: cycle + -ing = cycling, R: cycle + -er = cycler, S: cycle + -s = cycles Slide9: Algorithm of Word Conversion to the Main Word Form 1. Check if the word given is in the dictionary. If yes, it means that the main word form was used already in the query. 2. Otherwise the main word form is to be constructed. Apply rules of generation of word forms in reverse order, e.g. Rule M: embryos – ‘s = no result Rule S: embryos - -s = embryo Check a possibility of generation of the initial word form from the main word form: S: embryo + -s = embryos .Slide10: Initial Chain of Semantic Components ‘Which embryos are scanned for expression of bcd and belong to cleavage cycle 11 or 12 or 13?’ embryos(table,metadata.embryo,metadata.embryo) bcd(value, metadata.protein.name,’bicoid') and(logic,-,and) cleavage(field,metadata.embryo.cleavage, metadata.embryo.cleavage) cycle(field,metadata.embryo.cleavage, metadata.embryo.cleavage) 11(value, metadata.embryo.cleavage,11) or(logic,-,or) 12(value, metadata.embryo.cleavage,12) or(logic,-,or) 13(value, metadata.embryo.cleavage,13) In brackets type, orientation and body of each semantic component are shown. Slide11: Types of Semantic Components Type Description Table is mapped on the database table Field is mapped on the field of the database table Value value of the table field Relation relation operations (>, <, =, etc.) between semantic components of other types. Half interval one-side intervals of the field values > 5, < = 5 Aggregate main aggregation operations (COUNT, SUM, MIN, MAX) Logic operands AND, OR, NOT. Predicate represents a condition of selection of rows in a SQL query Function combines the component of the type aggregate with the component of the type table (e.g., COUNT (embryo.*)) or field (e.g., COUNT(embryo.embrid)). Slide12: Construction of semantic network Productions rules if <applicability condition> then <operator> IF c1.type – field; c2.type – value, c3.type – logic, c4.type – value, …….. c1.orientation = c2.orientation = c4.orientation … THEN c = (predicate,c1.orientation,(c1.body = c2.body c3.body c1.body = c4.body …)) embryos (table, metadata.embryo, metadata.embryo) bcd (value, metadata.protein.name, ‘bicoid') and (logic, -, and) (predicate, metadata.embryo.cleavage, ((metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13))) Slide13: SQL query generation SELECT fieldlist FROM tablelist WHERE condition Semantic network: embryos (table, metadata.embryo, metadata.embryo) (predicate, -, metadata.protein.name=‘bicoid' and ((metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13))) SQL query: SELECT DISTINCT metadata.embryo.cleavage as "Cleavage cycle", metadata.embryo.name as "Embryo name", metadata.embryo.temporal as "Temporal class“ FROM metadata.embryochannel, metadata.protein, metadata.embryo WHERE (metadata.protein.name='bicoid') and (metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13)) and (metadata.protein.prid=metadata.embryochannel.prid and metadata.embryo.embrid=metadata.embryochannel.embrid) Slide14: SQL query optimization Slide15: SQL query optimization: the algorithm 1. Construction of a matrix of the shortest paths using the Floyd algorithm. 2. Search of the shortest path between each pair of nodes. All nodes, appearing in these paths, form a set of nodes. 3. Construction of a sub-graph, which contains only the nodes, which appear in the set. 4. Construction of the minimum skeleton tree of the sub-graph by means of the Kruskal algorithm. 5. All terminal nodes, which have only one adjacent edge and were not listed as a requested semantic components, are deleted from the skeleton. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
ismb03 slides1 15 Teodora Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 63 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 07, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Processing of the natural language queries to a relational database: Processing of the natural language queries to a relational database Maria Samsonova, Andrei Pisarev and Maxim Blagov St.Petersburg State Polytechnical University, St.Petersburg, Russia Slide2: FlyEx database Contains Images of segmentation gene expression in individual embryos. Quantitative data on segmentation gene expression in each nucleus of an individual embryo. Averaged data on expression of each segmentation gene at each time point and at cellular resolution. Natural language interface is available at http://urchin.spbcas.ru/NLP/NLP.htmlSlide3: Conceptual Schema “is characterized by”(“EMBRYO”,“DEVELOPMENTAL TIME”) “is an instance of”(“TIME FROM ONSET OF CYCLE 14A”, “DEVELOPMENTAL TIME”) “is derived from”(“TIME FROM ONSET OF CYCLE 14A”, “INVAGINATING MEMBRANE”), etc. Slide4: Main Steps of Processing of a Query in NLSlide5: Processing of the NL Query ‘Which embryos are scanned for expression of bcd and belong to stage 4?’ Slide6: Transformation of Synonyms and Concepts to Logical Level Terms 1. Transformation rules: ‘stage 4 ’ -> ‘cleavage cycle 11 or 12 or 13 ’, ‘stage 5’ -> ‘cleavage cycle 14A’, ‘embryonic syncytium’ -> ‘nuclear mask’, etc. 2. Dictionary, containing all word forms of these concepts. 3. Table, which contains conditional probabilities of correspondence of each concept’s main word form to the term of logical level. Slide7: Transformation of Synonyms and Concepts to Logical Level Terms: an example ‘Which embryos are scanned for expression of bcd and belong to stage 4?’ Stage 4 -> cleavage cycle 11 or 12 or 13 , as transformation condition r = 1 is satisfied. L = [1 1 0 0 0] A = R = [1 0.5 0] 0.5 0.5 0 0.5 0 0 0 0.5 0 0 0 0.5 0 0 0.5Slide8: Domain-oriented Dictionary `Which embryos are scanned for expression of bcd and belong to cleavage cycle 11 or 12 or 13?’ English dictionary: .................. cleavage/S; cycle/DGRS; embryo/MS which; bcd; Embryo/MS: S: embryo + -s = embryos; M: embryo + ‘s = embryo’s Cycle/DGRS: D: cycle + -ed = cycled, G: cycle + -ing = cycling, R: cycle + -er = cycler, S: cycle + -s = cycles Slide9: Algorithm of Word Conversion to the Main Word Form 1. Check if the word given is in the dictionary. If yes, it means that the main word form was used already in the query. 2. Otherwise the main word form is to be constructed. Apply rules of generation of word forms in reverse order, e.g. Rule M: embryos – ‘s = no result Rule S: embryos - -s = embryo Check a possibility of generation of the initial word form from the main word form: S: embryo + -s = embryos .Slide10: Initial Chain of Semantic Components ‘Which embryos are scanned for expression of bcd and belong to cleavage cycle 11 or 12 or 13?’ embryos(table,metadata.embryo,metadata.embryo) bcd(value, metadata.protein.name,’bicoid') and(logic,-,and) cleavage(field,metadata.embryo.cleavage, metadata.embryo.cleavage) cycle(field,metadata.embryo.cleavage, metadata.embryo.cleavage) 11(value, metadata.embryo.cleavage,11) or(logic,-,or) 12(value, metadata.embryo.cleavage,12) or(logic,-,or) 13(value, metadata.embryo.cleavage,13) In brackets type, orientation and body of each semantic component are shown. Slide11: Types of Semantic Components Type Description Table is mapped on the database table Field is mapped on the field of the database table Value value of the table field Relation relation operations (>, <, =, etc.) between semantic components of other types. Half interval one-side intervals of the field values > 5, < = 5 Aggregate main aggregation operations (COUNT, SUM, MIN, MAX) Logic operands AND, OR, NOT. Predicate represents a condition of selection of rows in a SQL query Function combines the component of the type aggregate with the component of the type table (e.g., COUNT (embryo.*)) or field (e.g., COUNT(embryo.embrid)). Slide12: Construction of semantic network Productions rules if <applicability condition> then <operator> IF c1.type – field; c2.type – value, c3.type – logic, c4.type – value, …….. c1.orientation = c2.orientation = c4.orientation … THEN c = (predicate,c1.orientation,(c1.body = c2.body c3.body c1.body = c4.body …)) embryos (table, metadata.embryo, metadata.embryo) bcd (value, metadata.protein.name, ‘bicoid') and (logic, -, and) (predicate, metadata.embryo.cleavage, ((metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13))) Slide13: SQL query generation SELECT fieldlist FROM tablelist WHERE condition Semantic network: embryos (table, metadata.embryo, metadata.embryo) (predicate, -, metadata.protein.name=‘bicoid' and ((metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13))) SQL query: SELECT DISTINCT metadata.embryo.cleavage as "Cleavage cycle", metadata.embryo.name as "Embryo name", metadata.embryo.temporal as "Temporal class“ FROM metadata.embryochannel, metadata.protein, metadata.embryo WHERE (metadata.protein.name='bicoid') and (metadata.embryo.cleavage=11) or (metadata.embryo.cleavage=12) or (metadata.embryo.cleavage=13)) and (metadata.protein.prid=metadata.embryochannel.prid and metadata.embryo.embrid=metadata.embryochannel.embrid) Slide14: SQL query optimization Slide15: SQL query optimization: the algorithm 1. Construction of a matrix of the shortest paths using the Floyd algorithm. 2. Search of the shortest path between each pair of nodes. All nodes, appearing in these paths, form a set of nodes. 3. Construction of a sub-graph, which contains only the nodes, which appear in the set. 4. Construction of the minimum skeleton tree of the sub-graph by means of the Kruskal algorithm. 5. All terminal nodes, which have only one adjacent edge and were not listed as a requested semantic components, are deleted from the skeleton.