logging in or signing up ijcnlp 20080109 Carmela Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 26 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 19, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Minimally Supervised Learning of Semantic Knowledge from Query Logs: Minimally Supervised Learning of Semantic Knowledge from Query Logs IJCNLP-08, Hyderabad, India 2008/3/17 Mamoru Komachi(†) and Hisami Suzuki(‡) (†) Nara Institute of Science and Technology, Japan (‡) Microsoft Research, USATask: Task 2008/3/17 2 Learn semantic categories from web search query logs by bootstrapping with minimal supervision Semantic category: a set of words which are interrelated Named entities, technical terms, paraphrases, … Can be useful for search ads, etc… Darjeeling Chai (Indian tea) Kombucha (Japanese tea) similar similarApproach: Approach 2008/3/17 3Our Contribution: Our Contribution 2008/3/17 4Table of Contents: Table of Contents 2008/3/17 5Bootstrapping: Bootstrapping Iteratively conduct pattern induction and instance extraction starting from seed instances Can fertilize small set of seed instances Instances Contextual patterns Query log (Corpus) vaio Compare vaio laptop Compare # laptop Compare toshiba satellite laptop Compare HP xb3000 laptop Toshiba satellite HP xb3000 #:slot Instance lookup and pattern induction: Instance lookup and pattern induction 2008/3/17 7 ANA 予約 ANA # 予約 query log instance extracted pattern Restaurant reservation? Flight reservation? Generic patterns Broad coverage, Noisy patternsInstance/Pattern Scoring Metrics: Instance/Pattern Scoring Metrics 2008/3/17 8 P: patterns in corpus I: instances in corpus PMI: pointwise mutual information r: reliability score Reliability of an instance and a pattern is mutually defined PMI is normalized by the maximum of all P and IProblems of Espresso: Problems of Espresso 2008/3/17 9The Tchai Algorithm: The Tchai Algorithm 2008/3/17 10Comparison of methods: Comparison of methods 2008/3/17 11Experiments: Experiments 2008/3/17 12Results: Results Travel Finance 2008/3/17 13 Due to the ambiguity of hand labeling (e.g. Tokyo Disney Land) Include common nouns related to Travel (e.g. Rental car) High precision (92.1%) Learned 251 novel wordsSample of Instances (Travel category): Sample of Instances (Travel category) 2008/3/17 14 Able to learn several sub-categories in which no seed words givenImpact of Pattern Induction: Impact of Pattern Induction 2008/3/17 15 Effect of each modification: Effect of each modification 2008/3/17 16 Scaling factor has the most impact Filtering outperforms no-filtering constantlySystem Performance: System Performance Travel Finance 2008/3/17 17 Relative Recall (Pantel et al., 2004) High precision and recall High precision but low relative recall due to strict filteringCumulative precision: Travel: Cumulative precision: Travel 2008/3/17 18 Tchai achieved the best precisionCumulative precision: Finance: Cumulative precision: Finance 2008/3/17 19 Both Basilisk and Espresso suffered from acquiring generic pattern in early stages of iterationSample Extracted Patterns: Sample Extracted Patterns 2008/3/17 20 Basilisk and Espresso extracted location names as context patterns, which may be too generic for Travel domain Tchai found context patterns that are characteristic to the domainConclusion and future work: Conclusion and future work 2008/3/17 21Thank you for listening! : Thank you for listening! 2008/3/17 22 Tchai You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
ijcnlp 20080109 Carmela Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 26 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 19, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Minimally Supervised Learning of Semantic Knowledge from Query Logs: Minimally Supervised Learning of Semantic Knowledge from Query Logs IJCNLP-08, Hyderabad, India 2008/3/17 Mamoru Komachi(†) and Hisami Suzuki(‡) (†) Nara Institute of Science and Technology, Japan (‡) Microsoft Research, USATask: Task 2008/3/17 2 Learn semantic categories from web search query logs by bootstrapping with minimal supervision Semantic category: a set of words which are interrelated Named entities, technical terms, paraphrases, … Can be useful for search ads, etc… Darjeeling Chai (Indian tea) Kombucha (Japanese tea) similar similarApproach: Approach 2008/3/17 3Our Contribution: Our Contribution 2008/3/17 4Table of Contents: Table of Contents 2008/3/17 5Bootstrapping: Bootstrapping Iteratively conduct pattern induction and instance extraction starting from seed instances Can fertilize small set of seed instances Instances Contextual patterns Query log (Corpus) vaio Compare vaio laptop Compare # laptop Compare toshiba satellite laptop Compare HP xb3000 laptop Toshiba satellite HP xb3000 #:slot Instance lookup and pattern induction: Instance lookup and pattern induction 2008/3/17 7 ANA 予約 ANA # 予約 query log instance extracted pattern Restaurant reservation? Flight reservation? Generic patterns Broad coverage, Noisy patternsInstance/Pattern Scoring Metrics: Instance/Pattern Scoring Metrics 2008/3/17 8 P: patterns in corpus I: instances in corpus PMI: pointwise mutual information r: reliability score Reliability of an instance and a pattern is mutually defined PMI is normalized by the maximum of all P and IProblems of Espresso: Problems of Espresso 2008/3/17 9The Tchai Algorithm: The Tchai Algorithm 2008/3/17 10Comparison of methods: Comparison of methods 2008/3/17 11Experiments: Experiments 2008/3/17 12Results: Results Travel Finance 2008/3/17 13 Due to the ambiguity of hand labeling (e.g. Tokyo Disney Land) Include common nouns related to Travel (e.g. Rental car) High precision (92.1%) Learned 251 novel wordsSample of Instances (Travel category): Sample of Instances (Travel category) 2008/3/17 14 Able to learn several sub-categories in which no seed words givenImpact of Pattern Induction: Impact of Pattern Induction 2008/3/17 15 Effect of each modification: Effect of each modification 2008/3/17 16 Scaling factor has the most impact Filtering outperforms no-filtering constantlySystem Performance: System Performance Travel Finance 2008/3/17 17 Relative Recall (Pantel et al., 2004) High precision and recall High precision but low relative recall due to strict filteringCumulative precision: Travel: Cumulative precision: Travel 2008/3/17 18 Tchai achieved the best precisionCumulative precision: Finance: Cumulative precision: Finance 2008/3/17 19 Both Basilisk and Espresso suffered from acquiring generic pattern in early stages of iterationSample Extracted Patterns: Sample Extracted Patterns 2008/3/17 20 Basilisk and Espresso extracted location names as context patterns, which may be too generic for Travel domain Tchai found context patterns that are characteristic to the domainConclusion and future work: Conclusion and future work 2008/3/17 21Thank you for listening! : Thank you for listening! 2008/3/17 22 Tchai