logging in or signing up 2a Course Project Alien Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 126 Category: Celebrities License: All Rights Reserved Like it (0) Dislike it (0) Added: July 09, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Statistische Methoden in der ComputerlinguistikStatistical Methods in Computational Linguistics: Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics 2a. Course projects Jonas Kuhn Universität Potsdam, 2007 Leistungen im Kurs: Leistungen im Kurs Übungsaufgaben (werden nicht benotet) 2-3 größere Programmieraufgaben (Abgabe; werden bewertet) Teilnahme in einem 'Projekt-Team' (à 2-5 Mitglieder) Bezug zu einem Gesamt-Kursprojekt (s.u.) Recherchen zu einem Teil-Thema (zu Literatur und/ oder verfügbaren Werkzeugen/Ressourcen) (Kurz-)Referat zu Ergebnissen / evtl. kleines Tutorium zu Techniken von allgemeinem Interesse Experimente mit Werkzeugen bzw. Programmierung Dokumentation der Projektarbeit (nach Teilnehmern aufgeschlüsselt The Spock Challenge: The Spock Challenge The Entity Resolution Problem A common problem that we face is that there are many people with the same name. Given that, how do we distinguish a document about Michael Jackson the singer from Michael Jackson the football player? World-wide contest for a software solution http://challenge.spock.com/ Winning team receives $ 50,000 prize (NOTE RULES! 'Upon acceptance of the prize, the winning Software Submissions and all source code and algorithms related thereto becomes the sole and exclusive property of Spock.') The Spock Challenge: The Spock Challenge With billions of documents and people on the web, we need to identify and cluster web documents accurately to the people they are related to. Mapping these named entities from documents to the correct person is the essence of the Spock Challenge. The Spock Challenge: The Spock Challenge Data set The complete data-set is divided into training and test sets containing roughly 25,000 and 75,000 documents, respectively. Along with a set of documents we've included a set of target names. You can assume that each document contains only one of the target names (even though most documents contain many names). The challenge is to partition all the documents relevant to a target name by their referent. Consider the following two documents with the target name 'Michael Jackson': Michael Jackson - The King of Pop or Wacko Jacko? Michael Jackson statistics - pro-football-reference.com The referents of these articles are the pop star and football player, respectively. We've included the ground truth for the training set so you have something to compare against. The Spock Challenge: The Spock Challenge Test/Application Once you're done training, you can run your algorithm on the test set and submit your results on this site. (http://challenge.spock.com/) We will provide instant feedback in the form of a percentage rank score (using the F-measure). This way you can see how you stack up against the other teams. What good is a problem without a little competition? Course projects inspired by Spock challenge: Course projects inspired by Spock challenge Experiment with various (mostly statistical) NLP techniques on the data set Any Ideas? Sub-tasks (we need a team for each): Sub-tasks (we need a team for each) State of the Art in Entity Resolution (a.k.a. deduplication, or merge-purge) Clustering Starting point: Manning/Schütze 1999, ch. 14 Information/Document Retrieval (?) Starting point: Manning/Schütze 1999, ch. 15 Term weighting techniques Possibly build additional data sets Named Entity Detection Coreference Resolution Parsing, Semantic Role Labelling Using Word-Net (and other ontological resources) Using Wikipedia (and other encyclopaedic resources) Word Sense Disambiguation (possibly similar techniques) Software Integration, Testing You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
2a Course Project Alien Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 126 Category: Celebrities License: All Rights Reserved Like it (0) Dislike it (0) Added: July 09, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Statistische Methoden in der ComputerlinguistikStatistical Methods in Computational Linguistics: Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics 2a. Course projects Jonas Kuhn Universität Potsdam, 2007 Leistungen im Kurs: Leistungen im Kurs Übungsaufgaben (werden nicht benotet) 2-3 größere Programmieraufgaben (Abgabe; werden bewertet) Teilnahme in einem 'Projekt-Team' (à 2-5 Mitglieder) Bezug zu einem Gesamt-Kursprojekt (s.u.) Recherchen zu einem Teil-Thema (zu Literatur und/ oder verfügbaren Werkzeugen/Ressourcen) (Kurz-)Referat zu Ergebnissen / evtl. kleines Tutorium zu Techniken von allgemeinem Interesse Experimente mit Werkzeugen bzw. Programmierung Dokumentation der Projektarbeit (nach Teilnehmern aufgeschlüsselt The Spock Challenge: The Spock Challenge The Entity Resolution Problem A common problem that we face is that there are many people with the same name. Given that, how do we distinguish a document about Michael Jackson the singer from Michael Jackson the football player? World-wide contest for a software solution http://challenge.spock.com/ Winning team receives $ 50,000 prize (NOTE RULES! 'Upon acceptance of the prize, the winning Software Submissions and all source code and algorithms related thereto becomes the sole and exclusive property of Spock.') The Spock Challenge: The Spock Challenge With billions of documents and people on the web, we need to identify and cluster web documents accurately to the people they are related to. Mapping these named entities from documents to the correct person is the essence of the Spock Challenge. The Spock Challenge: The Spock Challenge Data set The complete data-set is divided into training and test sets containing roughly 25,000 and 75,000 documents, respectively. Along with a set of documents we've included a set of target names. You can assume that each document contains only one of the target names (even though most documents contain many names). The challenge is to partition all the documents relevant to a target name by their referent. Consider the following two documents with the target name 'Michael Jackson': Michael Jackson - The King of Pop or Wacko Jacko? Michael Jackson statistics - pro-football-reference.com The referents of these articles are the pop star and football player, respectively. We've included the ground truth for the training set so you have something to compare against. The Spock Challenge: The Spock Challenge Test/Application Once you're done training, you can run your algorithm on the test set and submit your results on this site. (http://challenge.spock.com/) We will provide instant feedback in the form of a percentage rank score (using the F-measure). This way you can see how you stack up against the other teams. What good is a problem without a little competition? Course projects inspired by Spock challenge: Course projects inspired by Spock challenge Experiment with various (mostly statistical) NLP techniques on the data set Any Ideas? Sub-tasks (we need a team for each): Sub-tasks (we need a team for each) State of the Art in Entity Resolution (a.k.a. deduplication, or merge-purge) Clustering Starting point: Manning/Schütze 1999, ch. 14 Information/Document Retrieval (?) Starting point: Manning/Schütze 1999, ch. 15 Term weighting techniques Possibly build additional data sets Named Entity Detection Coreference Resolution Parsing, Semantic Role Labelling Using Word-Net (and other ontological resources) Using Wikipedia (and other encyclopaedic resources) Word Sense Disambiguation (possibly similar techniques) Software Integration, Testing