logging in or signing up March2006 english final Javier Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 58 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 26, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Bilingual Russian Énglish Thesaurus and Domain Ontologies. Thesaurus-Based Technologies and Value-Added Servicies at University Information System RUSSIASlide2: University Information System RUSSIA (Russian inter-University Social Sciences Information and Analytical consortium) www.cir.ru Prepared for Seminar at Finish Social Science Data Archive, Helsinki, March 9 - 10, 2006 by Tatyana N. Yudina, Leading researcher, Ph.D. (history) Moscow State University Research Computing Center Anna Bogomolova, Assistant professor, Ph.D. (economics) Moscow State University Economic faculty yudina@mail.cir.ru; bogo@mail.cir.ru Moscow State University Research Computing Center NCO Center for Information ResearchSlide6: University Information System RUSSIA Collections 2 000,000/ 20 Gb (www.cir.ru)UIS RUSSIA: UIS RUSSIA Collections of documents in English - OECD Health Data, - RePEc (Research Papers in Economics, www.repec.org) abstracts and full texts, - Council of Europe documents, - European Court for Human Rights archive, - Publications of Kennan Institute, USA. Slide8: NLP technology in UIS RUSSIA Automatic Linguistic Text Processing/Linguistic Processors *.POD *.OUT *.LEM *.HDR ORACLE WEB www.cir.ru (Apache; OAS) Administrator. holdings convertors *.HTM Slide9: Automatic Linguistic Text Processing (ALTP) is a UIS RUSSIA team know how. ALTP is adjusted to content-based process and integrate all main types of business prose text corpora (documents and statistics)– government publications, parliament chambers daily records, think tanks reports, scientific journals, mass media, public opinion polls. Content-based processing includes: -- Conceptual Indexing, -- Coherent Summarization, -- Text Categorisation. Thesaurus: ThesaurusSlide11: Sociopolitical Thesaurus 29,000 concepts, 75,000 terms 110,000 conceptual relations constructed specially as a tool for automatic text processing; contains terms from economic, financial, political, military, social,legislative and cultural domains; regularly tested during automatic text processing. set of relations is adjusted to serve content-based search, navigation and query refinement. General Structure of Thesaurus: General Structure of ThesaurusEnglish-Russian Sociopolitical Thesaurus: English-Russian Sociopolitical Thesaurus Hierarchical conceptual net of 65 thousand English terms Manual work: Use of general and special domain English-Russian and Russian-English dictionaries, Study of conventional American and British dictionaries and thesauri, Cross-checking of translations. Internet search checking. Thesaurus terminology in social and political domain: Thesaurus terminology in social and political domainAdding languages to Thesaurus: Adding languages to Thesaurus It is a challenge to develop multilingual Sociopolitical thesaurus, to describe terms of social and political domains from different languages and arrange in a multilingual hierarchical net. A project under discussion – to add Tatar language to the bilingual thesaurus. Tatars is the second nation in Russia.Term Extraction for Russian Official Documents (RF Government Regulation N604 26.06.1995): Term Extraction for Russian Official Documents (RF Government Regulation N604 26.06.1995)Slide17: Thematic Lines of Thesaurus Terms (RF Government Regulation N604 26.06.1995)Slide18: Network of Thematic Nodes (RF Government Regulation N604 26.06.1995)Slide19: Network of Thematic Nodes in English (RF Government Regulation N604 26.06.1995)Slide20: Structure of Thematic Representation Main Thematic Nodes Specific Thematic NodesStructural Thematic Summary(RF Government Regulation N604 26.06.1995): Structural Thematic Summary (RF Government Regulation N604 26.06.1995)THESAURUS for Information Retrievalin Sociopolitical Domain: THESAURUS for Information Retrieval in Sociopolitical Domain Thesaurus provides for query refinement - reformulation - expansion; Terminology of Thesaurus covers 95-98% of business prose - terms of Russian government publications, academic papers and mass media texts from 1991; Thesaurus is a main element of ALTP/automatic linguistic text processing technology at UIS RUSSIA. Query Refinement: Query RefinementNavigation in Thesaurus : Navigation in Thesaurus Bilingual Information Retrieval: Bilingual Information RetrievalDocument content representation in two languagesscheme: Document content representation in two languages scheme Document in Russian Document In English Content representation In English Content representation in Russian Content representation of a documentDocuments content representation in two languagesexample: Documents content representation in two languages example Document in Russian Document in English Content representation In English Content representation In Russian Content representation of a document Bilingual Search in UIS RUSSIA: Bilingual Search in UIS RUSSIA Slide33: www.cir.ru/is4/ Text Categorization: Text CategorizationExpert-made classification: Expert-made classification 60% coincidence High accuracy Not high relevanceClassification in automatic mode: Classification in automatic mode Text Categorization Using Thematic Representation: Text Categorization Using Thematic Representation Systems of Subject Headings: UIS RUSSIA system of subject headings, RF Central Election Committee Legal Subject Headings (450 items; 4 levels), 80 Top Terms of Legislative Indexing Vocabulary (LIV) Congressional Research Service of the US Congress.English-Russian Sociopolitical Thesaurus: new applications: English-Russian Sociopolitical Thesaurus: new applications Automatic text categorization of research papers in economics exploiting JEL subject headings (700 categories), Automatic text processing of statistical tables, Automatic text processing of European organizations documents (European Court of Human Rights, Council of Europe, European Union). System of Subject Headings for Budget Data: System of Subject Headings for Budget Data 87 hierarchic categories First level categories are: Macroeconomic Indicators Budget Revenues and Expenditures Tax Concessions Budget Deficit/Surplus State and Municipal Debt Budget Process Budget Federalism Extra-Budgetary Funds State Authorities Fiscal MisconductForeign Exchange rate: Foreign Exchange rate 1. ((US Dollar OR Euro Currency OR Ruble) AND Foreign Exchange Rate) OR 2. ((US Dollar OR Euro Currency) AND Ruble AND Economic Development (Economic Crisis; Economic Forecasting; Economic Indicator; Economic Growth; Economic Laws; Economic Situation))Slide44: Thank you ! Tatyana N. Yudina, Leading researcher, Ph.D. (history) Moscow State University Research Computing Center yudina@mail.cir.ru Anna Bogomolova, Assistant professor, Ph.D. (economics) Moscow State University Economic faculty bogo@mail.cir.ru You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
March2006 english final Javier Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 58 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 26, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Bilingual Russian Énglish Thesaurus and Domain Ontologies. Thesaurus-Based Technologies and Value-Added Servicies at University Information System RUSSIASlide2: University Information System RUSSIA (Russian inter-University Social Sciences Information and Analytical consortium) www.cir.ru Prepared for Seminar at Finish Social Science Data Archive, Helsinki, March 9 - 10, 2006 by Tatyana N. Yudina, Leading researcher, Ph.D. (history) Moscow State University Research Computing Center Anna Bogomolova, Assistant professor, Ph.D. (economics) Moscow State University Economic faculty yudina@mail.cir.ru; bogo@mail.cir.ru Moscow State University Research Computing Center NCO Center for Information ResearchSlide6: University Information System RUSSIA Collections 2 000,000/ 20 Gb (www.cir.ru)UIS RUSSIA: UIS RUSSIA Collections of documents in English - OECD Health Data, - RePEc (Research Papers in Economics, www.repec.org) abstracts and full texts, - Council of Europe documents, - European Court for Human Rights archive, - Publications of Kennan Institute, USA. Slide8: NLP technology in UIS RUSSIA Automatic Linguistic Text Processing/Linguistic Processors *.POD *.OUT *.LEM *.HDR ORACLE WEB www.cir.ru (Apache; OAS) Administrator. holdings convertors *.HTM Slide9: Automatic Linguistic Text Processing (ALTP) is a UIS RUSSIA team know how. ALTP is adjusted to content-based process and integrate all main types of business prose text corpora (documents and statistics)– government publications, parliament chambers daily records, think tanks reports, scientific journals, mass media, public opinion polls. Content-based processing includes: -- Conceptual Indexing, -- Coherent Summarization, -- Text Categorisation. Thesaurus: ThesaurusSlide11: Sociopolitical Thesaurus 29,000 concepts, 75,000 terms 110,000 conceptual relations constructed specially as a tool for automatic text processing; contains terms from economic, financial, political, military, social,legislative and cultural domains; regularly tested during automatic text processing. set of relations is adjusted to serve content-based search, navigation and query refinement. General Structure of Thesaurus: General Structure of ThesaurusEnglish-Russian Sociopolitical Thesaurus: English-Russian Sociopolitical Thesaurus Hierarchical conceptual net of 65 thousand English terms Manual work: Use of general and special domain English-Russian and Russian-English dictionaries, Study of conventional American and British dictionaries and thesauri, Cross-checking of translations. Internet search checking. Thesaurus terminology in social and political domain: Thesaurus terminology in social and political domainAdding languages to Thesaurus: Adding languages to Thesaurus It is a challenge to develop multilingual Sociopolitical thesaurus, to describe terms of social and political domains from different languages and arrange in a multilingual hierarchical net. A project under discussion – to add Tatar language to the bilingual thesaurus. Tatars is the second nation in Russia.Term Extraction for Russian Official Documents (RF Government Regulation N604 26.06.1995): Term Extraction for Russian Official Documents (RF Government Regulation N604 26.06.1995)Slide17: Thematic Lines of Thesaurus Terms (RF Government Regulation N604 26.06.1995)Slide18: Network of Thematic Nodes (RF Government Regulation N604 26.06.1995)Slide19: Network of Thematic Nodes in English (RF Government Regulation N604 26.06.1995)Slide20: Structure of Thematic Representation Main Thematic Nodes Specific Thematic NodesStructural Thematic Summary(RF Government Regulation N604 26.06.1995): Structural Thematic Summary (RF Government Regulation N604 26.06.1995)THESAURUS for Information Retrievalin Sociopolitical Domain: THESAURUS for Information Retrieval in Sociopolitical Domain Thesaurus provides for query refinement - reformulation - expansion; Terminology of Thesaurus covers 95-98% of business prose - terms of Russian government publications, academic papers and mass media texts from 1991; Thesaurus is a main element of ALTP/automatic linguistic text processing technology at UIS RUSSIA. Query Refinement: Query RefinementNavigation in Thesaurus : Navigation in Thesaurus Bilingual Information Retrieval: Bilingual Information RetrievalDocument content representation in two languagesscheme: Document content representation in two languages scheme Document in Russian Document In English Content representation In English Content representation in Russian Content representation of a documentDocuments content representation in two languagesexample: Documents content representation in two languages example Document in Russian Document in English Content representation In English Content representation In Russian Content representation of a document Bilingual Search in UIS RUSSIA: Bilingual Search in UIS RUSSIA Slide33: www.cir.ru/is4/ Text Categorization: Text CategorizationExpert-made classification: Expert-made classification 60% coincidence High accuracy Not high relevanceClassification in automatic mode: Classification in automatic mode Text Categorization Using Thematic Representation: Text Categorization Using Thematic Representation Systems of Subject Headings: UIS RUSSIA system of subject headings, RF Central Election Committee Legal Subject Headings (450 items; 4 levels), 80 Top Terms of Legislative Indexing Vocabulary (LIV) Congressional Research Service of the US Congress.English-Russian Sociopolitical Thesaurus: new applications: English-Russian Sociopolitical Thesaurus: new applications Automatic text categorization of research papers in economics exploiting JEL subject headings (700 categories), Automatic text processing of statistical tables, Automatic text processing of European organizations documents (European Court of Human Rights, Council of Europe, European Union). System of Subject Headings for Budget Data: System of Subject Headings for Budget Data 87 hierarchic categories First level categories are: Macroeconomic Indicators Budget Revenues and Expenditures Tax Concessions Budget Deficit/Surplus State and Municipal Debt Budget Process Budget Federalism Extra-Budgetary Funds State Authorities Fiscal MisconductForeign Exchange rate: Foreign Exchange rate 1. ((US Dollar OR Euro Currency OR Ruble) AND Foreign Exchange Rate) OR 2. ((US Dollar OR Euro Currency) AND Ruble AND Economic Development (Economic Crisis; Economic Forecasting; Economic Indicator; Economic Growth; Economic Laws; Economic Situation))Slide44: Thank you ! Tatyana N. Yudina, Leading researcher, Ph.D. (history) Moscow State University Research Computing Center yudina@mail.cir.ru Anna Bogomolova, Assistant professor, Ph.D. (economics) Moscow State University Economic faculty bogo@mail.cir.ru