Study of Natural Language Processing Issues and Case Studies- Him


Presentation Description

Its the ppt of paper presented by me in a NATIONAL CONFERENCE at Corporate Institute Bhopal.


Presentation Transcript

PowerPoint Presentation:

Authors(s)- Himanshu Paliwal, Ashutosh Yadav National Seminar On “Recent Advances in Communication and Signal Processing” RACSP-2013 Study of Natural Language Processing Issues and Application Areas 1


Contents Introduction to Natural Language Processing Formal Language Linguistics and language processing Steps of natural language processing Terms related to linguistics Existing Morphological Analyzer Our Approach Studying Hindi Derivations Derivational Rules Algorithm for Derivational Analysis Applications Challenges in implementation Case Study 2


Communication…?? Written Oral 3


Introduction A natural language is a language which spoken by people. Essentials of a language are alphabets, words, sentences and most important “GRAMMAR”. Natural language processing is subfield of Artificial Intelligence which enables computers to understand and generate natural languages. 4

Work of NLP system:

Work of NLP system Fig: Natural Processing System Computer Natural Language Natural Language Natural Language Understanding Natural Language Generation 1 2 5

Level of Linguistics:

Level of Linguistics 6

Steps of Natural Language Processing (1):

Steps of Natural Language Processing (1) Morphological and lexical analysis Morphology is the identification, description and analysis of word structure. Lexicon is the vocabulary of any language. Syntactic Analysis Analysis of words to check the grammatical structure Ex: “rose beautiful a is garden in” Semantic Analysis Analysis of words to check their meaning Ex: “get out” or “colourless green ideas” 7

Steps of Natural Language Processing (2):

Steps of Natural Language Processing (2) Discourse Integration Its about analysis of dependencies Ex: “its poisonous” “drink water” “cyanide is a chemical” Pragmatic analysis Extracting knowledge from commonsense or gestures. Ex: “You know the killer” is different from “You know the killer?” 8

Terms related to linguistic analysis (1):

Terms related to linguistic analysis (1) Phones – Acoustic pattern [ R OLLE R ] Phonetics – Classifies phonemes Phonology –Tells how phonemes are grouped Strings English alphabets Ex: { a,b,c,d,e,f,g,h,i,j,,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z } Lexicon Ex: {“pig” N, V, ADJ} Words: Words like bear , car , house are different from Words like run , sleep , think and are different from Words like in , under , about . 9

Terms related to linguistic analysis (2):

Terms related to linguistic analysis (2) Determiner –Indicates kind of reference Ex: The boy, a bus, out car, these children, both hospitals Morphology –Analysis of words into morphemes and morphemes into words Morphemes –Smallest meaningful word Ex: “ un break able Syntax Semantics Pragmatics 10

Existing Morphological Analyzer:

Existing Morphological Analyzer The morphological analyzer developed by Vishal and Gurpreet stores all the commonly used word forms for all Hindi root words in its database. The morph analyzer developed by Niraj and Robert extracts a set of suffix replacement rules from a corpus and a dictionary. Our derivational analyzer is an extension of an existing inflectional morphological analyzer developed at IIIT Hyderabad ( Bharati Akshar et al, 1995) 11

Our Approach (1):

Our Approach (1) Studying Hindi Derivations First of all we conducted study of nouns. Ex: Word maxaxagAra (helper). This word is derived from the word maxaxa ( maxaxagAra = maxaxa (help) + gAra ) gunAhagAra (criminal) (derived from gunAha (crime)) prove that gAra is a derivational suffix 12

Our Approach (2):

Our Approach (2) Algorithm for Derivational Analysis 13


Examples bAgabAnoM (gardeners) bAgabAnoM is a noun, plural in number Information (category, gender) of the root word ( bAga (garden)) kirAexAroM (Tenants) normal-form of the input word kirAexAroM is kirAexAra Rule: noun = noun/ adj + xAra the root word is kirAe because kirAexAra = kirAe + xAra . ppppwA (invalid word) Is invalid word because the particular word is not present in the WIKI. 14


Applications Human Machine Interaction systems Routing Fusion 15

Challenges in implementation:

Challenges in implementation Reading text Hearing Speech Writing Text Translation Lack of Leverage Training of researchers 16

Case Study:

Case Study Design of Multilanguage translator Issues- Parsing (lexicon and grammar) Fig: Translation Support System Fig: Dictionary – Data Sets(Corpus) 17


References [1]. e/838/816 (as visited on 7/10/2009). [2] as visited on 8/10/2009 [3] px as visited on 8/10/2009 [4] iiithyd.pdf as visited on 8/10/2009 [5] (as visited on9 /10/2009) [6] K. Ryan, “The role of natural language in requirements engineering”, IEEE International Symposium on Requirements Engineering, IEEE Computer Society Press, 1992, pp. 240–242, (as visited on 9/10/2009). [7] Alon Itai , Erel Segal, 2003.A Corpus Based Morphological Analyzer for Unvocalized Modern Hebrew.Department of Computer Science Technion Israel Institute of Technology, Haifa, Israel. [8 Anandan . P, Ranjani Parthasarathy , Geetha T.V.2002. Morphological Analyzer for Tamil, ICON 2002, RCILTS-Tamil, Anna University, India. [9] Taku kudo ,2005. CRF++:Yet Another CRFToolkit . , [10] K. Rajan . Corpus analysis and tagging for Tamil. In: Proceeding of symposium on Translation support system STRANS-2002 [11] Haas S. Tools for Natural language processing. 2011. (accessed 1 Jun 2011). [12Vishal Goyal , Gurpreet Singh Lehal . 2008. Hindi Morphological Analyzer and Generator, pp. 1156–1159. IEEE Computer Society Press, California, USA. [13] Niraj Aswani , Robert Gaizauskas . 2010. Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages. In Proceedings of LREC. 18

Thank You:

Thank You 19