Internationalisation R L issues

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 

Dr.B.MALLIKARJUN Central Institute of Indian Languages Mysore – 570006, INDIA mallikarjun@ciil.stpmy.soft.net Internationalization Regional and Linguistic Perspectives Summit on Internationalization

Outline of Presentation: 

Outline of Presentation Indian Multilingualism Language technology concerns Use of standards – CIIL experience

Slide3: 

Indian Multilingualism

Slide5: 

Census had 10,400 raw returns Rationalized into 1576 mother tongues Further rationalized into 216 mother tongues Grouped under 114 languages Mother tongues of less than 10000 speakers or not possible to identify on the basis of available linguistic information have gone under ‘others’. Multilingual India

Slide6: 

Linguistically India is made of many mini-Indias

Sharing of languages Indian multilingualism is unique: 

Sharing of languages Indian multilingualism is unique Massive number of people involved in the use of multilingualism. Bilingualism is often taken as a given fact. Acquired in these contexts from their early childhood. No need to go to school to learn to use two or more languages. Sponsored Bilingualism : English - Hindi institutional arrangement. Bilingualism is also used as a denominator of the movement of various populations from one region or province to another. Naturally evolved bilingualism coupled with bilingualism evolving through schooling has become a big language resource, and it is exploited mainly by the mass media for enhancing its reach across the population. Bilingualism figures are often used to make political claims and seek privileges in administration, education, mass communication, and other departments of public life in general. Multilingualism on increase 1961- 9.70% 1971- 13.04% 1981- 13.34% 1991- 19.44%. Major languages: 18.72% bilinguals, trilinguals 7.22% are Minor languages : 38.14% bilinguals, trilinguals 28%. English : 8% second language, 3.15% as third language Hindi : 6.15% second language 2.16% third language.

Multilingualism: 

Multilingualism

Sharing of Linguistic Features : 

Sharing of Linguistic Features India as a Linguistic Area (Emeneau 1956). Common linguistic features across language families. Coexistence and continuing interaction of the people who speak these languages on a day-to-day basis. People, who live in villages and towns that lie in the political boundaries of two or more linguistically re-organized states, may continue to use the same grammar of their own language with different vocabularies drawn from another language of the border to communicate among themselves and with the groups across the border. “Whatever may be the difference in the languages, they all belong to the same great family; similar laws regulate the idiom, construction, style, and various kinds of composition, which prevail in the dialects of the north and the south; when you describe one art of India, you have, in many respects, described the whole; the manners, the customs, and the habits of the people, with trifling variations, correspond from Cape Comorin to the Himalayas; and their superstition, in all its great lineaments, is exactly the same.” (Campbell 1839)

Sharing Scripts Scripts have no Language Borders : 

Sharing Scripts Scripts have no Language Borders Using the same script to write different languages. Devanagari script - Bodo, Dogri, Hindi, Maithili, Marathi, Nepali, Konkani, Rajasthani, Sanskrit etc., and many tribal and minor languages. Kannada script - Kannada, Kodava, Tulu, Banjari, Konkani, Sanskrit, etc. Different scripts to write the same language Sanskrit - written using the Devanagari, Kannada, Telugu, Tamil, Malayalam and many other scripts. Kashmiri - written using the Perso-Arabic, Sharada and Devanagari scripts. Sindhi Perso-Arabic and Devanagari scripts. Rabha - Assamese script in Assam.Roman script Meghalaya, Bengali script in West Bengal. Santali – Bengali in WB, Oriya in Orissa ; Evolved by the community-Ol Chiki. Bodo – Assamese in Assam, Bengali in WB, community in conflict for Devanagari or Roman. Normal convention one language one script. Pluralistic tradition of India depended upon the oral transmission. So many languages do not have script of their own. Indian languages - more than 14 scripts.

Language Clustering: 

Language Clustering First kind of clustering Scheduled/Modern/Major Indian languages and others Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sanskrit, Tamil, Telugu, and Urdu. Sindhi, Konkani, Manipuri and Nepali, Bodo, Santhali, Maithili and Dogri Claims of 33 more languages for inclusion are under consideration. This list is open-ended and has become a tool to bargain and gain benefits for the languages. Important language policy statement. Second kind of clustering is at the level of mother tongues into languages. No rationale to cluster the Indian languages into these categories, Are not normally treated on par with Non-Scheduled languages Preferential treatment, Considered first for any and almost every language development activity including facilities to absorb language technology initiatives of the government. Technology Development in Indian Languages (TDIL) did not, and under present circumstances would not percolate beyond these languages.

Governance: 

Governance Natural communication policies to suit their realities with a genuine understanding of inter-woven relations. Official Language Language(s) used in Administration Official Languages - 16 in different states and union territories : Assamese, Bengali, English Gujarati, Hindi, Kannada, Konkani, Malayalam, Nepali, Manipuri, Marathi, Oriya, Punjabi, Tamil, Telugu and Urdu. However, English remains the language of judiciary at the higher level.

Education: 

Education Policy suggests three languages in the schools: (i) Home Language/ Regional Language, (ii) English, and (iii) Hindi in non-Hindi speaking states and any other Modern Indian Language in Hindi speaking states. As one goes up in the ladder of education, the number of languages available for him to study and the medium of instruction become less. Though many languages are media of instruction at the lower level, only English is the medium of technical and management education. School languages - 41 Media of instruction - 19 Names

Number of school languages and medium of instruction: 

Number of school languages and medium of instruction

Mass Communication: 

Mass Communication Print Media: The print media in India got initiated in 1780. People’s choice of languages. No bar on starting newspapers in any language or dialect. No bar on any language to be written in any script. News papers and magazines 123 languages. Hindi -2507 Urdu-534 English - 407 Marathi - 395 Tamil - 395 Kannada - 364 Malayalam - 225 Telugu -180 Gujarati -159 Punjabi -107 Bengali - 103

Language Technology Concerns: 

Language Technology Concerns

Slide18: 

Source: NRS 2005 Reach of Media

Karnataka : An Example: 

Karnataka : An Example

GDP by Language 1975-2002: 

GDP by Language 1975-2002

GDP by Language 2003-2010 : 

GDP by Language 2003-2010

Context of discussion : 

Context of discussion Economic resources (per capita) of people to cross the digital divide The 2002, GDP by language Bengali 0.44% Gujarati 0.26% Hindi 2.14% Marathi 0.4% Tamil 0.34% Telugu 0.42% Urdu 0.28% English 29.94% World Bank, CIA Fact book Convergence of many related technologies, Speed of technology update so short and long term strategy to cope with the same is to be thought out right from the beginning.

DIGITAL DIVIDE: 

DIGITAL DIVIDE Computer penetration 7.5 per 1000 people. Internet subscription 0.4 percentage of population Internet reach about 1% percent of the total population. What people want in the digital world is not available in their languages. The government and the people are fast moving towards introducing English at the earliest level in education. Persistent and intense maintenance of the digital divide may result in more retrograde and disastrous steps than all other divides put together.

Empowerment of Languages: 

Empowerment of Languages IT as a tool for empowering Indian languages and their speakers Access to Information and communication technology in their own language is one of the ways to empower the people and enhance the vitality of a language. The language vitality - capacity of a language to live, grow, and develop - depends upon various factors. Some of these are: social status, demography, access to technology and institutional support.

Localization : 

Localization Is only a small part of this process of empowerment. Localization in case of Indian languages is not of single layer since a script is used to write many languages, it is multilayered one; for example once localization for Kannada and then for languages written in Kannada script like Tulu and Kodagu. Language localization and localization of culture specific aspects and standardization. Cultural localization will give a fillip to revival of some of the traditions relating to date, day, time and some other important formats.

Slide26: 

Building consensus for standards is not an easy task. Standards should capture and reflect the essence of languages. Adopt already standardized one or standardize. Standards are for emulations and should be guiding principles but not restrictions to halt the growth of languages. There has to be place for variation and it has to be honored. Systems should follow standards at the same time allow the user to choose variations As we are concerned with maintenance of biodiversity, we are to be concerned with maintenance linguistic diversity. Internationalization has to take place keeping end-user in mind rather than technology experts.

Slide27: 

Notion of Unicode has evolved from the context of a script for a language. But our context is different. All languages using a script have to be considered together for encoding. All languages using different scripts should have automatic transliteration. Retain the aesthetics of scripts used for Indian languages All languages are equal and deserve equal treatment including in absorption of technology. Terminology : hardware /software : transliterate/ translate/coin afresh Sorting order : the place of anuswaara, consonant clusters, Numerals – local or international Contemporary use of language. Contemporary conventions of the language.

Slide28: 

Use of Standards : CIIL Experience

Corpora in Indian Languages : 

Corpora in Indian Languages

Parallel Corpora - I: 

Parallel Corpora - I

Parallel Corpora - II: 

Parallel Corpora - II

Slide32: 

Search/Sorting on Web Standard Markup language for texts Texts of Indian Languages on Web Bengali Bodo Dogri Hindi Manipuri Maithili Nepali Automatic Transliteration tools for Indian Languages Language processing tools on Web

Slide33: 

Thank You