Share PowerPoint. Anywhere!

hayashi

Uploaded from authorPOINT Lite
Download as Download Not Available PPT
Presentation Description

No description available

Views: 18
Like it  ( Likes) Dislike it  ( Dislikes)
Added: January 15, 2008 This presentation is Public
Presentation Category :Education
Presentation StatisticsNew!
Views on authorSTREAM: 16 | Views from Embeds: 2
Presentation Transcript

Discovering and testing linguistic generalizations using interactive concordances : Discovering and testing linguistic generalizations using interactive concordances Larry Hayashi SIL Language Software Development larry_hayashi@sil.org 7500 W Camp Wisdom Road Dallas, TX 75236 972-708-7400 www.sil.org


Empirical Linguistics : Empirical Linguistics “If it happens once, you don't know anything. If it happens twice, it suggests further investigation. If it happens three or more times, then you have something to write about!”


concordance: n. 1. an alphabetical index of all the words in a text or corpus of texts, showing every contextual occurrence of a word. : concordance: n. 1. an alphabetical index of all the words in a text or corpus of texts, showing every contextual occurrence of a word. 2. an index of any recurring object or analysis in a text corpus showing every contextual occurrence of that object. part-of-speech phone lemma gloss morph syntactic phrase


Traditional concordances : Traditional concordances Printed as a separate volume Requires looking up the reference in the corpus


Traditional computational concordance tools : Traditional computational concordance tools User defines a query Software searches through corpus and generates a separate static text file


Using object-oriented and relational database technologies for concordances : Using object-oriented and relational database technologies for concordances Concordances are a view of the data instances themselves rather than copies in a separate file


Advantages of using relational or object oriented databases for concordances : Advantages of using relational or object oriented databases for concordances The capability to jump to the broader context of each data instance in the concordance Immediate update of concordances when data is edited or analyses changed Concordances can be interactive allowing the user to apply the repercussions of a hypothesis across a collection of data


Examples from SIL software : Examples from SIL software Morphological analysis and text interlinearization in LinguaLinks Phonetic analysis in Speech Analysis Tools


Textbook morphology problem : Textbook morphology problem 1 mailha reta 'Maila laughed.' 2 mailha rapa 'Maila cried.' 3 mija rapa 'The child cried.' 4 mija retle 'The child will laugh.' 5 arlam birhile 'The girl will be afraid.' 6 guma hoya 'Auntie shook.' 7 sila bhisa 'The jackal escaped.' 8 renjha retle 'The boy will laugh.' 9 sila birhia 'The jackal was afraid.' 10 mija lhomle 'The child will grow up.' 11 mija imang rahle 'The child will come home.‘ … more data


Hypothesis positing and testing : There is a separate morpheme for the concept of future. Find all occurrences of “will” in the free translations of the corpus. Posit a hypothesis Hypothesis positing and testing


LinguaLinks Morphology Explorer – concordance on concept of FUTURE : LinguaLinks Morphology Explorer – concordance on concept of FUTURE will


Discovering further generalizations : There is a separate morpheme for the concept of future. Find all occurrences of “will” in the free translations of the corpus. There is a morpheme with a sense of ‘future’ … Posit a hypothesis Discovering further generalizations


Testing new hypothesis : -le is a separate morpheme for the concept of future. Find all occurrences of “le” in the wordforms of the corpus. Posit a hypothesis Testing new hypothesis


Slide14 : From the LinguaLinks Morphology Explorer


Slide15 : From the LinguaLinks Morphology Explorer le


Verifying that –le is the morpheme for FUTURE : Verifying that –le is the morpheme for FUTURE


Interactive concordances : -le is a separate morpheme for the concept of future. Find all occurrences of “le” in the wordforms of the corpus. Posit a hypothesis Test hypothesis Verified! Apply hypothesis using interactive concordance Interactive concordances


Adding relevant lexical entries from concordance (prototype) : Adding relevant lexical entries from concordance (prototype)


Parsing wordforms from concordance : Parsing wordforms from concordance


Interlinearizing text examples from concordance : Interlinearizing text examples from concordance


Object oriented modeling : Object oriented modeling Data model reflects “real” linguistic objects and the relationships between those objects Using the relationships between objects, concordances are available for FREE!


CELLAR : CELLAR Computing Environment for Linguistic, Literary and Anthropological Research


Text object model : Text object model


Text and Word analysis object model : Text and Word analysis object model


Questions about Tuwali linkers : Questions about Tuwali linkers What are the syntactic characteristics of linkers? Are there any other words that I have identified as linker?


Following backreferences from word category to occurrences in text. : di Following backreferences from word category to occurrences in text.


Tuwali “linkers” concordance : Tuwali “linkers” concordance


Lexicon to Wordform to Text object model : Lexicon to Wordform to Text object model


LinguaLinks Lexical Entry with concordance of corpus examples.1 : LinguaLinks Lexical Entry with concordance of corpus examples.1


Double-click on Att.Example goes to broader context : Double-click on Att.Example goes to broader context


SIL Speech Analysis Tools : SIL Speech Analysis Tools


Speech Manager database view : Speech Manager database view


Consonant Chart: list of consonant phones : Consonant Chart: list of consonant phones


Concordance on phone [s] : Concordance on phone [s]


Launch into sound file : Launch into sound file


Look at more context in sound file : Look at more context in sound file


Change a transcription : Change a transcription


View updated data in concordance : View updated data in concordance


Development process for Fieldworks : Development process for Fieldworks Model the linguistic objects Create an XML representation for each object class Run the XML files through a code generator to generate the database Build apps on top of the database


Interactive concordances in the linguistics classroom : Interactive concordances in the linguistics classroom Students learn good empirical methodology Students empirically test their hypotheses against the corpus data rather than their intuition Use of the data model in field methods class reinforces the linguistic concepts students are learning


Future SIL Software : Future SIL Software FieldWorks – implements CELLAR 2. Much faster with an easier to use interface. Stealth-to-wealth analysis tools Speech Manager 2


Bibliography : Bibliography Barlow, Michael. Web site: Corpus Linguistics. http://www.ruf.rice.edu/~barlow/corpus.html. Includes a list of various text corpora available for research as well as a list of concordance tools. Simons, Gary F. 1998. The nature of linguistic data and the requirements of a computing environment for linguistic research. In Using Computers in Linguistics: a practical guide, John M. Lawler and Helen Aristar Dry (eds.). London and New York: Routledge. Pages 10-25. Simons, Gary F. 1994. Conceptual modeling versus visual modeling: a technological key to building consensus. SIL. http://www.sil.org/cellar/ach94/ach94.html.


Web resources : Web resources SIL Computing: http://www.sil.org/computing/ SIL Speech Tools: http://www.sil.org/computing/speechtools/ LinguaLinks: http://www.sil.org/lingualinks/


Further resources: : Further resources: Object oriented modeling tutorial FieldWorks object models LinguaLinks demonstration movies (Quicktime format) Speech Tools (Speech Analyzer and Speech Manager software)