lrec metadata

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Customizing the IMDI metadata schema for endangered languages: 

Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Introduction: 

Introduction IMDI: International Standards for Language Engineering Metadata Initiative DOBES: Volkswagen Foundation’s Documentation of Endangered Languages initiative AILLA: the Archive of the Indigenous Languages of Latin America

Types of resources: 

Types of resources Audio and video recordings in various digital formats Annotation text files, e.g. transcriptions and translations Standalone texts, e.g. dictionaries, poetry Wide range of genres: from verbal art to scholarly analyses

Bundles of resources: 

Bundles of resources Session (IMDI, 2001): resources resulting from a linguistic elicitation session - recordings and annotations. Only models one kind of resource production - a recording session. Collections will include a greater variety of resources, in sets of related materials.

Types of bundles: 

Types of bundles Canonical bundle: the original session. A digitized recording, in different formats, and some textual annotation files, also in different formats. Minimal bundle: a single file. Examples: dictionary, poem, recording of uninterpretable chants. Meta-bundle: a bundle containing other bundles. Example: a book about a set of annotated recordings.

Bundle elements: 

Bundle elements Current: Name of bundle Date and place of production Proposed: Resource relations Date archived Last modified

Major subschemas: 

Major subschemas Project Collector Content Participants Resources References

The Content Subschema: 

The Content Subschema Genre is the top-level category: Interaction: conversation, interview … Explanation: description, recipe … Performance: narrative, poem, oratory … Teaching: primer, textbook … Analysis: grammar, dictionary …

Other Content categories: 

Other Content categories Modality: speech, writing, gesture Communication context: Interactivity Planning Involvement Languages Task Description Keys

AILLA’s Content Keys: 

AILLA’s Content Keys Register: a characterization of how the discourse reflects the social context. Example: honorific speech Style: about poetic and stylistic effects. Examples: parallelism, metered verse.

The Project subschema: 

The Project subschema Current elements: Name: a nickname or acronym Title: official title ID: a unique identifier Contact information Proposed element: Funder: name of funding organization

The Collector subschema: 

The Collector subschema AILLA renames this Depositor, since this is the individual we have to keep track of (e.g. for Level 3 access permission). When the Depositor is not also the Collector, Collector can be listed under Participants.

The Participants subschema: 

The Participants subschema Type: functional role, e.g. creator Role: family relationship Name/Full name Language(s) Ethnic group, age, sex: Education Anonymous: True if participant’s Full name is reserved; False otherwise

AILLA additions to Participants: 

AILLA additions to Participants Origin: Place (country, region, etc) of origin of the creator of the primary resource in the bundle (e.g. the speaker whose voice is recorded). Occupation: Can be relevant in assessing accuracy of some kinds of data.

The Resources subschema: 

The Resources subschema Resources contains information about formats and provenance of files in a bundle. Media Files: audio, video, etc. Annotation Files: text files. Proposal: call them all Media Files, to reduce redundancy in the database. (All have URL, size, etc. elements.)

Text resources: 

Text resources Current elements: Type: type of annotation, e.g. phonetic transcription. Content encoding: annotation encoding scheme, e.g. EUROTYP. Character encoding: character set(s) used in a text file.

Text resources 2: 

Text resources 2 Proposed elements: Transcription type Translation (aka Glossing) type Software: used to produce transcriptions, translations, other annotations (e.g. Shoebox) Describe Annotator in Participants (along with Translator, etc.)

Proposed subschema: 

Proposed subschema Place: composed of several elements: Continent Country Region Subregion (address) Repeated at least twice, in Bundle and in Participants (Origin). Might also be useful in the Language subschema.

Conclusion: 

Conclusion IMDI schema is a flexible tool. Customization through Key/Value pairs allows local modifications. Most of the proposed changes are terminological, moving from the DOBES in-house terminology to more general usage.