An overview of TEI tagging or, Anyone for pizza?: An overview of TEI tagging or, Anyone for pizza?
Basic concepts: Basic concepts The TEI is a modular system, built like a Chicago pizza
Each module defines specific elements and attributes
Elements are classified structurally and semantically
TEI core modules: TEI core modules Infrastructure
defines all named element classes and macros
Core
the TEI header
elements “common to all kinds of text”
Structure
“book-like” structures of prose, verse, drama
Optional modules: Optional modules Alternative structures
eg transcribed speech, dictionaries ...
Specialist applications
linking and alignment; analysis; feature structures; certainty; physical transcription; textual criticism, names and dates; language corpora; manuscript description....
Caution! Under Construction!
There is NO SUCH THING as “the TEI dtd”: There is NO SUCH THING as “the TEI dtd” TEI Lite (http://www.tei-c.org/Lite/)
is our guess at what most people want, most of the time
realistic for existing texts, and for new document production, e.g. TEI technical documentation
At P5 the task of making your own TEI schema is much simplified
Basic structure(s): Basic structure(s) Every TEI-conformant document comprises a header followed by (at least one) text
the header contains:
mandatory file description
optional encoding, profile and revision descriptions
the header is essential for:
bibliographic control and identification
resource documentation and processing
Structure of a TEI text: Structure of a TEI text A text may be unitary or composite
a unitary text contains
front matter
back matter
a body
in a composite text, the body is a group of texts (or nested groups)
TEI basic structure: TEI basic structure s
A text usually has divisions : A text usually has divisions generic, hierarchic subdivisions
vanilla or numbered
type attribute
associated head and trailer elements from the divtop class
for example...: for example...
TEI global attributes : TEI global attributes Defined in the core module
id for unique identification (to become xml:id)
n for (non-unique) name or number
rend for rendition (appearance)
lang for language (to become xml:lang)
Defined in the linking module
corresp, synch, ana for specific association types
next, prev for aggregating fragmented elements
Character Encoding Recommendations: Character Encoding Recommendations non-normative
extend, using standard entity sets or transliteration
document transliteration scheme with formal Writing System Declaration
Text components (prose base): Text components (prose base) What are divisions composed of?
prose is mostly paragraphs ()
verse is mostly lines (), sometimes in hierarchic groups ()
drama is mostly speeches () containing or and interspersed with stage directions ()
These may be mixed, and may also appear directly within undivided texts.
Verse: an example: Verse: an example
Drama: an example: Drama: an example
Texts are not just words...: Texts are not just words... … but probably only people know that
an encoding may claim to capture
just visual salience,
just its assumed causes
both
encoding makes explicit one (or more) sets of interpretations
For example...: For example... And this Indenture further witnesseth that the said Walter Shandy, merchant, in consideration of the said intended marriage...
Slide18: And this Indenture further witnesseth that the said Walter Shandy, merchant, in consideration of the said intended marriage...
Who does the work?: Who does the work? TEI scheme allows for close reading -- and the reverse
can tag very detailed features of discourse function
can normalise or simplify (e.g. dates numbers, names)
… or leave well alone
Core phrase level elements include...: Core phrase level elements include... phrases that are conventionally typographically distinct
“data-like” (names, numbers, dates, times, addresses)
editorial intervention (corrections, regularizations, additions, omissions ...)
cross references and links
for example...: for example... Of writing lives in general,and particularly of Pamela , with a word by the bye of Colley Cibber and others.
It is a trite but true observation, that examples work more forcibly on the mind than precepts.…
Mr. Joseph Andrews, the hero of our ensuing history, was esteemed to be ...
Direct speech: Direct speech Use the who attribute to show speakers
Speeches can be nested in other speeches
Foreign language phrases : Foreign language phrases The xml:lang attribute may be attached to any element
Use if nothing else is available
Use ISO 639-2 code to identify language
Names and other referring strings: Names and other referring strings The (referring string) element is used for any kind of name or reference
Correction and Regularization: Correction and Regularization marks a correction
marks a (deliberate) non-correction
and for normalization (or the reverse)
use singly, or within if you want both
A table of green feelds: A table of green feelds
Omissions, Deletions, Additions: Omissions, Deletions, Additions omission by transcriber
and cancellation or addition in source
used to group addition and deletion together
insertion by editor
material uncertain because illegible
physical damage to text carrier
The multiple hierarchy problem: The multiple hierarchy problem SGML allows only one hierarchy at a time
Is a document
chapter-paragraph-phrase
gathering-page-leaf
or both?
discontinuous segments
links and milestones
Boundary markers: Boundary markers page, column, and line breaks (, , )
generic
Some chunks are also phrases: Some chunks are also phrases lists of all kinds
notes (authorial or editorial)
pictures or figures
formulae
tables
bibliographic descriptions
Lists: Lists use for lists of any kind (use type attribute to distinguish)
use in two-column lists as alternative to n attribute
may be nested as necessary
for example...: for example...
Figures and graphics: Figures and graphics The presence of a graphic is indicated by the element
The title of the graphic is tagged as a
A description of the graphic may be supplied (as a ) for use by software unable to render the graphic
The graphic itself is specified by an external link (URL)
for example... : for example...
Tables: Tables a element contains s of s
spanning is indicated by rows and cols attributes
role attribute indicates whether row or column holds data or a label
embedded tables are permitted
for example...: for example...
Bibliography: Bibliography Use simple with optional subcomponents:
(for any kind of responsibility) or , , etc.
with optional level attribute
groups publication details
adds page references etc.
Use for list of references
for example...: for example...
Notes : Notes Use for notes of any kind (editorial or authorial)
if in-line, use place attribute to specify location
if out of line, either
use target attribute to specify attachment point
or mark attachment point as a
for example...: for example...
The Spoken texts module: The Spoken texts module components :
contextual information in header
facilities for synchronization and timing
Features of speech: Features of speech
Utterances: Utterances Basic unit of discourse, corresponding to speaker turns
Optionally grouped into higher-level divisions (s), e.g. to mark discourse function
Linked by who attribute to description in header
Vocals and events: Vocals and events Empty elements are used to mark paralinguistic phenomena
Voice quality and prosody: Voice quality and prosody The element is used to mark changes in voice quality
Other prosodic features may be marked using specific kinds of or entity refs
Another example: Another example
Participant Description: Participant Description
Setting Description: Setting Description eg from P2
Timing: Timing Pausing
use element
Duration
use dur attribute
Overlap
use trans attribute
Overlap: Overlap
Not covered here...: Not covered here... specialised front and back matter
dictionaries and terminology
analytic tagging
segmentation
interpretations
linking
the header
tags for documentation