ilash seminar final

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

What’s in a link between an image and a text?: 

What’s in a link between an image and a text? Dr Andrew Salway ***FINAL VERSION*** ILASH Seminar, 3 May 2002

Overview: 

Overview Representing the semantic content of aesthetic and scientific visual artefacts in digital libraries Extracting information from collateral texts produced by experts who analyse visual information Integrating multimedia information in computational systems

Slide3: 

Scientific and Aesthetic Disciplines Multimedia Databases Content Technologies Information Access in Digital Libraries Machine-level representation of complex visual artefacts Image and Video Data Collateral Text Corpus Experts Analysing Visual Information Query-retrieval, browsing, summarisation Image and Video Processing Natural Language Processing

Research Questions: 

Research Questions How do experts put images into words? Special Languages Selection and organisation of information How to instantiate the link between visual and textual information in multimedia computing systems?

Analysing Aesthetic Images: 

Analysing Aesthetic Images Fine Art (Panofsky 1939): pre-iconographic; iconographic; iconological Film (Metz 1974): physical; cinematic; diegetic; connotative; sub-textual Dance (Adshead-Lansdale 1988): description of movements; discernment of form; interpretation; evaluation

Fine Art: collateral texts: 

Fine Art: collateral texts [Painting and caption, other texts Turner, Joseph Mallord William 1775-1851 The Goddess of Discord Choosing the Apple of Contention in the Garden of the Hesperides (exhibited 1806) Discord chooses the apple that will eventually be awarded by Paris to the goddess Aphrodite, leading to the Trojan War. First seen in the British Institution, this picture was shown again in Turner’s Gallery in 1808 when its classical grandeur, based on the work of Nicolas Poussin, must have formed a striking contrast to the English pastoral landscapes also shown that year. Its own background is based on Turner’s experience of the Alps in 1802.

Fine Art: corpus analysis: 

Fine Art: corpus analysis Corpus: 804,939 words from Tate WWW-site (691,121 from painting captions and 113,818 from artist biographies) Specialist terminology extracted, e.g.: art movements (surrealism, cubism, pop art, abstract expressionism), techniques (brushwork, mezzotint), types of work (watercolour, pencil sketch), features of a painting (monochromatic, naturalistic) Contrast between abstract art (colour, form, technique) and figurative art (mood, feeling, representation)

Cues for content: 

Cues for content depict (295 occurrences) and convey (119 occurrences) this painting depicts a glass, two pears and a box this work depicts a group struggling in a wind this composition conveys the claustrophobia of the interior of an omnibus an expressive use of colour and shape to convey the subject’s mood depict / convey (in earlier 305,913 word corpus): pre-iconograpical (56% / 0%); iconographical (41% / 9%); iconological (3% / 91%)

Cues to chart the history of art: 

Cues to chart the history of art influence (403 occurrences) and inspire (442 occurrences): about 80% passive his paintings of the Thames were influenced by Whistler where he was influenced by expressionism this picture was inspired by a performance of Shakespeare’s play Macbeth Severini was inspired by modern machinery 50% ARTIST influenced by ARTIST; 22% ARTIST influenced by MOVEMENT 31% WORK inspired by PERSON / ENVIRONMENT / WORLD / WORK

Crime Scenes: 

Crime Scenes I can see what appears to be a male laying in the prone position on the floor. He is wearing a maroon striped shirt with white collar and cuffs, blue jeans, and has a pair of left and right training shoes which have become slightly dis-extended from the foot. There appears to be a green tie down by his right hand and I can see a possible footwear impression in blood on his right hand. Surrounding the body there are droplets of blood, footwear impression in blood and several pieces of broken glass and bottles.

Analysing moving images: Dance: 

Analysing moving images: Dance Swan Lake, Matthew Bourne (1995).

Slide12: 

Text Types and their Relationship to the Moving Image

Slide13: 

Text Types and their Relationship to the Moving Image

Eliciting spoken commentaries: 

Eliciting spoken commentaries Five dance experts each asked to ‘Describe’ then to ‘Interpret’ five dance sequences as they watched them (20 minutes in total)  11,300 words of description 9,754 words of interpretation Appears to be systematic contrasts between description and interpretation… [Some resonance with literature on Protocol Analysis (Ericsson and Simon 1993) and studies of language production like Chafe (1980).]

Descriptions: 

Descriptions Utterances: Single words in rapid sequence to identify movements Spatio-temporal details and relationships between dancers Most frequent open-class words referred literally to dancers, their movements and space: woman, arm, leg, turn, jump, spin, arabesque, pirouette, left, right Descriptions clustered on a Kohonen Map according to both dance and to expert Cohesion by reference and lexical cohesion; potentially useful for dance segmentation

Interpretations: 

Interpretations Most frequent open-class words referred non-literally to dancers, their movements and themes of the dance: swan, prince, wing, flight, ethereality Longer utterances either referring to larger video intervals, or linking literal descriptions to interpreted ‘meaning’, conjoined by seems, as if, like, a sense of, suggest, appears to be The stretching of the neck, like a swan Aerial steps which could suggest flight Moving faster as if something is driving him

KAB: Knowledge-rich video Annotation and Browsing: 

KAB: Knowledge-rich video Annotation and Browsing

KAB: Knowledge-rich video Annotation and Browsing: 

KAB: Knowledge-rich video Annotation and Browsing Keyword indexing of video data using time-coded commentaries; annotations can be layered User can query for intervals or browse between moving images and texts; and can add own term lists and annotations OO design; implemented in Java with JMF Limitations: (i) only one kind of collateral text; (ii) temporal association of text and video interval; (iii) keyword-based representation of content.

Audio Description: 

Audio Description Audio description enhances TV and films for visually impaired viewers and is scripted before it is recorded: in effect the story told be the moving image is retold in words Describers follow guidelines which restricts the language they use, i.e. normally the present tense, simple sentences and few pronominal references Describers are encouraged not to make inferences on behalf of their audience (i.e. little / no interpretation) We are interested in applying information extraction technology to generate machine-level representations of video content from audio description scripts: TIWO (Television in Words), EPSRC GR/R67194/01

Audio Description Script: 

Audio Description Script [11.43] Hanna passes Jan some banknotes. [11.55] Laughing, Jan falls back into her seat as the jeep overtakes the line of the lorries. [12.01] An explosion on the road ahead. [12.08] The jeep has hit a mine. [12.09] Hanna jumps from the lorry. [12.20] Desperately she runs towards the mangled jeep. [12.27] Soldiers try to stop her. [12.31] She struggles with the soldier who grabs hold of her firmly. [12.35] He lifts her bodily from the ground, holding her tightly in his arms. (NB. Some ‘cue’ information removed)

Corpus Analysis: 

Corpus Analysis Audio Description Corpus: 70,856 words (12 movies, various genres) Temporal Information, maybe use to: align fragment with interval (aspectual verbs); recover event-event relations (simultaneity and cause, using as); recover time period(s) of film – dates / (?costumes and props?) Other kinds of information: 50 most frequent verbs – 84% material processes

Exploiting Collateral Text: 

Exploiting Collateral Text Experts in a number of fields (both aesthetic and scientific) appear to use special languages to articulate the semantic content of visual information Text types select and organise information about still and moving images differently A distinction between Description and Interpretation, which is apparent in theoretical frameworks, seems to be realised in experts’ texts The use of collateral text in digital libraries requires the integration of multimedia information…

Integrating Multimedia Information: WHY?: 

Integrating Multimedia Information: WHY? Functionality for Digital Libraries: synchronised presentations; hypermedia browsing; multimedia corpora; cross-modal IR; information fusion; multimedia summarisation; information conversion.

Integrating Multimedia Information: HOW?: 

Integrating Multimedia Information: HOW? Image/Video-text data models Statistical Image and Text Features “Multimedia Thesaurus” Intermediate Representations ? The development of such systems may benefit from a computational framework in which to instantiate the image-text link ?

Issues for Instantiating the Image-Text Link: 

Issues for Instantiating the Image-Text Link 1…many relationships CAPTION BIOGRAPHY TEXTBOOK JOURNAL EXHIBITION CATALOGUE

Issues for Instantiating the Image-Text Link: 

Issues for Instantiating the Image-Text Link Text fragments and image regions “a young mother is strolling with her little girl dressed in white with a salmon-coloured sash … at the extreme right, appears a scandalously hieratic-looking couple”

Issues for Instantiating the Image-Text Link: 

Issues for Instantiating the Image-Text Link Text fragments and video intervals 3:51 four dancers stand in a ring 3:55 a female dancer enters the ring

Issues for Instantiating the Image-Text Link: 

Issues for Instantiating the Image-Text Link Choice of content representation scheme Keywords Propositions Spatio-Temporal Logics Causal Relationships ‘Mental States’, e.g. for films Mapping between levels of meaning, e.g. for interpretations Maintaining multiple viewpoints

Issues for Instantiating the Image-Text Link: 

Issues for Instantiating the Image-Text Link Typed Links? Intuitively an image may, for example, illustrate or a text, whilst a text may describe or explain an image May be a case for explicating the meaning of image-text links and considering precedents from the study of semantic networks (structural vs. assertional links) and from the development of hypertext systems (taxonomy of link types)

Closing Remarks: 

Closing Remarks Great potential to exploit collateral text in specialist digital libraries The development of multimedia information systems may benefit from a theoretical framework for instantiating the image-text link Potential for synergy between different disciplines concerned with this link: computational systems may help in understanding the relationship between vision, language and knowledge

What’s in a link between an image and a text?: 

What’s in a link between an image and a text? Dr Andrew Salway Department of Computing, University of Surrey ILASH Seminar, 3 May 2002

Semiotic-based Frameworks for Multimedia: 

Semiotic-based Frameworks for Multimedia Gonzalez Purchase Warner..

The Family of Images (Mitchell 1986, Iconology: image, text and ideology. Chicago Uni. Press): 

The Family of Images (Mitchell 1986, Iconology: image, text and ideology. Chicago Uni. Press) Image likeness, resemblance, similitude Optical mirrors projections Graphic pictures statues designs Perceptual sense data “species” appearances Mental dreams memories ideas fantasmata Verbal metaphors descriptions

Vision and Language: 

Vision and Language