Content-based Multimedia Information Retrieval: Challenges & Opportunities: Content-based Multimedia Information Retrieval: Challenges & Opportunities Stefan Rüger et al
http://km.doc.ic.ac.uk
Content-based MM IR: Content-based MM IR Multimedia Information Retrieval
aims, applications and a retrieval example
Challenges
semantic gap
polysemy
the “multi” in multimedia
Video Search and Summarisation
Music Retrieval
Information Navigation
Need for Information Retrieval : Need for Information Retrieval Information is of no use unless you can actually access it.
Multimedia Information Retrieval: Multimedia Information Retrieval archive
text, video, images, speech, music, combinations
query
text, stills, sketch, speech, humming, examples
content-based
present results
browsing, summaries, story boards
document clustering, cluster summaries
utilise relevance feedback
Query-retrieval matrix: Query-retrieval matrix text
video
images
speech
music
sketches
multimedia text
stills
sketch
speech
sound
humming
examples query
doc Example
Some applications: Some applications medicine
get diagnosis of cases with similar scans
law enforcement
child pornography prosecution
copyright infringement (music, videos, images)
CCTV video retrieval (car park, public spaces)
digital libraries
searching, visualisation, summaries, browsing
Example: get me similar images!: Example: get me similar images! extract, eg, 50,000 primitive features
provide positive image examples, generate negative examples at random
Feature selection & learning ADA-Boost, K-NN, SVM, ... eg, compute separating hyper-plane and rank all images in database accordingly
Example: Jupiter video search: Example: Jupiter video search video segmentation: generate paragraphs
identify key frame of video paragraph
get Jupiter example images, eg, from web Google image search:
treat video search as image search [with Marcus Pickering and David Sinclair, CVIR 2002]
Result list of video key frames: Result list of video key frames
Content-based MM IR: Content-based MM IR Multimedia Information Retrieval
aims, applications and a retrieval example
Challenges
semantic gap
polysemy
the “multi” in multimedia
Video Search and Summarisation
Music Retrieval
Information Navigation
The semantic gap: The semantic gap
Bridging the semantic gap: Bridging the semantic gap region segmentation + region classification (grass, water, ...)
using simple models for complex concepts (grass+plates+people = barbeque)
Region segmentation: Region segmentation collaboration with AT&T Research, Cambridge
Region classifiers: Region classifiers visual categories
grass, sky (blue), sky (cloudy), skin, trees, wood, water, sand, brick, snow, tarmac
give regions a probability of membership Positive Examples Negative Examples Cluster Prune Cluster Nearest Neighbours Test region Probability Cluster
Example: grass classifier: Example: grass classifier
Modelling semantic concepts: Modelling semantic concepts outdoor town crowd sky grass skin tarmac Bayesian networks
Content-based MM IR: Content-based MM IR Multimedia Information Retrieval
aims, applications and a retrieval example
Challenges
semantic gap
polysemy
the “multi” in multimedia
Video Search and Summarisation
Music Retrieval
Information Navigation
Polysemy: Polysemy old Volkswagen
colour contrast
road signs
outback
Relevance feedback: Relevance feedback system needs plasticity (parameters)
images are quickly assessed and user can inform system explicitly or implicitly
system needs to learn from user = change the parameters
Relevance feedback mechanism : Relevance feedback mechanism centre = query = ideal result
results are displayed such that distance to centre is the dissimilarity to the query
user indicates her/his idea of similarity by rearranging the displayed results
system recomputes optimal parameters for this specific query automatically
Example: relevance feedback: Example: relevance feedback query initial result
User action: User action
After relevance feedback: After relevance feedback number of relevant images has doubled
GUI: GUI
User modelling: User modelling simulate users who click at most three images
mean average precision increase - weight space movement: 15% - query change and weight change: 58% [with Daniel Heesch, ECIR 2003]
Content-based MM IR: Content-based MM IR Multimedia Information Retrieval
aims, applications and a retrieval example
Challenges
semantic gap
polysemy
the “multi” in multimedia
Video Search and Summarisation
Music Retrieval
Information Navigation
The “multi” of multimedia: The “multi” of multimedia high-level features
words and phrases from text, speech recognition
medium-level features
face detector, regions classifiers, outdoor etc
low-level features
Fourier transforms, wavelet decomposition, texture histograms, colour histograms, shape primitives, filter primitives
Unified theoretical framework: Unified theoretical framework document network
index time run time query network
Content-based MM IR: Content-based MM IR Multimedia Information Retrieval
aims, applications and a retrieval example
Challenges
semantic gap
polysemy
the “multi” in multimedia
Video Search and Summarisation
Music Retrieval
Information Navigation
System overview: System overview [with M Pickering, D Heesch, R O’Callaghan and D Bull, TREC 2002]
TREC 2002 evaluation: 10 best manual runs: TREC 2002 evaluation: 10 best manual runs [with M Pickering, D Heesch, R O’Callaghan and D Bull, TREC 2002]
VideoSummary: Video Summary story-level segmentation
keyframe summary videotext summary
full-text search
named entities
[with L Wong and
M Pickering]
Content-based MM IR: Content-based MM IR Multimedia Information Retrieval
aims, applications and a retrieval example
Challenges
semantic gap
polysemy
the “multi” in multimedia
Video Search and Summarisation
Music Retrieval
Information Navigation
Polyphonic Music Indexing Technique: Polyphonic Music Indexing Technique n-grams
encode music as text strings using pitch and onsets
index text words with text search engine
process query in the same way
application: eg, Query by Humming [with Shyamala Doraisamy, ISMIR 2000, ISMIR 2001, ISMIR 2002]
Monophonic pitch n-gramming : Monophonic pitch n-gramming 0 +7 0 +2 0 -2 0 -2 0 Interval: Example: musical strings with interval-only representation [0 +7 0 +2] ZGZB [+7 0 +2 0] GZBZ [0 +2 0 -2] ZBZb
N-grams and polyphony: N-grams and polyphony Polyphony: index all monophonic combinations
Encoded rhythm in similar way
Performed well with known-item search
Studied fault-tolerance
Content-based MM IR: Content-based MM IR Multimedia Information Retrieval
aims, applications and a retrieval example
Challenges
semantic gap
polysemy
the “multi” in multimedia
Video Search and Summarisation
Music Retrieval
Information Navigation
Presentation of search results: Presentation of search results ranked list adequate? [funded by NSF-EU: Cultural Heritage Language Technologies]
[with D Heesch et al]
Vision: labelled clusters: Vision: labelled clusters suggest keywords
refine query
drill down/up
Keyword computation: Keyword computation example: search for “computer”
related keywords: “hardware”, “software”, “IBM”, “Linux”, etc
Document representation: Document representation word histogram vectors (“bag of words”) cost
dog
drug
hospital
hunt
impact
mafia
reform
… vocabulary doc1 doc2 …
New document representation: New document representation use keywords only for returned documents
low-dimensional vector (10-30 dim)
efficient clustering
no curse of dimensionality
Slide43: Sammon
Tree-Map: Tree-Map
Slide45: DendroVis
Slide46: Radial
Slide47: Radial
Conclusions: Conclusions Multimedia Information Retrieval
Challenging research questions
Draws on computer vision, audio processing, natural language analysis, unstructured document analysis, information retrieval, information visualisation, computer human interaction, artificial intelligence
Collaborations: Collaborations part of the High Performance Informatics area
existing collaborations with
Tufts’s Perseus Digital Library
Imperial’s Newton Project
AT&T Research, Cambridge
ISE Dept of the Ben Gurion University, Israel
EE Dept of Bristol University
the Greenstone Digital Library, U of Waikato, NZ
intended collaborations with
Center for Intelligent Information Retrieval, Umass
EIE Dept of Hong Kong Polytechnic University
Content-based Multimedia Information Retrieval: Challenges & Opportunities: Content-based Multimedia Information Retrieval: Challenges & Opportunities Stefan Rüger et al
http://km.doc.ic.ac.uk
The semantic gap: The semantic gap
Rhythm encoding: Rhythm encoding we use ratios, not absolute values and onset time differences, not durations
ri = (oi+2 - oi+1)/(oi+1 - oi)
we quantise this number (use 21 letters)
this is already invariant to tempo change
Keyword computation: Keyword computation potentially interesting for the user
related to the returned documents
able to discriminate the returned documents
candidate keywords: medium document freq
rank words with (h/d) h log(|H|/h) h returned-document frequency d document frequency H returned-document set
keywords: highly ranked candidates
Hierarchical clustering: Hierarchical clustering
Slide55: drill down DendroVis