Mmir Challenges

Uploaded from authorPOINT Lite
Download as
 PPT
Presentation Description 

No description available

authorSTREAM Premium Service
What's up on authorSTREAM?
Views: 184
Like it  ( Likes) Dislike it  ( Dislikes)
Added: November 15, 2007 This Presentation is Public 
Presentation Category : Entertainment All Rights Reserved
Presentation Transcript

Content-based Multimedia Information Retrieval: Challenges & Opportunities: Content-based Multimedia Information Retrieval: Challenges & Opportunities Stefan Rüger et al http://km.doc.ic.ac.uk


Content-based MM IR: Content-based MM IR Multimedia Information Retrieval aims, applications and a retrieval example Challenges semantic gap polysemy the “multi” in multimedia Video Search and Summarisation Music Retrieval Information Navigation


Need for Information Retrieval : Need for Information Retrieval Information is of no use unless you can actually access it.


Multimedia Information Retrieval: Multimedia Information Retrieval archive text, video, images, speech, music, combinations query text, stills, sketch, speech, humming, examples content-based present results browsing, summaries, story boards document clustering, cluster summaries utilise relevance feedback


Query-retrieval matrix: Query-retrieval matrix text video images speech music sketches multimedia text stills sketch speech sound humming examples query doc Example


Some applications: Some applications medicine get diagnosis of cases with similar scans law enforcement child pornography prosecution copyright infringement (music, videos, images) CCTV video retrieval (car park, public spaces) digital libraries searching, visualisation, summaries, browsing


Example: get me similar images!: Example: get me similar images! extract, eg, 50,000 primitive features provide positive image examples, generate negative examples at random Feature selection & learning ADA-Boost, K-NN, SVM, ... eg, compute separating hyper-plane and rank all images in database accordingly


Example: Jupiter video search: Example: Jupiter video search video segmentation: generate paragraphs identify key frame of video paragraph get Jupiter example images, eg, from web Google image search: treat video search as image search [with Marcus Pickering and David Sinclair, CVIR 2002]


Result list of video key frames: Result list of video key frames


Content-based MM IR: Content-based MM IR Multimedia Information Retrieval aims, applications and a retrieval example Challenges semantic gap polysemy the “multi” in multimedia Video Search and Summarisation Music Retrieval Information Navigation


The semantic gap: The semantic gap


Bridging the semantic gap: Bridging the semantic gap region segmentation + region classification (grass, water, ...) using simple models for complex concepts (grass+plates+people = barbeque)


Region segmentation: Region segmentation collaboration with AT&T Research, Cambridge


Region classifiers: Region classifiers visual categories grass, sky (blue), sky (cloudy), skin, trees, wood, water, sand, brick, snow, tarmac give regions a probability of membership Positive Examples Negative Examples Cluster Prune Cluster Nearest Neighbours Test region Probability Cluster


Example: grass classifier: Example: grass classifier


Modelling semantic concepts: Modelling semantic concepts outdoor town crowd sky grass skin tarmac Bayesian networks


Content-based MM IR: Content-based MM IR Multimedia Information Retrieval aims, applications and a retrieval example Challenges semantic gap polysemy the “multi” in multimedia Video Search and Summarisation Music Retrieval Information Navigation


Polysemy: Polysemy old Volkswagen colour contrast road signs outback


Relevance feedback: Relevance feedback system needs plasticity (parameters) images are quickly assessed and user can inform system explicitly or implicitly system needs to learn from user = change the parameters


Relevance feedback mechanism : Relevance feedback mechanism centre = query = ideal result results are displayed such that distance to centre is the dissimilarity to the query user indicates her/his idea of similarity by rearranging the displayed results system recomputes optimal parameters for this specific query automatically


Example: relevance feedback: Example: relevance feedback query initial result


User action: User action


After relevance feedback: After relevance feedback number of relevant images has doubled


GUI: GUI


User modelling: User modelling simulate users who click at most three images mean average precision increase - weight space movement: 15% - query change and weight change: 58% [with Daniel Heesch, ECIR 2003]


Content-based MM IR: Content-based MM IR Multimedia Information Retrieval aims, applications and a retrieval example Challenges semantic gap polysemy the “multi” in multimedia Video Search and Summarisation Music Retrieval Information Navigation


The “multi” of multimedia: The “multi” of multimedia high-level features words and phrases from text, speech recognition medium-level features face detector, regions classifiers, outdoor etc low-level features Fourier transforms, wavelet decomposition, texture histograms, colour histograms, shape primitives, filter primitives


Unified theoretical framework: Unified theoretical framework document network index time run time query network


Content-based MM IR: Content-based MM IR Multimedia Information Retrieval aims, applications and a retrieval example Challenges semantic gap polysemy the “multi” in multimedia Video Search and Summarisation Music Retrieval Information Navigation


System overview: System overview [with M Pickering, D Heesch, R O’Callaghan and D Bull, TREC 2002]


TREC 2002 evaluation: 10 best manual runs: TREC 2002 evaluation: 10 best manual runs [with M Pickering, D Heesch, R O’Callaghan and D Bull, TREC 2002]


Video Summary: Video Summary story-level segmentation keyframe summary videotext summary full-text search named entities [with L Wong and M Pickering]


Content-based MM IR: Content-based MM IR Multimedia Information Retrieval aims, applications and a retrieval example Challenges semantic gap polysemy the “multi” in multimedia Video Search and Summarisation Music Retrieval Information Navigation


Polyphonic Music Indexing Technique: Polyphonic Music Indexing Technique n-grams encode music as text strings using pitch and onsets index text words with text search engine process query in the same way application: eg, Query by Humming [with Shyamala Doraisamy, ISMIR 2000, ISMIR 2001, ISMIR 2002]


Monophonic pitch n-gramming : Monophonic pitch n-gramming 0 +7 0 +2 0 -2 0 -2 0 Interval: Example: musical strings with interval-only representation [0 +7 0 +2] ZGZB [+7 0 +2 0] GZBZ [0 +2 0 -2] ZBZb


N-grams and polyphony: N-grams and polyphony Polyphony: index all monophonic combinations Encoded rhythm in similar way Performed well with known-item search Studied fault-tolerance


Content-based MM IR: Content-based MM IR Multimedia Information Retrieval aims, applications and a retrieval example Challenges semantic gap polysemy the “multi” in multimedia Video Search and Summarisation Music Retrieval Information Navigation


Presentation of search results: Presentation of search results ranked list adequate? [funded by NSF-EU: Cultural Heritage Language Technologies] [with D Heesch et al]


Vision: labelled clusters: Vision: labelled clusters suggest keywords refine query drill down/up


Keyword computation: Keyword computation example: search for “computer” related keywords: “hardware”, “software”, “IBM”, “Linux”, etc


Document representation: Document representation word histogram vectors (“bag of words”) cost dog drug hospital hunt impact mafia reform … vocabulary doc1 doc2 …


New document representation: New document representation use keywords only for returned documents low-dimensional vector (10-30 dim) efficient clustering no curse of dimensionality


Slide43: Sammon


Tree-Map: Tree-Map


Slide45: DendroVis


Slide46: Radial


Slide47: Radial


Conclusions: Conclusions Multimedia Information Retrieval Challenging research questions Draws on computer vision, audio processing, natural language analysis, unstructured document analysis, information retrieval, information visualisation, computer human interaction, artificial intelligence


Collaborations: Collaborations part of the High Performance Informatics area existing collaborations with Tufts’s Perseus Digital Library Imperial’s Newton Project AT&T Research, Cambridge ISE Dept of the Ben Gurion University, Israel EE Dept of Bristol University the Greenstone Digital Library, U of Waikato, NZ intended collaborations with Center for Intelligent Information Retrieval, Umass EIE Dept of Hong Kong Polytechnic University


Content-based Multimedia Information Retrieval: Challenges & Opportunities: Content-based Multimedia Information Retrieval: Challenges & Opportunities Stefan Rüger et al http://km.doc.ic.ac.uk


The semantic gap: The semantic gap


Rhythm encoding: Rhythm encoding we use ratios, not absolute values and onset time differences, not durations ri = (oi+2 - oi+1)/(oi+1 - oi) we quantise this number (use 21 letters) this is already invariant to tempo change


Keyword computation: Keyword computation potentially interesting for the user related to the returned documents able to discriminate the returned documents candidate keywords: medium document freq rank words with (h/d)  h log(|H|/h) h returned-document frequency d document frequency H returned-document set keywords: highly ranked candidates


Hierarchical clustering: Hierarchical clustering


Slide55: drill down DendroVis