Lecture 14: MPEG-7: Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2003
http://www.sims.berkeley.edu/academics/courses/is202/f03/
SIMS 202:
Information Organization
and Retrieval Lecture 14: MPEG-7
Lecture Overview: Lecture Overview Review
XML and Markup
MPEG-7
MPEG-7 Standard
MPEG-7 Tools
Discussion Questions
Action Items for Next Time
Lecture Overview: Lecture Overview Review
XML and Markup
MPEG-7
MPEG-7 Standard
MPEG-7 Tools
Discussion Questions
Action Items for Next Time
XML as a Common Syntax: XML as a Common Syntax XML (and SGML) provide a way of expressing the structure of documents that can be verified and validated by document processing systems
“Documents” can be metadata structures
Such as the description of a particular photograph in our Phone project
XML thus provides a way of representing metadata descriptions as well as the content that they describe
XML as a Common Syntax: XML as a Common Syntax All XML documents follow some simple rules that make them interchangeable and usable across different systems
All data and markup is in UNICODE
All elements are marked by begin and end tags
All markup is case-sensitive
XML DTD’s and/or Schemas define the valid structure (and sometimes content) of the documents
Document Type Definitions: Document Type Definitions The DTD describes the structural elements and "shorthand" markup for a particular document type and defines:
Names of "legal" elements
How many times elements can appear
The order of elements in a document
Whether markup can be omitted (SGML only)
Contents of elements (i.e., nested structures)
Attributes associated with elements
Names of "entities"
Short-hand conventions for element tags (SGML only)
What are XML Schemas?: What are XML Schemas? An XML vocabulary for expressing your data's structure AND content types, and even the business rules involved in processing the data
Written in XML themselves
Support namespaces for combining multiple schemas in the same documents
The slides in this section are based on an XML tutorial by Roger L. Costello
Why Schemas?: Motivation for XML Schemas Why Schemas? People are dissatisfied with DTDs
It's a different syntax
You write your XML (instance) document using one syntax and the DTD using another syntax --> bad, inconsistent
Limited datatype capability
DTDs support a very limited capability for specifying datatypes. You can't, for example, express "I want the element to hold an integer with a range of 0 to 12,000"
Desire a set of datatypes compatible with those found in databases
DTD supports 10 datatypes; XML Schemas supports 44+ datatypes
Highlights of XML Schemas: Highlights of XML Schemas XML Schemas are a tremendous advancement over DTDs:
Enhanced datatypes
44+ versus 10
Can create your own datatypes
Example: "This is a new type based on the string type and elements of this type must follow this pattern: ddd-dddd, where 'd' represents a digit".
Written in the same syntax as instance documents
less syntax to remember
Object-oriented'ish
Can extend or restrict a type (derive new type definitions on the basis of old ones)
Can express sets, i.e., can define the child elements to occur in any order
Lecture Overview: Lecture Overview Review
XML and Markup
MPEG-7
MPEG-7 Standard
MPEG-7 Tools
Discussion Questions
Action Items for Next Time
What is the Problem?: What is the Problem? Today people cannot easily create, find, edit, share, and reuse media
Computers don’t understand media content
Media is opaque and data rich
We lack structured representations
Without content representation (metadata), manipulating digital media will remain like word-processing with bitmaps
The Search for Solutions: The Search for Solutions Current approaches to creating metadata don’t work
Signal-based analysis
Keywords
Natural language
Need standardized metadata framework
Designed for video and rich media data
Human and machine readable and writable
Standardized and scaleable
Integrated into media capture, archiving, editing, distribution, and reuse
Standards Overview: Standards Overview Why do we need multimedia standards?
Reliability
Scalability
Interoperability
Layered architecture
De facto standards
Not legislated, but widely adopted
De jure standards
Legislated, but not necessarily widely adopted
Multimedia Standards Process: Multimedia Standards Process Market dominance
Microsoft
Examples: Internet Explorer, Windows Media Player
Sony
Examples: VHS, MiniDV
Adobe
Examples: PDF
International standards organizations
ISO
MPEG
SMPTE
MPEG Standards: MPEG Standards MPEG-1
Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/sec
MPEG-2
Generic coding of moving pictures and associated audio information
MPEG Audio Layer-3 (MP3)
Audio compression
MPEG-4
Standardized technological elements enabling the integration of production, distribution, and content access paradigms
MPEG-4: MPEG-4 Represents units of aural, visual or audiovisual content, called “media objects”
These media objects can be of natural or synthetic origin (this means they could be recorded with a camera or microphone, or generated with a computer)
Describes the composition of these objects to create compound media objects that form audiovisual scenes
Synchronizes the data associated with media objects, so that they can be transported over network channels providing a QoS appropriate for the nature of the specific media objects
Interacts with the audiovisual scene generated at the receiver’s end
MPEG Standards: MPEG Standards MPEG-7
Describing the multimedia content data that supports some degree of interpretation of the information’s meaning, which can be passed onto, or accessed by, a device or a computer code
MPEG-21
A normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain
MPEG-7 Motivation: MPEG-7 Motivation Create standardized multimedia description framework
Enable content-based access to and processing of multimedia information on the basis of descriptions of multimedia content and structure (metadata)
Support range of abstraction levels for metadata from low-level signal characteristics to high-level semantic information
MPEG-7 Query Examples: MPEG-7 Query Examples Play a few notes on a keyboard and retrieve a list of musical pieces similar to the required tune, or images matching the notes in a certain way, e.g., in terms of emotions
Draw a few lines on a screen and find a set of images containing similar graphics, logos, ideograms,...
Define objects, including color patches or textures and retrieve examples among which you select the interesting objects to compose your design
On a given set of multimedia objects, describe movements and relations between objects and so search for animations fulfilling the described temporal and spatial relations
Describe actions and get a list of scenarios containing such actions
Using an excerpt of Pavarotti’s voice, obtaining a list of Pavarotti’s records, video clips where Pavarotti is singing and photographic material portraying Pavarotti
MPEG-7 Sample Application Areas: MPEG-7 Sample Application Areas Architecture, real estate, and interior design
(e.g., searching for ideas)
Broadcast media selection
(e.g., radio channel, TV channel)
Cultural services
(history museums, art galleries, etc.)
Digital libraries
(e.g., image catalogue, musical dictionary, bio-medical imaging catalogues, film, video and radio archives)
E-Commerce
(e.g., personalized advertising, on-line catalogues, directories of e-shops)
Education
(e.g., repositories of multimedia courses, multimedia search for support material)
Home Entertainment
(e.g., systems for the management of personal multimedia collections, including manipulation of content, e.g. home video editing, searching a game, karaoke) Investigation services
(e.g., human characteristics recognition, forensics)
Journalism
(e.g. searching speeches of a certain politician using his name, his voice or his face)
Multimedia directory services
(e.g. yellow pages, Tourist information, Geographical information systems)
Multimedia editing
(e.g., personalized electronic news service, media authoring)
Remote sensing
(e.g., cartography, ecology, natural resources management)
Shopping
(e.g., searching for clothes that you like)
Social
(e.g. dating services)
Surveillance
(e.g., traffic control, surface transportation, non-destructive testing in hostile environments)
MPEG-7 Scope: MPEG-7 Scope
MPEG-7 Metadata Framework: MPEG-7 Metadata Framework Data
“multimedia information that will be described using MPEG-7, regardless of storage, coding, display, transmission, medium, or technology.”
Feature
“a distinctive characteristic of the data [that] signifies something to somebody.”
MPEG-7 Metadata Framework: MPEG-7 Metadata Framework Descriptor
“A representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation.”
Description Scheme
“The structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes.”
Description Definition Language (XML Schema)
“A language that allows the creation of new Description Schemes, and, possibly, new Descriptors. It also allows the extension and modification of existing Description Schemes.”
MPEG-7 Framework: MPEG-7 Framework
MPEG-7 Standard Parts: MPEG-7 Standard Parts MPEG-7 Systems
The binary format for encoding MPEG-7 descriptions and the terminal architecture
MPEG-7 Description Definition Language
The language for defining the syntax of the MPEG-7 Description Tools and for defining new Description Schemes
MPEG-7 Visual
The Description Tools dealing with (only) Visual descriptions
MPEG-7 Audio
The Description Tools dealing with (only) Audio descriptions
MPEG-7 Standard Parts: MPEG-7 Standard Parts MPEG-7 Multimedia Description Schemes
The Description Tools dealing with generic features and multimedia descriptions
MPEG-7 Reference Software
A software implementation of relevant parts of the MPEG-7 Standard with normative status
MPEG-7 Conformance Testing
Guidelines and procedures for testing conformance of MPEG-7 implementations
MPEG-7 Extraction and Use of Descriptions
Informative material (in the form of a Technical Report) about the extraction and use of some of the Description Tools (under development)
Lecture Overview: Lecture Overview Review
XML and Markup
MPEG-7
MPEG-7 Standard
MPEG-7 Tools
Discussion Questions
Action Items for Next Time
MPEG-7 Description Tools: MPEG-7 Description Tools
MPEG-7 Top Level Hierarchy: MPEG-7 Top Level Hierarchy
MPEG-7 Still Image Description: MPEG-7 Still Image Description
Referencing Temporal Media: Referencing Temporal Media
Spatio-Temporal Region: Spatio-Temporal Region
MPEG-7 Video Segments Example: MPEG-7 Video Segments Example
MPEG-7 Segment Relationship Graph: MPEG-7 Segment Relationship Graph
MPEG-7 Conceptual Description: MPEG-7 Conceptual Description
MPEG-7 Summaries: MPEG-7 Summaries
MPEG-7 Collections: MPEG-7 Collections
MPEG-7 Application Framework: MPEG-7 Application Framework
MPEG-7 Applications Today: MPEG-7 Applications Today IBM MPEG-7 Annotation Tool
Assists in annotating video sequences with MPEG-7 metadata
Ricoh MPEG-7 MovieTool
A tool for creating video content descriptions conforming to MPEG-7 syntax interactively
Canon MPEG-7 Speech Recognition engine
Web site allows you to create an MPEG-7 Audio “SpokenContent” description file from an audio file in “wav” format
IBM MPEG-7 Annotation Tool : IBM MPEG-7 Annotation Tool
IBM MPEG-7 Annotation Tool: IBM MPEG-7 Annotation Tool The IBM MPEG-7 Annotation Tool assists in annotating video sequences with MPEG-7 metadata
Each shot in the video sequence can be annotated with static scene descriptions, key object descriptions, event descriptions, and other lexicon sets
The annotated descriptions are associated with each video shot and are stored as MPEG-7 descriptions in an XML file
Can also open MPEG-7 files in order to display the annotations for the corresponding video sequence
Customized lexicons can be created, saved, downloaded, and updated
Ricoh MovieTool: Ricoh MovieTool Creates an MPEG-7 description by loading video data
Provides visual clues to aid the user in creating the structure of the video
Automatically reflects the structure in the MPEG-7 descriptions
Visually shows the relationship between the structure and MPEG-7 descriptions
Presents candidate tags to help choose appropriate MPEG-7 tags
Checks the validation of the MPEG-7 descriptions in accordance with MPEG-7 schema
Can describe all metadata defined in MPEG-7
Is able to reflect any future changes and extensions made to MPEG-7 schema
Canon MPEG-7 ASR Tool: Canon MPEG-7 ASR Tool
MPEG-7 Resources: MPEG-7 Resources
http://mpeg.telecomitalialab.com/
http://www.mpeg-industry.com/
http://www.josseybass.com/WileyCDA/WileyTitle/productCd-0471486787.html
MPEG-7 Future: MPEG-7 Future New application specific profiles
Integration into media production and reuse cycle
Automated metadata creation in devices
Use of MPEG-7 metadata in multimedia applications
MPEG-21
Lecture Overview: Lecture Overview Review
XML and Markup
MPEG-7
MPEG-7 Standard
MPEG-7 Tools
Discussion Questions
Action Items for Next Time
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Lisa de Larios-Heiman on MPEG-7
MPEG-7 is generic “so not all descriptive tools are necessary for all applications” Do you believe MPEG-7 as described in the two papers has avoided being too generic? Could each application be too specific, affecting their interoperability?
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Lisa de Larios-Heiman on MPEG-7
The developers of MPEG-7 state that they are leaving the best methods for feature extraction to be decided in the marketplace. Similarly, “competition and innovation will produce the best results” for the consumption-end of the chain. Their papers do not express any concern that those companies that already dominate the software marketplace might dominate this niche as well, either providing inferior products or developing proprietary flavors of MPEG-7. Either of these two scenarios would affect the success of MPEG-7 and interoperability of applications, but are they very likely?
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Megan Finn on MPEG-7
What obstacles will there for the adoption of MPEG-7? For example, people will be required to learn a visual description tool. What can be done to enable technology adoption? Who will be the first groups to adopt MPEG-7?
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Megan Finn on MPEG-7
Media Streams and MPEG-7 have many of the same goals. How could Media Streams use some of the description tools of MPEG-7 (and vice versa)? For example, it seems that the Audio-Visual Description Scheme could use Media Streams' annotation system.
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Jesse Mendelsohn on MPEG-7
The authors say that the actual methods of extracting features are not part of the MPEG-7 standard because it is not within the standard's scope and because competition among providers will cause innovation in extraction methods. Does this mean the Description format of the MPEG-7 standard runs the risk of being deemed difficult, too detailed, too abstract, or even impossible to comply with once the creation of applications for feature extraction is attempted? Otherwise stated, is the creation of a standard this detailed comparable to inventing the car before the wheel?
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Jesse Mendelsohn on MPEG-7
Who is MPEG-7 specifically for? What groups of people are going to be capable of adopting and using it? Images, moving images, sound, and semantics are not only part of motion pictures but many other things as well (i.e., medical imaging). Is the MPEG-7 standard described here powerful enough so that specialized communities can adapt it to their own needs?
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Jeannie Yang on MPEG-7
An important factor of search success or search quality nowadays is ranking or how relevant the search results are to the search terms. Is this concept applicable to searching through MPEG-7 media content at the highest abstraction level, semantic information? If it is, how does one determine ranking for semantic content? Who is to say that one video of the “sun setting” is more relevant or correct than another video of the “sun setting?” Who determines if one video is more “passionate” in its depiction of the sun setting than another?
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Jeannie Yang on MPEG-7
Also, if a ranking system of semantic content does prevail, is there a danger of dwindling supply of creative interpretations because media re-use would always use the same image? For example, if a stock image of a “setting sun” is always re-used, it becomes the de facto image and no more images of the “setting sun” will be taken or found. On the other hand, if the ranking or relevance concept is not applicable to searching through MPEG-7 media at the semantic level, would searching still be a useful application? How does MPEG-7 make searching for semantic content easier in this case, when there may be a thousand videos on the sun setting?
Discussion Questions (MPEG-7): Discussion Questions (MPEG-7) Joseph Hall on MPEG-7
Will businesses have to pay royalties to be able to use the MPEG-7 standard? How does this affect how this "standard" penetrates the global community? (A corollary: what's the impetus behind charging royalties for the use of standards?)
Lecture Overview: Lecture Overview Review
XML and Markup
MPEG-7
MPEG-7 Standard
MPEG-7 Tools
Discussion Questions
Action Items for Next Time
Phone Project Presentations: Phone Project Presentations Flamenco Markup and Browsing of Photos
COMING SOON!
Readings for Next Time: Readings for Next Time Introduction to IR and the Search Process (RRL)
MIR Ch. 1
Social Navigation of Information in Space (Alan Nmunro, Kristina Hook, and David Benyon)
Where did you Put It? Issues in the Design and Use of a Group Memory (Lucy Berlin et. al.)