MPEG-7 is a multimedia content description standard. It was standardized in ISO/IEC 15938 (Multimedia content description interface).[1][2][3][4] This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description Interface. Thus, it is not a standard which deals with the actual encoding of moving pictures and audio, like MPEG-1, MPEG-2 and MPEG-4. It uses XML to store metadata, and can be attached to timecode in order to tag particular events, or synchronise lyrics to a song, for example.

It was designed to standardize:

  • a set of Description Schemes ("DS") and Descriptors ("D")
  • a language to specify these schemes, called the Description Definition Language ("DDL")
  • a scheme for coding the description

The combination of MPEG-4 and MPEG-7 has been sometimes referred to as MPEG-47.[5]


MPEG-7 is intended to provide complementary functionality to the previous MPEG standards, representing information about the content, not the content itself ("the bits about the bits"). This functionality is the standardization of multimedia content descriptions. MPEG-7 can be used independently of the other MPEG standards - the description might even be attached to an analog movie. The representation that is defined within MPEG-4, i.e. the representation of audio-visual data in terms of objects, is however very well suited to what will be built on the MPEG-7 standard. This representation is basic to the process of categorization. In addition, MPEG-7 descriptions could be used to improve the functionality of previous MPEG standards.With these tools, we can build an MPEG-7 Description and deploy it. According to the requirements document,1 “a Description consists of a Description Scheme (structure) and the set of Descriptor Values (instantiations) that describe the Data.” A Descriptor Value is “an instantiation of a Descriptor for a given data set (or subset thereof).” The Descriptor is the syntatic and semantic definition of the content. extraction algorithms are inside the scope of the standard because their standardization isn’t required to allow interoperability.


The MPEG-7 (ISO/IEC 15938) consists of different Parts. Each part covers a certain aspect of the whole specification.

MPEG-7 Parts[4][6]
Part Number First public release date (First edition) Latest public release date (edition) Latest amendment Title Description
Part 1 ISO/IEC 15938-1 2002 2002 2006 Systems the architectural framework of MPEG-7, the carriage of MPEG-7 content - TeM (Textual format for MPEG-7) and the binary format for MPEG-7 descriptions (BiM)[7]
Part 2 ISO/IEC 15938-2 2002 2002 Description definition language
Part 3 ISO/IEC 15938-3 2002 2002 2010 Visual
Part 4 ISO/IEC 15938-4 2002 2002 2006 Audio
Part 5 ISO/IEC 15938-5 2003 2003 2015 Multimedia description schemes
Part 6 ISO/IEC 15938-6 2003 2003 2011 Reference software
Part 7 ISO/IEC 15938-7 2003 2003 2011 Conformance testing
Part 8 ISO/IEC TR 15938-8 2002 2002 2011 Extraction and use of MPEG-7 descriptions
Part 9 ISO/IEC 15938-9 2005 2005 2012 Profiles and levels
Part 10 ISO/IEC 15938-10 2005 2005 Schema definition
Part 11 ISO/IEC TR 15938-11 2005 2005 2012 MPEG-7 profile schemas
Part 12 ISO/IEC 15938-12 2008 2012 Query format
Part 13 ISO/IEC 15938-13 2015 2015 Compact descriptors for visual search

Relation between description and content

Independence between description and content

An MPEG-7 architecture requirement is that description must be separate from the audiovisual content.

On the other hand, there must be a relation between the content and description. Thus the description is multiplexed with the content itself.

On the right side you can see this relation between description and content.

MPEG-7 tools

Relation between different tools and elaboration process of MPEG-7

MPEG-7 uses the following tools:

  • Descriptor (D): It is a representation of a feature defined syntactically and semantically. It could be that a unique object was described by several descriptors.
  • Description Schemes (DS): Specify the structure and semantics of the relations between its components, these components can be descriptors (D) or description schemes (DS).
  • Description Definition Language (DDL): It is based on XML language used to define the structural relations between descriptors. It allows the creation and modification of description schemes and also the creation of new descriptors (D).
  • System tools: These tools deal with binarization, synchronization, transport and storage of descriptors. It also deals with Intellectual Property protection.

On the right side you can see the relation between MPEG-7 tools.

MPEG-7 applications

There are many applications and application domains which will benefit from the MPEG-7 standard. A few application examples are:

  • Digital library: Image/video catalogue, musical dictionary.
  • Multimedia directory services: e.g. yellow pages.
  • Broadcast media selection: Radio channel, TV channel.
  • Multimedia editing: Personalized electronic news service, media authoring.
  • Security services: Traffic control, production chains...
  • E-business: Searching process of products.
  • Cultural services: Art-galleries, museums...
  • Educational applications.
  • Biomedical applications.
  • Intelligent multimedia applications that leverage low-level multimedia semantics via formal representation and automated reasoning.[8]

Software and demonstrators for MPEG-7

  • Caliph & Emir: Annotation and retrieval of images based on MPEG-7 (GPL). Creates MPEG-7 XML files.[9]
  • C# Implementation: Open Source implementation of the MPEG-7 descriptors in C#.
  • Frameline 47 Video Notation: Frameline 47 from Versatile Delivery Systems. The first commercial MPEG-7 application, Frameline 47 uses an advanced content schema based on MPEG-7 so as to be able to notate entire video files, or segments and groups of segments from within that video file according to the MPEG-7 convention (commercial tool)
  • Eptascape ADS200 uses a real-time MPEG 7 encoder on an analog camera video signal to identify interesting events, especially in surveillance applications, check the demos to see MPEG-7 in action (commercial tool)
  • IBM VideoAnnEx Annotation Tool: Creating MPEG-7 documents for video streams describing structure and giving keywords from a controlled vocabulary (binary release, restrictive license)
  • iFinder Medienanalyse- und Retrievalsystem: Metadata extraction and search engine based on MPEG-7 (commercial tool)
  • MPEG-7 Audio Encoder: Creating MPEG-7 documents for audio documents describing low level audio characteristics (binary & source release, Java, GPL)
  • MPEG-7 Visual Descriptor Extraction: Software to extract MPEG-7 visual descriptors from images and image regions.
  • XM Feature Extraction Web Service: The functionalities of the eXperimentation Model(XM) are made available via web service interface to enable automatic MPEG-7 low-level visual description characterization of images.
  • TU Berlin MPEG-7 Audio Analyzer (Web-Demo): Creating MPEG-7 documents (XML) for audio documents (WAV, MP3). All 17 MPEG-7 low level audio descriptors are implemented (commercial)
  • TU Berlin MPEG-7 Spoken Content Demonstrator (Web-Demo): Creating MPEG-7 documents (XML) with SpokenContent description from an input speech signal (WAV, MP3) (commercial)
  • MP7JRS C++ Library Complete MPEG-7 implementation of part 3, 4 and 5 (visual, audio and MDS) by Joanneum Research Institute for Information and Communication Technologies - Audiovisual Media Group.
  • BilVideo-7: MPEG-7 compatible, distributed video indexing and retrieval system, supporting complex, multimodal, composite queries; developed by Bilkent University Multimedia Database Group (BILMDG).
  • UniSay: Sophisticated Post-production file analysis and audio processing based on MPEG-7.

See also


The MPEG-7 standard was originally written in XML Schema (XSD), which constitutes semi-structured data. For example, the running time of a movie annotated using MPEG-7 in XML is machine-readable data, so software agents will know that the number expressing the running time is a positive integer, but such data is not machine-interpretable (cannot be understood by agents), because it does not convey semantics (meaning), known as the “Semantic Gap.” To address this issue, there were many attempts to map the MPEG-7 XML Schema to the Web Ontology Language (OWL), which is a structured data equivalent of the terms of the MPEG-7 standard (MPEG-7Ontos, COMM, SWIntO, etc.). However, these mappings did not really bridge the “Semantic Gap,” because low-level video features alone are inadequate for representing video semantics.[10] In other words, annotating an automatically extracted video feature, such as color distribution, does not provide the meaning of the actual visual content.[11]



  • B.S. Manjunath (Editor), Philippe Salembier (Editor), and Thomas Sikora (Editor): Introduction to MPEG-7: Multimedia Content Description Interface. Wiley & Sons, April 2002 - ISBN 0-471-48678-7
  • Harald Kosch: Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21. CRC Press, January 2004 - ISBN 0-8493-1854-8
  • Giorgos Stamou (Editor) and Stefanos Kollias (Editor): Multimedia Content and the Semantic Web: Standards, Methods and Tools. Wiley & Sons, May 2005 - ISBN 0-470-85753-6
  • Hyoung-Gook Kim, Nicolas Moreau, and Thomas Sikora: MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval. Wiley & Sons, October 2005 - ISBN 0-470-09334-X
  1. ^ ISO. "ISO/IEC 15938-1:2002 - Information technology -- Multimedia content description interface -- Part 1: Systems". Retrieved 2009-10-31.
  2. ^ MPEG. "About MPEG - Achievements". Archived from the original on July 8, 2008. Retrieved 2009-10-31.
  3. ^ MPEG. "Terms of Reference". Archived from the original on February 21, 2010. Retrieved 2009-10-31.
  4. ^ a b MPEG. "MPEG standards - Full list of standards developed or under development". Archived from the original on April 20, 2010. Retrieved 2009-10-31.
  5. ^ NetworkDictionary. "Complete Protocol dictionary, glossary and reference - M". Retrieved 2009-12-26.
  6. ^ ISO/IEC JTC 1/SC 29 (2009-10-30). "MPEG-7 (Multimedia content description interface)". Archived from the original on 2013-12-31. Retrieved 2009-11-10.
  7. ^ ISO/IEC JTC1/SC29/WG11 (October 2004). "MPEG-7 Overview (version 10)". Retrieved 2009-11-01.
  8. ^ "MPEG-7 Ontology". Retrieved 29 June 2017.
  9. ^ Lux, Mathias. "Caliph & Emir: MPEG-7 photo annotation and retrieval." Proceedings of the 17th ACM international conference on Multimedia. ACM, 2009.
  10. ^ Sikos, Leslie F.; Powers, David M.W. (2015). "Knowledge-Driven Video Information Retrieval with LOD": 35–37. doi:10.1145/2810133.2810141.
  11. ^ Boll, Susanne; Klas, Wolfgang; Sheth, Amit (1998). "Overview on Using Metadata to Manage Multimedia Data". Using Metadata to Integrate and Apply Digital Media. McGraw-Hill. p. 3. ISBN 978-0070577350.

External links


AudioID is a commercial technology for automatically identifying audio material using acoustic fingerprints. Audio data is recognized automatically and associated information (track or artist name, for example) is provided in real time. The technology was developed by the Fraunhofer Institute for Digital Media Technology (IDMT). The IDMT is managed by Prof. Karlheinz Brandenburg, who led the development of the MP3 format. AudioID technology is a part of the international ISO/IEC MPEG-7 audio standard of the Moving Picture Experts Group (MPEG). In 2005 German-based company Magix AG acquired patents for the technology. Mufin is a commercial product based on the AudioID.


For the architectural modelling technique see Building information modelingBiM (Binary MPEG format for XML) is an international standard defining a generic binary format for encoding XML documents.

The technical specifications for BiM are found in: MPEG systems technologies - Part 1: Binary MPEG format for XML (ISO/IEC 23001-1) It is also known as MPEG-B Part 1.

Color layout descriptor

A color layout descriptor (CLD) is designed to capture the spatial distribution of color in an image. The feature extraction process consists of two parts; grid based representative color selection and discrete cosine transform with quantization.

Color is the most basic quality of the visual contents, therefore it is possible to use colors to describe and represent an image. The MPEG-7 standard has tested the most efficient procedure to describe the color and has selected those that have provided more satisfactory results. This standard proposes different methods to obtain these descriptors, and one tool defined to describe the color is the CLD, that permits describing the color relation between sequences or group of images.

The CLD captures the spatial layout of the representative colors on a grid superimposed on a region or image. Representation is based on coefficients of the discrete cosine transform (DCT). This is a very compact descriptor being highly efficient in fast browsing and search applications. It can be applied to still images as well as to video segments.

Description Definition Language

DDL (Description Definition Language) is part of the MPEG-7 standard. It gives an important set of tools for the users to create their own Description Schemes (DSs) and Descriptors (Ds). DDL defines the syntax rules to define, combine, extend and modify Description Schemes and Descriptors.

Douglas Armati

Douglas "Doug" Armati (born 1950) is an Australian writer, researcher, consultant, business development executive and technical diplomat.

Doug Armati undertook research work on digital copyright issues at Murdoch University in Western Australia in 1990–91 before taking a role in international efforts to standardize the identification of digital objects.

After a speech on the importance and potential economic benefits of uniform approach to identification of digitized copyright content to the International Association of Scientific, Technical, and Medical Publishers at the Frankfurt Book Fair in October 1994 he wrote two pivotal reports in 1995 – the first for the STM group on Information Identification and the second for the Association of American Publishers on Uniform File Identifiers. Armati's work for these global publishing bodies was an important catalyst for the birth of the Digital Object Identifier Foundation.

His 1996 speech at the joint ICSU/UNESCO Electronic Publishing in Science conference in Paris on "Tools and standards for protection, control and archiving" and his book later that year on "Intellectual Property in Electronic Environments" both helped frame the legal, scientific and technical debate in the emerging field of Digital Rights Management. Armati was also part of the digital copyright experts group that worked closely with the World Intellectual Property Organization in the period leading up to the ratification of the WIPO Copyright Treaty in December 1996.

In 1996 Armati joined InterTrust Technologies, the leading company in the then nascent field of Digital Rights Management, where he was a member of the leadership group through the company's 1999 IPO until its sale to Sony and Philips in early 2003.

During his time with InterTrust, Armati was also active in international standards groups, having been a vice-chairman of the Recording Industry Association of America's international Secure Digital Music Initiative, a board member of the Open eBook Forum (now the International Digital Publishing Forum) and a significant contributor to the Moving Picture Experts Group (MPEG), particularly in the development of a standard for the management and protection of intellectual property in MPEG-4, MPEG-7 and MPEG-21.

MP7 (disambiguation)

MP7 or MP 7 may refer to:

Heckler & Koch MP7, a German submachine gun.

A Mammal Paleogene zone during the Oligocene geological period

MPEG-7, a video encoding standard

Mario Party 7, a video game


The MPEG-21 standard, from the Moving Picture Experts Group, aims at defining an open framework for multimedia applications. MPEG-21 is ratified in the standards ISO/IEC 21000 - Multimedia framework (MPEG-21).MPEG-21 is based on two essential concepts:

definition of a Digital Item (a fundamental unit of distribution and transaction)

users interacting with Digital ItemsDigital Items can be considered the kernel of the Multimedia Framework and the users can be considered as who interacts with them inside the Multimedia Framework. At its most basic level, MPEG-21 provides a framework in which one user interacts with another one, and the object of that interaction is a Digital Item. Due to that, we could say that the main objective of the MPEG-21 is to define the technology needed to support users to exchange, access, consume, trade or manipulate Digital Items in an efficient and transparent way.

MPEG-21 Part 9: File Format defined the storage of an MPEG-21 Digital Item in a file format based on the ISO base media file format, with some or all of Digital Item's ancillary data (such as movies, images or other non-XML data) within the same file. It uses filename extensions .m21 or .mp21 and MIME type application/mp21.


MPEG-3 is the designation for a group of audio and video coding standards agreed upon by the Moving Picture Experts Group (MPEG) designed to handle HDTV signals at 1080p in the range of 20 to 40 megabits per second. MPEG-3 was launched as an effort to address the need of an HDTV standard while work on MPEG-2 was underway, but it was soon discovered that MPEG-2, at high data rates, would accommodate HDTV. Thus, in 1992 HDTV was included as a separate profile in the MPEG-2 standard and MPEG-3 was rolled into MPEG-2.

MPEG-4 Part 11

See also: Banded Iron FormationMPEG-4 Part 11 Scene description and application engine was published as ISO/IEC 14496-11 in 2005. MPEG-4 Part 11 is also known as BIFS, XMT, MPEG-J. It defines:

the coded representation of the spatio-temporal positioning of audio-visual objects as well as their behaviour in response to interaction (scene description);

the coded representation of synthetic two-dimensional (2D) or three-dimensional (3D) objects that can be manifested audibly or visually;

the Extensible MPEG-4 Textual (XMT) format - a textual representation of the multimedia content described in MPEG-4 using the Extensible Markup Language (XML);

and a system level description of an application engine (format, delivery, lifecycle, and behaviour of downloadable Java byte code applications). (The MPEG-J Graphics Framework eXtensions (GFX) is defined in MPEG-4 Part 21 - ISO/IEC 14496-21.)Binary Format for Scenes (BIFS) is a binary format for two- or three-dimensional audiovisual content. It is based on VRML and part 11 of the MPEG-4 standard.

BIFS is MPEG-4 scene description protocol to compose MPEG-4 objects, describe interaction with MPEG-4 objects and to animate MPEG-4 objects.

MPEG-4 Binary Format for Scene (BIFS) is used in Digital Multimedia Broadcasting (DMB).The XMT framework accommodates substantial portions of SMIL, W3C Scalable Vector Graphics (SVG) and X3D (the new name of VRML). Such a representation can be directly played back by a SMIL or VRML player, but can also be binarised to become a native MPEG-4 representation that can be played by an MPEG-4 player. Another bridge has been created with BiM (Binary MPEG format for XML).

MPEG Industry Forum

The MPEG Industry Forum (MPEGIF) is a non-profit consortium dedicated to "further the adoption of MPEG Standards, by establishing them as well accepted and widely used standards among creators of content, developers, manufacturers, providers of services, and end users."

The group is involved in many tasks, including promotion of MPEG standards (particularly MPEG-4, MPEG-4 AVC / H.264, MPEG-7 and MPEG-21); developing MPEG certification for products; organising educational events; and collaborating on development of new de facto MPEG standards.

MPEGIF, founded in 2000, has played a significant role in facilitating the widespread adoption and deployment of MPEG-4 AVC/H.264 as the industry’s standard video compression technology powering next generation television and most mainstream content delivery and consumption applications including packaged media. MPEGIF serves as a single point of information on technology, products and services for these standards, offers interoperability testing, a conformance program, marketing activities and is supporting over 50 international trade shows and conferences per year.

The key activities of the forum are structured via three main Committees:

Technology & Engineering

Interoperability & Compliance

Marketing & Communication

Metadata standard

A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common understanding, a number of characteristics, or attributes of the data have to be defined, also known as metadata.

Moving Picture Experts Group

The Moving Picture Experts Group (MPEG) is a working group of authorities that was formed by ISO and IEC to set standards for audio and video compression and transmission. It was established in 1988 by the initiative of Hiroshi Yasuda (Nippon Telegraph and Telephone) and Leonardo Chiariglione, group Chair since its inception. The first MPEG meeting was in May 1988 in Ottawa, Canada. As of late 2005, MPEG has grown to include approximately 350 members per meeting from various industries, universities, and research institutions. MPEG's official designation is ISO/IEC JTC 1/SC 29/WG 11 – Coding of moving pictures and audio (ISO/IEC Joint Technical Committee 1, Subcommittee 29, Working Group 11).

Multimedia information retrieval

Multimedia information retrieval (MMIR or MIR) is a research discipline of computer science that aims at extracting semantic information from multimedia data sources. Data sources include directly perceivable media such as audio, image and video, indirectly perceivable sources such as text, semantic descriptions, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:

Methods for the summarization of media content (feature extraction). The result of feature extraction is a description.

Methods for the filtering of media descriptions (for example, elimination of redundancy)

Methods for the categorization of media descriptions into classes.

Query by humming

Query by humming (QbH) is a music retrieval system that branches off the original classification systems of title, artist, composer, and genre. It normally applies to songs or other music with a distinct single theme or melody. The system involves taking a user-hummed melody (input query) and comparing it to an existing database. The system then returns a ranked list of music closest to the input query.

One example of this would be a system involving a portable media player with a built-in microphone that allows for faster searching through media files.

The MPEG-7 standard includes provisions for QbH music searches.

Examples of QbH systems include ACRCloud, SoundHound, Musipedia, and Tunebot.


Singingfish was an audio/video search engine that powered audio video search for Windows Media Player,, RealOne/RealPlayer, Real Guide, AOL Search, Dogpile, Metacrawler and, among others. Launched in 2000, it was one of the earliest and longest lived search engines dedicated to multimedia content. Acquired in 2003 by AOL, it was slowly folded into the AOL search offerings and all web hits from RMC TV to Singingfish were being redirected to AOL Video and as of February 2007 Singingfish had ceased to exist as a separate service.

Singingfish powered audio search continues to live on for the time being at AOL Search and other AOL properties. However, little if any development has been done since August 2006. Singingfish powered video search is no longer publicly available and is now being re-directed to AOL Video Search.

Uniform Resource Name

A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme.

Vinod Vasudevan

Dr Vinod V Vasudevan is the Group CEO of Flytxt. He is responsible for driving the company's strategic direction and overall management. Vasudevan has over 20 years of experience in software and telecommunications sector spanning across India, Asia Pacific, Europe, and North America.

Throughout his career, he has been involved in the development and commercialization of innovative products and technologies, strategic consultation and planning for startup companies, technology and business planning and business development. He has led and managed several successful projects involving large scale system and network design, development and roll-out worth several hundreds of millions of dollars.

Visual descriptor

In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others.

MPEG-1 Parts
MPEG-2 Parts
MPEG-4 Parts
MPEG-7 Parts
MPEG-21 Parts
MPEG-D Parts
MPEG-G Parts
MPEG-H Parts
IEC standards
ISO/IEC standards

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.