MPEG-D is a group of standards for audio coding formally known as ISO/IEC 23003 - MPEG audio technologies, published since 2007.[1][2][3][4]

MPEG-D consists of four parts:

See also


  1. ^ MPEG. "MPEG standards - Full list of standards developed or under development". Archived from the original on 2010-04-20. Retrieved 2010-02-09.
  2. ^ MPEG. "Terms of Reference". Archived from the original on 2010-02-21. Retrieved 2010-02-09.
  3. ^ "Working documents, MPEG-D (MPEG Audio Technologies)". MPEG. Archived from the original on 2010-02-21. Retrieved 2010-02-09.
  4. ^ ISO/IEC JTC 1/SC 29 (2009-12-30). "Programme of Work (Allocated to SC 29/WG 11) - MPEG-D". Archived from the original on 2013-12-31. Retrieved 2009-12-30.
  5. ^ ISO (2007-01-29). "ISO/IEC 23003-1:2007 - Information technology -- MPEG audio technologies -- Part 1: MPEG Surround". ISO. Retrieved 2009-10-24.
  6. ^ ISO/IEC JTC1/SC29/WG11 (July 2005). "Tutorial on MPEG Surround Audio Coding". Retrieved 2010-02-09.
  7. ^ ISO (2010-10-06). "ISO/IEC 23003-2 - Information technology -- MPEG audio technologies -- Part 2: Spatial Audio Object Coding (SAOC)". Retrieved 2011-07-18.
  8. ^ "ISO/IEC DIS 23003-3 - Information technology -- MPEG audio technologies -- Part 3: Unified speech and audio coding". 2011-02-15. Retrieved 2011-07-18.
  9. ^ ISO (2015). "ISO/IEC 23003-4 - Information technology -- MPEG audio technologies -- Part 4: Dynamic Range Control". Retrieved 2016-12-21.

ECMA-407 is the world's first approved international 3D audio standard for the unrestricted delivery of channel-based, object-based and scene-based signals up to NHK 22.2 developed by Ecma TC32-TG22 in close cooperation with France Télévisions, Radio France, École Polytechnique Fédérale de Lausanne and McGill University in Montreal.

ECMA-407 uses inverse coding in the time domain, an invention by the Swiss-Austrian mathematician Clemens Par, and shows lowest spatial bitrates ever achieved (for instance, several minutes of NHK 22.2 may be represented by an encapsulated data package of 100 bytes). It was chosen for the World Intellectual Property Organization Award in 2009. Inverse coding goes back to Victor Ambartsumian's scientific legacy on inverse problems and represents the first solution of its kind in audio by separating sound sources at the same frequency by a time-level model.


G.723 is an ITU-T standard speech codec using extensions of G.721 providing voice quality covering 300 Hz to 3400 Hz using Adaptive Differential Pulse Code Modulation (ADPCM) to 24 and 40 kbit/s for digital circuit multiplication equipment (DCME) applications. The standard G.723 is obsolete and has been superseded by G.726.

Note that this is a completely different codec from G.723.1.


H.323 is a recommendation from the ITU Telecommunication Standardization Sector (ITU-T) that defines the protocols to provide audio-visual communication sessions on any packet network. The H.323 standard addresses call signaling and control, multimedia transport and control, and bandwidth control for point-to-point and multi-point conferences.It is widely implemented by voice and videoconferencing equipment manufacturers, is used within various Internet real-time applications such as GnuGK and NetMeeting and is widely deployed worldwide by service providers and enterprises for both voice and video services over IP networks.

It is a part of the ITU-T H.32x series of protocols, which also address multimedia communications over ISDN, the PSTN or SS7, and 3G mobile networks.

H.323 call signaling is based on the ITU-T Recommendation Q.931 protocol and is suited for transmitting calls across networks using a mixture of IP, PSTN, ISDN, and QSIG over ISDN. A call model, similar to the ISDN call model, eases the introduction of IP telephony into existing networks of ISDN-based PBX systems, including transitions to IP-based PBXs.

Within the context of H.323, an IP-based PBX might be a gatekeeper or other call control element which provides service to telephones or videophones. Such a device may provide or facilitate both basic services and supplementary services, such as call transfer, park, pick-up, and hold.

MPEG-4 Part 3

MPEG-4 Part 3 or MPEG-4 Audio (formally ISO/IEC 14496-3) is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.The MPEG-4 Part 3 consists of a variety of audio coding technologies – from lossy speech coding (HVXC, CELP), general audio coding (AAC, TwinVQ, BSAC), lossless audio compression (MPEG-4 SLS, Audio Lossless Coding, MPEG-4 DST), a Text-To-Speech Interface (TTSI), Structured Audio (using SAOL, SASL, MIDI) and many additional audio synthesis and coding techniques.MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback.

MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.

MPEG Surround

MPEG Surround (ISO/IEC 23003-1 or MPEG-D Part 1), also known as Spatial Audio Coding (SAC) is a lossy compression format for surround sound that provides a method for extending mono or stereo audio services to multi-channel audio in a backwards compatible fashion. The total bit rates used for the (mono or stereo) core and the MPEG Surround data are typically only slightly higher than the bit rates used for coding of the (mono or stereo) core.

MPEG Surround adds a side-information stream to the (mono or stereo) core bit stream, containing spatial image data. Legacy stereo playback systems will ignore this side-information while players supporting MPEG Surround decoding will output the reconstructed multi-channel audio.

Moving Picture Experts Group (MPEG) issued a call for proposals on MPEG Spatial Audio Coding in March 2004. The group decided that the technology that would be the starting point in standardization process, would be a combination of the submissions from two proponents - Fraunhofer IIS / Agere Systems and Coding Technologies / Philips. The MPEG Surround standard was developed by the Moving Picture Experts Group (ISO/IEC JTC1/SC29/WG11) and published as ISO/IEC 23003-1 in 2007. It was the first standard of MPEG-D standards group, formally known as ISO/IEC 23003 - MPEG audio technologies.

MPEG Surround was also defined as one of the MPEG-4 Audio Object Types in 2007. There is also the MPEG-4 Low Delay MPEG Surround object type (LD MPEG Surround), which was published in 2010. The Spatial Audio Object Coding (SAOC) was published as MPEG-D Part 2 - ISO/IEC 23003-2 in 2010 and it extends MPEG Surround standard by re-using its spatial rendering capabilities while retaining full compatibility with existing receivers. MPEG SAOC system allows users on the decoding side to interactively control the rendering of each individual audio object (e.g. individual instruments, vocals, human voices). There is also the Unified Speech and Audio Coding (USAC) which will be defined in MPEG-D Part 3 - ISO/IEC 23003-3 and ISO/IEC 14496-3:2009/Amd 3. MPEG-D MPEG Surround parametric coding tools are integrated into the USAC codec.The (mono or stereo) core could be coded with any (lossy or lossless) audio codec. Particularly low bitrates (64-96 kbit/s for 5.1 channels) are possible when using HE-AAC v2 as the core codec.

Moving Picture Experts Group

The Moving Picture Experts Group (MPEG) is a working group of authorities that was formed by ISO and IEC to set standards for audio and video compression and transmission. It was established in 1988 by the initiative of Hiroshi Yasuda (Nippon Telegraph and Telephone) and Leonardo Chiariglione, group Chair since its inception. The first MPEG meeting was in May 1988 in Ottawa, Canada. As of late 2005, MPEG has grown to include approximately 350 members per meeting from various industries, universities, and research institutions. MPEG's official designation is ISO/IEC JTC 1/SC 29/WG 11 – Coding of moving pictures and audio (ISO/IEC Joint Technical Committee 1, Subcommittee 29, Working Group 11).


mp3PRO is an unmaintained proprietary audio compression codec that combines the MP3 audio format with the spectral band replication (SBR) compression method. At the time it was developed it could reduce the size of a stereo MP3 by as much as 50% while maintaining the same relative quality. This works, fundamentally, by discarding the higher half of the frequency range and algorithmically replicating that information while decoding.

The technology behind SBR was developed by the former Swedish company Coding Technologies AB (acquired by Dolby Laboratories in 2007) in the late 1990s. It was included in their MPEG-2 AAC derived codec aacPlus, which would later be standardized as MPEG-4 HE-AAC. Thomson Multimedia (now Technicolor SA) licensed the technology and used it to extend the MP3 format, for which they held patents, hoping to also extend its profitable lifetime. This was released as mp3PRO in 2001.It was originally claimed that mp3PRO files were compatible with existing MP3 decoders, and that the SBR data could simply be ignored. The reality was that MP3 players lacking specific mp3PRO decoding capability experienced a significant reduction in audio quality when playing mp3PRO files as only the lower half of the original frequency range is available.mp3PRO development has been abandoned. The format was never standardized and there is no publicly available reference source code or documentation in existence. A very old software encoder/player exists, but is not maintained. Nero's Soundtrax application, bundled in the Nero Multimedia Suite, is capable of encoding and decoding this format into several others. Some versions of the outdated MusicMatch Jukebox player were able to decode and encode this format, too. In the early 2000s, mp3PRO was usable in several portable music players and in popular music software, but its market share has deteriorated rapidly. The codec itself is largely surpassed in quality and efficiency, as well as device and application support, by modern codecs like AAC and its HE-AAC variants which employ the same SBR method.

Unified Speech and Audio Coding

Unified Speech and Audio Coding (USAC) is an audio compression format and codec for both music and speech or any mix of speech and audio using very low bit rates between 12 and 64 kbit/s. It was developed by Moving Picture Experts Group (MPEG) and was published as an international standard ISO/IEC 23003-3 (a.k.a. MPEG-D Part 3) and also as an MPEG-4 Audio Object Type in ISO/IEC 14496-3:2009/Amd 3 in 2012.It uses time-domain linear prediction and residual coding tools (ACELP-like techniques) for speech signal segments and transform coding tools (MDCT-based techniques) for music signal segments and it is able to switch between the tool sets dynamically in a signal-responsive manner. It is being developed with the aim of a single, unified coder with performance that equals or surpasses that of dedicated speech coders and dedicated music coders over a broad range of bitrates. Enhanced variations of the MPEG-4 Spectral Band Replication (SBR) and MPEG-D MPEG Surround parametric coding tools are integrated into the USAC codec.

MPEG-1 Parts
MPEG-2 Parts
MPEG-4 Parts
MPEG-7 Parts
MPEG-21 Parts
MPEG-D Parts
MPEG-G Parts
MPEG-H Parts
IEC standards
ISO/IEC standards

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.