MPEG-4 Part 3 or MPEG-4 Audio (formally ISO/IEC 14496-3) is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.
The MPEG-4 Part 3 consists of a variety of audio coding technologies – from lossy speech coding (HVXC, CELP), general audio coding (AAC, TwinVQ, BSAC), lossless audio compression (MPEG-4 SLS, Audio Lossless Coding, MPEG-4 DST), a Text-To-Speech Interface (TTSI), Structured Audio (using SAOL, SASL, MIDI) and many additional audio synthesis and coding techniques.
MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback. MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.
|Edition||Release date||Latest amendment||Standard||Description|
|First edition||1999||2001||ISO/IEC 14496-3:1999||also known as "MPEG-4 Audio Version 1"|
|2000||ISO/IEC 14496-3:1999/Amd 1:2000||also known as "MPEG-4 Audio Version 2", an Amendment to first edition|
|Second edition||2001||2005||ISO/IEC 14496-3:2001|
|Third edition||2005||2008||ISO/IEC 14496-3:2005|
|Fourth edition||2009||2015 and under development||ISO/IEC 14496-3:2009|
MPEG-4 Part 3 contains following subparts:
MPEG-4 Audio includes a system for handling a diverse group of audio formats in a uniform manner. Each format is assigned a unique Audio Object Type to represent it. Object Type is used to distinguish between different coding methods. It directly determines the MPEG-4 tool subset required to decode a specific object. The MPEG-4 profiles are based on the object types and each profile supports different list of object types.
|Object Type ID||Audio Object Type||First public release date||Description|
|1||AAC Main||1999||contains AAC LC|
|2||AAC LC (Low Complexity)||1999||Used in the "AAC Profile". MPEG-4 AAC LC Audio Object Type is based on the MPEG-2 Part 7 Low Complexity profile (LC) combined with Perceptual Noise Substitution (PNS) (defined in MPEG-4 Part 3 Subpart 4).|
|3||AAC SSR (Scalable Sample Rate)||1999||MPEG-4 AAC SSR Audio Object Type is based on the MPEG-2 Part 7 Scalable Sampling Rate profile (SSR) combined with Perceptual Noise Substitution (PNS) (defined in MPEG-4 Part 3 Subpart 4).|
|4||AAC LTP (Long Term Prediction)||1999||contains AAC LC|
|5||SBR (Spectral Band Replication)||2003||used with AAC LC in the "High Efficiency AAC Profile" (HE-AAC v1)|
|7||TwinVQ||1999||audio coding at very low bitrates|
|8||CELP (Code Excited Linear Prediction)||1999||speech coding|
|9||HVXC (Harmonic Vector eXcitation Coding)||1999||speech coding|
|12||TTSI (Text-To-Speech Interface)||1999|
|13||Main synthesis||1999||contains 'wavetable' sample-based synthesis and Algorithmic Synthesis and Audio Effects|
|14||'wavetable' sample-based synthesis||1999||based on SoundFont and DownLoadable Sounds, contains General MIDI|
|16||Algorithmic Synthesis and Audio Effects||1999|
|17||ER AAC LC||2000||Error Resilient|
|19||ER AAC LTP||2000||Error Resilient|
|20||ER AAC Scalable||2000||Error Resilient|
|21||ER TwinVQ||2000||Error Resilient|
|22||ER BSAC (Bit-Sliced Arithmetic Coding)||2000||It is also known as "Fine Granule Audio" or fine grain scalability tool. It is used in combination with the AAC coding tools and replaces the noiseless coding and the bitstream formatting of MPEG-4 Version 1 GA coder. Error Resilient|
|23||ER AAC LD (Low Delay)||2000||Error Resilient, used with CELP, ER CELP, HVXC, ER HVXC and TTSI in the "Low Delay Profile", (commonly used for real-time conversation applications)|
|24||ER CELP||2000||Error Resilient|
|25||ER HVXC||2000||Error Resilient|
|26||ER HILN (Harmonic and Individual Lines plus Noise)||2000||Error Resilient|
|27||ER Parametric||2000||Error Resilient|
|28||SSC (SinuSoidal Coding)||2004|
|29||PS (Parametric Stereo)||2004 and 2006||used with AAC LC and SBR in the "HE-AAC v2 Profile". PS coding tool was defined in 2004 and Object Type defined in 2006.|
|30||MPEG Surround||2007||also known as MPEG Spatial Audio Coding (SAC), it is a type of spatial audio coding (MPEG Surround was also defined in ISO/IEC 23003-1 in 2007)|
|34||MPEG-1/2 Layer-3||2005||also known as "MP3onMP4"|
|35||DST (Direct Stream Transfer)||2005||lossless audio coding, used on Super Audio CD|
|36||ALS (Audio Lossless Coding)||2006||lossless audio coding|
|37||SLS (Scalable Lossless Coding)||2006||two-layer audio coding with lossless layer and lossy General Audio core/layer (e.g. AAC)|
|38||SLS non-core||2006||lossless audio coding without lossy General Audio core/layer (e.g. AAC)|
|39||ER AAC ELD (Enhanced Low Delay)||2008||Error Resilient|
|40||SMR (Symbolic Music Representation) Simple||2008||note: Symbolic Music Representation is also the MPEG-4 Part 23 standard (ISO/IEC 14496-23:2008)|
|42||USAC (Unified Speech and Audio Coding) (no SBR)||2012|
|43||SAOC (Spatial Audio Object Coding)||2010||note: Spatial Audio Object Coding is also the MPEG-D Part 2 standard (ISO/IEC 23003-2:2010)|
|44||LD MPEG Surround||2010||This object type conveys Low Delay MPEG Surround Coding side information (that was defined in MPEG-D Part 2 – ISO/IEC 23003-2) in the MPEG-4 Audio framework.|
|45||USAC||2012 (it will be also defined in MPEG-D Part 3 – ISO/IEC 23003-3)|
The MPEG-4 Audio standard defines several profiles. These profiles are based on the object types and each profile supports different list of object types. Each profile may also have several levels, which limit some parameters of the tools present in a profile. These parameters usually are the sampling rate and the number of audio channels decoded at the same time.
|Audio Profile||Audio Object Types||First public release date|
|AAC Profile||AAC LC||2003|
|High Efficiency AAC Profile||AAC LC, SBR||2003|
|HE-AAC v2 Profile||AAC LC, SBR, PS||2006|
|Main Audio Profile||AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, Main synthesis||1999|
|Scalable Audio Profile||AAC LC, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI||1999|
|Speech Audio Profile||CELP, HVXC, TTSI||1999|
|Synthetic Audio Profile||TTSI, Main synthesis||1999|
|High Quality Audio Profile||AAC LC, AAC LTP, AAC Scalable, CELP, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER CELP||2000|
|Low Delay Audio Profile||CELP, HVXC, TTSI, ER AAC LD, ER CELP, ER HVXC||2000|
|Natural Audio Profile||AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD, ER CELP, ER HVXC, ER HILN, ER Parametric||2000|
|Mobile Audio Internetworking Profile||ER AAC LC, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD||2000|
|HD-AAC Profile||AAC LC, SLS||2009|
|ALS Simple Profile||ALS||2010|
|Multiplex||ISO/IEC 14496-1||MPEG-4 Multiplex scheme (M4Mux)|
|Multiplex||ISO/IEC 14496-3||Low Overhead Audio Transport Multiplex (LATM)|
|Storage||ISO/IEC 14496-3 (informative)||Audio Data Interchange Format (ADIF) – only for AAC|
|Storage||ISO/IEC 14496-12||MPEG-4 file format (MP4) / ISO base media file format|
|Transmission||ISO/IEC 14496-3 (informative)||Audio Data Transport Stream (ADTS) – only for AAC|
|Transmission||ISO/IEC 14496-3||Low Overhead Audio Stream (LOAS), based on LATM|
There is no standard for transport of elementary streams over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with a single solution.
The capabilities of a transport layer and the communication between transport, multiplex, and demultiplex functions are described in the Delivery Multimedia Integration Framework (DMIF) in ISO/IEC 14496-6. A wide variety of delivery mechanisms exist below this interface, e.g., MPEG transport stream, Real-time Transport Protocol (RTP), etc.
Transport in Real-time Transport Protocol is defined in RFC 3016 (RTP Payload Format for MPEG-4 Audio/Visual Streams), RFC 3640 (RTP Payload Format for Transport of MPEG-4 Elementary Streams), RFC 4281 (The Codecs Parameter for "Bucket" Media Types) and RFC 4337 (MIME Type Registration for MPEG-4).
LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems.
The Advanced Audio Coding in MPEG-4 Part 3 (MPEG-4 Audio) Subpart 4 was enhanced relative to the previous standard MPEG-2 Part 7 (Advanced Audio Coding), in order to provide better sound quality for a given encoding bitrate.
It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to the newness of the standard.
The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles: Low Complexity profile (LC), Main profile and Scalable Sampling Rate profile (SSR).
The MPEG-4 Part 3 Subpart 4 (General Audio Coding) combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution (PNS) and defined them as Audio Object Types (AAC LC, AAC Main, AAC SSR).
High-Efficiency Advanced Audio Coding is an extension of AAC LC using spectral band replication (SBR), and Parametric Stereo (PS). It is designed to increase coding efficiency at low bitrates by using partial parametric representation of audio.
AAC Scalable Sample Rate was introduced by Sony to the MPEG-2 Part 7 and MPEG-4 Part 3 standards. It was first published in ISO/IEC 13818-7, Part 7: Advanced Audio Coding (AAC) in 1997. The audio signal is first split into 4 bands using a 4 band polyphase quadrature filter bank. Then these 4 bands are further split using MDCTs with a size k of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size k of 128 or 1024 directly on the audio signal.
The advantage of this technique is that short block switching can be done separately for every PQF band. So high frequencies can be encoded using a short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands coding efficiencies around (1,2,3) * fs/8 is worse than normal MPEG-4 AAC LC.
The idea behind AAC-SSR was not only the advantage listed above, but also the possibility of reducing the data rate by removing 1, 2 or 3 of the upper PQF bands. A very simple bitstream splitter can remove these bands and thus reduce the bitrate and sample rate.
Note: although possible, the resulting quality is much worse than typical for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is achieved by using intensity stereo and reduced NMRs. This degrades audible quality less than transmitting 6 kHz bandwidth with perfect quality.
Bit Sliced Arithmetic Coding is an MPEG-4 standard (ISO/IEC 14496-3 subpart 4) for scalable audio coding. BSAC uses an alternative noiseless coding to AAC, with the rest of the processing being identical to AAC. This support for scalability allows for nearly transparent sound quality at 64 kbit/s and graceful degradation at lower bit rates. BSAC coding is best performed in the range of 40 kbit/s to 64 kbit/s, though it operates in the range of 16 kbit/s to 64 kbit/s. The AAC-BSAC codec is used in Digital Multimedia Broadcasting (DMB) applications.
2.2 Wavetable synthesis with SASBF: The SASBF wavetable-bank format had a somewhat complex history of development. The original specification was contributed by E-Mu Systems and was based on their "SoundFont" format . After integration of this component in the MPEG-4 reference software was complete, the MIDI Manufacturers Association (MMA) approached MPEG requesting that MPEG-4 SASBF be compatible with their "Downloaded Sounds" format . E-Mu agreed that this compatibility was desirable, and so a new format was negotiated and designed collaboratively by all parties.
Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at the same bit rate. The confusingly named AAC+ (HE-AAC) does so only at low bit rates and less so at high ones.
AAC has been standardized by ISO and IEC, as part of the MPEG-2 and MPEG-4 specifications. Part of AAC, HE-AAC (AAC+), is part of MPEG-4 Audio and also adopted into digital radio standards DAB+ and Digital Radio Mondiale, as well as mobile television standards DVB-H and ATSC-M/H.
AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams. The quality for stereo is satisfactory to modest requirements at 96 kbit/s in joint stereo mode; however, hi-fi transparency demands data rates of at least 128 kbit/s (VBR). Tests of MPEG-4 audio have shown that AAC meets the requirements referred to as "transparent" for the ITU at 128 kbit/s for stereo, and 320 kbit/s for 5.1 audio.AAC is the default or standard audio format for YouTube, iPhone, iPod, iPad, Nintendo DSi, Nintendo 3DS, iTunes, DivX Plus Web Player, PlayStation 3 and various Nokia Series 40 phones. It is supported on PlayStation Vita, Wii (with the Photo Channel 1.1 update installed), Sony Walkman MP3 series and later, Android and BlackBerry. AAC is also supported by manufacturers of in-dash car audio systems.Audio Lossless Coding
MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 2006. The latest description of MPEG-4 ALS was published as subpart 11 of the MPEG-4 Audio standard (ISO/IEC 14496-3:2009) (4th edition) in August 2009.MPEG-4 ALS combines together a short-term predictor and a long term predictor. The short-term predictor is similar to FLAC in its operation - it is a quantized LPC predictor with a losslessly coded residual using Golomb Rice Coding or Block Gilbert Moore Coding (BGMC). The long term predictor is modeled by 5 long-term weighted residues, each with its own lag (delay). The lag can be hundreds of samples. This predictor improves the compression for sounds with rich harmonics (containing multiples of a single fundamental frequency, locked in phase) present in many musical instruments and human voice.BSAC
BSAC can stand for:
The British Screen Advisory Council
Bit Sliced Arithmetic Coding, audio coding from MPEG-4 Part 3
British South Africa Company
British Sub-Aqua Club
British Society for Antimicrobial Chemotherapy
Black Swamp Area Council
Benedictine Study and Arts CentreCode-excited linear prediction
Code-excited linear prediction (CELP) is a speech coding algorithm originally proposed by M. R. Schroeder and B. S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction and linear predictive coding vocoders (e.g., FS-1015). Along with its variants, such as algebraic CELP, relaxed CELP, low-delay CELP and vector sum excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 Audio speech coding. CELP is commonly used as a generic term for a class of algorithms and not for a particular codec.Harmonic Vector Excitation Coding
Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 (MPEG-4 Audio) standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.It was published as subpart 2 of ISO/IEC 14496-3:1999 (MPEG-4 Audio) in 1999. An extended version of HVXC was published in MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000).MPEG-4 Natural Speech Coding Tool Set uses two algorithms: HVXC and CELP (Code Excited Linear Prediction). HVXC is used at a low bit rate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP.MPEG-4 Part 14
MPEG-4 Part 14 or MP4 is a digital multimedia container format most commonly used to store video and audio, but it can also be used to store other data such as subtitles and still images. Like most modern container formats, it allows streaming over the Internet. The only official filename extension for MPEG-4 Part 14 files is .mp4. MPEG-4 Part 14 (formally ISO/IEC 14496-14:2003) is a standard specified as a part of MPEG-4.
Portable media players are sometimes advertised as "MP4 Players", although some are simply MP3 Players that also play AMV video or some other video format, and do not necessarily play the MPEG-4 Part 14 format.MPEG-4 SLS
MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per ISO/IEC 14496-3:2005/Amd 3:2006 (Scalable Lossless Coding), is an extension to the MPEG-4 Part 3 (MPEG-4 Audio) standard to allow lossless audio compression scalable to lossy MPEG-4 General Audio coding methods (e.g., variations of AAC). It was developed jointly by the Institute for Infocomm Research (I2R) and Fraunhofer, which commercializes its implementation of a limited subset of the standard under the name of HD-AAC. Standardization of the HD-AAC profile for MPEG-4 Audio is under development (as of September 2009).MPEG-4 SLS allows having both a lossy layer and a lossless correction layer similar to Wavpack Hybrid, OptimFROG DualStream and DTS-HD Master Audio, providing backwards compatibility to MPEG AAC-compliant bitstreams. MPEG-4 SLS can also work without a lossy layer (a.k.a. "SLS Non-Core"), in which case it will not be backwards compatible, Lossy compression of files is necessary for files that need to be streamed to the Internet or played in devices with limited storage.
With DRM, ripping of the lossless data or playback on non DRM-enabled devices could be disabled.
MPEG-4 SLS is not related in any way to MPEG-4 ALS (Audio Lossless Coding).MPEG-4 Structured Audio
MPEG-4 Structured Audio is an ISO/IEC standard for describing sound. It was published as subpart 5 of MPEG-4 Part 3 (ISO/IEC 14496-3:1999) in 1999.It allows the transmission of synthetic music and sound effects at very low bit rates (from 0.01 to 10 kbit/s), and the description of parametric sound post-production for mixing multiple streams and adding effects to audio scenes. It does not standardize a particular set of synthesis methods, but a method for describing synthesis methods.
The sound descriptions generate audio when compiled (or interpreted) by a compliant decoder. MPEG-4 Structured Audio consists of the following major elements:
Structured Audio Orchestra Language (SAOL), an audio programming language. SAOL is historically related to Csound and other so-called Music-N languages. It was created by an MIT Media Lab grad student named Eric Scheirer while he was studying under Barry Vercoe during the 1990s.
Structured Audio Score Language (SASL) - is used to describe the manner in which algorithms described in SAOL are used to produce sound.
Structured Audio Sample Bank Format (SASBF) - allows for the transmission of banks of audio samples to be used in 'wavetable' sample-based synthesis (based on SoundFont and DownLoadable Sounds)
A normative Structured Audio scheduler description - it is the supervisory run-time element of the Structured Audio decoding process.
MIDI support - provides important backward-compatibility with existing content and authoring tools.MPEG-4 Structured Audio was cited by CNN as one of the top-25 innovations to arise at the Media Laboratory.MPEG program stream
Program stream (PS or MPEG-PS) is a container format for multiplexing digital audio, video and more. The PS format is specified in MPEG-1 Part 1 (ISO/IEC 11172-1) and MPEG-2 Part 1, Systems (ISO/IEC standard 13818-1/ITU-T H.222.0). The MPEG-2 Program Stream is analogous and similar to ISO/IEC 11172 Systems layer and it is forward compatible.Program streams are used on DVD-Video discs and HD DVD video discs, but with some restrictions and extensions. The filename extensions are VOB and EVO respectively.Nero Digital
Nero Digital is a brand name applied to a suite of MPEG-4-compatible video and audio compression codecs developed by Nero AG of Germany and Ateme of France. The audio codecs are integrated into the Nero Digital Audio+ audio encoding tool for Microsoft Windows, and the audio & video codecs are integrated into Nero's Recode DVD ripping software.
Nero certifies certain DVD player/recorder devices as Nero Digital compatible, and licenses the codec technology to integrated circuit manufacturers.The video codecs were developed by Ateme, and according to an interview with Nero AG developer Ivan Dimkovic, the audio codecs are improved versions of Dimkovic's older PsyTEL AAC Encoder. The audio codec is now available as a free stand-alone package called Nero AAC Codec.Parametric Stereo
Parametric Stereo (PS) is lossy audio compression algorithm and a feature and an Audio Object Type (AOT) defined and used in MPEG-4 Part 3 (MPEG-4 Audio) to further enhance efficiency in low bandwidth stereo media. Advanced Audio Coding Low Complexity (AAC LC) combined with Spectral Band Replication (SBR) and Parametric Stereo (PS) was defined as HE-AAC v2. An HE-AAC v1 decoder will only give mono sound when decoding an HE-AAC v2 bitstream. Parametric Stereo performs sparse coding in the spatial domain, somewhat similar to what SBR does in the frequency domain.PlayStation Vita system software
The PlayStation Vita system software is the official firmware and operating system for the PlayStation Vita and PlayStation TV video game consoles. It uses the LiveArea as its graphical shell. The PlayStation Vita system software has one optional add-on component, the PlayStation Mobile Runtime Package. The system is built on a Unix-base which is derived from FreeBSD and NetBSD. The last version of the system software is 3.70, which was made available on January 14, 2019.Spectral band replication
Spectral band replication (SBR) is a technology to enhance audio or speech codecs, especially at low bit rates and is based on harmonic redundancy in the frequency domain.
It can be combined with any audio compression codec: the codec itself transmits the lower and midfrequencies of the spectrum, while SBR replicates higher frequency content by transposing up harmonics from the lower and midfrequencies at the decoder. Some guidance information for reconstruction of the high-frequency spectral envelope is transmitted as side information.
When needed, it also reconstructs or adaptively mixes in noise-like information in selected frequency bands in order to faithfully replicate signals that originally contained no or fewer tonal components.
The SBR idea is based on the principle that the psychoacoustic part of the human brain tends to analyse higher frequencies with less accuracy; thus harmonic phenomena associated with the spectral band replication process needs only be accurate in a perceptual sense and not technically or mathematically exact.Structured Audio Orchestra Language
Structured Audio Orchestra Language (SAOL) is an imperative, MUSIC-N programming language designed for describing virtual instruments, processing digital audio, and applying sound effects. It was published as subpart 5 of MPEG-4 Part 3 (ISO/IEC 14496-3:1999) in 1999.As part of the MPEG-4 international standard, SAOL is one of the key components of the MPEG-4 Structured Audio toolset, along with:
Structured Audio Score Language (SASL)
Structured Audio Sample Bank Format (SASBF)
The MPEG-4 SA scheduler
TwinVQ (transform-domain weighted interleave vector quantization) is an audio compression technique developed by Nippon Telegraph and Telephone Corporation (NTT) Human Interface Laboratories (now Cyber Space Laboratories) in 1994. The compression technique has been used in both standardized and proprietary designs.
See Compression methods for techniques and Compression software for codecs