ISO base media file format (ISO/IEC 14496-12 – MPEG-4 Part 12) defines a general structure for time-based multimedia files such as video and audio. The identical text is published as ISO/IEC 15444-12 (JPEG 2000, Part 12).
It is designed as a flexible, extensible format that facilitates interchange, management, editing and presentation of the media. The presentation may be local, or via a network or other stream delivery mechanism. The file format is designed to be independent of any particular network protocol while enabling support for them in general. It is used as the basis for other media file formats (e.g. container formats MP4 and 3GP).
|ISO base media file format (MPEG-4 Part 12)|
|Type of format||Media container|
|Container for||Audio, video, text, data|
|Extended from||QuickTime .mov|
|Extended to||MP4, 3GP, 3G2, .mj2, .dvb, .dcf, .m21, .f4v|
|Standard||ISO/IEC 14496-12, ISO/IEC 15444-12|
ISO base media file format is directly based on Apple’s QuickTime container format. It was developed by MPEG (ISO/IEC JTC1/SC29/WG11). The first MP4 file format specification was created on the basis of the QuickTime format specification published in 2001. The MP4 file format known as "version 1" was published in 2001 as ISO/IEC 14496-1:2001, as revision of the MPEG-4 Part 1: Systems. In 2003, the first version of MP4 file format was revised and replaced by MPEG-4 Part 14: MP4 file format (ISO/IEC 14496-14:2003), commonly known as MPEG-4 file format "version 2". The MP4 file format was generalized into the ISO Base Media File format (ISO/IEC 14496-12:2004 or ISO/IEC 15444-12:2004), which defines a general structure for time-based media files. It is used as the basis for other file formats in the family such as MP4, 3GP, Motion JPEG 2000).
|Edition||Release date||Latest amendment||Standard||Description|
|First edition||2004||ISO/IEC 14496-12:2004, ISO/IEC 15444-12:2004|
|Second edition||2005||2008||ISO/IEC 14496-12:2005, ISO/IEC 15444-12:2005|
|Third edition||2008||2009||ISO/IEC 14496-12:2008, ISO/IEC 15444-12:2008|
|Fourth edition||2012||ISO/IEC 14496-12:2012, ISO/IEC 15444-12:2012|
|Fifth edition||2015||ISO/IEC 14496-12:2015, ISO/IEC 15444-12:2015|
In January 2017, ISO/IEC 15444-12 was withdrawn.
The ISO base media file format is designed as extensible file format. List of all registered extensions for ISO Base Media File Format is published on the official registration authority website, www.mp4ra.org. The registration authority for code-points (identifier values) in "MP4 Family" files is Apple Inc. and it is named in Annex D (informative) in MPEG-4 Part 12. Codec designers should register the codes they invent, but the registration is not mandatory and some of invented and used code-points are not registered. When someone is creating a new specification derived from the ISO base media file format, all the existing specifications should be used both as examples and a source of definitions and technology. If an existing specification already covers how a particular media type is stored in the file format (e.g. MPEG-4 audio or video in MP4), that definition should be used and a new one should not be invented.
MPEG has standardized a number of specifications extending the ISO base media file format: The MP4 file format (ISO/IEC 14496-14) defined some extensions over ISO base media file format to support MPEG-4 visual/audio codecs and various MPEG-4 Systems features such as object descriptors and scene descriptions. The MPEG-4 Part 3 (MPEG-4 Audio) standard also defined storage of some audio compression formats. Storage of MPEG-1/2 Audio (MP3, MP2, MP1) in the ISO base media file format was defined in ISO/IEC 14496-3:2001/Amd 3:2005. The Advanced Video Coding (AVC) file format (ISO/IEC 14496-15) defined support for H.264/MPEG-4 AVC video compression. The High Efficiency Image File Format (HEIF) is an image container format using the ISO base media file format as the basis. While HEIF can be used with any image compression format, it specifically includes the support for HEVC intra-coded images and HEVC-coded image sequences taking advantage of inter-picture prediction.
Some of the above-mentioned MPEG standard extensions are used by other formats based on ISO base media file format (e.g. 3GP). The 3GPP file format (.3gp) specification also defined extensions to support H.263 video, AMR-NB, AMR-WB, AMR-WB+ audio and 3GPP Timed Text in files based on the ISO base media file format. The 3GPP2 file format (.3g2) defined extensions for usage of EVRC, SMV or 13K (QCELP) voice compression formats. The JPEG 2000 specification (ISO/IEC 15444-3) defined usage of Motion JPEG 2000 video compression and uncompressed audio (PCM) in ISO base media file format (.mj2). The "DVB File Format" (.dvb) defined by DVB Project allowed storage of DVB services in the ISO base media file format. It allows the storage of audio, video and other content in any of three main ways: encapsulated in a MPEG transport stream, stored as a reception hint track; encapsulated in an RTP stream, stored as a reception hint track or stored directly as media tracks. The MPEG-21 File Format (.m21, .mp21) defined the storage of an MPEG-21 Digital Item in ISO base media file format, with some or all of its ancillary data (such as movies, images or other non-XML data) within the same file. The OMA DRM Content Format (.dcf) specification from Open Mobile Alliance defined the content format for DRM protected encrypted media objects and associated metadata. There are also other extensions, such as ISMA ISMACryp specification for encrypted/protected audio and video, G.719 audio compression specification, AC3 and E-AC-3 audio compression, DTS audio compression, Dirac video compression, VC-1 video compression specification and others, which are named on the MP4 Registration authority's website.
There are some extensions over ISO base media file format, which were not registered by the MP4 Registration authority. Adobe Systems introduced in 2007 new F4V file format for Flash Video and declared that it is based on the ISO base media file format. The F4V file format was not registered by the MP4 registration authority, but the F4V technical specification is publicly available. This format can contain H.264 video compression and MP3 or AAC audio compression. In addition, F4V file format can contain data corresponding to the ActionScript Message Format and still frame of video data using image formats GIF, JPEG and PNG. Microsoft Corporation announced in 2009 a file format based on the ISO base media file format — ISMV (Smooth Streaming format), also known as Protected Interoperable File Format (PIFF). As announced, this format can for example contain VC-1, WMA, H.264 and AAC compression formats. Microsoft published a Protected Interoperable File Format (PIFF) specification in 2010. It defined another usage of multiple encryption and DRM systems in a single file container. PIFF brand was registered by the MP4 registration authority in 2010. Some extensions used by this format (e.g. for WMA support) were not registered. Usage of WMA compression format in ISO base media file format was not publicly documented so it’s possible that they may be unsupported by some platforms.
ISO base media file format contains the timing, structure, and media information for timed sequences of media data, such as audio-visual presentations. The file structure is object-oriented. A file can be decomposed into basic objects very simply and the structure of the objects is implied from their type.
Files conforming to the ISO base media file format are formed as a series of objects, called "boxes." All data is contained in boxes and there is no other data within the file. This includes any initial signature required by the specific file format. The "box" is an object-oriented building block defined by a unique type identifier and length. It was called "atom" in some specifications (e.g. the first definition of MP4 file format).
A presentation (motion sequence) may be contained in several files. All timing and framing (position and size) information must be in the ISO base media file and the ancillary files may essentially use any format. They must be only capable of description by the metadata defined in ISO base media file format.
In order to identify the specifications to which a file based on ISO base media file format complies, brands are used as identifiers in the file format. They are set in a box named File Type Box ('ftyp'), which must be placed in the beginning of the file. It is somewhat analogous to the so-called fourcc code, used for a similar purpose for media embedded in AVI container format. A brand might indicate the type of encoding used, how the data of each encoding is stored, constraints and extensions that are applied to the file, the compatibility, or the intended usage of the file. Brands are a printable four-character codes. A File Type Box contains two kinds of brands. One is "major_brand" which identifies the specification of the best use for the file. It is followed by "minor_version," an informative 4 bytes integer for the minor version of the major brand. The second kind of brand is "compatible_brands," which identifies multiple specifications to which the file complies. All files shall contain a File Type Box, but for compatibility reasons with an earlier version of the specification, files may be conformant to ISO base media file format and not contain a File Type Box. In that case they should be read as if they contained an ftyp with major and compatible brand "mp41" (MP4 v1 – ISO 14496-1, Chapter 13). Many in-use brands (ftyps) are not registered and can be found on some webpages.
A multimedia file structured upon ISO base media file format may be compatible with more than one concrete specification, and it is therefore not always possible to speak of a single "type" or "brand" for the file. In this regard, the utility of the Multipurpose Internet Mail Extension type and file name extension is somewhat reduced. In spite of that, when a derived specification is written, a new file extension will be used, a new MIME type and a new Macintosh file type.
The ISO base media file format supports streaming of media data over a network as well as local playback. A file that supports streaming includes information about the data units to stream (how to serve the elementary stream data in the file over streaming protocols). This information is placed in additional tracks of the file called "hint" tracks. Separate "hint" tracks for different protocols may be included within the same file. The media will play over all such protocols without making any additional copies or versions of the media data. Existing media can be easily made streamable for other specific protocols by the addition of appropriate hint tracks. The media data itself need not be reformatted in any way. The streams sent by the servers under the direction of the hint tracks, need contain no trace of file-specific information. When the presentation is played back locally (not streamed), the hint tracks may be ignored. Hint tracks may be created by an authoring tool, or may be added to an existing file (presentation) by a hinting tool. In media authored for progressive download the moov box, which contains the index of frames should precede the movie data mdat box.
3GP (3GPP file format) is a multimedia container format defined by the Third Generation Partnership Project (3GPP) for 3G UMTS multimedia services. It is used on 3G mobile phones but can also be played on some 2G and 4G phones.
3G2 (3GPP2 file format) is a multimedia container format defined by the 3GPP2 for 3G CDMA2000 multimedia services. It is very similar to the 3GP file format but consumes less space & bandwidth also has some extensions and limitations in comparison to 3GP.Adaptive Multi-Rate Wideband
Adaptive Multi-Rate Wideband (AMR-WB) is a patented wideband speech audio coding standard developed based on Adaptive Multi-Rate encoding, using similar methodology as algebraic code excited linear prediction (ACELP). AMR-WB provides improved speech quality due to a wider speech bandwidth of 50–7000 Hz compared to narrowband speech coders which in general are optimized for POTS wireline quality of 300–3400 Hz. AMR-WB was developed by Nokia and VoiceAge and it was first specified by 3GPP.
AMR-WB is codified as G.722.2, an ITU-T standard speech codec, formally known as Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB). G.722.2 AMR-WB is the same codec as the 3GPP AMR-WB. The corresponding 3GPP specifications are TS 26.190 for the speech codec and TS 26.194 for the Voice Activity Detector.The AMR-WB format has the following parameters:
Frequency bands processed: 50-6400 Hz (all modes) plus 6400-7000 Hz (23.85 kbit/s mode only)
Delay frame size: 20 ms
Look ahead: 5 ms
AMR-WB codec employs a bandsplitting filter; the one-way delay of this filter is 0.9375 ms
Complexity: 38 WMOPS, RAM 5.3KWords
Voice activity detection, discontinuous transmission, comfort noise generator
Fixed point: Bit-exact C
Floating point: under work.A common file extension for AMR-WB file format is .awb. There also exists another storage format for AMR-WB that is suitable for applications with more advanced demands on the storage format, like random access or synchronization with video. This format is the 3GPP-specified 3GP container format based on ISO base media file format. 3GP also allows use of AMR-WB bit streams for stereo sound.Adaptive Multi-Rate audio codec
The Adaptive Multi-Rate (AMR, AMR-NB or GSM-AMR) audio codec is an audio compression format optimized for speech coding. AMR speech codec consists of a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.AMR was adopted as the standard speech codec by 3GPP in October 1999 and is now widely used in GSM and UMTS. It uses link adaptation to select from one of eight different bit rates based on link conditions.
AMR is also a file format for storing spoken audio using the AMR codec. Many modern mobile telephone handsets can store short audio recordings in the AMR format, and both free and proprietary programs exist (see Software support) to convert between this and other formats, although AMR is a speech format and is unlikely to give ideal results for other audio. The common filename extension is .amr. There also exists another storage format for AMR that is suitable for applications with more advanced demands on the storage format, like random access or synchronization with video. This format is the 3GPP-specified 3GP container format based on ISO base media file format.Codec
A codec is a device or computer program for encoding or decoding a digital data stream or signal. Codec is a portmanteau of coder-decoder.A coder encodes a data stream or a signal for transmission or storage, possibly in encrypted form, and the decoder function reverses the encoding for playback or editing. Codecs are used in videoconferencing, streaming media, and video editing applications.Digital container format
A container or wrapper format is a metafile format whose specification describes how different elements of data and metadata coexist in a computer file.Among the earliest cross-platform container formats were Distinguished Encoding Rules and the 1985 Interchange File Format. Containers are frequently used in multimedia applications.Flash Video
Flash Video is a container file format used to deliver digital video content (e.g., TV shows, movies, etc.) over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There are two different video file formats known as Flash Video: FLV and F4V. The audio and video data within FLV files are encoded in the same manner as they are within SWF files. The F4V file format is based on the ISO base media file format and is starting with Flash Player 9 update 3. Both formats are supported in Adobe Flash Player and developed by Adobe Systems. FLV was originally developed by Macromedia.
In the early 2000s, Flash Video used to be the de facto standard for web-based streaming video (over RTMP). Notable users of it include Hulu, VEVO, Yahoo! Video, metacafe, Reuters.com, and many other news providers.
Flash Video FLV files usually contain material encoded with codecs following the Sorenson Spark or VP6 video compression formats. The most recent public releases of Flash Player (collaboration between Adobe Systems and MainConcept) also support H.264 video and HE-AAC audio. All of these compression formats are restricted by patents. Flash Video is viewable on most operating systems via the Adobe Flash Player and web browser plugin or one of several third-party programs. Apple's iOS devices, along with almost all other mobile devices, do not support the Flash Player plugin and so require other delivery methods such as provided by the Adobe Flash Media Server.G.719
G.719 is an ITU-T standard audio coding format providing high quality, moderate bit rate (32 to 128 kbit/s) wideband (20 Hz - 20 kHz audio bandwidth, 48 kHz audio sample rate) audio coding at low computational load. It was produced through a collaboration between Polycom and Ericsson.G.719 incorporates elements of Polycom's Siren22 codec (22 kHz) and Ericsson codec technology, as well as Polycom's Siren7 and Siren14 codecs (G.722.1 and G.722.1 Annex C), which have been used in videoconferencing systems for many years. As ITU-T Recommendation G.719, it was approved on June 13, 2008.
G.719 is optimized for both speech and music. It is based on transform coding with adaptive time-resolution, adaptive bit-allocation and low complexity lattice vector quantization. The computational complexity is quite low (18 floating-point MIPS) for an efficient high-quality compressor. The codec operates on 20 ms frames, and the algorithmic delay end-to-end is 40 ms. The encoder input and decoder output are sampled at 48 kHz.
In addition to the nominal bit rates of 32, 48 and 64 kbit/s, the G.719 codec has an inherent feature of flexible rate selection. In fact, it is possible to accommodate any rate between 32 kbit/s and 64 kbit/s by steps of 4 kbit/s. Moreover, the codec can also provide higher rates than 64 kbit/s and up to 128 kbit/s.
Amendment 1 of the ITU-T G.719 specification defined the use of the ISO base media file format (ISO/IEC 14496-12 a.k.a. MPEG-4 Part 12) as container for the G.719 bitstream. It also defined stereo and multichannel use of G.719 bitstreams in the ISO base media file format. It addresses non-conversational use cases of the codec (e.g. call waiting music playback and recording of teleconferencing sessions, voice mail messages). Thus, media file formats such as MP4 (audio/mp4 or video/mp4) and 3GP (audio/3GPP and video/3GPP) can contain G.719-encoded audio.RFC 5404 defined media type audio/G719.High Efficiency Image File Format
High Efficiency Image File Format (HEIF) is a file format for individual images and image sequences. It was developed by the Moving Picture Experts Group (MPEG) and is defined by MPEG-H Part 12 (ISO/IEC 23008-12). The MPEG group claims that twice as much information can be stored in a HEIF image as in a JPEG image of the same size, resulting in a better quality image.
The HEIF specification also defines the means of storing High Efficiency Video Codec (HEVC)-encoded intra images and HEVC-encoded image sequences in which inter prediction is applied in a constrained manner.
HEIF files are compatible with the ISO Base Media File Format (ISOBMFF, ISO/IEC 14496-12) and can also include other media streams, such as timed text and audio.
HEIF image files are stored with filename extensions .heif or .heic.ISMACryp
The ISMA Encryption and Authentication, Version 1.1 specification (or ISMACryp) specifies encryption and message authentication services for MPEG-4 over RTP streams. It was defined by the Internet Streaming Media Alliance and published on September 15, 2006.The ISMA Encryption and Authentication, Version 2.0 specifies content encryption, message authentication (integrity) services, an RTP payload
format and a file format for pre-encrypted content for ISMA 1.0, ISMA 2.0 and more generally any media that can be stored as elementary stream in an ISO base media file format (ISO/IEC 14496-12). The specification was published on 15 November 2007. ISMACryp specification defined extensions over the ISO base media file format, which were registered by the registration authority for code-points in "MP4 Family" files. The ISMACryp 2.0 specification in an informative "Annex F" provides guidelines on how ISMACryp can be used together with the key and rights management system of OMA DRM v2 (Open Mobile Alliance DRM). The Packetized OMA DRM Content Format is almost based on ISMACryp format.There are two alternatives to ISMACryp, SRTP and IPsec, that can also be used to provide service and content protection. The difference between the three is at what level encryption is done. Whereas ISMACryp encrypts MPEG-4 access units (that are in the RTP payload), SRTP encrypts the whole RTP payload, and IPsec encrypts packets at .JPEG 2000
JPEG 2000 (JP2) is an image compression standard and coding system. It was created by the Joint Photographic Experts Group committee in 2000 with the intention of superseding their original discrete cosine transform-based JPEG standard (created in 1992) with a newly designed, wavelet-based method. The standardized filename extension is .jp2 for ISO/IEC 15444-1 conforming files and .jpx for the extended part-2 specifications, published as ISO/IEC 15444-2. The registered MIME types are defined in RFC 3745. For ISO/IEC 15444-1 it is image/jp2.
JPEG 2000 code streams are regions of interest that offer several mechanisms to support spatial random access or region of interest access at varying degrees of granularity. It is possible to store different parts of the same picture using different quality.MPEG-21
The MPEG-21 standard, from the Moving Picture Experts Group, aims at defining an open framework for multimedia applications. MPEG-21 is ratified in the standards ISO/IEC 21000 - Multimedia framework (MPEG-21).MPEG-21 is based on two essential concepts:
definition of a Digital Item (a fundamental unit of distribution and transaction)
users interacting with Digital ItemsDigital Items can be considered the kernel of the Multimedia Framework and the users can be considered as who interacts with them inside the Multimedia Framework. At its most basic level, MPEG-21 provides a framework in which one user interacts with another one, and the object of that interaction is a Digital Item. Due to that, we could say that the main objective of the MPEG-21 is to define the technology needed to support users to exchange, access, consume, trade or manipulate Digital Items in an efficient and transparent way.
MPEG-21 Part 9: File Format defined the storage of an MPEG-21 Digital Item in a file format based on the ISO base media file format, with some or all of Digital Item's ancillary data (such as movies, images or other non-XML data) within the same file. It uses filename extensions .m21 or .mp21 and MIME type application/mp21.MPEG-4
MPEG-4 is a method of defining compression of audio and visual (AV) digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group (MPEG) (ISO/IEC JTC1/SC29/WG11) under the formal standard ISO/IEC 14496 – Coding of audio-visual objects. Uses of MPEG-4 include compression of AV data for web (streaming media) and CD distribution, voice (telephone, videophone) and broadcast television applications.MPEG-4 Part 14
MPEG-4 Part 14 or MP4 is a digital multimedia container format most commonly used to store video and audio, but it can also be used to store other data such as subtitles and still images. Like most modern container formats, it allows streaming over the Internet. The only official filename extension for MPEG-4 Part 14 files is .mp4. MPEG-4 Part 14 (formally ISO/IEC 14496-14:2003) is a standard specified as a part of MPEG-4.
Portable media players are sometimes advertised as "MP4 Players", although some are simply MP3 Players that also play AMV video or some other video format, and do not necessarily play the MPEG-4 Part 14 format.MPEG-4 Part 17
MPEG-4 Part 17, or MPEG-4 Timed Text, or MPEG-4 Streaming text format is the text-based subtitle format for MPEG-4, published as ISO/IEC 14496-17 in 2006. It was developed in response to the need for a generic method for coding of text as one of the multimedia components within audiovisual presentations.It is also streamable, which was one of the main aspects when creating the format. It is mainly aimed for use in the .mp4 container, but can also be used in the .3gp container (as 3GPP Timed Text), which is technically almost identical with .mp4 but more used in cell phones. 3GPP Timed Text is exactly the same as MPEG-4 Timed Text when used in the .mp4 container. It can be also used in other file formats based on the ISO base media file format.3GPP approved the Timed text format for 3G multimedia services in 3GPP TS 26.245 in 2004. MPEG-4 Part 17 (ISO/IEC 14496-17:2006) defined Text Streams that are capable of carrying 3GPP Timed Text. For 3GPP text streams, ISO/IEC 14496-17:2006 defined a generic framing structure suitable for transport of 3GPP text streams across a variety of networks (RTP and MPEG transport stream and MPEG program stream). The framing structure for text streams consists of so-called Timed Text Units (TTU).MPEG-H
MPEG-H is a group of standards under development by the ISO/IEC Moving Picture Experts Group (MPEG) for a digital container standard, a video compression standard, an audio compression standard, and two conformance testing standards. The group of standards is formally known as ISO/IEC 23008 - High efficiency coding and media delivery in heterogeneous environments.MPEG-H consists of the following parts:
MPEG-H Part 1: MPEG media transport (MMT) – A media streaming format similar to the Real-time Transport Protocol that is adaptable to different networks.
MPEG-H Part 2: High Efficiency Video Coding (under joint development with the ITU-T Video Coding Experts Group) – A video compression standard that doubles the data compression ratio compared to H.264/MPEG-4 AVC and can support resolutions up to 8192×4320.
MPEG-H Part 3: 3D Audio – An audio compression standard for 3D audio that can support many loudspeakers.
MPEG-H Part 4: MMT Reference Software
MPEG-H Part 5: HEVC Reference Software
MPEG-H Part 6: 3D Audio Reference Software
MPEG-H Part 7: MMT Conformance Testing
MPEG-H Part 8: HEVC Conformance Testing
MPEG-H Part 9: 3D Audio Conformance Testing
MPEG-H Part 10: MMT FEC Codes
MPEG-H Part 11: MMT Composition Coding
MPEG-H Part 12: High Efficiency Image File Format based on the ISO base media file format
MPEG-H Part 13: MMT Implementation GuidelinesMPEG Common Encryption
MPEG Common Encryption (abbreviated MPEG-CENC) refers to a set of two MPEG standards governing different container formats:
for ISOBMFF, Common encryption in ISO base media file format files (ISO/IEC 23001-7:2016)
for MPEG-TS, Common encryption of MPEG-2 transport streams (ISO/IEC 23001-9:2016)The specifications are compatible, so that conversion between the encrypted formats can happen without re-encryption.
They define metadata, specific to each format, about which parts of the stream are encrypted and by which encryption scheme. Each encryption scheme may have different methods to retrieve the decryption key.QuickTime
QuickTime is an extensible multimedia framework developed by Apple Inc., capable of handling various formats of digital video, picture, sound, panoramic images, and interactivity. First made in 1991, the latest Mac version, QuickTime X, is currently available on Mac OS X Snow Leopard and newer. Apple ceased support for the Windows version of QuickTime in 2016.As of Mac OS X Lion, the underlying media framework for QuickTime, QTKit, is deprecated in favor of a newer graphics framework, AV Foundation. In iOS, the video player on the Internet was QuickTime-based, used to play videos on the internet.QuickTime File Format
QuickTime File Format (QTFF) is a computer file format used natively by the QuickTime framework.
See Compression methods for techniques and Compression software for codecs