In the IEEE 754-2008 standard, the 16-bit base-2 format is referred to as binary16. It is intended for storage of floating-point values in applications where higher precision is not essential for performing arithmetic computations.
Although implementations of the IEEE Half-precision floating point are relatively new, several earlier 16-bit floating point formats have existed including that of Hitachi's HD61810 DSP of 1982, Scott's WIF and the 3dfx Voodoo Graphics processor.
Nvidia and Microsoft defined the half datatype in the Cg language, released in early 2002, and implemented it in silicon in the GeForce FX, released in late 2002. ILM was searching for an image format that could handle a wide dynamic range, but without the hard drive and memory cost of floating-point representations that are commonly used for floating-point computation (single and double precision). The hardware-accelerated programmable shading group led by John Airey at SGI (Silicon Graphics) invented the s10e5 data type in 1997 as part of the 'bali' design effort. This is described in a SIGGRAPH 2000 paper (see section 4.3) and further documented in US patent 7518615.
This format is used in several computer graphics environments including OpenEXR, JPEG XR, GIMP, OpenGL, Cg, and D3DX. The advantage over 8-bit or 16-bit binary integers is that the increased dynamic range allows for more detail to be preserved in highlights and shadows for images. The advantage over 32-bit single-precision binary formats is that it requires half the storage and bandwidth (at the expense of precision and range).
The IEEE 754 standard specifies a binary16 as having the following format:
The format is laid out as follows:
The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus only 10 bits of the significand appear in the memory format but the total precision is 11 bits. In IEEE 754 parlance, there are 10 bits of significand, but there are 11 bits of significand precision (log10(211) ≈ 3.311 decimal digits, or 4 digits ± slightly less than 5 units in the last place).
The half-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 15; also known as exponent bias in the IEEE 754 standard.
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 15 has to be subtracted from the stored exponent.
The stored exponents 000002 and 111112 are interpreted specially.
|Exponent||Significand = zero||Significand ≠ zero||Equation|
|000002||zero, −0||subnormal numbers||(−1)signbit × 2−14 × 0.significantbits2|
|000012, ..., 111102||normalized value||(−1)signbit × 2exponent−15 × 1.significantbits2|
|111112||±infinity||NaN (quiet, signalling)|
The minimum strictly positive (subnormal) value is 2−24 ≈ 5.96 × 10−8. The minimum positive normal value is 2−14 ≈ 6.10 × 10−5. The maximum representable value is (2−2−10) × 215 = 65504.
These examples are given in bit representation of the floating-point value. This includes the sign bit, (biased) exponent, and significand.
0 00000 00000000012 = 000116 = 2−14 × 2−10 = 2−24 ≈ 0.000000059605 (smallest positive subnormal number)
0 00000 11111111112 = 03ff16 = 2−14 × (1 − 2−10) ≈ 0.000060976 (largest subnormal number)
0 00001 00000000002 = 040016 = 2−14 ≈ 0.000061035 (smallest positive normal number)
0 11110 11111111112 = 7bff16 = 215 × (1 + (1 − 2−10)) ≈ 65504 (largest normal number)
0 01110 11111111112 = 3bff16 = 1 − 2−11 ≈ 0.99951 (largest number less than one)
0 01111 00000000002 = 3c0016 = 1 (one)
0 01111 00000000012 = 3c0116 = 1 + 2−10 ≈ 1.001 (smallest number larger than one)
1 10000 00000000002 = c00016 = −2 0 00000 00000000002 = 000016 = 0 1 00000 00000000002 = 800016 = −0 0 11111 00000000002 = 7c0016 = infinity 1 11111 00000000002 = fc0016 = −infinity 0 01101 01010101012 = 355516 = 0.333251953125 ≈ 1/3
By default, 1/3 rounds down like for double precision, because of the odd number of bits in the significand. So the bits beyond the rounding point are
0101... which is less than 1/2 of a unit in the last place.
ARM processors support (via a floating point control register bit) an "alternative half-precision" format, which does away with the special case for an exponent value of 31 (111112). It is almost identical to the IEEE format, but there is no encoding for infinity or NaNs; instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008.
The bfloat16 floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing. It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations, but this is not their intended use.
The bfloat16 format is utilized in upcoming Intel AI processors, such as Nervana NNP-L1000, Xeon processors, and Intel FPGAs, Google Cloud TPUs, and TensorFlow.DisplayID
DisplayID is a VESA standard for metadata describing display device capabilities to the video source. It is designed to replace E-EDID standard and EDID structure v1.4.
The DisplayID standard was initially released in December 2007. Version 1.1 was released in March 2009 and was followed by version 1.2 released in August 2011. Version 1.3 was released in June 2013 and current version 2.0 was released in September 2017.
DisplayID uses variable-length structures of up to 256 bytes each, which encompass all existing EDID extensions as well as new extensions for 3D displays, embedded displays, wide color gamut and HDR EOTF. DisplayID format includes several blocks which describe logical parts of the display such as video interfaces, display device technology, timing details and manufacturer information. Data blocks are identified with a unique tag. The length of each block can be variable or fixed to a specific number of bytes. Only the base data block is mandatory, while all extension blocks are optional. This variable structure is based on CEA EDID Extension Block Version 3 first defined in CEA-861-B.
The DisplayID standard is freely available and is royalty-free to implement.F16
F16, F 16, F-16 or FI6 may refer to:
General Dynamics F-16 Fighting Falcon, a 1974 American multirole fighter jet aircraft
FI6 (antibody), an antibody that targets influenza A viruses
F 16 Uppsala, a Swedish air force base
Formula 16, a 5 metre catamaran
Volvo F16, a truck
the ICD-10 code for mental and behavioural disorders due to use of hallucinogens
Eric Fehr, a Canadian ice hockey player and author who wears #16 for the Toronto Maple Leafs of the National Hockey League
Half-precision floating-point format, a 16-bit computer number formatMinifloat
In computing, minifloats are floating-point values represented with very few bits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes, most often in computer graphics, where iterations are small and precision has aesthetic effects. Additionally, they are frequently encountered as a pedagogical tool in computer-science courses to demonstrate the properties and structures of floating-point arithmetic and IEEE 754 numbers.
Minifloats with 16 bits are half-precision numbers (opposed to single and double precision). There are also minifloats with 8 bits or even fewer.
Minifloats can be designed following the principles of the IEEE 754 standard. In this case they must obey the (not explicitly written) rules for the frontier between subnormal and normal numbers and must have special patterns for infinity and NaN. Normalized numbers are stored with a biased exponent. The new revision of the standard, IEEE 754-2008, has 16-bit binary minifloats.
The Radeon R300 and R420 GPUs used an "fp24" floating-point format with 7 bits of exponent and 16 bits (+1 implicit) of mantissa.
"Full Precision" in Direct3D 9.0 is a proprietary 24-bit floating-point format. Microsoft's D3D9 (Shader Model 2.0) graphics API initially supported both FP24 (as in ATI's R300 chip) and FP32 (as in Nvidia's NV30 chip) as "Full Precision", as well as FP16 as "Partial Precision" for vertex and pixel shader calculations performed by the graphics hardware.
In computer graphics minifloats are sometimes used to represent only integral values. If at the same time subnormal values should exist, the least subnormal number has to be 1. This statement can be used to calculate the bias value. The following example demonstrates the calculation, as well as the underlying principles.Precision (computer science)
In computer science, the precision of a numerical quantity is a measure of the detail in which the quantity is expressed. This is usually measured in bits, but sometimes in decimal digits. It is related to precision in mathematics, which describes the number of digits that are used to express a value.
Some of the standardized precision formats are
Half-precision floating-point format
Single-precision floating-point format
Double-precision floating-point format
Quadruple-precision floating-point format
Octuple-precision floating-point formatOf these, octuple-precision format is rarely used. The single- and double-precision formats are most widely used and supported on nearly all platforms. The use of half-precision format has been increasing especially in the field of machine learning since many machine learning algorithms are inherently error-tolerant.
See also platform-dependent and independent units of information