ISO/IEC 10967

ISO/IEC 10967, Language independent arithmetic (LIA), is a series of standards on computer arithmetic. It is compatible with ISO/IEC/IEEE 60559:2011, more known as IEEE 754-2008, and much of the specifications are for IEEE 754 special values (though such values are not required by LIA itself, unless the parameter iec559 is true). It was developed by the working group ISO/IEC JTC1/SC22/WG11, which was disbanded in 2011.[1]

LIA currently consists of three parts:

  • Part 1: Integer and floating point arithmetic, second edition published 2012.
  • Part 2: Elementary numerical functions, first edition published 2001.
  • Part 3: Complex integer and floating point arithmetic and complex elementary numerical functions, first edition published 2006.

Parts

Part 1

Part 1 deals with the basic integer and floating point datatypes (for multiple radices, including 2 and 10), but unlike IEEE 754-2008 not the representation of the values. Part 1 also deals with basic arithmetic, including comparisons, on values of such datatypes. The parameter iec559 is expected to be true for most implementations of LIA-1.

Part 1 was revised, to the second edition, to become more in line with the specifications in parts 2 and 3.

Part 2

Part 2 deals with some additional "basic" operations on integer and floating point datatype values, but focuses primarily on specifying requirements on numerical versions of elementary functions. Much of the specifications in LIA-2 are inspired by the specifications in Ada for elementary functions.

Part 3

Part 3 generalizes parts 1 and 2 to deal with imaginary and complex datatypes and arithmetic and elementary functions on such values. Much of the specifications in LIA-3 are inspired by the specifications for imaginary and complex datatypes and operations in C, Ada and Common Lisp.

Bindings

Each of the parts provide suggested bindings for a number of programming languages. These are not part of the LIA standards, just suggestions, and are not complete. Authors of a programming language standard may wish to alter the suggestions before any incorporation in the programming language standard.

Currently (2013) the standards for C, C++, and Modula-2 have partial bindings to LIA-1.

See also

References

  1. ^ "JTC1/SC22/WG11 – Binding Techniques". Home page. ISO/IEC. Retrieved 7 June 2017.

External links

  • ISO/IEC 10967-1:2012, complete text of Part 1: Integer and floating point arithmetic.
  • ISO/IEC 10967-2:2001, complete text of Part 2: Elementary numerical functions.
  • ISO/IEC 10967-3:2006, complete text of Part 3: Complex integer and floating point arithmetic and complex elementary numerical functions.
Bfloat16 floating-point format

The bfloat16 floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing. It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations, but this is not their intended use.

The bfloat16 format is utilized in upcoming Intel AI processors, such as Nervana NNP-L1000, Xeon processors, and Intel FPGAs, Google Cloud TPUs, and TensorFlow.

Decimal128 floating-point format

In computing, decimal128 is a decimal floating-point computer numbering format that occupies 16 bytes (128 bits) in computer memory.

It is intended for applications where it is necessary to emulate decimal rounding exactly, such as financial and tax computations.

Decimal128 supports 34 decimal digits of significand and an exponent range of −6143 to +6144, i.e. ±0.000000000000000000000000000000000×10^−6143 to ±9.999999999999999999999999999999999×10^6144. (Equivalently, ±0000000000000000000000000000000000×10^−6176 to ±9999999999999999999999999999999999×10^6111.) Therefore, decimal128 has the greatest range of values compared with other IEEE basic floating point formats. Because the significand is not normalized, most values with less than 34 significant digits have multiple possible representations; 1×102=0.1×103=0.01×104, etc. Zero has 12288 possible representations (24576 if you include both signed zeros).

Decimal128 floating point is a relatively new decimal floating-point format, formally introduced in the 2008 version of IEEE 754 as well as with ISO/IEC/IEEE 60559:2011.

Decimal32 floating-point format

In computing, decimal32 is a decimal floating-point computer numbering format that occupies 4 bytes (32 bits) in computer memory.

It is intended for applications where it is necessary to emulate decimal rounding exactly, such as financial and tax computations. Like the binary16 format, it is intended for memory saving storage.

Decimal32 supports 7 decimal digits of significand and an exponent range of −95 to +96, i.e. ±0.000000×10^−95 to ±9.999999×10^96. (Equivalently, ±0000000×10^−101 to ±9999999×10^90.) Because the significand is not normalized (there is no implicit leading "1"), most values with less than 7 significant digits have multiple possible representations; 1×102=0.1×103=0.01×104, etc. Zero has 192 possible representations (384 when both signed zeros are included).

Decimal32 floating point is a relatively new decimal floating-point format, formally introduced in the 2008 version of IEEE 754 as well as with ISO/IEC/IEEE 60559:2011.

Decimal64 floating-point format

In computing, decimal64 is a decimal floating-point computer numbering format that occupies 8 bytes (64 bits) in computer memory.

It is intended for applications where it is necessary to emulate decimal rounding exactly, such as financial and tax computations.

Decimal64 supports 16 decimal digits of significand and an exponent range of −383 to +384, i.e. ±0.000000000000000×10^−383 to ±9.999999999999999×10^384. (Equivalently, ±0000000000000000×10^−398 to ±9999999999999999×10^369.) In contrast, the corresponding binary format, which is the most commonly used type, has an approximate range of ±0.000000000000001×10^−308 to ±1.797693134862315×10^308. Because the significand is not normalized, most values with less than 16 significant digits have multiple possible representations; 1×102=0.1×103=0.01×104, etc. Zero has 768 possible representations (1536 if both signed zeros are included).

Decimal64 floating point is a relatively new decimal floating-point format, formally introduced in the 2008 version of IEEE 754 as well as with ISO/IEC/IEEE 60559:2011.

Half-precision floating-point format

In computing, half precision is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory.

In the IEEE 754-2008 standard, the 16-bit base-2 format is referred to as binary16. It is intended for storage of floating-point values in applications where higher precision is not essential for performing arithmetic computations.

Although implementations of the IEEE Half-precision floating point are relatively new, several earlier 16-bit floating point formats have existed including that of Hitachi's HD61810 DSP of 1982, Scott's WIF and the 3dfx Voodoo Graphics processor.Nvidia and Microsoft defined the half datatype in the Cg language, released in early 2002, and implemented it in silicon in the GeForce FX, released in late 2002. ILM was searching for an image format that could handle a wide dynamic range, but without the hard drive and memory cost of floating-point representations that are commonly used for floating-point computation (single and double precision). The hardware-accelerated programmable shading group led by John Airey at SGI (Silicon Graphics) invented the s10e5 data type in 1997 as part of the 'bali' design effort. This is described in a SIGGRAPH 2000 paper (see section 4.3) and further documented in US patent 7518615.This format is used in several computer graphics environments including OpenEXR, JPEG XR, GIMP, OpenGL, Cg, and D3DX. The advantage over 8-bit or 16-bit binary integers is that the increased dynamic range allows for more detail to be preserved in highlights and shadows for images. The advantage over 32-bit single-precision binary formats is that it requires half the storage and bandwidth (at the expense of precision and range).The F16C extension allows x86 processors to convert half-precision floats to and from single-precision floats.

IEEE 754

The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard.

The standard defines:

arithmetic formats: sets of binary and decimal floating-point data, which consist of finite numbers (including signed zeros and subnormal numbers), infinities, and special "not a number" values (NaNs)

interchange formats: encodings (bit strings) that may be used to exchange floating-point data in an efficient and compact form

rounding rules: properties to be satisfied when rounding numbers during arithmetic and conversions

operations: arithmetic and other operations (such as trigonometric functions) on arithmetic formats

exception handling: indications of exceptional conditions (such as division by zero, overflow, etc.)The current version, IEEE 754-2008 revision published in August 2008, includes nearly all of the original IEEE 754-1985 standard plus IEEE 854-1987 Standard for Radix-Independent Floating-Point Arithmetic.

Language-independent specification

A language-independent specification (LIS) is a programming language specification providing a common interface usable for defining semantics applicable toward arbitrary language bindings.

LIS's are language-agnostic; they mitigate the risk that a certain language binding might reduce compatibility with other languages. An ideal LIS allows the language bindings to take advantage of features of a programming language uncompromisingly.

Examples of LIS include Interface description language, Simplified Wrapper and Interface Generator and Common Language Infrastructure.

Recursive transcompiling can be used to distribute a language independent specification across many different technologies, with each technology potentially keeping an authoritative description of a different part of the specification. Recursive transcompiling provides the general methodology for distributing this authoritative information through the rest of the derivative code pipeline.

List of International Organization for Standardization standards, 10000-10999

This is a list of published International Organization for Standardization (ISO) standards and other deliverables. For a complete and up-to-date list of all the ISO standards, see the ISO catalogue.The standards are protected by copyright and most of them must be purchased. However, about 300 of the standards produced by ISO and IEC's Joint Technical Committee 1 (JTC1) have been made freely and publicly available.

Octuple-precision floating-point format

In computing, octuple precision is a binary floating-point-based computer number format that occupies 32 bytes (256 bits) in computer memory. This 256-bit octuple precision is for applications requiring results in higher than quadruple precision. This format is rarely (if ever) used and very few environments support it.

Quadruple-precision floating-point format

In computing, quadruple precision (or quad precision) is a binary floating point–based computer number format that occupies 16 bytes (128 bits) with precision more than twice the 53-bit double precision.

This 128-bit quadruple precision is designed not only for applications requiring results in higher than double precision, but also, as a primary function, to allow the computation of double precision results more reliably and accurately by minimising overflow and round-off errors in intermediate calculations and scratch variables. William Kahan, primary architect of the original IEEE-754 floating point standard noted, "For now the 10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a 16-byte format ... That kind of gradual evolution towards wider precision was already in view when IEEE Standard 754 for Floating-Point Arithmetic was framed."In IEEE 754-2008 the 128-bit base-2 format is officially referred to as binary128.

Single-precision floating-point format

Single-precision floating-point format is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

A floating-point variable can represent a wider range of numbers than a fixed-point variable of the same bit width at the cost of precision. A signed 32-bit integer variable has a maximum value of 231 − 1 = 2,147,483,647, whereas an IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2−23) × 2127 ≈ 3.4028235 × 1038. All integers with 6 or fewer significant decimal digits, and any number that can be written as 2n such that n is a whole number from -126 to 127, can be converted into an IEEE 754 floating-point value without loss of precision.

In the IEEE 754-2008 standard, the 32-bit base-2 format is officially referred to as binary32; it was called single in IEEE 754-1985. IEEE 754 specifies additional floating-point types, such as 64-bit base-2 double precision and, more recently, base-10 representations.

One of the first programming languages to provide single- and double-precision floating-point data types was Fortran. Before the widespread adoption of IEEE 754-1985, the representation and properties of floating-point data types depended on the computer manufacturer and computer model, and upon decisions made by programming-language designers. E.g., GW-BASIC's single-precision data type was the 32-bit MBF floating-point format.

Single precision is termed REAL in Fortran, SINGLE-FLOAT in Common Lisp, float in C, C++, C#, Java, Float in Haskell, and Single in Object Pascal (Delphi), Visual Basic, and MATLAB. However, float in Python, Ruby, PHP, and OCaml and single in versions of Octave before 3.2 refer to double-precision numbers. In most implementations of PostScript, and some embedded systems, the only supported precision is single.

Unit in the last place

In computer science and numerical analysis, unit in the last place or unit of least precision (ULP) is the spacing between floating-point numbers, i.e., the value the least significant digit represents if it is 1. It is used as a measure of accuracy in numeric calculations.

ISO standards by standard number
1–9999
10000–19999
20000+
IEC standards
ISO/IEC standards
Related

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.