Primitive data type

In computer science, primitive data type is either of the following:

  • a basic type is a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types.
  • a built-in type is a data type for which the programming language provides built-in support.

In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of composite data types. Opinions vary as to whether a built-in type that is composite should be considered "primitive".

Depending on the language and its implementation, primitive data types may or may not have a one-to-one correspondence with objects in the computer's memory. However, one usually expects operations on basic primitive data types to be the fastest language constructs there are. Integer addition, for example, can be performed as a single machine instruction, and some processors offer specific instructions to process sequences of characters with a single instruction. In particular, the C standard mentions that "a 'plain' int object has the natural size suggested by the architecture of the execution environment". This means that int is likely to be 32 bits long on a 32-bit architecture. Basic primitive types are almost always value types.

Most languages do not allow the behavior or capabilities of primitive (either built-in or basic) data types to be modified by programs. Exceptions include Smalltalk, which permits all data types to be extended within a program, adding to the operations that can be performed on them or even redefining the built-in operations.

Overview

The actual range of primitive data types that is available is dependent upon the specific programming language that is being used. For example, in C#, strings are a composite but built-in data type, whereas in modern dialects of BASIC and in JavaScript, they are assimilated to a primitive data type that is both basic and built-in.

Classic basic primitive types may include:

More sophisticated types which can be built-in include:

Specific primitive data types

Integer numbers

An integer data type that represents some range of mathematical integers. Integers may be either signed (allowing negative values) or unsigned (non-negative integers only). Common ranges are:

Size (bytes) Size (bits) Names Signed range (assuming two's complement for signed) Unsigned range
1 byte 8 bits Byte, octet, minimum size of char in C99( see limits.h CHAR_BIT) −128 to +127 0 to 255
2 bytes 16 bits x86 word, minimum size of short and int in C −32,768 to +32,767 0 to 65,535
4 bytes 32 bits x86 double word, minimum size of long in C, actual size of int for most modern C compilers,[1] pointer for IA-32-compatible processors −2,147,483,648 to +2,147,483,647 0 to 4,294,967,295
8 bytes 64 bits x86 quadruple word, minimum size of long long in C, actual size of long for most modern C compilers,[1] pointer for x86-64-compatible processors −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 0 to 18,446,744,073,709,551,615
unlimited/8 unlimited Bignum –2unlimited/2 to +(2unlimited/2 − 1) 0 to 2unlimited − 1

Literals for integers can be written as regular Arabic numerals, consisting of a sequence of digits and with negation indicated by a minus sign before the value. However, most programming languages disallow use of commas for or spaces digit grouping. Examples of integer literals are:

  • 42
  • 10000
  • -233000

There are several alternate methods for writing integer literals in many programming languages:

  • Most programming languages, especially those influenced by C, prefix an integer literal with 0X or 0x to represent a hexadecimal value, e.g. 0xDEADBEEF. Other languages may use a different notation, e.g. some assembly languages append an H or h to the end of a hexadecimal value.
  • Perl, Ruby, Java, Julia, D, Rust and Python (starting from version 3.6) allow embedded underscores for clarity, e.g. 10_000_000, and fixed-form Fortran ignores embedded spaces in integer literals.
  • In C and C++, a leading zero indicates an octal value, e.g. 0755. This was primarily intended to be used with Unix modes; however, it has been criticized because normal integers may also lead with zero.[2] As such, Python, Ruby, Haskell, and OCaml prefix octal values with 0O or 0o, following the layout used by hexadecimal values.
  • Several languages, including Java, C#, Scala, Python, Ruby, and OCaml, can represent binary values by prefixing a number with 0B or 0b.

Booleans

A boolean type, typically denoted "bool" or "boolean", is typically a logical type that can be either "true" or "false". Although only one bit is necessary to accommodate the value set "true" and "false", programming languages typically implement boolean types as one or more bytes.

Many languages (e.g. Java, Pascal and Ada) implement booleans adhering to the concept of boolean as a distinct logical type. Languages, though, may implicitly convert booleans to numeric types at times to give extended semantics to booleans and boolean expressions or to achieve backwards compatibility with earlier versions of the language. For example, ANSI C and its former standards did not have a dedicated boolean type. Instead, numeric values of zero are interpreted as "false", and any other value is interpreted as "true". C99 adds a distinct boolean type that can be included with stdbool.h, and C++ supports bool as a built-in type and "true" and "false" as reserved words.

Floating-point numbers

A floating-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in a format equivalent to scientific notation, typically in binary but sometimes in decimal. Because floating-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately.

Many languages have both a single precision (often called "float") and a double precision type.

Literals for floating point numbers include a decimal point, and typically use e or E to denote scientific notation. Examples of floating-point literals are:

  • 20.0005
  • 99.9
  • -5000.12
  • 6.02e23

Some languages (e.g., Fortran, Python, D) also have a complex number type comprising two floating-point numbers: a real part and an imaginary part.

Fixed-point numbers

A fixed-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in a scaled-integer form, typically in binary but sometimes in decimal. Because fixed-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately. Fixed-point numbers also tend to have a more limited range of values than floating point, and so the programmer must be careful to avoid overflow in intermediate calculations as well as the final result.

Characters and strings

A character type (typically called "char") may contain a single letter, digit, punctuation mark, symbol, formatting code, control code, or some other specialized code (e.g., a byte order mark). In C, char is defined as the smallest addressable unit of memory. On most systems, this is 8 bits; Several standards, such as POSIX, require it to be this size. Some languages have two or more character types, for example a single-byte type for ASCII characters and a multi-byte type for Unicode characters. The term "character type" is normally used even for types whose values more precisely represent code units, for example a UTF-16 code unit as in Java and JavaScript.

Characters may be combined into strings. The string data can include numbers and other numerical symbols but will be treated as text.

Strings are implemented in various ways, depending on the programming language. The simplest way to implement strings is to create them as an array of characters, followed by a delimiting character used to signal the end of the string, usually NUL. These are referred to as null-terminated strings, and are usually found in languages with a low amount of hardware abstraction, such as C and Assembly. While easy to implement, null terminated strings have been criticized for causing buffer overflows. Most high-level scripting languages, such as Python, Ruby, and many dialects of BASIC, have no separate character type; strings with a length of one are normally used to represent single characters. Some languages, such as C++ and Java, have the capability to use null-terminated strings (usually for backwards-compatibility measures), but additionally provide their own class for string handling (std::string and java.lang.String, respectively) in the standard library.

There is also a difference on whether or not strings are mutable or immutable in a language. Mutable strings may be altered after their creation, whereas immutable strings maintain a constant size and content. In the latter, the only way to alter strings are to create new ones. There are both advantages and disadvantages to each approach: although immutable strings are much less flexible, they are simpler and completely thread-safe. Some examples of languages that use mutable strings include C++, Perl, and Ruby, whereas languages that do not include JavaScript, Lua, Python and Go. A few languages, such as Objective-C, provide different types for mutable and immutable strings.

Literals for characters and strings are usually surrounded by quotation marks: sometimes, single quotes (') are used for characters and double quotes (") are used for strings. Python accepts either variant for its string notation.

Examples of character literals in C syntax are:

Examples of string literals in C syntax are:

  • "A"
  • "Hello World"

Numeric data type ranges

Each numeric data type has its maximum and minimum value known as the range. Attempting to store a number outside the range may lead to compiler/runtime errors, or to incorrect calculations (due to truncation) depending on the language being used.

The range of a variable is based on the number of bytes used to save the value, and an integer data type is usually able to store 2n values (where n is the number of bits that contribute to the value). For other data types (e.g. floating-point values) the range is more complicated and will vary depending on the method used to store it. There are also some types that do not use entire bytes, e.g. a boolean that requires a single bit, and represents a binary value (although in practice a byte is often used, with the remaining 7 bits being redundant). Some programming languages (such as Ada and Pascal) also allow the opposite direction, that is, the programmer defines the range and precision needed to solve a given problem and the compiler chooses the most appropriate integer or floating-point type automatically.

See also

References

  1. ^ a b Fog, Agner (2010-02-16). "Calling conventions for different C++ compilers and operating systems: Chapter 3, Data Representation" (PDF). Retrieved 2010-08-30.
  2. ^ ECMAScript 6th Edition draft: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-literals-numeric-literals Archived 2013-12-16 at the Wayback Machine
Bfloat16 floating-point format

The bfloat16 floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing. It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations, but this is not their intended use.

The bfloat16 format is utilized in upcoming Intel AI processors, such as Nervana NNP-L1000, Xeon processors, and Intel FPGAs, Google Cloud TPUs, and TensorFlow.

Bit

The bit is a basic unit of information in information theory, computing, and digital communications. The name is a portmanteau of binary digit.In information theory, one bit is typically defined as the information entropy of a binary random variable that is 0 or 1 with equal probability, or the information that is gained when the value of such a variable becomes known. As a unit of information, the bit has also been called a shannon, named after Claude Shannon.

As a binary digit, the bit represents a logical value, having only one of two values. It may be physically implemented with a two-state device. These state values are most commonly represented as either 0or1, but other representations such as true/false, yes/no, +/−, or on/off are possible. The correspondence between these values and the physical states of the underlying storage or device is a matter of convention, and different assignments may be used even within the same device or program.

The symbol for the binary digit is either simply bit per recommendation by the IEC 80000-13:2008 standard, or the lowercase character b, as recommended by the IEEE 1541-2002 and IEEE Std 260.1-2004 standards. A group of eight binary digits is commonly called one byte, but historically the size of the byte is not strictly defined.

Byte

The byte is a unit of digital information that most commonly consists of eight bits, representing a binary number. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures.

The size of the byte has historically been hardware dependent and no definitive standards existed that mandated the size – byte-sizes from 1 to 48 bits are known to have been used in the past. Early character encoding systems often used six bits, and machines using six-bit and nine-bit bytes were common into the 1960s. These machines most commonly had memory words of 12, 24, 36, 48 or 60 bits, corresponding to two, four, six, eight or 10 six-bit bytes. In this era, bytes in the instruction stream were often referred to as syllables, before the term byte became common.

The modern de-facto standard of eight bits, as documented in ISO/IEC 2382-1:1993, is a convenient power of two permitting the values 0 through 255 for one byte (2 in power of 8 = 256, where zero signifies a number as well). The international standard IEC 80000-13 codified this common meaning. Many types of applications use information representable in eight or fewer bits and processor designers optimize for this common usage. The popularity of major commercial computing architectures has aided in the ubiquitous acceptance of the eight-bit size. Modern architectures typically use 32- or 64-bit words, built of four or eight bytes.

The unit symbol for the byte was designated as the upper-case letter B by the International Electrotechnical Commission (IEC) and Institute of Electrical and Electronics Engineers (IEEE) in contrast to the bit, whose IEEE symbol is a lower-case b. Internationally, the unit octet, symbol o, explicitly denotes a sequence of eight bits, eliminating the ambiguity of the byte.

Character (computing)

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.Examples of characters include letters, numerical digits, common punctuation marks (such as "." or "-"), and whitespace. The concept also includes control characters, which do not correspond to symbols in a particular natural language, but rather to other bits of information used to process text in one or more languages. Examples of control characters include carriage return or tab, as well as instructions to printers or other devices that display or otherwise process text.

Characters are typically combined into strings.

Church encoding

In mathematics, Church encoding is a means of representing data and operators in the lambda calculus. The Church numerals are a representation of the natural numbers using lambda notation. The method is named for Alonzo Church, who first encoded data in the lambda calculus this way.

Terms that are usually considered primitive in other notations (such as integers, booleans, pairs, lists, and tagged unions) are mapped to higher-order functions under Church encoding. The Church-Turing thesis asserts that any computable operator (and its operands) can be represented under Church encoding. In the untyped lambda calculus the only primitive data type is the function.

The Church encoding is not intended as a practical implementation of primitive data types. Its use is to show that other primitive data types are not required to represent any calculation. The completeness is representational. Additional functions are needed to translate the representation into common data types, for display to people. It is not possible in general to decide if two functions are extensionally equal due to the undecidability of equivalence from Church's theorem. The translation may apply the function in some way to retrieve the value it represents, or look up its value as a literal lambda term.

Lambda calculus is usually interpreted as using intensional equality. There are potential problems with the interpretation of results because of the difference between the intensional and extensional definition of equality.

Complex data type

Some programming languages provide a complex data type for complex number storage and arithmetic as a built-in (primitive) data type.

In some programming environments the term complex data type (in contrast to primitive data types) is a synonym for the composite data type.

Description Definition Language

DDL (Description Definition Language) is part of the MPEG-7 standard. It gives an important set of tools for the users to create their own Description Schemes (DSs) and Descriptors (Ds). DDL defines the syntax rules to define, combine, extend and modify Description Schemes and Descriptors.

G-Day (disambiguation)

G-Day a series of events held by Google in Latin America, Middle East, Africa and India for developers, etc.

G-Day may also refer to:

G-Day in military designation of days and hours is the unnamed day on which an order, normally national, is given to deploy a unit

G'day, an Australian English greeting

gDay, a primitive data type in XML Schema (W3C)

ISO 10303-28

STEP-XML is a short term for ISO 10303-28, Industrial automation systems and integration—Product data representation and exchange—Part 28: Implementation methods: XML representations of EXPRESS schema and data.

STEP-XML specifies the use of the Extensible Markup Language (XML) to represent EXPRESS schema (ISO 10303-11) and the data that is governed by those EXPRESS schema. It is an alternative method to STEP-File for the exchange of data according to ISO 10303.

The following specifications are within the scope of ISO 10303-28:

Late Bound XML markup declaration set, independent of all EXPRESS schemas, to describe the XML representation of the data governed by each schema

Early Bound XML markup declaration sets, for each of the schemas, to describe the XML representation of the data governed by that specific schema

The mapping between the schema-specific and schema-independent XML markup declarations

The form of XML documents containing EXPRESS schemas and/or data governed by EXPRESS schemas

The XML markup declarations that enables XML representation of EXPRESS schemas

The representation of EXPRESS primitive data type values as element content and as XML attribute values.The following specifications are outside the scope of ISO 10303-28:

XML markup declarations that depend on the semantic intent of the corresponding EXPRESS schema

The mapping from an XML markup declaration to an EXPRESS schema. Note: Given an XML markup declaration set and its corresponding data set(s), it is possible to create an EXPRESS schema that captures the semantic intent of the data. However, this would requires an understanding of the meaning and use of the data that may not be captured by the XML markup declarations.

The mapping from an XML representation of an EXPRESS schema back to the initial EXPRESS schema

The mapping from the XML markup declarations that have been derived from an EXPRESS schema back to the initial EXPRESS schema

The mapping of the final use of an XML schema.

Index of object-oriented programming articles

This is a list of terms found in object-oriented programming. Some are related to object-oriented programming and some are not.

Integer

An integer (from the Latin integer meaning "whole") is a number that can be written without a fractional component. For example, 21, 4, 0, and −2048 are integers, while 9.75, 5 1/2, and 2 are not.

The set of integers consists of zero (0), the positive natural numbers (1, 2, 3, …), also called whole numbers or counting numbers, and their additive inverses (the negative integers, i.e., −1, −2, −3, …). The set of integers is often denoted by a boldface Z ("Z") or blackboard bold (Unicode U+2124 ℤ) standing for the German word Zahlen ([ˈtsaːlən], "numbers").

Z is a subset of the set of all rational numbers Q, in turn a subset of the real numbers R. Like the natural numbers, Z is countably infinite.

The integers form the smallest group and the smallest ring containing the natural numbers. In algebraic number theory, the integers are sometimes qualified as rational integers to distinguish them from the more general algebraic integers. In fact, the (rational) integers are the algebraic integers that are also rational numbers.

Octuple-precision floating-point format

In computing, octuple precision is a binary floating-point-based computer number format that occupies 32 bytes (256 bits) in computer memory. This 256-bit octuple precision is for applications requiring results in higher than quadruple precision. This format is rarely (if ever) used and very few environments support it.

SGML entity

In the Standard Generalized Markup Language (SGML), an entity is a primitive data type, which associates a string with either a unique alias (such as a user-specified name) or an SGML reserved word (such as #DEFAULT). Entities are foundational to the organizational structure and definition of SGML documents. The SGML specification defines numerous entity types, which are distinguished by keyword qualifiers and context. An entity string value may variously consist of plain text, SGML tags, and/or references to previously-defined entities. Certain entity types may also invoke external documents. Entities are called by reference.

Single-precision floating-point format

Single-precision floating-point format is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

A floating-point variable can represent a wider range of numbers than a fixed-point variable of the same bit width at the cost of precision. A signed 32-bit integer variable has a maximum value of 231 − 1 = 2,147,483,647, whereas an IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2−23) × 2127 ≈ 3.4028235 × 1038. All integers with 6 or fewer significant decimal digits, and any number that can be written as 2n such that n is a whole number from -126 to 127, can be converted into an IEEE 754 floating-point value without loss of precision.

In the IEEE 754-2008 standard, the 32-bit base-2 format is officially referred to as binary32; it was called single in IEEE 754-1985. IEEE 754 specifies additional floating-point types, such as 64-bit base-2 double precision and, more recently, base-10 representations.

One of the first programming languages to provide single- and double-precision floating-point data types was Fortran. Before the widespread adoption of IEEE 754-1985, the representation and properties of floating-point data types depended on the computer manufacturer and computer model, and upon decisions made by programming-language designers. E.g., GW-BASIC's single-precision data type was the 32-bit MBF floating-point format.

Single precision is termed REAL in Fortran, SINGLE-FLOAT in Common Lisp, float in C, C++, C#, Java, Float in Haskell, and Single in Object Pascal (Delphi), Visual Basic, and MATLAB. However, float in Python, Ruby, PHP, and OCaml and single in versions of Octave before 3.2 refer to double-precision numbers. In most implementations of PostScript, and some embedded systems, the only supported precision is single.

Symbol (programming)

A symbol in computer programming is a primitive data type whose instances have a unique human-readable form. Symbols can be used as identifiers. In some programming languages, they are called atoms. Uniqueness is enforced by holding them in a symbol table. The most common use of symbols by programmers is for performing language reflection (particularly for callbacks), and most common indirectly is their use to create object linkages.

In the most trivial implementation, they are essentially named integers (e.g. the enumerated type in C).

Value type and reference type

In computer programming, data types can be divided into two categories: value types and reference types. A value of value type is the actual value. A value of reference type is a reference to another value.

Uninterpreted
Numeric
Pointer
Text
Composite
Other
Related
topics

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.