Digital data

Digital data, in information theory and information systems, is the discrete, discontinuous representation of information or works. Numbers and letters are commonly used representations.

Digital data can be contrasted with analog signals which behave in a continuous manner, and with continuous functions such as sounds, images, and other measurements.

The word digital comes from the same source as the words digit and digitus (the Latin word for finger), as fingers are often used for discrete counting. Mathematician George Stibitz of Bell Telephone Laboratories used the word digital in reference to the fast electric pulses emitted by a device designed to aim and fire anti-aircraft guns in 1942.[1] The term is most commonly used in computing and electronics, especially where real-world information is converted to binary numeric form as in digital audio and digital photography.

Symbol to digital conversion

Since symbols (for example, alphanumeric characters) are not continuous, representing symbols digitally is rather simpler than conversion of continuous or analog information to digital. Instead of sampling and quantization as in analog-to-digital conversion, such techniques as polling and encoding are used.

A symbol input device usually consists of a group of switches that are polled at regular intervals to see which switches are switched. Data will be lost if, within a single polling interval, two switches are pressed, or a switch is pressed, released, and pressed again. This polling can be done by a specialized processor in the device to prevent burdening the main CPU. When a new symbol has been entered, the device typically sends an interrupt, in a specialized format, so that the CPU can read it.

For devices with only a few switches (such as the buttons on a joystick), the status of each can be encoded as bits (usually 0 for released and 1 for pressed) in a single word. This is useful when combinations of key presses are meaningful, and is sometimes used for passing the status of modifier keys on a keyboard (such as shift and control). But it does not scale to support more keys than the number of bits in a single byte or word.

Devices with many switches (such as a computer keyboard) usually arrange these switches in a scan matrix, with the individual switches on the intersections of x and y lines. When a switch is pressed, it connects the corresponding x and y lines together. Polling (often called scanning in this case) is done by activating each x line in sequence and detecting which y lines then have a signal, thus which keys are pressed. When the keyboard processor detects that a key has changed state, it sends a signal to the CPU indicating the scan code of the key and its new state. The symbol is then encoded, or converted into a number, based on the status of modifier keys and the desired character encoding.

A custom encoding can be used for a specific application with no loss of data. However, using a standard encoding such as ASCII is problematic if a symbol such as 'ß' needs to be converted but is not in the standard.

It is estimated that in the year 1986 less than 1% of the world's technological capacity to store information was digital and in 2007 it was already 94%.[2] The year 2002 is assumed to be the year when human kind was able to store more information in digital than in analog format (the "beginning of the digital age").[3][4]


Digital data come in these three states: data at rest, data in transit and data in use. The confidentiality, integrity and availability have to be managed during the entire lifecycle from 'birth' to the destruction of the data.

Properties of digital information

All digital information possesses common properties that distinguish it from analog data with respect to communications:

  • Synchronization: Since digital information is conveyed by the sequence in which symbols are ordered, all digital schemes have some method for determining the beginning of a sequence. In written or spoken human languages, synchronization is typically provided by pauses (spaces), capitalization, and punctuation. Machine communications typically use special synchronization sequences.
  • Language: All digital communications require a formal language, which in this context consists of all the information that the sender and receiver of the digital communication must both possess, in advance, in order for the communication to be successful. Languages are generally arbitrary and specify the meaning to be assigned to particular symbol sequences, the allowed range of values, methods to be used for synchronization, etc.
  • Errors: Disturbances (noise) in analog communications invariably introduce some, generally small deviation or error between the intended and actual communication. Disturbances in a digital communication do not result in errors unless the disturbance is so large as to result in a symbol being misinterpreted as another symbol or disturb the sequence of symbols. It is therefore generally possible to have an entirely error-free digital communication. Further, techniques such as check codes may be used to detect errors and guarantee error-free communications through redundancy or re-transmission. Errors in digital communications can take the form of substitution errors in which a symbol is replaced by another symbol, or insertion/deletion errors in which an extra incorrect symbol is inserted into or deleted from a digital message. Uncorrected errors in digital communications have unpredictable and generally large impact on the information content of the communication.
  • Copying: Because of the inevitable presence of noise, making many successive copies of an analog communication is infeasible because each generation increases the noise. Because digital communications are generally error-free, copies of copies can be made indefinitely.
  • Granularity: The digital representation of a continuously variable analog value typically involves a selection of the number of symbols to be assigned to that value. The number of symbols determines the precision or resolution of the resulting datum. The difference between the actual analog value and the digital representation is known as quantization error. For example, if the actual temperature is 23.234456544453 degrees, but if only two digits (23) are assigned to this parameter in a particular digital representation, the quantizing error is: 0.234456544453. This property of digital communication is known as granularity.
  • Compressible: According to Miller, "Uncompressed digital data is very large, and in its raw form, it would actually produce a larger signal (therefore be more difficult to transfer) than analog data. However, digital data can be compressed. Compression reduces the amount of bandwidth space needed to send information. Data can be compressed, sent and then decompressed at the site of consumption. This makes it possible to send much more information and result in, for example, digital television signals offering more room on the airwave spectrum for more television channels."[4]

Historical digital systems

Even though digital signals are generally associated with the binary electronic digital systems used in modern electronics and computing, digital systems are actually ancient, and need not be binary or electronic.

  • DNA genetic code is a naturally occurring form of digital data storage.
  • Written text (due to the limited character set and the use of discrete symbols – the alphabet in most cases)
  • The abacus was created sometime between 1000 BC and 500 BC, it later became a form of calculation frequency. Nowadays it can be used as a very advanced, yet basic digital calculator that uses beads on rows to represent numbers. Beads only have meaning in discrete up and down states, not in analog in-between states.
  • A beacon is perhaps the simplest non-electronic digital signal, with just two states (on and off). In particular, smoke signals are one of the oldest examples of a digital signal, where an analog "carrier" (smoke) is modulated with a blanket to generate a digital signal (puffs) that conveys information.
  • Morse code uses six digital states—dot, dash, intra-character gap (between each dot or dash), short gap (between each letter), medium gap (between words), and long gap (between sentences)—to send messages via a variety of potential carriers such as electricity or light, for example using an electrical telegraph or a flashing light.
  • The Braille system was the first binary format for character encoding, using a six-bit code rendered as dot patterns.
  • Flag semaphore uses rods or flags held in particular positions to send messages to the receiver watching them some distance away.
  • International maritime signal flags have distinctive markings that represent letters of the alphabet to allow ships to send messages to each other.
  • More recently invented, a modem modulates an analog "carrier" signal (such as sound) to encode binary electrical digital information, as a series of binary digital sound pulses. A slightly earlier, surprisingly reliable version of the same concept was to bundle a sequence of audio digital "signal" and "no signal" information (i.e. "sound" and "silence") on magnetic cassette tape for use with early home computers.

See also


  1. ^ Ceruzzi, Paul E (June 29, 2012). Computing: A Concise History. MIT Press. ISBN 978-0-262-51767-6.
  2. ^ "The World’s Technological Capacity to Store, Communicate, and Compute Information", especially Supporting online material, Martin Hilbert and Priscila López (2011), Science, 332(6025), 60–65; free access to the article through here:
  3. ^ "video animation on The World’s Technological Capacity to Store, Communicate, and Compute Information from 1986 to 2010
  4. ^ a b Miller, Vincent (2011). Understanding digital culture. London: Sage Publications. sec. "Convergence and the contemporary media experience". ISBN 978-1-84787-497-9.

Further reading

  • Tocci, R. 2006. Digital Systems: Principles and Applications (10th Edition). Prentice Hall. ISBN 0-13-172579-3
Audio codec

An audio codec is a codec (a device or computer program capable of encoding or decoding a digital data stream) that encodes or decodes audio.In software, an audio codec is a computer program implementing an algorithm that compresses and decompresses digital audio data according to a given audio file or streaming media audio coding format. The objective of the algorithm is to represent the high-fidelity audio signal with minimum number of bits while retaining quality. This can effectively reduce the storage space and the bandwidth required for transmission of the stored audio file. Most software codecs are implemented as libraries which interface to one or more multimedia players.

In hardware, audio codec refers to a single device that encodes analog audio as digital signals and decodes digital back into analog. In other words, it contains both an analog-to-digital converter (ADC) and digital-to-analog converter (DAC) running off the same clock signal. This is used in sound cards that support both audio in and out, for instance. Hardware audio codecs send and receive digital data using buses such as AC-Link, I²S, SPI, I²C, etc. Most commonly the digital data is linear PCM, and this is the only format that most codecs support, but some legacy codecs support other formats such as G.711 for telephony.


A checksum is a small-sized datum derived from a block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. It is usually applied to an installation file after it is received from the download server. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.

The actual procedure which yields the checksum from a data input is called a checksum function or checksum algorithm. Depending on its design goals, a good checksum algorithm will usually output a significantly different value, even for small changes made to the input. This is especially true of cryptographic hash functions, which may be used to detect many data corruption errors and verify overall data integrity; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted.

Checksum functions are related to hash functions, fingerprints, randomization functions, and cryptographic hash functions. However, each of those concepts has different applications and therefore different design goals. For instance, a function returning the start of a string can provide a hash appropriate for some applications but will never be a suitable checksum. Checksums are used as cryptographic primitives in larger authentication algorithms. For cryptographic systems with these two specific design goals, see HMAC.

Check digits and parity bits are special cases of checksums, appropriate for small blocks of data (such as Social Security numbers, bank account numbers, computer words, single bytes, etc.). Some error-correcting codes are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases.


A codec is a device or computer program for encoding or decoding a digital data stream or signal. Codec is a portmanteau of coder-decoder.A coder encodes a data stream or a signal for transmission or storage, possibly in encrypted form, and the decoder function reverses the encoding for playback or editing. Codecs are used in videoconferencing, streaming media, and video editing applications.

DNA digital data storage

DNA digital data storage is defined as the process of encoding and decoding binary data to and from synthesized DNA strands. DNA molecules are genetic blueprints for living cells and organisms. Although DNA data storage became a popular topic in the 21st century, it is not a modern-day idea. Its origins date back to 1964-65 when Mikhail Neiman, a Soviet physicist, published his works in the journal Radiotehnika. Neiman wrote about general considerations regarding the possibility of recording, storage, and retrieval of information on DNA molecules. The physicist explained he had the idea from an interview with Norbert Wiener, an American cybernetic, mathematician, and philosopher, published in 1964.

Daisy chain (electrical engineering)

In electrical and electronic engineering a daisy chain is a wiring scheme in which multiple devices are wired together in sequence or in a ring. Other than a full, single loop, systems which contain internal loops cannot be called daisy chains.

Daisy chains may be used for power, analog signals, digital data, or a combination thereof.

The term daisy chain may refer either to large scale devices connected in series, such as a series of power strips plugged into each other to form a single long line of strips, or to the wiring patterns embedded inside of devices. Other examples of devices which can be used to form daisy chains are those based on USB, FireWire, Thunderbolt and Ethernet cables.

Data (computing)

Data ( DAY-tə, DAT-ə, DAH-tə; treated as singular, plural, or as a mass noun) is any sequence of one or more symbols given meaning by specific act(s) of interpretation.

Data (or datum – a single unit of data) requires interpretation to become information. To translate data to information, there must be several known factors considered. The factors involved are determined by the creator of the data and the desired information. The term metadata is used to reference the data about the data. Metadata may be implied, specified or given. Data relating to physical events or processes will also have a temporal component. In almost all cases this temporal component is implied. This is the case when a device such as a temperature logger receives data from a temperature sensor. When the temperature is received it is assumed that the data has a temporal references of "now". So the device records the date, time and temperature together. When the data logger communicates temperatures, it must also report the date and time (metadata) for each temperature.

Digital data is data that is represented using the binary number system of ones (1) and zeros (0), as opposed to analog representation. In modern (post 1960) computer systems, all data is digital. Data within a computer, in most cases, moves as parallel data. Data moving to or from a computer, in most cases, moves as serial data. See Parallel communication and Serial communication. Data sourced from an analog device, such as a temperature sensor, must pass through an "analog to digital converter" or "ADC" (see Analog-to-digital converter) to convert the analog data to digital data.

Data representing quantities, characters, or symbols on which operations are performed by a computer are stored and recorded on magnetic, optical, or mechanical recording media, and transmitted in the form of digital electrical signals.A program is a set of data that consists of a series of coded software instructions to control the operation of a computer or other machine. Physical computer memory elements consist of an address and a byte/word of data storage. Digital data are often stored in relational databases, like tables or SQL databases, and can generally be represented as abstract key/value pairs.

Data can be organized in many different types of data structures, including arrays, graphs, and objects. Data structures can store data of many different types, including numbers, strings and even other data structures. Data pass in and out of computers via peripheral devices.

In an alternate usage, binary files (which are not human-readable) are sometimes called "data" as distinguished from human-readable "text". The total amount of digital data in 2007 was estimated to be 281 billion gigabytes (= 281 exabytes).Digital data comes in these three states: data at rest, data in transit and data in use.

Data storage

Data storage is the recording (storing) of information (data) in a storage medium. DNA and RNA, handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Recording is accomplished by virtually any form of energy. Electronic data storage requires electrical power to store and retrieve data.

Data storage in a digital, machine-readable medium is sometimes called digital data. Computer data storage is one of the core functions of a general purpose computer. Electronic documents can be stored in much less space than paper documents. Barcodes and magnetic ink character recognition (MICR) are two ways of recording machine-readable data on paper.

Data stream

In connection-oriented communication, a data stream is a sequence of digitally encoded coherent signals (packets of data or data packets) used to transmit or receive information that is in the process of being transmitted. Data Stream is a set of extracted information from data provider. It contains raw data that was gathered out of users’ browser behavior from websites, where a dedicated pixel is placed. Data Stream is useful for data scientists for Big Data and AI algorithms supply. Main data stream providers are Data technology companies:, Lotame, ShareThis, AddThis, 33 Across.

Data transmission

Data transmission (also data communication or digital communications) is the transfer of data (a digital bitstream or a digitized analog signal) over a point-to-point or point-to-multipoint communication channel. Examples of such channels are copper wires, optical fibers, wireless communication channels, storage media and computer buses. The data are represented as an electromagnetic signal, such as an electrical voltage, radiowave, microwave, or infrared signal.

Analog or analogue transmission is a transmission method of conveying voice, data, image, signal or video information using a continuous signal which varies in amplitude, phase, or some other property in proportion to that of a variable. The messages are either represented by a sequence of pulses by means of a line code (baseband transmission), or by a limited set of continuously varying wave forms (passband transmission), using a digital modulation method. The passband modulation and corresponding demodulation (also known as detection) is carried out by modem equipment. According to the most common definition of digital signal, both baseband and passband signals representing bit-streams are considered as digital transmission, while an alternative definition only considers the baseband signal as digital, and passband transmission of digital data as a form of digital-to-analog conversion.

Data transmitted may be digital messages originating from a data source, for example a computer or a keyboard. It may also be an analog signal such as a phone call or a video signal, digitized into a bit-stream, for example, using pulse-code modulation (PCM) or more advanced source coding (analog-to-digital conversion and data compression) schemes. This source coding and decoding is carried out by codec equipment.


Demodulation is extracting the original information-bearing signal from a carrier wave. A demodulator is an electronic circuit (or computer program in a software-defined radio) that is used to recover the information content from the modulated carrier wave. There are many types of modulation so there are many types of demodulators. The signal output from a demodulator may represent sound (an analog audio signal), images (an analog video signal) or binary data (a digital signal).

These terms are traditionally used in connection with radio receivers, but many other systems use many kinds of demodulators. For example, in a modem, which is a contraction of the terms modulator/demodulator, a demodulator is used to extract a serial digital data stream from a carrier signal which is used to carry it through a telephone line, coaxial cable, or optical fiber

Digital Data Storage

Digital Data Storage (DDS) is a computer data storage technology that is based upon the digital audio tape (DAT) format that was developed during the 1980s. DDS is primarily intended for use as off-line storage, especially for generating backup copies of working data.

A DDS cartridge uses tape with a width of 3.81mm, with the exception of the latest formats, DAT-160 and DAT-320, both which use 8mm wide tape. Initially, the tape was 60 meters (197 feet) or 90 meters (295 ft.) in length. Advancements in materials technology have allowed the length to be increased significantly in successive versions. A DDS tape drive uses helical scan recording, the same process used by a video cassette recorder (VCR).

Backward compatibility between newer drives and older cartridges is not assured; the compatibility matrices provided by manufacturers will need to be consulted. Typically drives can read and write tapes in the prior generation format, with most (but not all) also able to read and write tapes from two generations prior. Notice in HP's article that newer tape standards do not simply consist of longer tapes; with DDS-2, for example, the track was narrower than with DDS-1.

At one time, DDS competed against the Linear Tape-Open (LTO), Advanced Intelligent Tape (AIT), VXA, and Travan formats. However, AIT, Travan and VXA are no longer mainstream, and the capacity of LTO has far exceeded that of the most recent DDS standard, DDS-320.

Distributed ledger

A distributed ledger (also called a shared ledger or distributed ledger technology or DLT) is a consensus of replicated, shared, and synchronized digital data geographically spread across multiple sites, countries, or institutions. There is no central administrator or centralized data storage.A peer-to-peer network is required as well as consensus algorithms to ensure replication across nodes is undertaken. One form of distributed ledger design is the blockchain system, which can be either public or private.

Frame (networking)

A frame is a digital data transmission unit in computer networking and telecommunication. In packet switched systems, a frame is a simple container for a single network packet. In other telecommunications systems, a frame is a repeating structure supporting time-division multiplexing.

A frame typically includes frame synchronization features consisting of a sequence of bits or symbols that indicate to the receiver the beginning and end of the payload data within the stream of symbols or bits it receives. If a receiver is connected to the system during frame transmission, it ignores the data until it detects a new frame synchronization sequence.

Global Biodiversity Information Facility

The Global Biodiversity Information Facility (GBIF) is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around the world; GBIF's information architecture makes these data accessible and searchable through a single portal. Data available through the GBIF portal are primarily distribution data on plants, animals, fungi, and microbes for the world, and scientific names data.

The mission of the Global Biodiversity information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide to underpin sustainable development. Priorities, with an emphasis on promoting participation and working through partners, include mobilising biodiversity data, developing protocols and standards to ensure scientific integrity and interoperability, building an informatics architecture to allow the interlinking of diverse data types from disparate sources, promoting capacity building and catalysing development of analytical tools for improved decision-making.

GBIF strives to form informatics linkages among digital data resources from across the spectrum of biological organisation, from genes to ecosystems, and to connect these to issues important to science, society and sustainability by using georeferencing and GIS tools. It works in partnership with other international organisations such as the Catalogue of Life partnership, Biodiversity Information Standards, the Consortium for the Barcode of Life (CBOL), the Encyclopedia of Life (EOL), and GEOSS.

From 2002-2014, GBIF awarded a prestigious global award in the area of biodiversity informatics, the Ebbe Nielsen Prize, valued at €30,000 annually. As at 2018, the GBIF Secretariat currently presents two annual prizes: the GBIF Ebbe Nielsen Challenge and the Young Researchers Award.

Image compression

Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior results compared with generic data compression methods which are used for other digital data.


A modem (portmanteau of modulator-demodulator) is a hardware device that converts data into a format suitable for a transmission medium so that it can be transmitted from computer to computer (historically over telephone wires). A modem modulates one or more carrier wave signals to encode digital information for transmission and demodulates signals to decode the transmitted information. The goal is to produce a signal that can be transmitted easily and decoded to reproduce the original digital data. Modems can be used with almost any means of transmitting analog signals from light-emitting diodes to radio. A common type of modem is one that turns the digital data of a computer into modulated electrical signal for transmission over telephone lines and demodulated by another modem at the receiver side to recover the digital data.

Modems are generally classified by the maximum amount of data they can send in a given unit of time, usually expressed in bits per second (symbol bit(s), sometimes abbreviated "bps") or bytes per second (symbol B(s)). Modems can also be classified by their symbol rate, measured in baud. The baud unit denotes symbols per second, or the number of times per second the modem sends a new signal. For example, the ITU V.21 standard used audio frequency-shift keying with two possible frequencies, corresponding to two distinct symbols (or one bit per symbol), to carry 300 bits per second using 300 baud. By contrast, the original ITU V.22 standard, which could transmit and receive four distinct symbols (two bits per symbol), transmitted 1,200 bits by sending 600 symbols per second (600 baud) using phase-shift keying

Open format

An open format is a file format for storing digital data, defined by a published specification usually maintained by a standards organization, and which can be used and implemented by anyone. For example, an open format can be implemented by both proprietary and free and open-source software, using the typical software licenses used by each. In contrast to open formats, closed formats are considered trade secrets. Open formats are also called free file formats if they are not encumbered by any copyrights, patents, trademarks or other restrictions (for example, if they are in the public domain) so that anyone may use them at no monetary cost for any desired purpose.

Time temperature indicator

A time temperature indicator (TTI) is a device or smart label that shows the accumulated time-temperature history of a product. Time temperature indicators are commonly used on food, pharmaceutical, and medical products to indicate exposure to excessive temperature (and time at temperature).

In contrast, a Temperature data logger measures and records the temperatures for a specified time period. The digital data can be downloaded and analyzed.


A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. The term derives from rubber stamps used in offices to stamp the current date, and sometimes time, in ink on paper documents, to record when the document was received. Common examples of this type of timestamp are a postmark on a letter or the "in" and "out" times on a time card.

In modern times usage of the term has expanded to refer to digital date and time information attached to digital data. For example, computer files contain timestamps that tell when the file was last modified, and digital cameras add timestamps to the pictures they take, recording the date and time the picture was taken.

Design issues

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.