Check digit

A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parity bit used to check for errors in computer-generated data. It consists of one or more digits computed by an algorithm from the other digits (or letters) in the sequence input.

With a check digit, one can detect simple errors in the input of a series of characters (usually digits) such as a single mistyped digit or some permutations of two successive digits.

Design

Check digit algorithms are generally designed to capture human transcription errors. In order of complexity, these include the following: [1]

  • single digit errors, such as 1 → 2
  • transposition errors, such as 12 → 21
  • twin errors, such as 11 → 22
  • jump transpositions errors, such as 132 → 231
  • jump twin errors, such as 131 → 232
  • phonetic errors, such as 60 → 16 ("sixty" to "sixteen")

In choosing a system, a high probability of catching errors is traded off against implementation difficulty; simple check digit systems are easily understood and implemented by humans but do not catch as many errors as complex ones, which require sophisticated programs to implement.

A desirable feature is that left-padding with zeros should not change the check digit. This allows variable length digits to be used and the length to be changed. If there is a single check digit added to the original number, the system will not always capture multiple errors, such as two replacement errors (12 → 34) though, typically, double errors will be caught 90% of the time (both changes would need to change the output by offsetting amounts).

A very simple check digit method would be to take the sum of all digits (digital sum) modulo 10. This would catch any single-digit error, as such an error would always change the sum, but does not catch any transposition errors (switching two digits) as re-ordering does not change the sum.

A slightly more complex method is to take the weighted sum of the digits, modulo 10, with different weights for each number position.

To illustrate this, for example if the weights for a four digit number were 5, 3, 2, 7 and the number to be coded was 4871, then one would take 5×4 + 3×8 + 2×7 + 7×1 = 65, i.e. 65 modulo 10, and the check digit would be 5, giving 48715.

Systems with weights of 1, 3, 7, or 9, with the weights on neighboring numbers being different, are widely used: for example, 31 31 weights in UPC codes, 13 13 weights in EAN numbers (GS1 algorithm), and the 371 371 371 weights used in United States bank routing transit numbers. This system detects all single-digit errors and around 90% of transposition errors. 1, 3, 7, and 9 are used because they are coprime to 10, so changing any digit changes the check digit; using a coefficient that is divisible by 2 or 5 would lose information (because 5×0 = 5×2 = 5×4 = 5×6 = 5×8 = 0 modulo 10) and thus not catch some single-digit errors. Using different weights on neighboring numbers means that most transpositions change the check digit; however, because all weights differ by an even number, this does not catch transpositions of two digits that differ by 5, (0 and 5, 1 and 6, 2 and 7, 3 and 8, 4 and 9), since the 2 and 5 multiply to yield 10.

The ISBN-10 code instead uses modulo 11, which is prime, and all the number positions have different weights 1, 2, ... 10. This system thus detects all single digit substitution and transposition errors (including jump transpositions), but at the cost of the check digit possibly being 10, represented by "X". (An alternative is simply to avoid using the serial numbers which result in an "X" check digit.) ISBN-13 instead uses the GS1 algorithm used in EAN numbers.

More complicated algorithms include the Luhn algorithm (1954), which captures 98% of single digit transposition errors (it does not detect 90 ↔ 09) and the still more sophisticated Verhoeff algorithm (1969), which catches all single digit substitution and transposition errors, and many (but not all) more complex errors. Similar is another abstract algebra-based method, the Damm algorithm (2004), that too detects all single-digit errors and all adjacent transposition errors. These three methods use a single check digit and will therefore fail to capture around 10% of more complex errors. To reduce this failure rate, it is necessary to use more than one check digit (for example, the modulo 97 check referred to below, which uses two check digits - for the algorithm, see International Bank Account Number) and/or to use a wider range of characters in the check digit, for example letters plus numbers.

Examples

UPC

The final digit of a Universal Product Code is a check digit computed as follows:[2]

  1. Add the digits in the odd-numbered positions (first, third, fifth, etc.) together and multiply by three.
  2. Add the digits (up to but not including the check digit) in the even-numbered positions (second, fourth, sixth, etc.) to the result.
  3. Take the remainder of the result divided by 10 (modulo operation). If the remainder is equal to 0 then use 0 as the check digit, and if not 0 subtract the remainder from 10 to derive the check digit.

For instance, the UPC-A barcode for a box of tissues is "036000241457". The last digit is the check digit "7", and if the other numbers are correct then the check digit calculation must produce 7.

  1. Add the odd number digits: 0+6+0+2+1+5 = 14.
  2. Multiply the result by 3: 14 × 3 = 42.
  3. Add the even number digits: 3+0+0+4+4 = 11.
  4. Add the two results together: 42 + 11 = 53.
  5. To calculate the check digit, take the remainder of (53 / 10), which is also known as (53 modulo 10), and if not 0, subtract from 10. Therefore, the check digit value is 7. i.e. (53 / 10) = 5 remainder 3; 10 - 3 = 7.

Another example: to calculate the check digit for the following food item "01010101010x".

  1. Add the odd number digits: 0+0+0+0+0+0 = 0.
  2. Multiply the result by 3: 0 x 3 = 0.
  3. Add the even number digits: 1+1+1+1+1=5.
  4. Add the two results together: 0 + 5 = 5.
  5. To calculate the check digit, take the remainder of (5 / 10), which is also known as (5 modulo 10), and if not 0, subtract from 10: i.e. (5 / 10) = 0 remainder 5; (10 - 5) = 5. Therefore, the check digit x value is 5.

ISBN 10

The final character of a ten-digit International Standard Book Number is a check digit computed so that multiplying each digit by its position in the number (counting from the right) and taking the sum of these products modulo 11 is 0. The digit the farthest to the right (which is multiplied by 1) is the check digit, chosen to make the sum correct. It may need to have the value 10, which is represented as the letter X. For example, take the ISBN 0-201-53082-1: The sum of products is 0×10 + 2×9 + 0×8 + 1×7 + 5×6 + 3×5 + 0×4 + 8×3 + 2×2 + 1×1 = 99 ≡ 0 (mod 11). So the ISBN is valid. Note that positions can also be counted from left, in which case the check digit is multiplied by 10, to check validity: 0×1 + 2×2 + 0×3 + 1×4 + 5×5 + 3×6 + 0×7 + 8×8 + 2×9 + 1×10 = 143 ≡ 0 (mod 11).

ISBN 13

ISBN 13 (in use January 2007) is equal to the EAN-13 code found underneath a book's barcode. Its check digit is generated the same way as the UPC except that the even digits are multiplied by 3 instead of the odd digits.[3]

EAN (GLN, GTIN, EAN numbers administered by GS1)

EAN (European Article Number) check digits (administered by GS1) are calculated by summing each of the odd position numbers multiplied by 3 and then by adding the sum of the even position numbers. Numbers are examined going from right to left, so the first odd position is the last digit in the code. The final digit of the result is subtracted from 10 to calculate the check digit (or left as-is if already zero). A GS1 check digit calculator and detailed documentation is online at GS1's website.[4] Another official calculator page shows that the mechanism for GTIN-13 is the same for Global Location Number/GLN.[5]

Other examples of check digits

International

In the USA

In Central America

  • The Guatemalan Tax Number (NIT - Número de Identificación Tributaria) based on modulo 11.

In Eurasia

In Oceania

Algorithms

Notable algorithms include:

See also

References

  1. ^ Kirtland, Joseph (2001). Identification Numbers and Check Digit Schemes. Classroom Resource Materials. Mathematical Association of America. pp. 4–6. ISBN 978-0-88385-720-5.
  2. ^ "GS1 Check Digit Calculator". GS1 US. 2006. Archived from the original on 2008-05-09. Retrieved 2008-05-21.
  3. ^ "ISBN Users Manual". International ISBN Agency. 2005. Retrieved 2008-05-21.
  4. ^ "Check Digit Calculator". GS1. 2005. Retrieved 2008-05-21.
  5. ^ "Check Digit Calculator, at GS1 US official site". GS1 US. Retrieved 2012-08-09.
  6. ^ http://openfigi.com
  7. ^ "Unique Identification Card". Geek Gazette. IEEE Student Branch (Autumn 2011): 16. Archived from the original on 2012-10-24.
  8. ^ Dr. Chong-Yee Khoo (20 January 2014). "New Format for Singapore IP Application Numbers at IPOS". Singapore Patent Blog. Cantab IP. Retrieved 6 July 2014.

External links

Damm algorithm

In error detection, the Damm algorithm is a check digit algorithm that detects all single-digit errors and all adjacent transposition errors. It was presented by H. Michael Damm in 2004.

European Community number

The European Community number (EC number) is a unique seven-digit identifier that was assigned to substances for regulatory purposes within the European Union by the European Commission. The EC Inventory comprises three individual inventories, EINECS, ELINCS and the NLP list.

ISO 6346

ISO 6346 is an international standard covering the coding, identification and marking of intermodal (shipping) containers used within containerized intermodal freight transport. The standard establishes a visual identification system for every container that includes a unique serial number (with check digit), the owner, a country code, a size, type and equipment category as well as any operational marks. The standard is managed by the International Container Bureau (BIC).

ISO 7064

ISO 7064 defines algorithms for calculating check digit characters.

International Article Number

The International Article Number (also known as European Article Number or EAN) is a standard describing a barcode symbology and numbering system used in global trade to identify a specific retail product type, in a specific packaging configuration, from a specific manufacturer. The standard has been subsumed in the Global Trade Item Number standard from the GS1 organization; the same numbers can be referred to as GTINs and can be encoded in other barcode symbologies defined by GS1. EAN barcodes are used worldwide for lookup at retail point of sale, but can also be used as numbers for other purposes such as wholesale ordering or accounting.

The most commonly used EAN standard is the thirteen-digit EAN-13, a superset of the original 12-digit Universal Product Code (UPC-A) standard developed in 1970 by George J. Laurer. An EAN-13 number includes a 3-digit GS1 prefix (indicating country of registration or special type of product). A prefix with a first digit of "0" indicates a 12-digit UPC-A code follows. A prefix with first two digits of "45" or "49" indicates a Japanese Article Number (JAN) follows.

The less commonly used 8-digit EAN-8 barcode was introduced for use on small packages, where EAN-13 would be too large. 2-digit EAN-2 and 5-digit EAN-5 are supplemental barcodes, placed on the right-hand side of EAN-13 or UPC. These are generally used for periodicals like magazines or books, to indicate the current year's issue number; and weighed products like food, to indicate the manufacturer's suggested retail price.

International Mobile Equipment Identity

The International Mobile Equipment Identity or IMEI is a number, usually unique, to identify 3GPP and iDEN mobile phones, as well as some satellite phones. It is usually found printed inside the battery compartment of the phone, but can also be displayed on-screen on most phones by entering *#06# on the dialpad, or alongside other system information in the settings menu on smartphone operating systems.

GSM networks use the IMEI number to identify valid devices, and can stop a stolen phone from accessing the network. For example, if a mobile phone is stolen, the owner can have their network provider use the IMEI number to blacklist the phone. This renders the phone useless on that network and sometimes other networks, even if the thief changes the phone's subscriber identity module (SIM).

The IMEI only identifies the device and has no particular relationship to the subscriber. The phone identifies the subscriber by transmitting the International mobile subscriber identity (IMSI) number, which it stores on a SIM card that can, in theory, be transferred to any handset. However, the network's ability to know a subscriber's current, individual device enables many network and security features.

International Securities Identification Number

An International Securities Identification Number (ISIN) uniquely identifies a security. Its structure is defined in ISO 6166. The ISIN code is a 12-character alphanumeric code that serves for uniform identification of a security through normalization of the assigned National Number, where one exists, at trading and settlement.

International Standard Book Number

The International Standard Book Number (ISBN) is a numeric commercial book identifier which is intended to be unique. Publishers purchase ISBNs from an affiliate of the International ISBN Agency.An ISBN is assigned to each edition and variation (except reprintings) of a book. For example, an e-book, a paperback and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, and 10 digits long if assigned before 2007. The method of assigning an ISBN is nation-based and varies from country to country, often depending on how large the publishing industry is within a country.

The initial ISBN identification format was devised in 1967, based upon the 9-digit Standard Book Numbering (SBN) created in 1966. The 10-digit ISBN format was developed by the International Organization for Standardization (ISO) and was published in 1970 as international standard ISO 2108 (the SBN code can be converted to a ten-digit ISBN by prefixing it with a zero digit "0").

Privately published books sometimes appear without an ISBN. The International ISBN agency sometimes assigns such books ISBNs on its own initiative.Another identifier, the International Standard Serial Number (ISSN), identifies periodical publications such as magazines; and the International Standard Music Number (ISMN) covers for musical scores.

International Standard Music Number

The International Standard Music Number or ISMN (ISO 10957) is a thirteen-character alphanumeric identifier for printed music developed by ISO.

International Standard Musical Work Code

International Standard Musical Work Code (ISWC) is a unique identifier for musical works, similar to ISBN for books. It is adopted as international standard ISO 15707. The ISO subcommittee with responsibility for the standard is TC 46/SC 9.

International Standard Serial Number

An International Standard Serial Number (ISSN) is an eight-digit serial number used to uniquely identify a serial publication, such as a magazine. The ISSN is especially helpful in distinguishing between serials with the same title. ISSN are used in ordering, cataloging, interlibrary loans, and other practices in connection with serial literature.The ISSN system was first drafted as an International Organization for Standardization (ISO) international standard in 1971 and published as ISO 3297 in 1975. ISO subcommittee TC 46/SC 9 is responsible for maintaining the standard.

When a serial with the same content is published in more than one media type, a different ISSN is assigned to each media type. For example, many serials are published both in print and electronic media. The ISSN system refers to these types as print ISSN (p-ISSN) and electronic ISSN (e-ISSN), respectively. Conversely, as defined in ISO 3297:2007, every serial in the ISSN system is also assigned a linking ISSN (ISSN-L), typically the same as the ISSN assigned to the serial in its first published medium, which links together all ISSNs assigned to the serial in every medium.

Luhn algorithm

The Luhn algorithm or Luhn formula, also known as the "modulus 10" or "mod 10" algorithm, named after IBM scientist Hans Peter Luhn, is a simple checksum formula used to validate a variety of identification numbers, such as credit card numbers, IMEI numbers, National Provider Identifier numbers in the United States, Canadian Social Insurance Numbers, Israel ID Numbers, Greek Social Security Numbers (ΑΜΚΑ), and survey codes on McDonald's receipts. It was created by IBM scientist Hans Peter Luhn and described in U.S. Patent No. 2,950,048, filed on January 6, 1954, and granted on August 23, 1960.

The algorithm is in the public domain and is in wide use today. It is specified in ISO/IEC 7812-1. It is not intended to be a cryptographically secure hash function; it was designed to protect against accidental errors, not malicious attacks. Most credit cards and many government identification numbers use the algorithm as a simple method of distinguishing valid numbers from mistyped or otherwise incorrect numbers.

MSI Barcode

MSI (also known as Modified Plessey) is a barcode symbology developed by the MSI Data Corporation, based on the original Plessey Code symbology. It is a continuous symbology that is not self-checking. MSI is used primarily for inventory control, marking storage containers and shelves in warehouse environments.

Machine-readable passport

A machine-readable passport (MRP) is a machine-readable travel document (MRTD) with the data on the identity page encoded in optical character recognition format. Many countries began to issue machine-readable travel documents in the 1980s.

Most travel passports worldwide are MRPs. They are standardized by the ICAO Document 9303 (endorsed by the International Organization for Standardization and the International Electrotechnical Commission as ISO/IEC 7501-1) and have a special machine-readable zone (MRZ), which is usually at the bottom of the identity page at the beginning of a passport. The ICAO Document 9303 describes three types of documents. Usually passport booklets are issued in "Type 3" format, while identity cards and passport cards typically use the "Type 1" format.

The machine-readable zone of a Type 3 travel document spans two lines, and each line is 44 characters long. The following information must be provided in the zone: name, passport number, nationality, date of birth, sex, and passport expiration date. There is room for optional, often country-dependent, supplementary information.

The machine-readable zone of a Type 1 travel document spans three lines, and each line is 30 characters long.

Computers with a camera and suitable software can directly read the information on machine-readable passports. This enables faster processing of arriving passengers by immigration officials, and greater accuracy than manually read passports, as well as faster data entry, more data to be read and better data matching against immigration databases and watchlists.

Apart from optically readable information, many passports contain an RFID chip which enables computers to read a higher amount of information, for example a photo of the bearer. These passports are called biometric passports.

POSTNET

POSTNET (Postal Numeric Encoding Technique) is a barcode symbology used by the United States Postal Service to assist in directing mail. The ZIP Code or ZIP+4 code is encoded in half- and full-height bars. Most often, the delivery point is added, usually being the last two digits of the address or PO box number.

The barcode starts and ends with a full bar (often called a guard rail or frame bar and represented as the letter "S" in one version of the USPS TrueType Font) and has a check digit after the ZIP, ZIP+4, or delivery point. The encoding table is shown on the right.

Each individual digit is represented by a set of five bars, two of which are full bars (i.e. two-out-of-five code). The full bars represent "on" bits in a pseudo-binary code in which the places represent, from left to right: 7, 4, 2, 1, and 0. (Though in this scheme, zero is encoded as 11 in decimal, or in POSTNET "binary" as 11000.)

Postal Alpha Numeric Encoding Technique

The Postal Alpha Numeric Encoding Technique (PLANET) barcode was used by the United States Postal Service to identify and track pieces of mail during delivery - the Post Office's "CONFIRM" services. It was fully superseded by Intelligent Mail Barcode by January 28, 2013.

A PLANET barcode appears either 12 or 14 digits long.

The barcode:

identifies mailpiece class and shape

identifies the Confirm Subscriber ID

includes up to 6 digits of additional information that the Confirm subscriber chose, such as a mailing number, mailing campaign ID or customer ID

ends with a check digitLike POSTNET, PLANET encodes the data in half- and full-height bars. Also like POSTNET, PLANET always starts and ends with a full bar (often called a guard rail), and each individual digit is represented by a set of five bars using a two-out-of-five code. However, in POSTNET, the two bars are full bars; in PLANET, the two-of-five are the short bars. As with POSTNET, the check digit is calculated by summing the other characters and calculating the single digit which, when added to the sum, makes the total divisible by 10.

Universal Product Code

The Universal Product Code (UPC) is a barcode symbology that is widely used in the United States, Canada, United Kingdom, Australia, New Zealand, in Europe and other countries for tracking trade items in stores.

UPC (technically refers to UPC-A) consists of 12 numeric digits that are uniquely assigned to each trade item. Along with the related EAN barcode, the UPC is the barcode mainly used for scanning of trade items at the point of sale, per GS1 specifications. UPC data structures are a component of GTINs and follow the global GS1 specification, which is based on international standards. But some retailers (clothing, furniture) do not use the GS1 system (rather other barcode symbologies or article number systems). On the other hand, some retailers use the EAN/UPC barcode symbology, but without using a GTIN (for products sold in their own stores only).

Vehicle identification number

A vehicle identification number (VIN) is a unique code, including a serial number, used by the automotive industry to identify individual motor vehicles, towed vehicles, motorcycles, scooters and mopeds, as defined in ISO 3779 (content and structure) and ISO 4030 (location and attachment).

VINs were first used in 1954 in the United States. From 1954 to 1981, there was no accepted standard for these numbers, so different manufacturers used different formats.

In 1981, the National Highway Traffic Safety Administration of the United States standardized the format. It required all on-road vehicles sold to contain a 17-character VIN, which does not include the letters I (i), O (o), and Q (q) (to avoid confusion with numerals 1 and 0).

There are vehicle history services in several countries that help potential car owners use VINs to find vehicles that are defective or have been written off. See the Used car article for a list of countries where this service is available.

Wagon numbering system in India

(Not to be confused with British carriage and wagon numbering and classification which is commonly called the carriage numbering system)

A new wagon numbering system was adopted in Indian Railways in 2003. Wagons are allocated 11 digits, making it easy for identification and computerization of a wagon's information. The first two digits indicate Type of Wagon, the third and fourth digits indicate Owning Railway, the fifth and sixth digits indicate Year of Manufacture, the seventh through tenth digits indicate Individual Wagon Number, and the last digit is a Check digit.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.