ISO 639-2

ISO 639-2:1998, Codes for the representation of names of languages — Part 2: Alpha-3 code, is the second part of the ISO 639 standard, which lists codes for the representation of the names of languages. The three-letter codes given for each language in this part of the standard are referred to as "Alpha-3" codes. There are 487 entries in the list of ISO 639-2 codes.

The US Library of Congress is the registration authority for ISO 639-2 (referred to as ISO 639-2/RA). As registration authority, the LOC receives and reviews proposed changes; they also have representation on the ISO 639-RA Joint Advisory Committee responsible for maintaining the ISO 639 code tables.

Find a language
Enter an ISO 639-2 code to find the corresponding language article

History and relationship to other ISO 639 standards

Work was begun on the ISO 639-2 standard in 1989, because the ISO 639-1 standard, which uses only two-letter codes for languages, is not able to accommodate a sufficient number of languages. The ISO 639-2 standard was first released in 1998.

In practice, ISO 639-2 has largely been superseded by ISO 639-3 (2007), which includes codes for all the individual languages in ISO 639-2 plus many more. It also includes the special and reserved codes, and is designed not to conflict with ISO 639-2. ISO 639-3, however, does not include any of the collective languages in ISO 639-2; most of these are included in ISO 639-5.

B and T codes

While most languages are given one code by the standard, twenty of the languages described have two three-letter codes, a "bibliographic" code (ISO 639-2/B), which is derived from the English name for the language and was a necessary legacy feature, and a "terminological" code (ISO 639-2/T), which is derived from the native name for the language and resembles the language's two-letter code in ISO 639-1. There were originally 22 B codes; scc and scr are now deprecated.

In general the T codes are favored; ISO 639-3 uses ISO 639-2/T. However, ISO 15924 derives its codes from ISO 639-2/B when possible.

Scopes and types

The codes in ISO 639-2 have a variety of "scopes of denotation", or types of meaning and use, some of which are described in more detail below.

Individual languages are further classified as to type:

  • Living languages
  • Extinct languages
  • Ancient languages
  • Historic languages
  • Constructed languages

Collections of languages

Some ISO 639-2 codes that are commonly used for languages do not precisely represent a particular language or some related languages (as the above macrolanguages). They are regarded as collective language codes and are excluded from ISO 639-3. For a definition of macrolanguages and collective languages see [1].

The collective language codes in ISO 639-2 are listed below.

The following code is identified as a collective code in ISO 639-2 but is (at present) missing from ISO 639-5:

Codes registered for 639-2 that are listed as collective codes in ISO 639-5 (and collective codes by name in ISO 639-2):

Reserved for local use

The interval from qaa to qtz is 'reserved for local use' and is not used in ISO 639-2 nor in ISO 639-3. These codes are typically used privately for languages not (yet) in either standard.

Special situations

There are four generic codes for special situations:

  • mis is listed as "uncoded languages" (originally an abbreviation for "miscellaneous")
  • mul (for multiple languages) is applied when several languages are used and it is not practical to specify all the appropriate language codes
  • und (for undetermined) is used in situations in which a language or languages must be indicated but the language cannot be identified.
  • zxx is listed in the code list as "no linguistic content", e.g. animal sounds (added 2006-01-11)

These four codes are also used in ISO 639-3.

See also

External links

Edo language

Edo (with diacritics, Ẹ̀dó), also called Bini (Benin), is a Volta–Niger language spoken in Edo State, Nigeria. It is the primary native language of the Edo people and was the primary language of the Benin Empire and its predecessor, Igodomigodo.

Efik language

Efik proper; Efik. Ikɔ Efik) is the native language of the Efik people of Nigeria, where it is a national language. It is the official language of Cross River State in Nigeria.

Filipino language

Filipino (English: (listen); Wikang Filipino [wɪˈkɐŋ ˌfiːliˈpiːno]) is the national language (Wikang pambansa/Pambansang wika) of the Philippines. Filipino is also designated, along with English, as an official language of the country. It is a standardized variety of the Tagalog language, an Austronesian regional language that is widely spoken in the Philippines. As of 2007, Tagalog is the first language of 28 million people, or about one-third of the Philippine population, while 45 million speak Tagalog as their second language. Tagalog is among the 185 languages of the Philippines identified in the Ethnologue. Officially, Filipino is defined by the Commission on the Filipino Language (Komisyon sa Wikang Filipino in Filipino or simply KWF) as "the native dialect, spoken and written, in Metro Manila, the National Capital Region, and in other urban centers of the archipelago."Filipino is officially taken to be a pluricentric language, as it is further enriched and developed by the other existing Philippine languages according to the mandate of the 1987 Constitution. Indeed, there have been observed "emerging varieties of Filipino which deviate from the grammatical properties of Tagalog" in Cebu, Davao City, and Iloilo which together with Metro Manila form the four largest metropolitan areas in the Philippines.

Gorontalo language

The Gorontalo language (also called Hulontalo) is a language spoken in Gorontalo Province (Northern Sulawesi, Indonesia, southern coast) by the Gorontaloan people. Dialects of Gorontalo are East Gorontalo, Gorontalo City, Tilamuta, Limboto and West Gorontalo.

Goykanadi

Goykānaḍī is an ancient script used in the territory of Goa. This script was also called kandavī. This script was used to write Konkani and sometimes Marathi. Similarly, it was used by the trading Saraswat and Daivajna families along with the Modi script to maintain their accounts.

Guardian of Scotland

The Guardians of Scotland were the de facto heads of state of Scotland during the First Interregnum of 1290–1292, and the Second Interregnum of 1296–1306. During the many years of minority in Scotland's subsequent history, there were many guardians of Scotland and the post was a significant constitutional feature in the course of development for politics in the country.

IETF language tag

An IETF BCP 47 language tag is a code to identify human languages. For example, the tag en stands for English; es-419 for Latin American Spanish; rm-sursilv for Sursilvan; gsw-u-sd-chzh for Zürich German; nan-Hant-TW for Min Nan Chinese as spoken in Taiwan using traditional Han characters. To distinguish language variants for countries, regions, writing systems etc., IETF language tags combine subtags from other standards such as ISO 639, ISO 15924, ISO 3166-1, and UN M.49. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47; the subtags are maintained by the IANA Language Subtag Registry. IETF language tags are used by computing standards such as HTTP,, HTML, XML, and PNG.

ISO 639-1

ISO 639-1:2002, Codes for the representation of names of languages — Part 1: Alpha-2 code, is the first part of the ISO 639 series of international standards for language codes. Part 1 covers the registration of two-letter codes. There are 184 two-letter codes registered as of December 2018. The registered codes cover the world's major languages.

These codes are a useful international and formal shorthand for indicating languages.

Many multilingual web sites—such as Wikipedia—use these codes to prefix URLs of specific language versions of their web sites: for example, en.Wikipedia.org is the English version of Wikipedia. See also IETF language tag. (Two-letter country-specific top-level-domain code suffixes are often different from these language-tag prefixes).

ISO 639, the original standard for language codes, was approved in 1967. It was split into parts, and in 2002 ISO 639-1 became the new revision of the original standard. The last code added was ht, representing Haitian Creole on 2003-02-26. The use of the standard was encouraged by IETF language tags, introduced in RFC 1766 in March 1995, and continued by RFC 3066 from January 2001 and RFC 4646 from September 2006. The current version is RFC 5646 from September 2009. Infoterm (International Information Center for Terminology) is the registration authority for ISO 639-1 codes.

New ISO 639-1 codes are not added if an ISO 639-2 code exists, so systems that use ISO 639-1 and 639-2 codes, with 639-1 codes preferred, do not have to change existing codes.If an ISO 639-2 code that covers a group of languages is used, it might be overridden for some specific languages by a new ISO 639-1 code.

There is no specification on treatment of macrolanguages (see ISO 639-3).

ISO 639-3

ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. It defines three-letter codes for identifying languages. The standard was published by ISO on 1 February 2007.ISO 639-3 extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. The extended language coverage was based primarily on the language codes used in the Ethnologue (volumes 10-14) published by SIL International, which is now the registration authority for ISO 639-3. It provides an enumeration of languages as complete as possible, including living and extinct, ancient and constructed, major and minor, written and unwritten. However, it does not include reconstructed languages such as Proto-Indo-European.ISO 639-3 is intended for use as metadata codes in a wide range of applications. It is widely used in computer and information systems, such as the Internet, in which many languages need to be supported. In archives and other information storage, they are used in cataloging systems, indicating what language a resource is in or about. The codes are also frequently used in the linguistic literature and elsewhere to compensate for the fact that language names may be obscure or ambiguous.

ISO 639-5

ISO 639-5:2008 "Codes for the representation of names of languages—Part 5: Alpha-3 code for language families and groups" is a highly incomplete international standard published by the International Organization for Standardization (ISO). It was developed by ISO Technical Committee 37, Subcommittee 2, and first published on May 15, 2008. It is part of the ISO 639 series of standards.

ISO 639 macrolanguage

A macrolanguage is a book-keeping mechanism for the ISO 639 international standard for language codes. Macrolanguages are established to assist mapping between different sets of ISO language codes. Specifically, there may be a many-to-one correspondence between ISO 639-3, intended to identify all the thousands of languages of the world, and either of two other sets, ISO 639-1, established to identify languages in computer systems, and ISO 639-2, which encodes a few hundred languages for library cataloguing and bibliographic purposes. When such many-to-one ISO 639-2 codes are included in an ISO 639-3 context, they are called "macrolanguages" to distinguish them from the corresponding individual languages of ISO 639-3. According to the ISO,

Some existing code elements in ISO 639-2, and the corresponding code elements in ISO 639-1, are designated in those parts of ISO 639 as individual language code elements, yet are in a one-to-many relationship with individual language code elements in [ISO 639-3]. For purposes of [ISO 639-3], they are considered to be macrolanguage code elements.

ISO 639-3 is curated by SIL International, ISO 639-2 is curated by the Library of Congress (USA).

The mapping often has the implication that it covers borderline cases where two language varieties may be considered strongly divergent dialects of the same language or very closely related languages (dialect continuums); it may also encompass situations when there are language varieties that are considered to be varieties of the same language on the grounds of ethnic, cultural, and political considerations, rather than linguistic reasons. However, this is not its primary function and the classification is not evenly applied.

For example, Chinese is a macrolanguage encompassing many languages that are not mutually intelligible, but the languages "Standard German", "Bavarian German", and other closely related languages do not form a macrolanguage, despite being more mutually intelligible. Other examples include Tajiki not being part of the Persian macrolanguage despite sharing much lexicon, and Urdu and Hindi not forming a macrolanguage despite forming a mutually intelligible dialect continuum. Even all dialects of Hindi are considered as separate languages. Basically, ISO 639-2 and ISO 639-3 use different criteria for dividing language varieties into languages, 639-2 uses shared writing systems and literature more whereas 639-3 focuses on mutual intelligibility and shared lexicon. The macrolanguages exist within the ISO 639-3 code set to make mapping between the two sets easier.

As of 25 January 2019, there are fifty-eight language codes in ISO 639-2 that are considered to be macrolanguages in ISO 639-3. The use of this category of macrolanguage was applied in Ethnologue, starting in the 16th edition.Some of the macrolanguages had no individual language (as defined by 639-3) in ISO 639-2, e.g. "ara" (Arabic), but ISO 639-3 recognizes different varieties of Arabic as separate languages under some circumstances. Others, like "nor" (Norwegian) had their two individual parts (nno Nynorsk, nob Bokmål) already in 639-2. That means some languages (e.g. "arb" Standard Arabic) that were considered by ISO 639-2 to be dialects of one language ("ara") are now in ISO 639-3 in certain contexts considered to be individual languages themselves. This is an attempt to deal with varieties that may be linguistically distinct from each other, but are treated by their speakers as forms of the same language, e.g. in cases of diglossia. For example,

Generic Arabic, 639-2

Standard Arabic, 639-3ISO 639-2 also includes codes for collections of languages; these are not the same as macrolanguages. These collections of languages are excluded from ISO 639-3, because they never refer to individual languages. Most such codes are included in ISO 639-5.

Karakalpak language

Karakalpak is a Turkic language spoken by Karakalpaks in Karakalpakstan. It is divided into two dialects, Northeastern Karakalpak and Southeastern Karakalpak. It developed alongside neighboring Kazakh and Uzbek languages, being markedly influenced by both. Typologically, Karakalpak belongs to the Kipchak branch of the Turkic languages, thus being closely related to and partially mutually intelligible to Kazakh.

Kumyk language

Kumyk (ҡумуҡ тил, qumuq til) is a Turkic language, spoken by about 426,212 speakers — the Kumyks — in the Dagestan, North Ossetia, and Chechen republics of the Russian Federation.

List of ISO 639-1 codes

ISO 639 is a standardized nomenclature used to classify languages. Each language is assigned a two-letter (639-1) and three-letter (639-2 and 639-3), lowercase abbreviation, amended in later versions of the nomenclature.

This table lists all of:

ISO 639-1: two-letter codes, one per language for ISO 639 macrolanguageAnd some of:

ISO 639-2/T: three-letter codes, for the same languages as 639-1

ISO 639-2/B: three-letter codes, mostly the same as 639-2/T, but with some codes derived from English names rather than native names of languages (in the following table, these differing codes are highlighted in boldface)

ISO 639-3: three-letter codes, the same as 639-2/T for languages, but with distinct codes for each variety of an ISO 639 macrolanguageNote: Colors on the leftmost column represent the language family mentioned in second column.

List of ISO 639-2 codes

ISO 639 is a set of international standards that lists short codes for language names. The following is a complete list of three-letter codes defined in part two (ISO 639-2) of the standard, including the corresponding two-letter (ISO 639-1) codes where they exist.

Where two ISO 639-2 codes are given in the table, the one with the asterisk is the bibliographic code (B code) and the other is the terminological code (T code).

Entries in the Scope and Type columns distinguish:

ancient languages (extinct since ancient times);

collections of languages (which are connected, for example genetically or by region)

constructed languages;

languages extinct in recent times;

historical languages (distinct from their modern form);

macrolanguages.The standard includes some codes for special situations:

mis, for "uncoded languages";

mul, for "multiple languages";

qaa-qtz, a range reserved for local use.

und, for "undetermined";

zxx, for "no linguistic content; not applicable";

*Synonyms for terminology applications (ISO 639-2/T) and for *bibliographic applications (ISO 639-2/B)

Luba-Katanga language

Luba-Katanga, also known as Luba-Shaba and Kiluba, is one of the two major Bantu languages spoken in the Democratic Republic of the Congo called "Luba". (See Luba-Kasai.) It is spoken mostly in the south-east area of the country by the Luba people.Kiluba is spoken in the area around Kabongo, Kamina, Luena, Lubudi, Malemba Nkulu, Mulongo, and Kaniama, mostly in Katanga. Some 500 years ago or more, the Luba Kasai left Katanga and settled in the Kasai; since then, Luba Kasai (Chiluba) has evolved until it is no longer mutually intelligible with Luba Katanga.

Nzema language

Nzema (Nzima), also known as Appolo, is a Central Tano language spoken by the Nzema people of southwestern Ghana and southeast Ivory Coast. It shares 60% intelligibility with Jwira-Pepesa and is close to Baoule.

Ramidava

Ramidava (Ancient Greek: Ραμίδαυα) was a Dacian town.

Soninke language

The Soninke language (Soninke: Sooninkanxanne) is a Mande language spoken by the Soninke people of Africa. The language has an estimated 1,096,795 speakers, primarily located in Mali, and also (in order of numerical importance of the communities) in Senegal, Ivory Coast, The Gambia, Mauritania, Guinea-Bissau, Guinea and Ghana. It enjoys the status of a national language in Mali, Senegal, The Gambia and Mauritania.

The language is relatively homogeneous, with only slight phonological, lexical, and grammatical variations.

Linguistically, its nearest relatives is the Bozo language, which is centered on the Inner Niger Delta.

It is possible that the language of the Imraguen people and the Nemadi dialect are dialects of Soninke.

ISO standards by standard number
1–9999
10000–19999
20000+

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.