ISO 639-3

ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. It defines three-letter codes for identifying languages. The standard was published by ISO on 1 February 2007.[1]

ISO 639-3 extends the ISO 639-2 alpha-3 codes with an aim to cover all known natural languages. The extended language coverage was based primarily on the language codes used in the Ethnologue (volumes 10-14) published by SIL International, which is now the registration authority for ISO 639-3.[2] It provides an enumeration of languages as complete as possible, including living and extinct, ancient and constructed, major and minor, written and unwritten.[1] However, it does not include reconstructed languages such as Proto-Indo-European.[3]

ISO 639-3 is intended for use as metadata codes in a wide range of applications. It is widely used in computer and information systems, such as the Internet, in which many languages need to be supported. In archives and other information storage, they are used in cataloging systems, indicating what language a resource is in or about. The codes are also frequently used in the linguistic literature and elsewhere to compensate for the fact that language names may be obscure or ambiguous.

Find a language
Enter an ISO 639-3 code to find the corresponding language article.

Language codes

ISO 639-3 includes all languages in ISO 639-1 and all individual languages in ISO 639-2. ISO 639-1 and ISO 639-2 focused on major languages, most frequently represented in the total body of the world's literature. Since ISO 639-2 also includes language collections and Part 3 does not, ISO 639-3 is not a superset of ISO 639-2. Where B and T codes exist in ISO 639-2, ISO 639-3 uses the T-codes.

Examples:

language 639-1 639-2 (B/T) 639-3
type
639-3
code
English en eng individual eng
German de ger/deu individual deu
Arabic ar ara macro ara
individual arb + others
Chinese zh chi/zho[4][5] macro zho
Mandarin individual cmn
Cantonese individual yue
Minnan individual nan

As of 8 April 2019, the standard contains 7,863 entries.[6] The inventory of languages is based on a number of sources including: the individual languages contained in 639-2, modern languages from the Ethnologue, historic varieties, ancient languages and artificial languages from the Linguist List,[7] as well as languages recommended within the annual public commenting period.

Machine-readable data files are provided by the registration authority.[6] Mappings from ISO 639-1 or ISO 639-2 to ISO 639-3 can be done using these data files.

ISO 639-3 is intended to assume distinctions based on criteria that are not entirely subjective.[8] It is not intended to document or provide identifiers for dialects or other sub-language variations.[9] Nevertheless, judgments regarding distinctions between languages may be subjective, particularly in the case of oral language varieties without established literary traditions, usage in education or media, or other factors that contribute to language conventionalization.

Code space

Since the code is three-letter alphabetic, one upper bound for the number of languages that can be represented is 26 × 26 × 26 = 17,576. Since ISO 639-2 defines special codes (4), a reserved range (520) and B-only codes (22), 546 codes cannot be used in part 3. Therefore, a stricter upper bound is 17,576 − 546 = 17,030.

The upper bound gets even stricter if one subtracts the language collections defined in 639-2 and the ones yet to be defined in ISO 639-5.

Macrolanguages

There are 58 languages in ISO 639-2 which are considered, for the purposes of the standard, to be "macrolanguages" in ISO 639-3.[10]

Some of these macrolanguages had no individual language as defined by ISO 639-3 in the code set of ISO 639-2, e.g. 'ara' (Generic Arabic). Others like 'nor' (Norwegian) had their two individual parts ('nno' (Nynorsk), 'nob' (Bokmål)) already in ISO 639-2.

That means some languages (e.g. 'arb', Standard Arabic) that were considered by ISO 639-2 to be dialects of one language ('ara') are now in ISO 639-3 in certain contexts considered to be individual languages themselves.

This is an attempt to deal with varieties that may be linguistically distinct from each other, but are treated by their speakers as two forms of the same language, e.g. in cases of diglossia.

For example:

See[11] for the complete list.

Collective languages

"A collective language code element is an identifier that represents a group of individual languages that are not deemed to be one language in any usage context."[12] These codes do not precisely represent a particular language or macrolanguage.

While ISO 639-2 includes three-letter identifiers for collective languages, these codes are excluded from ISO 639-3. Hence ISO 639-3 is not a superset of ISO 639-2.

ISO 639-5 defines 3-letter collective codes for language families and groups, including the collective language codes from ISO 639-2.

Special codes

Four codes are set aside in ISO 639-2 and ISO 639-3 for cases where none of the specific codes are appropriate. These are intended primarily for applications like databases where an ISO code is required regardless of whether one exists.

mis Uncoded languages
mul Multiple languages
und Undetermined
zxx No linguistic content / Not applicable
  • mis (originally an abbreviation for 'miscellaneous') is intended for languages which have not (yet) been included in the ISO standard.
  • mul is intended for cases where the data includes more than one language, and (for example) the database requires a single ISO code.
  • und is intended for cases where the language in the data has not been identified, such as when it is mislabeled or never had been labeled. It is not intended for cases such as Trojan where an unattested language has been given a name.
  • zxx is intended for data which is not a language at all, such as animal calls.[13]

In addition, 520 codes in the range qaaqtz are 'reserved for local use'. For example, the Linguist List uses them for extinct languages. Linguist List has assigned one of them a generic value:

qnp unnamed proto-language (Linguist List only)

This is used for proposed intermediate nodes in a family tree that have no name.

Maintenance processes

The code table for ISO 639-3 is open to changes. In order to protect stability of existing usage, the changes permitted are limited to:[14]

  • modifications to the reference information for an entry (including names or categorizations for type and scope),
  • addition of new entries,
  • deprecation of entries that are duplicates or spurious,
  • merging one or more entries into another entry, and
  • splitting an existing language entry into multiple new language entries.

The code assigned to a language is not changed unless there is also a change in denotation.[15]

Changes are made on an annual cycle. Every request is given a minimum period of three months for public review.

The ISO 639-3 Web site has pages that describe "scopes of denotation"[16] (languoid types) and types of languages,[17] which explain what concepts are in scope for encoding and certain criteria that need to be met. For example, constructed languages can be encoded, but only if they are designed for human communication and have a body of literature, preventing requests for idiosyncratic inventions.

The registration authority documents on its Web site instructions made in the text of the ISO 639-3 standard regarding how the code tables are to be maintained.[18] It also documents the processes used for receiving and processing change requests.[19]

A change request form is provided, and there is a second form for collecting information about proposed additions. Any party can submit change requests. When submitted, requests are initially reviewed by the registration authority for completeness.

When a fully documented request is received, it is added to a published Change Request Index. Also, announcements are sent to the general LINGUIST discussion list at Linguist List and other lists the registration authority may consider relevant, inviting public review and input on the requested change. Any list owner or individual is able to request notifications of change requests for particular regions or language families. Comments that are received are published for other parties to review. Based on consensus in comments received, a change request may be withdrawn or promoted to "candidate status".

Three months prior to the end of an annual review cycle (typically in September), an announcement is set to the LINGUIST discussion list and other lists regarding Candidate Status Change Requests. All requests remain open for review and comment through the end of the annual review cycle.

Decisions are announced at the end of the annual review cycle (typically in January). At that time, requests may be adopted in whole or in part, amended and carried forward into the next review cycle, or rejected. Rejections often include suggestions on how to modify proposals for resubmission. A public archive of every change request is maintained along with the decisions taken and the rationale for the decisions.[20]

Criticism

Linguists Morey, Post and Friedman raise various criticisms of ISO 639, and in particular ISO 639-3:[15]

  • The three-letter codes themselves are problematic, because while officially arbitrary technical labels, they are often derived from mnemonic abbreviations for language names, some of which are pejorative. For example, Yemsa was assigned the code [jnj], from pejorative "Janejero". These codes may thus be considered offensive by native speakers, but codes in the standard, once assigned, cannot be changed.
  • The administration of the standard is problematic because SIL is a missionary organization with inadequate transparency and accountability. Decisions as to what deserves to be encoded as a language are made internally. While outside input may or may not be welcomed, the decisions themselves are opaque, and many linguists have given up trying to improve the standard.
  • Permanent identification of a language is incompatible with language change.
  • Languages and dialects often cannot be rigorously distinguished, and dialect continua may be subdivided in many ways, whereas the standard privileges one choice. Such distinctions are often based instead on social and political factors.
  • ISO 639-3 may be misunderstood and misused by authorities that make decisions about people's identity and language, abrogating the right of speakers to identify or identify with their speech variety. Though SIL is sensitive to such issues, this problem is inherent in the nature of an established standard, which may be used (or mis-used) in ways that ISO and SIL do not intend.

Martin Haspelmath agrees with four of these points, but not the point about language change.[21] He disagrees because any account of a language requires identifying it, and we can easily identify different stages of a language. He suggests that linguists may prefer to use a codification that is made at the languoid level since "it rarely matters to linguists whether what they are talking about is a language, a dialect or a close-knit family of languages." He also questions whether an ISO standard for language identification is appropriate since ISO is an industrial organization, while he views language documentation and nomenclature as a scientific endeavor. He cites the original need for standardized language identifiers as having been "the economic significance of translation and software localization," for which purposes the ISO 639-1 and 639-2 standards were established. But he raises doubts about industry need for the comprehensive coverage provided by ISO 639-3, including as it does "little-known languages of small communities that are never or hardly used in writing and that are often in danger of extinction".

Usage

References

  1. ^ a b "ISO 639-3 status and abstract". iso.org. 2010-07-20. Retrieved 2012-06-14.
  2. ^ "Maintenance agencies and registration authorities". ISO.
  3. ^ "Types of individual languages – Ancient languages". sil.org. Retrieved 2018-06-11.
  4. ^ Ethnologue report for ISO 639 code: zho on ethnologue.com
  5. ^ ISO639-3 on SIL.org
  6. ^ a b "ISO 639-3 Code Set". Sil.org. 2007-10-18. Retrieved 2012-06-14.
  7. ^ "ISO 639-3". sil.org.
  8. ^ "Scope of Denotation: Individual Languages". sil.org.
  9. ^ "Scope of Denotation: Dialects". sil.org.
  10. ^ "Scope of denotation: Macrolanguages". sil.org. Retrieved 2012-06-14.
  11. ^ "Macrolanguage Mappings". sil.org. Retrieved 2012-06-14.
  12. ^ "Scope of denotation: Collective languages". sil.org. Retrieved 2012-06-14.
  13. ^ Field Recordings of Vervet Monkey Calls. Entry in the catalog of the Linguistic Data Consortium. Retrieved 2012-09-04.
  14. ^ "Submitting ISO 639-3 Change Requests: Types of Changes". sil.org.
  15. ^ a b Morey, Stephen; Post, Mark W.; Friedman, Victor A. (2013). The language codes of ISO 639: A premature, ultimately unobtainable, and possibly damaging standardization. PARADISEC RRR Conference.
  16. ^ "Scope of Denotation for Language Identifiers". sil.org.
  17. ^ "Types of Languages". sil.org.
  18. ^ "ISO 639-3 Change Management". sil.org.
  19. ^ "Submitting ISO 639-3 Change Requests". sil.org.
  20. ^ "ISO 639-3 Change Request Index". sil.org.
  21. ^ Martin Haspelmath, "Can language identity be standardized? On Morey et al.'s critique of ISO 639-3", Diversity Linguistics Comment, 2013/12/04
  22. ^ "OLAC Language Extension". language-archives.org. Retrieved 3 August 2015.
  23. ^ "Over 7,000 languages, just 1 Windows". Microsoft. 2014-02-05.
  24. ^ "Language proposal policy". wikimedia.org. Retrieved 3 August 2015.
  25. ^ "BCP 47 – Tags for Identifying Languages". ietf.org. Retrieved 3 August 2015.
  26. ^ a b "EPUB Publications 3.0". idpf.org. Retrieved 3 August 2015.
  27. ^ "DCMI Metadata Terms". purl.org. Retrieved 3 August 2015.
  28. ^ "Two-letter or three-letter ISO language codes". w3.org. Retrieved 3 August 2015.
  29. ^ "Language Registry". Iana.org. Retrieved 2015-08-12.
  30. ^ "3 Semantics, structure, and APIs of HTML documents — HTML5". w3.org. Retrieved 3 August 2015.
  31. ^ "Elements – MODS User Guidelines: Metadata Object Description Schema: MODS (Library of Congress)". loc.gov. Retrieved 3 August 2015.
  32. ^ "TEI element language". tei-c.org. Retrieved 3 August 2015.

Further reading

External links

Australian Aboriginal English

Australian Aboriginal English (AAE) refers to a dialect of Australian English used by a large section of the Indigenous Australian population. It is made up of a number of varieties which developed differently in different parts of Australia. These varieties are generally said to fit along a continuum ranging from light forms, close to Standard Australian English, to heavy forms, closer to Kriol. There are generally distinctive features of accent, grammar, words and meanings, as well as language use. AAE is not to be confused with Kriol, which is a separate language from English spoken by over 30,000 people in Australia. Speakers have been noted to tend to change between different forms of AAE depending on whom they are speaking to, e.g. striving to speak more like Australian English when speaking to a non-Indigenous English-speaking person.Several features of AAE are shared with creole languages spoken in nearby countries, such as Tok Pisin in Papua New Guinea, Pijin in the Solomon Islands, and Bislama in Vanuatu.

AAE terms, or derivative terms, are sometimes used by the broader Australian community. Australian Aboriginal English is spoken amongst indigenous people generally but is especially evident in what are called "discrete communities" i.e. ex-government or mission reserves such as the DOGIT communities in Queensland. Because most Indigenous Australians live in urban and rural areas with strong social interaction across assumed rural and urban and remote divides, many so-called "urban" people also use Aboriginal English.

Burgundian language (Oïl)

The Burgundian language, also known by French names Bourguignon-morvandiau, Bourguignon, and Morvandiau, is an Oïl language spoken in Burgundy and particularly in the Morvan area of the region.

The arrival of the Burgundians brought Germanic elements into the Gallo-Romance speech of the inhabitants. The occupation of the Low Countries by the Dukes of Burgundy also brought Burgundian into contact with Dutch; e.g., the word for gingerbread couque derives from Old Dutch kooke (cake).

Dialects of the south along the Saône river, such as Brionnais-Charolais, have been influenced by the Arpitan language, which is spoken mainly in a neighbouring area that approximates the heartland of the original Kingdom of Burgundy.

Eugène de Chambure published a Glossaire du Morvan in 1878.

Central Plains Mandarin

Central Plains Mandarin, or Zhongyuan Mandarin (simplified Chinese: 中原官话; traditional Chinese: 中原官話; pinyin: zhōngyuán guānhuà), is a variety of Mandarin Chinese spoken in the central and southern parts of Shaanxi, Henan, southwestern part of Shanxi, southern part of Gansu, far southern part of Hebei, northern Anhui, northern parts of Jiangsu, southern Xinjiang and southern Shandong.The archaic dialect in Peking opera is a form of Zhongyuan Mandarin.

Among Hui people, Zhongyuan Mandarin is sometimes written with the Arabic alphabet, called Xiao'erjing ("Children's script").

Ewondo Populaire

Ewondo Populair is a Beti-based pidgin of Cameroon, spoken in the area of the capital Yaoundé.

ISO 639 macrolanguage

A macrolanguage is a book-keeping mechanism for the ISO 639 international standard for language codes. Macrolanguages are established to assist mapping between different sets of ISO language codes. Specifically, there may be a many-to-one correspondence between ISO 639-3, intended to identify all the thousands of languages of the world, and either of two other sets, ISO 639-1, established to identify languages in computer systems, and ISO 639-2, which encodes a few hundred languages for library cataloguing and bibliographic purposes. When such many-to-one ISO 639-2 codes are included in an ISO 639-3 context, they are called "macrolanguages" to distinguish them from the corresponding individual languages of ISO 639-3. According to the ISO,

Some existing code elements in ISO 639-2, and the corresponding code elements in ISO 639-1, are designated in those parts of ISO 639 as individual language code elements, yet are in a one-to-many relationship with individual language code elements in [ISO 639-3]. For purposes of [ISO 639-3], they are considered to be macrolanguage code elements.

ISO 639-3 is curated by SIL International, ISO 639-2 is curated by the Library of Congress (USA).

The mapping often has the implication that it covers borderline cases where two language varieties may be considered strongly divergent dialects of the same language or very closely related languages (dialect continuums); it may also encompass situations when there are language varieties that are considered to be varieties of the same language on the grounds of ethnic, cultural, and political considerations, rather than linguistic reasons. However, this is not its primary function and the classification is not evenly applied.

For example, Chinese is a macrolanguage encompassing many languages that are not mutually intelligible, but the languages "Standard German", "Bavarian German", and other closely related languages do not form a macrolanguage, despite being more mutually intelligible. Other examples include Tajiki not being part of the Persian macrolanguage despite sharing much lexicon, and Urdu and Hindi not forming a macrolanguage despite forming a mutually intelligible dialect continuum. Even all dialects of Hindi are considered as separate languages. Basically, ISO 639-2 and ISO 639-3 use different criteria for dividing language varieties into languages, 639-2 uses shared writing systems and literature more whereas 639-3 focuses on mutual intelligibility and shared lexicon. The macrolanguages exist within the ISO 639-3 code set to make mapping between the two sets easier.

As of 8 April 2019, there are fifty-eight language codes in ISO 639-2 that are considered to be macrolanguages in ISO 639-3. The use of this category of macrolanguage was applied in Ethnologue, starting in the 16th edition.Some of the macrolanguages had no individual language (as defined by 639-3) in ISO 639-2, e.g. "ara" (Arabic), but ISO 639-3 recognizes different varieties of Arabic as separate languages under some circumstances. Others, like "nor" (Norwegian) had their two individual parts (nno Nynorsk, nob Bokmål) already in 639-2. That means some languages (e.g. "arb" Standard Arabic) that were considered by ISO 639-2 to be dialects of one language ("ara") are now in ISO 639-3 in certain contexts considered to be individual languages themselves. This is an attempt to deal with varieties that may be linguistically distinct from each other, but are treated by their speakers as forms of the same language, e.g. in cases of diglossia. For example,

Generic Arabic, 639-2

Standard Arabic, 639-3ISO 639-2 also includes codes for collections of languages; these are not the same as macrolanguages. These collections of languages are excluded from ISO 639-3, because they never refer to individual languages. Most such codes are included in ISO 639-5.

Languedocien dialect

Languedocien (French name) or Lengadocian (native name) is an Occitan dialect spoken in rural parts of southern France such as Languedoc, Rouergue, Quercy, Agenais and Southern Périgord. Due to its central position among the dialects of Occitan, it is often used as a basis for a Standard Occitan.About 10% of the population of Languedoc are fluent in the language (about 300,000),and another 20% (600,000) "have some understanding" of the language. All speak French as their first or second language.

Limousin dialect

Limousin (Occitan: Lemosin) is a dialect of the Occitan language, spoken in the three departments of Limousin, parts of Charente and the Dordogne in the southwest of France.

The first Occitan documents are in an early form of this dialect, particularly the Boecis, written around the year 1000.

Limousin is used primarily by people over age 50 in rural communities. All speakers speak French as a first or second language. Due to the French single language policy, it is not recognised by the government and might be disappearing. A revivalist movement around the Félibrige and the Institut d'Estudis Occitans is active in Limousin (as well as in other parts of Occitania).

List of ISO 639-3 codes

These are lists of ISO 639-3 language codes.

Index |

a |

b |

c |

d |

e |

f |

g |

h |

i |

j |

k |

l |

m |

n |

o |

p |

q |

r |

s |

t |

u |

v |

w |

x |

y |

z

Lop dialect

Lop, also known as Lopnor or Lopnur is a language spoken in Xinjiang, China.

Malaysian Tamil

Malaysian Tamil (Tamil: Malēciya tamiḻ moḻi), also known as Malaya Tamil, is a local variant of Tamil Language spoken in Malaysia. It is one of the languages of education in Malaysia, along with English, Malay and Mandarin. There are many differences in vocabulary between Malaysian Tamil and Indian Tamil.

Mecklenburgisch-Vorpommersch dialect

Mecklenburgisch-Vorpommersch is a Low German dialect spoken in the German state of Mecklenburg-Vorpommern. It belongs to the East Low German group.

In the western parts of the language area it is similar to some Low Saxon dialects, while the eastern parts are influenced by the Central Pomeranian (Mittelpommersch) dialect. It differs slightly from East Pomeranian, which used to be spoken widely in the nowadays Polish part of Farther Pomerania and included much more Slavic Pomeranian and Kashubian influence.

The name Pomerania comes from Slavic po more, which means Land at the Sea.

Medan Hokkien

Medan Hokkien is a local variant of Hokkien spoken among the Chinese in Medan, Indonesia. It is the lingua franca in Medan as well as other northern city states of North Sumatra surrounding it, and is a subdialect of Zhangzhou (漳州) dialect, together with widespread use of Indonesian and English borrowed words. It is predominantly a spoken dialect: it is rarely written in Chinese characters as Indonesia had banned the use of Chinese characters back in New Order era, and there is no standard romanisation.

Monégasque dialect

Monégasque (natively Munegascu) is a variety of Ligurian, a Gallo-Italic language spoken in Monaco as well as nearby in Italy and France.

Monégasque is officially taught in the schools of Monaco and spoken by a minority of residents and as a common second language by many native residents. In Monaco-Ville, street signs are printed in both French and Monégasque.

Old Xiang

Old Xiang, also known as Lou-Shao (娄邵片 / 婁邵片) is a conservative form of Xiang Chinese. It is spoken in the central areas of Hunan where it has been to some extent isolated from neighboring Chinese varieties, Mandarin and Gan, and it retains the voiced plosives of Middle Chinese, which are otherwise only preserved in Wu dialects like Shanghainese. See Shuangfeng dialect for details.

Ramree dialect

Ramree, or Yangbye ("Rambray" in Arakanese)(Burmese: ရမ်းဗြဲဘာသာစကား, Burmese pronunciation: [jáɴbjɛ́ bàðà zəɡá]), is the main dialect spoken in Southern Arakan, especially in Ramree Island region, Arakan State in Burma (Myanmar), and the Awagyun Island and southern coastal regions in Bangladesh. Ramree language is also widely spoken along the South Arakan coast, including the western areas of the Irrawaddy Division, Burma.

Taihu Wu

Taihu Wu (吳語太湖片) or Northern Wu dialects (北部吳語) are a group of Wu dialects spoken over much of southern part of Jiangsu province, including Suzhou, Wuxi, Changzhou, the southern part of Nantong, Jingjiang and Danyang; the municipality of Shanghai; and the northern part of Zhejiang province, including Hangzhou, Shaoxing, Ningbo, Huzhou, and Jiaxing. A notable exception is the dialect of the town of Jinxiang, which is a linguistic exclave of Taihu Wu in Zhenan Min-speaking Cangnan county of Wenzhou prefecture in Zhejiang province. This group makes up the largest population among all Wu speakers. The subdialects of this region are mutually intelligible with each other.

Vaccarizzo Albanian

Vaccarizzo Albanian, or Calabria Arbëresh, is a subdialect of the Arbëresh dialect of the Albanian language. Spoken in the villages of Vaccarizzo Albanese and San Giorgio Albanese in southern Italy by approximately 3,000 people, Vaccarizzo Albanian has retained many archaic features of the Tosk dialect, on which the Standard Albanian is based.

Xuanzhou Wu dialects

Xuanzhou Wu (宣州吳語) is a western branch of Wu Chinese spoken in and around Xuancheng, Anhui province. The dialect has declined since the Taiping Rebellion, with an influx of Mandarin-speaking immigrants from north of the Yangtze River.

Yi-Liu dialect

Yi-Liu, sometimes called Yichun dialect (simplified Chinese: 宜春话; traditional Chinese: 宜春話) after its principal variety, is a dialect of Gan Chinese. It is spoken in Yichun in Jiangxi province and in Liuyang in Hunan, after which it is named, as well as inShanggao, Qingjiang, Xingan, Xinyu City, Fen yi, Pingxiang City, Fengcheng, Wanzai in Jiangxi and in Liling in Hunan.

ISO standards by standard number
1–9999
10000–19999
20000+

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.