ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166-9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.
However, these encodings are not widely used because the standard was published one year after the publication of international standard ISO 10585 that defined another 7-bit encoding, from which the encoding and mapping to the UCS (Universal Coded Character Set (ISO/IEC 10646) and Unicode standards) were also derived a few years after, and there was a lack of support in the computer industry for adding ArmSCII.
Very few systems support these encodings. Microsoft Windows does not support them, for example. It is usually better to use Unicode for proper interchange of Armenian text for web browsers and email, since most modern computers do not support ArmSCII by default.
The following three main variants are defined:
Note that each ArmSCII encoding also has several minor variants, depending on the revision of the related Armenian standard (which was not made official before 1997, and was defined informally before that; this has caused various confusions and the mappings described below are just best practices according to the latest 1997 revision of the Armenian standard); that may change the exact mapping and usage of a few punctuation characters and symbols.
None of the ArmSCII encodings have reached international approval (unlike the ISO 10585 standard, despite of the critics sent by the official Armenian standard body to ISO/DIS JTC 1/SC 2/WG 2, working on single byte-coded character sets) because all international efforts have been made since then to work with the UCS (in Unicode and ISO 10646).
ArmSCII-8 is intended for use on Unix and Windows systems, and for information interchange on the WWW and by email. However, Microsoft wanted users to use Unicode and not introduce a plethora of new code pages, so it is not supported natively on Windows. It just consists in remapping ArmSCII-7 in the higher range above the standard US ASCII range.
ArmSCII-8A is intended for use on DOS and Mac systems. It is a rearrangement of ArmSCII-8, to work with existing DOS and Mac code that reserve a range of code values for characters not intended for text but for presentation layout, using modified fonts; it is, however, considered as a "hack" of the code pages over which it is applied, as neither DOS (nor Windows in the "OEM" compatibility codepages used by the text-only console) nor MacOS has ever supported this encoding natively, notably in their file system (but this is also true for the now deprecated ISO 10585 standard). However, this encoding cannot map all the punctuation characters normally needed for Armenian, so the missing characters must be approximated using fallbacks to ASCII punctuation (some Armenian fonts may display these ASCII punctuation using the rendering intended for the Armenian characters that are mapped to them by these fallbacks).
|2x||SP||֎||և / §||։||)||(||»||«||―||·||՝||,||‐||֊||…||՜|
In this table, code value 21 is the eternity sign, which has, since 2013, a designated point in Unicode U+058E (LEFT-FACING ARMENIAN ETERNITY SIGN) and another for its right-facing variant: U+058D (RIGHT-FACING ARMENIAN ETERNITY SIGN). Some mappings incorrectly claim that it has a code point of U+0530.
Code value 20 is the regular SPACE character; code values 00–1F and 7F are not assigned to characters by AST 34.005, though they may be the same as the ASCII control characters that are located in those positions.
Code value 22 is used to encode the Armenian ligature ew (և). In some variants, it encodes the section sign (§) instead. It is strongly suggested to encode this ligature with the normal Armenian ech (yech) and yiwn (vyun) small letters pair, as various software or fonts will render it differently depending on the version of ArmSCII-7 they are assuming, and so let the renderer generate the ligature.
Code value 7F may be used sometimes as a substitution for the non-breaking space.
Note that the characters encoded at code values 2D and 7E (Armenian hyphen and apostrophe) may not be visible with all fonts supporting Armenian.
This table is simply remapped to higher codes by simple offset in ArmSCII-8 (below).
|֎||և / §||։||)||(||»||«||―||·||՝||,||‐||֊||…||՜|
In this table, code value 20 is reserved for the regular SPACE character, code value A0 is reserved for the non-breaking space, and code value A1 is assigned to the eternity sign, which has, since 2013, a designated point in Unicode U+58E (LEFT-FACING ARMENIAN ETERNITY SIGN) and another for its right-facing variant: U+58D (RIGHT-FACING ARMENIAN ETERNITY SIGN). Some mappings incorrectly claim that it has a code point of U+0530.
Code values 00–1F and 7F–9F are not assigned to characters by AST 34.002, though they may be the same as the ISO-8859-1 control characters that are located in those positions.
The code value A2 is used to encode the Armenian ligature ew (և). In some variants it encodes the section sign (§) instead. Some Armenian fonts display this ligature at the position of the ASCII ampersand symbol, but it is strongly suggested to encode the ligature using the two standard Armenian small letters that compose it.
The code value FF may be filled with the Armenian small letter modifier apostrophe (but it has no mapping in Unicode, and shown here using the ASCII apostrophe instead, for correct rendering with Unicode fonts, it is suggested that the small letter modifier be represented using code value FE with ligature control to change its position because it only occurs after a small Armenian letter), and the Armenian apostrophe encoded at FE occurs only after a capital Armenian letter. So most implementations do not encode anything at code value FF.
This standard is the only one that makes an apparent distinction for the "mirrored" Armenian parentheses, because it was created by simply remapping the ArmSCII-7 standard. However, many documents will not consider this as a productive distinction, and the usual ASCII-based parenthesis punctuation are most commonly used instead of the ArmSCII-7-based mirrored parentheses, just because Armenian keyboards and editors using ArmSCII-8 generated the lower ASCII codes (whose usage is just swapped in classical Armenian). Also, the duplication of the ASCII comma at code value AB is also the result of the simple remapping of ArmSCII-7, so there is no difference with the ASCII comma that most ArmSCII-8 documents are using.
Note that the characters encoded at code values AD and FE (Armenian hyphen and apostrophe) may not be visible with all fonts supporting Armenian.
In this table, code value 20 is the regular SPACE character, and code value DC is the eternity sign, which has, since 2013, a designated point in Unicode U+58E (LEFT-FACING ARMENIAN ETERNITY SIGN) and another for its right-facing variant: U+58D (RIGHT-FACING ARMENIAN ETERNITY SIGN). Some mappings incorrectly claim that it has a code point of U+0530.
Code values 00–1F, 7F, and B0–DB are not assigned to characters by AST 34.002, though they may be the same as those used in a legacy DOS/OEM codepage 437 (box drawing characters) or Macintosh Roman.
Note that the characters encoded at code values DD and FE (Armenian hyphen and apostrophe) may not be visible with all fonts supporting Armenian.
For comparison, this is the 7-bit encoding in the international standard ISO/IEC 10585 standard that was used before the revision in the Armenian standard AST34.002:1997 (ArmSCII-8).
In this standard (as well as in ISO/IEC 10646 and Unicode), there's only one Armenian apostrophe modifier letter encoded at 0x49 when Armenian uses two modifier letter apostrophes which are cased (U+055A represents the capital apostrophe but is not considered dual-cased in Unicode and this ISO 15985 standard, the small letter apostrophe is absent but generally represented by the ASCII apostrophe U+0027 in Unicode documents).
The left half-ring punctuation (a modifier letter) and the eternity symbol are also missing, and only one double quotation mark (U+2033) is encoded in code value 7A instead of double guillemets in the three ArmSCII variants.
However, this standard maps the Armenian full stop (whose glyph looks very close to the ASCII colon) in code value 4C and the Armenian abbreviation mark (that looks very similar to an angular grave accent) in code value 4F, that are both missing from all ArmSCII code charts.
Note that the characters encoded at code values 49 and 4A (Armenian apostrophe and hyphen) may not be visible with all fonts supporting Armenian.
Official Unicode Consortium code chart (PDF)
|Armenian subset of Alphabetic Presentation Forms|
Official Unicode Consortium code chart (PDF)
|U+FB1x||ﬓ||ﬔ||ﬕ||ﬖ||ﬗ||(U+FB00–FB12, U+FB18–FB4F omitted)|
For comparison, this is the Unicode code points charts for Armenian.
Its encoding since Unicode 1.1 (except the Armenian hyphen U+058A, the last character added since Unicode 3.0) was based on the previous ISO 10585 7-bit international encoding standard, rather than on ArmsCII that was missing a dozen of characters present in ISO 10585. However, non-letters were reorganized by type, and some extensions have been added for rare Armenian characters that were missing in all past 7-bit and 8-bit standards.
Capital letters are encoded in the first half of the block (terminated by modifier letters).
Lowercase letters are encoded in the second half of the block (terminated by Armenian punctuation signs).
Unlike the ArmSCII encodings, this encoding is stable and portable across systems, and contain all characters needed for Armenian (with the exception of the Armenian eternity sign). Some Unicode-encoded fonts for Armenian are mapping the eternity sign at code point U+0530. This is incorrect, as that code point has been allocated in 2013 at U+58E, and another for its right-facing variant: U+58D.
However, no distinction is kept for the Armenian (mirrored) parenthesis, so the standard ASCII/Unicode punctuation must be used according to their usual rendering. The left half-ring mark (modifier letter) is encoded here, and some other marks are unified with other scripts (notably the quotation marks, middle dot and dashes).
Note that the characters encoded at code points U+055A and U+058A (Armenian apostrophe and hyphen, like in the charts for ArmsCII and ISO 10585), and as well as U+0559 (the modifier mark for numeric, added specifically into ISO 10646-1 and Unicode), may not be visible with all fonts supporting Armenian.
Note that some transcodings are shown below between parentheses. They are only approximation fallbacks but do not map exactly the intended character.
|Subset||Character||Armenian description or usage||Short name||Encodings||Notes|
|ArmSCII-7||ArmSCII-8||ArmSCII-8A||ISO 10585||Unicode ISO/IEC 10646|
|General purpose||space||space||20||20||20||20||0020||same as ASCII and Unicode|
|non-breaking space||nbsp||(20)||A0||FF||(20)||00A0||missing in ArmSCII-7 and ISO 10585|
|Armenian symbols||֎||eternity sign||armeternity||21||A1||DC||—||058E||right-facing variant at U+058D|
|և||ligature ech yiwn (ew)||armew||(3B,75)||(26) (or BB,F5)||(26) (or 89,F5)||(55,72)||0587 (or 0565,0582)||specific to Armenian : compatibility ligature of Armenian ech (yech) and yiwn (vyun) small letters, used as a symbol (similar to ampersand symbol in ASCII)|
|§||section sign||armsection||22||A2||—||—||00A7||from ISO 8859; missing in all ArmSCII variants|
|Armenian punctuation||։||full stop (vertsaket)||armfullstop||23||A3||(3A)||4C||0589||specific to Armenian : looks mostly like ASCII colon, but distinct usage ; missing in ArmSCII-8A (approximated by ASCII colon)|
|)||right parenthesis||armparenright||24||A4||29||(79)||0029||from ASCII, name and usage different and Unicode ; missing in ISO 10585 (suggested substitution uses dashes)|
|(||left parenthesis||armparenleft||25||A5||28||(79)||0028||from ASCII, name and usage different and Unicode ; missing in ISO 10585 (suggested substitution uses dashes)|
|»||right quotation mark||armquotright||26||A6||AF||(7A)||00BB||from ISO-8859, name and usage different and Unicode|
|«||left quotation mark||armquotleft||27||A7||AE||(7A)||00AB||from ISO-8859, name and usage different and Unicode|
|″||quotation mark||—||—||(22)||(22)||7A||2033||used for either left or right quotation mark in ISO 10585; missing in ArmSCII-8/8A (approximated by ASCII double quotation mark)|
|―||em-dash||armemdash||28||A8||(5F)||78||2015||from ISO-8859; missing in ArmSCII-8A (approximated by ASCII underscore)|
|.||middle dot (mijaket)||armdot||29||A9||(2E)||7C||2024||sometimes similar to ASCII full stop, but usage different in Armenian where the middle dot is preferred; missing in ArmSCII-8A (approximated by ASCII full stop)|
|՝||separation mark (but)||armsep||2A||AA||(60)||48||055D||usage specific to Armenian : used as a comma ; = bowt ; missing in ArmSCII-8A (approximated by ASCII backquote)|
|,||comma||armcomma||2B||AB||2C||4D||002C||same as ASCII and Unicode comma|
|‐||dash||armendash||2C||AC||(2D)||79||2010||similar to the short variant of the ASCII and Unicode minus-hyphen (shorter than the general purpose minus sign used in ASCII) ; missing in ArmSCII-8A (approximated by ASCII minus-hyphen)|
|Armenian modifier letters||֊||hyphen (yentamna)||armyentamna||2D||AD||DD||4A||058A||specific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them)|
|…||ellipsis||armellipsis||2E||AE||DE||(7C,7C,7C)||2026||from ISO-8859, but not a punctuation : a modifier letter that follows and modifies another normal Armenian letter (possibly with combining punctuation between them)|
|ՙ||numeric mark (left half-ring)||armnum||—||—||—||—||0559||specific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them) ; missing in all ArmSCII variants|
|՚||apostrophe (right half-ring)||armapostrophe||7E||FE||FE||49||055A||specific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them)|
|Armenian combining punctuation||՜||exclamation mark (amanak)||armexclam||2F||AF||(7E)||7E||055C||specific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters. However, they are normally not spacing ; = batsaganchakan nshan ; missing in ArmSCII-8A (approximated by ASCII tilde symbol)|
|՛||emphasis mark (shesht)||armaccent||30||B0||(27)||7D||055B||specific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters. However, they are normally not spacing ; missing in ArmSCII-8A (approximated by ASCII single quote)|
|՞||question mark (paruyk)||armquestion||31||B1||DF||4E||055E||specific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters. However, they are normally not spacing ; = hartsakan nshan|
|՟||abbreviation mark (patiw)||armabbrev||—||—||—||4F||055F||specific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters. However, they are normally not spacing|
|Armenian capital letters||Ա||Ayb||Armayb||32||B2||80||21||0531|
|Armenian small letters||ա||ayb||armayb||33||B3||81||51||0561|
The Armenian alphabet (Armenian: Հայոց գրեր, Hayots' grer or Հայոց այբուբեն, Hayots' aybuben; Eastern Armenian: [haˈjotsʰ ajbuˈbɛn]; Western Armenian: [haˈjotsʰ ajpʰuˈpʰɛn]) is an alphabetic writing system used to write Armenian. It was developed around 405 AD by Mesrop Mashtots, an Armenian linguist and ecclesiastical leader. The system originally had 36 letters; eventually, three more were adopted.
The Armenian word for "alphabet" is այբուբեն (aybuben), named after the first two letters of the Armenian alphabet: ⟨Ա⟩ Armenian: այբ ayb and ⟨Բ⟩ Armenian: բեն ben. Armenian is written horizontally, left-to-right.Armenian eternity sign
The Armenian eternity sign (Armenian: հավերժության նշան, haverzhut’yan nshan) or Arevakhach (Արևախաչ, "Sun Cross") is an ancient Armenian national symbol and a symbol of the national identity of the Armenian people. It is one of the most common symbols in Armenian architecture, carved on khachkars and on walls of churches.Character encoding
Character encoding is used to represent a repertoire of characters by some kind of encoding system. Depending on the abstraction level and context, corresponding code points and the resulting code space may be regarded as bit patterns, octets, natural numbers, electrical pulses, etc. A character encoding is used in computation, data storage, and transmission of textual data. "Character set", "character map", "codeset" and "code page" are related, but not identical, terms.
Early character codes associated with the optical or electrical telegraph could only represent a subset of the characters used in written languages, sometimes restricted to upper case letters, numerals and some punctuation only. The low cost of digital representation of data in modern computer systems allows more elaborate character codes (such as Unicode) which represent most of the characters used in many written languages. Character encoding using internationally accepted standards permits worldwide interchange of text in electronic form.Charset detection
Character encoding detection, charset detection, or code page detection is the process of heuristically guessing the character encoding of a series of bytes that represent text. The technique is recognised to be unreliable and is only used when specific metadata, such as a HTTP Content-Type: header is either not available, or is assumed to be untrustworthy.
This algorithm usually involves statistical analysis of byte patterns, like frequency distribution of trigraphs of various languages encoded in each code page that will be detected; such statistical analysis can also be used to perform language detection. This process is not foolproof because it depends on statistical data.
In general, incorrect charset detection leads to mojibake.
One of the few cases where charset detection works reliably is detecting UTF-8. This is due to the large percentage of invalid byte sequences in UTF-8, so that text in any other encoding that uses bytes with the high bit set is extremely unlikely to pass a UTF-8 validity test. However, badly written charset detection routines do not run the reliable UTF-8 test first, and may decide that UTF-8 is some other encoding. For example, it was common that web sites in UTF-8 containing the name of the German city München were shown as MÃ¼nchen.
UTF-16 is fairly reliable to detect due to the high number of newlines (U+000A) and spaces (U+0020) that should be found when dividing the data into 16-bit words, and the fact that few encodings use 16-bit words. This process is not foolproof; for example, some versions of the Windows operating system would mis-detect the phrase "Bush hid the facts" (without a newline) in ASCII as Chinese UTF-16LE.
Charset detection is particularly unreliable in Europe, in an environment of mixed ISO-8859 encodings. These are closely related eight-bit encodings that share an overlap in their lower half with ASCII. There is no technical way to tell these encodings apart and recognising them relies on identifying language features, such as letter frequencies or spellings.
Due to the unreliability of heuristic detection, it is better to properly label datasets with the correct encoding. HTML documents served across the web by HTTP should have their encoding stated out-of-band using the Content-Type: header.
An isolated HTML document, such as one being edited as a file on disk, may imply such a header by a meta tag within the file:
or with a new meta type in HTML5
If the document is Unicode, then some UTF encodings explicitly label the document with an embedded initial byte order mark (BOM).Code page 1287
Code page 1287, also known as CP1287, DEC Greek (8-bit) and EL8DEC, is one of the code pages implemented for the VT220 terminals. It supports the Greek language.Code page 1288
Code page 1288, also known as CP1288, DEC Turkish (8-bit) and TR8DEC, is one of the code pages implemented for the VT220 terminals. It supports the Turkish language.ISO/IEC 6937
ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or rather of ISO/IEC 646-IRV. It was developed in common with ITU-T (then CCITT) for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on. Only certain combinations of lead byte and follow byte are allowed, and there are some exceptions to the lead byte interpretation for some follow bytes. However, there are no combining characters at all are encoded in ISO/IEC 6937. But one can represent some free-standing diacritics, often by letting the follow byte have the code for ASCII space.
ISO/IEC 6937's architects were Hugh McGregor Ross, Peter Fenwick, Bernard Marti and Loek Zeckendorf.
ISO6937/2 defines 327 characters found in modern European languages using the Latin alphabet. Non-Latin European characters, such as Cyrillic and Greek, are not included in the standard. Also, some diacritics used with the Latin alphabet like the Romanian comma are not included, using cedilla instead as no distinction between cedilla and comma below was made at the time.
IANA has registered the charset names ISO_6937-2-25 and ISO_6937-2-add for two (older) versions of this standard (plus control codes). But in practice this character encoding is unused on the Internet.
The ISO/IEC 2022 escape sequence to specify the right-hand side of the ISO/IEC 6937 character set is ESC - R (hex 1B 2D 52).ISO/IEC 8859-11
ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly identical to the national Thai standard TIS-620 (1990). The sole difference is that ISO/IEC 8859-11 allocates non-breaking space to code 0xA0, while TIS-620 leaves it undefined. (In practice, this small distinction is usually ignored.)
ISO-8859-11 is not a main registered IANA charset name despite following the normal pattern for IANA charsets based on the ISO 8859 series. However, it is defined as an alias of the close equivalent TIS-620 (which lacks the non-breaking space), and which can without problems be used for ISO/IEC 8859-11, since the no-break space has a code which was unallocated in TIS-620. Microsoft has assigned code page 28601 a.k.a. Windows-28601 to ISO-8859-11 in Windows. A draft had the Thai letters in different spots.As with all varieties of ISO/IEC 8859, the lower 128 codes are equivalent to ASCII. The additional characters, apart from no-break space, are found in Unicode in the same order, only shifted from 0xA1 to U+0E01 and so forth.
The Microsoft Windows code page 874 as well as the code page used in the Thai version of the Apple Macintosh, MacThai, are extensions of TIS-620 — incompatible with each other, however.ISO/IEC 8859-12
ISO/IEC 8859-12 would have been part 12 of the ISO/IEC 8859 character encoding standard series.
ISO 8859-12 was originally proposed to support the Celtic languages. ISO 8859-12 was later slated for Latin/Devanagari, but this was abandoned in 1997, during the 12th meeting of ISO/IEC JTC 1/SC 2/WG 3 in Iraklion-Crete, Greece, 4 to 7 July 1997. The Celtic proposal was changed to ISO 8859-14.ISO/IEC 8859-16
ISO/IEC 8859-16:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 16: Latin alphabet No. 10, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin-10 or South-Eastern European. It was designed to cover Albanian, Croatian, Hungarian, Polish, Romanian, Serbian and Slovenian, but also French, German, Italian and Irish Gaelic (new orthography).
ISO-8859-16 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429.
Microsoft has assigned code page 28606 a.k.a. Windows-28606 to ISO-8859-16.ISO/IEC 8859-3
ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding remains popular with users of Esperanto, though use is waning as application support for Unicode becomes more common.
ISO-8859-3 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. Microsoft has assigned code page 28593 a.k.a. Windows-28593 to ISO-8859-3 in Windows. IBM has assigned code page 913 to ISO 8859-3.ISO/IEC 8859-8
ISO-8859-8ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 to it.ISO-8859-8 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The text is (usually) in logical order, so bidi processing is required for display. Nominally ISO-8859-8 (code page 28598) is for “visual order”, and ISO-8859-8-I (code page 38598) is for logical order. But usually in practice, and required for HTML and XML documents, ISO-8859-8 also stands for logical order text. There is also ISO-8859-8-E which supposedly requires directionality to be explicitly specified with special control characters; this latter variant is in practice unused.
This character set was also adopted by Israeli Standard SI1311:2002. Over a decade after the publication of that standard, Unicode is preferred, at least for the Internet (meaning UTF-8, the dominant encoding for web pages). ISO-8859-8 is used by less that 0.1% of websites.ISO/IEC 8859-9
ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language, designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for these six replacements of Icelandic characters with characters unique to the Turkish alphabet:
ISO-8859-9 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. In modern applications Unicode and UTF-8 are preferred. 0.1% of all web pages use ISO-8859-9 in February 2016.Microsoft has assigned code page 28599 a.k.a. Windows-28599 to ISO-8859-9 in Windows. IBM has assigned Code page 920 to ISO-8859-9.Lotus International Character Set
The Lotus International Character Set (LICS) is a proprietary single-byte character encoding introduced in 1985 by Lotus Development Corporation. It is based on the 1983 DEC Multinational Character Set (MCS) for VT220 terminals. As such, LICS is also similar to two other descendants of MCS, the ECMA-94 character set of 1985 and the ISO 8859-1 (Latin-1) character set of 1987.
LICS was first introduced as the character set of Lotus 1-2-3 Release 2 for DOS in 1985. It is also utilized by 2.01, 2.2, 2.3 and 2.4 as well as by Symphony. It was also utilized in a number of third-party spreadsheet products emulating the file format. LICS was superseded by the Lotus Multi-Byte Character Set (LMBCS) introduced by Lotus 1-2-3 Release 3 in 1989.Mojibake
Mojibake (文字化け; IPA: [mod͡ʑibake]) is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.
This display may include the generic replacement character ("�") in places where the binary representation is considered invalid. A replacement can also involve multiple consecutive symbols, as viewed in one encoding, when the same binary code constitutes one symbol in the other encoding. This is either because of differing constant length encoding (as in Asian 16-bit encodings vs European 8-bit encodings), or the use of variable length encodings (notably UTF-8 and UTF-16).
Failed rendering of glyphs due to either missing fonts or missing glyphs in a font is a different issue that is not to be confused with mojibake. Symptoms of this failed rendering include blocks with the code point displayed in hexadecimal or using the generic replacement character ("�"). Importantly, these replacements are valid and are the result of correct error handling by the software.National Replacement Character Set
The National Replacement Character Set, or NRCS for short, was a feature supported by later models of Digital's (DEC) computer terminal systems, starting with the VT200 series in 1983. NRCS allowed individual characters from one character set to be replaced by one from another set, allowing the construction of different character sets on the fly. It was used to customize the character set to different local languages, without having to change the terminal's ROM for different counties, or alternately, include many different sets in a larger ROM. Many 3rd party terminals and terminal emulators supporting VT200 codes also supported NRCS.Xerox Character Code Standard
The Xerox Character Code Standard (XCCS) is a historical 16-bit character encoding that was created by Xerox in 1980 for the exchange of information between elements of the Xerox Network Systems Architecture. It encodes the characters required for languages using the Latin, Arabic, Hebrew, Greek and Cyrillic scripts, the Chinese, Japanese and Korean writing systems, and technical symbols.It can be viewed as an early precursor of, and inspiration for, Unicode.The International Character Set (ICS) is character set is compatible with XCCS.The XCCS 2.0 (1990) is revision covers Latin, Arabic, Hebrew, Gothic, Armenian, Runic, Georgian, Greek, Cryrillic, Hiragana, Katakana, Bopomofo scripts, technical, and mathematical symbols.
|MacOS code pages("scripts")|
|DOS code pages|
|IBM AIX code pages|
|IBM Apple MacIntoshemulations|
|IBM Adobe emulations|
|IBM DEC emulations|
|IBM HP emulations|
|Windows code pages|
|EBCDIC code pages|
|Unicode / ISO/IEC 10646|
|TeX typesetting system|
|Miscellaneous code pages|
ISO standards by standard number