ISO-8859-8ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 to it.
ISO-8859-8 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The text is (usually) in logical order, so bidi processing is required for display. Nominally ISO-8859-8 (code page 28598) is for “visual order”, and ISO-8859-8-I (code page 38598) is for logical order. But usually in practice, and required for HTML and XML documents, ISO-8859-8 also stands for logical order text. There is also ISO-8859-8-E which supposedly requires directionality to be explicitly specified with special control characters; this latter variant is in practice unused.
This character set was also adopted by Israeli Standard SI1311:2002. Over a decade after the publication of that standard, Unicode is preferred, at least for the Internet (meaning UTF-8, the dominant encoding for web pages). ISO-8859-8 is used by less that 0.1% of websites.
FD is left-to-right mark (U+200E) and FE is right-to-left mark (U+200F), as specified in a newer amendment as ISO/IEC 8859-8:1999.
Background: the problem of Hebrew and the Internet
Bi-directional text is text containing text in both text directionalities, both right-to-left (RTL or dextrosinistral) and left-to-right (LTR or sinistrodextral). It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text directionality in each row.
Some writing systems of the world, including the Arabic and Hebrew scripts or derived systems such as the Persian, Urdu, and Yiddish scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction used by the dominant Latin script. When LTR text is mixed with RTL in the same paragraph, each type of text is written in its own direction, which is known as bi-directional text. This can get rather complex when multiple levels of quotation are used.
Many computer programs fail to display bi-directional text correctly.
For example, the Hebrew name Sarah (שרה) is spelled: sin (ש) (which appears rightmost), then resh (ר), and finally heh (ה) (which should appear leftmost).
Note: Some web browsers may display the Hebrew text in this article in the opposite direction.DEC Hebrew
The DEC Hebrew character set is an 8-bit character set developed by Digital Equipment Corporation (DEC) to support the Hebrew alphabet. It was derived from DEC's Multinational Character Set (MCS) by removing the existing definitions from code points 192 to 223 and 224 to 250 and replacing code points 251 to 256 by the Hebrew letters. Since MCS is a predecessor of ISO/IEC 8859-1, DEC Hebrew is similar to ISO/IEC 8859-8 and the Windows code page 1255, that is, many characters in the range 160 to 191 are the same, and the Hebrew letters are at 192 to 250 in all three character sets.Diacritic
A diacritic – also diacritical mark, diacritical point, diacritical sign, or accent – is a glyph added to a letter, or basic glyph. The term derives from the Ancient Greek διακριτικός (diakritikós, "distinguishing"), from διακρίνω (diakrī́nō, "to distinguish"). Diacritic is primarily an adjective, though sometimes used as a noun, whereas diacritical is only ever an adjective. Some diacritical marks, such as the acute ( ´ ) and grave ( ` ), are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters.
The main use of diacritical marks in the Latin script is to change the sound-values of the letters to which they are added. Examples are the diaereses in the borrowed French words naïve and Noël, which show that the vowel with the diaeresis mark is pronounced separately from the preceding vowel; the acute and grave accents, which can indicate that a final vowel is to be pronounced, as in saké and poetic breathèd; and the cedilla under the "c" in the borrowed French word façade, which shows it is pronounced /s/ rather than /k/. In other Latin-script alphabets, they may distinguish between homonyms, such as the French là ("there") versus la ("the") that are both pronounced /la/. In Gaelic type, a dot over a consonant indicates lenition of the consonant in question.
In other alphabetic systems, diacritical marks may perform other functions. Vowel pointing systems, namely the Arabic harakat ( ـِ ,ـُ ,ـَ, etc.) and the Hebrew niqqud ( ַ◌, ֶ◌, ִ◌, ֹ◌, ֻ◌, etc.) systems, indicate vowels that are not conveyed by the basic alphabet. The Indic virama ( ् etc.) and the Arabic sukūn ( ـْـ ) mark the absence of vowels. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo stroke ( ◌҃ ) and the Hebrew gershayim ( ״ ), which, respectively, mark abbreviations or acronyms, and Greek diacritical marks, which showed that letters of the alphabet were being used as numerals. In the Hanyu Pinyin official romanization system for Chinese, diacritics are used to mark the tones of the syllables in which the marked vowels occur.
In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language, and may vary from case to case within a language. English is the only major modern European language requiring no diacritics for native words (although a diaeresis may be used in words such as "coöperation").In some cases, letters are used as "in-line diacritics", with the same function as ancillary glyphs, in that they modify the sound of the letter preceding them, as in the case of the "h" in the English pronunciation of "sh" and "th".ISO-8859-8-I
ISO-8859-8-I is the IANA charset name for the character encoding ISO/IEC 8859-8 used together with the control codes from ISO/IEC 6429 for the C0 (00–1F hex) and C1 (80–9F) parts. The characters are in logical order.
Escape sequences (from ISO/IEC 6429 or ISO/IEC 2022) are not to be interpreted. Most applications only interpret the control codes for LF, CR, and HT. A few applications also interpret VT, FF, and NEL (in C1). Very few applications interpret the other C0 and C1 control codes.
ISO-8859-8 is sometimes in logical order (HTML, XML), and sometimes in visual (left-to-right) order (plain text without any markup).
Logical order for this charset requires bidi processing for display.ISO/IEC 8859
ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.
ISO/IEC 8859 parts 1, 2, 3, and 4 were originally Ecma International standard ECMA-94.List of Ecma standards
This is a list of standards published by Ecma International, formerly the European Computer Manufacturers Association.List of International Organization for Standardization standards, 8000-8999
This is a list of published International Organization for Standardization (ISO) standards and other deliverables. For a complete and up-to-date list of all the ISO standards, see the ISO catalogue.The standards are protected by copyright and most of them must be purchased. However, about 300 of the standards produced by ISO and IEC's Joint Technical Committee 1 (JTC1) have been made freely and publicly available.
|MacOS code pages("scripts")|
|DOS code pages|
|IBM AIX code pages|
|IBM Apple MacIntoshemulations|
|IBM Adobe emulations|
|IBM DEC emulations|
|IBM HP emulations|
|Windows code pages|
|EBCDIC code pages|
|Unicode / ISO/IEC 10646|
|TeX typesetting system|
|Miscellaneous code pages|
ISO standards by standard number