JIS X 0213

JIS X 0213 is a Japanese Industrial Standard defining coded character sets for encoding the characters used in Japan. This standard extends JIS X 0208. The first version was published in 2000 and revised in 2004 (JIS2004) and 2012.[1][2][3][4] As well as adding a number of special characters, characters with diacritic marks, etc., it included an additional 3,625 kanji. The full name of the standard is 7-bit and 8-bit double byte coded extended KANJI sets for information interchange (7ビット及び8ビットの2バイト情報交換用符号化拡張漢字集合 Nana-Bitto Oyobi Hachi-Bitto no Ni-Baito Jōhō Kōkan'yō Fugōka Kakuchō Kanji Shūgō).

JIS X 0213 has two "planes" (94×94 character tables). Plane 1 is a superset of JIS X 0208 containing kanji sets level 1 to 3 and non-kanji characters such as Hiragana, Katakana (including letters used to write the Ainu language), Latin, Greek and Cyrillic alphabets, digits, symbols and so on. Plane 2 contains only level 4 kanji set. Total number of the defined characters is 11,233. Each character is capable of being encoded in two bytes.

This standard largely replaced the rarely used JIS X 0212-1990 "supplementary" standard, which included 5,801 kanji and 266 non-kanji. Of the additional 3,695 kanji in JIS X 0213, all but 952 were already in JIS X 0212.

JIS X 0213 defines several 7-bit and 8-bit encodings including EUC-JIS-2004, ISO-2022-JP-2004 and Shift JIS-2004. Also, it defines the mapping from each of these encodings to ISO/IEC 10646 (Unicode) for each character.

Unicode version 3.2 incorporated all characters of JIS X 0213 except for the characters that could be represented using combining characters. Because about 300 kanji are in Unicode Plane 2, Unicode implementations supporting only the Basic Multilingual Plane cannot handle all of the JIS X 0213 characters. This is not an issue for most applications, however.

JIS X 0213 2000-2004
Glyph variants changed by the 2004 edition (click to enlarge).

The 2004 edition of JIS X 0213 changed the recommended renderings of 168 kanji.[5]

JIS X 0213
Language(s)Japanese, English, Ainu, Russian
Partial support: Greek, Chinese
StandardJIS X 0213
ClassificationISO 2022, DBCS, CJK encoding
ExtendsJIS X 0208
Encoding formatsShift_JIS-2004
ISO-2022-JP-2004
EUC-JIS-2004
Preceded byJIS X 0208, JIS X 0212
Ver3benzu-EN
Euler diagram comparing repertoires of JIS X 0208, JIS X 0212, JIS X 0213, Windows-31J, the Microsoft standard repertoire and Unicode.

See also

References

  1. ^ "日本工業標準調査会:データベース-JIS詳細表示". 2012-02-20. Retrieved 15 Mar 2015.
  2. ^ "日本工業標準調査会:データベース-JIS規格詳細表示". 2000-01-20. Retrieved 15 Mar 2015.
  3. ^ "日本工業標準調査会:データベース-JIS規格詳細表示". 2004-02-20. Retrieved 15 Mar 2015.
  4. ^ "日本工業標準調査会:データベース-JIS規格詳細表示". 2008-10-01. Retrieved 15 Mar 2015.
  5. ^ http://kakijun.jp/main/jis2004.html ‹See Tfd›(in Japanese)

External links

CJK Compatibility

CJK Compatibility is a Unicode block containing square symbols (both CJK and Latin alphanumeric) encoded for compatibility with east Asian character sets.

Characters U+337B through U+337E are the Japanese era symbols Heisei (㍻), Shōwa (㍼), Taishō (㍽) and Meiji (㍾) (also available in certain legacy sets, such as the "NEC special characters" extension for JIS X 0208, as included in Microsoft's version and later JIS X 0213). The Reiwa era symbol (㋿) is in Enclosed CJK Letters and Months (the CJK Compatibility block having been fully allocated by the time of its commencement).

CJK Compatibility Ideographs

CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. Such encodings include the South Korean KS X 1001:1998 (U+F900–U+FA0B, 268 characters), Taiwanese Big5 (U+FA0C–U+FA0D, 2 characters), Japanese IBM 32 (CP932 variant; U+FA0E–U+FA2D, 32 characters), South Korean KS X 1001:2004 (U+FA2E–U+FA2F, 2 character), Japanese JIS X 0213 (U+FA30–U+FA6A, 59 characters), Japanese ARIB STD-B24 (U+FA6B–U+FA6D, 3 characters) and the North Korean KPS 10721-2000 (U+FA70–U+FAD9, 106 characters) source standards.

In ensuing versions of the standard, more characters have been added to the block. These even include a few regular ideographs (with the Unified_Ideograph property) that do not have duplicates (U+FA0E–U+FA0F, U+FA11, U+FA13–U+FA14, U+FA1F, U+FA21, U+FA23–U+FA24 and U+FA27–U+FA29).The block has dozens of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).

These sequences specify the desired glyph variant for a given Unicode character.

CJK Unified Ideographs

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 12.0, Unicode defines a total of 87,887 CJK Unified Ideographs.The terms ideographs or ideograms may be misleading, since the Chinese script is not strictly a pictographic or ideographic system.

Historically, Vietnam used Chinese ideographs too, so sometimes the abbreviation "CJKV" is used. This system was replaced by the Latin-based Vietnamese alphabet in the 1920s.

Extended Unix Code

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.

The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94 characters, or 8836 (942) characters, or 830584 (943) characters, as sequences of 7-bit codes. Only ISO-2022 compliant character sets can have EUC forms. Up to four coded character sets (referred to as G0, G1, G2, and G3 or as code sets 0, 1, 2, and 3) can be represented with the EUC scheme.

G0 is almost always an ISO-646 compliant coded character set such as US-ASCII, ISO 646:KR (KS X 1003) or ISO 646:JP (the lower half of JIS X 0201) that is invoked on GL (i.e. with the most significant bit cleared). An exception from US-ASCII is that 0x5C (backslash in US-ASCII) is often used to represent a Yen sign in EUC-JP (see below) and a Won sign in EUC-KR.

To get the EUC form of an ISO-2022 character, the most significant bit of each 7-bit byte of the original ISO 2022 codes is set (by adding 128 to each of these original 7-bit codes); this allows software to easily distinguish whether a particular byte in a character string belongs to the ISO-646 code or the ISO-2022 (EUC) code.

The most commonly used EUC codes are variable-width encodings with a character belonging to G0 (ISO-646 compliant coded character set) taking one byte and a character belonging to G1 (taken by a 94x94 coded character set) represented in two bytes. The EUC-CN form of GB2312 and EUC-KR are examples of such two-byte EUC codes. EUC-JP includes characters represented by up to three bytes whereas a single character in EUC-TW can take up to four bytes.

Modern applications are more likely to use UTF-8, which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors.

Extended shinjitai

Extended shinjitai (拡張新字体, kakuchō shinjitai, lit. "extended new character form") is the extension of the shinjitai (officially simplified kanji). They are the simplified versions of some of the hyōgaiji (表外字, kanji not included in the jōyō kanji list). They are unofficial characters; the official forms of these hyōgaiji are still kyūjitai (traditional characters).

GNU Unifont

The GNU Unifont by Roman Czyborra is a free Unicode bitmap font using an intermediate bitmapped font format. The main Unifont covers the entire Basic Multilingual Plane (BMP), the "Upper" companion covers significant parts of the Supplementary Multilingual Plane, and the "Unifont JP" companion contains Japanese kanji present in the JIS X 0213 character set.

It is present in most free operating systems and windowing systems such as Linux, XFree86 or the X.Org Server and some embedded firmware such as RockBox. The font is released under the GNU General Public License Version 2+ with a font embedding exception (embedding the font in a document does not require the document to be placed under the same license).

It became a GNU package in October 2013. The current maintainer is Paul Hardy.

Ga (kana)

が, in hiragana, or ガ in katakana (が ), is one of the Japanese kana, which each represent one mora. Both represent [ɡa].

Gi (kana)

ぎ, in hiragana, or ギ in katakana (pronunciation ), is one of the Japanese kana, which each represent one mora. Both represent [ɡi].

Hiragana (Unicode block)

Hiragana is a Unicode block containing hiragana characters for the Japanese language.

JIS X 0208

JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange (7ビット及び8ビットの2バイト情報交換用符号化漢字集合, Nana-Bitto Oyobi Hachi-Bitto no Ni-Baito Jōhō Kōkan'yō Fugōka Kanji Shūgō). It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.

JIS X 0212

JIS X 0212 is a Japanese Industrial Standard defining a coded character set for encoding supplementary characters for use in Japanese. This standard is intended to supplement JIS X 0208 (Code page 952). It is numbered 953 or 5049 as an IBM code page (see below).

It is one of the source standards for Unicode's CJK Unified Ideographs.

JIS encoding

In computing, JIS encoding refers to several Japanese Industrial Standards for encoding the Japanese language. Strictly speaking, the term means either:

A set of standard coded character sets for Japanese, notably:

JIS X 0201, the Japanese version of ISO 646 (ASCII) containing the base 7-bit ASCII characters (with some modifications) and 64 half-width katakana characters.

JIS X 0208, the most common kanji character set containing 6,879 characters, including 6355 kanji and 524 other characters (one 94 by 94 plane)

JIS X 0212, an supplement for JIS X 0208 which adds 5801 kanji, totalling 12156 kanji (a second 94 by 94 plane)

JIS X 0213, which extends JIS X 0208 (two planes)

JIS X 0202 (also known as ISO-2022-JP), a set of encoding mechanisms for sending JIS character data over transmission mediums that only support 7-bit data.In practice, "JIS encoding" usually refers to JIS X 0208 character data encoded with JIS X 0202. For instance, the IANA uses the JIS_Encoding label to refer to JIS X 0202, and the ISO-2022-JP label to refer to the profile thereof defined by RFC 1468.Other encoding mechanisms for JIS characters include the Shift JIS encoding and EUC-JP. Shift JIS adds the kanji, full-width hiragana and full-width katakana from JIS X 0208 to JIS X 0201 in a backward compatible way. Shift JIS is perhaps the most widely used encoding in Japan, as the compatibility with the single-byte JIS X 0201 character set made it possible for electronic equipment manufacturers (such as cash register manufacturers) to offer an upgrade from older cheaper equipment that was not capable of displaying kanji to newer equipment while retaining character-set compatibility.

EUC-JP is used on UNIX systems, where the JIS encodings are incompatible with POSIX standards.

A more recent alternative to JIS coded characters is Unicode (UCS coded characters), particularly in the UTF-8 encoding mechanism.

Japanese punctuation

Japanese punctuation (Japanese: 約物, Hepburn: yakumono) includes various written marks (besides characters and numbers), which differ from those found in European languages, as well as some not used in formal Japanese writing but frequently found in more casual writing, such as exclamation and question marks.

Japanese can be written horizontally or vertically, and some punctuation marks adapt to this change in direction. Parentheses, curved brackets, square quotation marks, ellipses, dashes, and swung dashes are rotated clockwise 90° when used in vertical text (see diagram).

Japanese punctuation marks are usually full width (that is, occupying an area that is the same as the surrounding characters).

Punctuation was not widely used in Japanese writing until translations from European languages became common in the 19th century.

Katakana (Unicode block)

Katakana is a Unicode block containing katakana characters for the Japanese and Ainu languages.

Katakana Phonetic Extensions

Katakana Phonetic Extensions is a Unicode block containing additional small katakana characters for writing the Ainu language, in addition to characters in the Katakana block.

Further small katakana are present in the Small Kana Extension block.

List of Japanese typographic symbols

This page lists Japanese typographic symbols that are not included in kana or kanji.

List of typefaces included with macOS

This list of fonts contains every font shipped with Mac OS X 10.0 through macOS 10.14, including any that shipped with language-specific updates from Apple (primarily Korean and Chinese fonts). For fonts shipped only with Mac OS X 10.5,

please see Apple's documentation.

Meiryo

Meiryo (メイリオ, Meirio) is a Japanese sans-serif gothic typeface. Microsoft bundled Meiryo with Office Mac 2008 as part of the standard install, and it replaces MS Gothic as the default system font for Vista on Japanese systems.

It was decided that a new Japanese font was needed, as the current ones (mainly MS Gothic and MS Mincho) are incompatible with Microsoft's ClearType subpixel rendering technology: Meiryo is intended to increase legibility of characters on LCD screens. ClearType has been available in Windows for Latin fonts since the release of Windows XP in October 2001. However, unlike Latin fonts which use the ClearType hinting system for all sizes, the Japanese fonts distributed with Windows included embedded bitmap versions of the fonts in small sizes. Although fonts using only hinted CJK glyphs exist (such as Arial Unicode MS), they had not been distributed with Windows prior to Vista.

Shift JIS

Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1. 0.4% of all web pages used Shift JIS in September 2018, a decline from 1.3% in July 2014.

Early telecommunications
ISO/IEC 8859
Bibliographic use
National standards
EUC
ISO/IEC 2022
MacOS code pages("scripts")
DOS code pages
IBM AIX code pages
IBM Apple MacIntoshemulations
IBM Adobe emulations
IBM DEC emulations
IBM HP emulations
Windows code pages
EBCDIC code pages
Platform specific
Unicode / ISO/IEC 10646
TeX typesetting system
Miscellaneous code pages
Related topics

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.