Code page 852

Code page 852 (also known as CP 852, IBM 00852,[1] OEM 852 (Latin II),[2][3] MS-DOS Latin 2[4]) is a code page used under DOS to write Central European languages that use Latin script (such as Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak or Slovene).

Note that code page 852 (DOS Latin 2) is very different from ISO/IEC 8859-2 (ISO Latin-2), although both are informally referred to as "Latin-2" in different language regions.[5]

Some of the box drawing characters of the original DOS code page 437 were sacrificed in order to put in more accented letters (all printable characters from ISO 8859-2 are included). These changes caused display glitches in DOS applications that made use of the box drawing characters to display a GUI-like surface in text mode (e.g. Norton Commander). Several local encodings were invented to avoid the problem, for example the Kamenický encoding for Czech and Slovak.[6]

Character set

The following table shows code page 852.[2][7] Each character is shown with its equivalent Unicode code point. Only the second half of the table (128–255) is shown, the first half (0–127) being the same as code page 437. Differences from code page 437 are boxed.

Code page 852
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
8_
128
Ç
00C7
ü
00FC
é
00E9
â
00E2
ä
00E4
ů
016F
ć
0107
ç
00E7
ł
0142
ë
00EB
Ő
0150
ő
0151
î
00EE
Ź
0179
Ä
00C4
Ć
0106
9_
144
É
00C9
Ĺ
0139
ĺ
013A
ô
00F4
ö
00F6
Ľ
013D
ľ
013E
Ś
015A
ś
015B
Ö
00D6
Ü
00DC
Ť
0164
ť
0165
Ł
0141
×
00D7
č
010D
A_
160
á
00E1
í
00ED
ó
00F3
ú
00FA
Ą
0104
ą
0105
Ž
017D
ž
017E
Ę
0118
ę
0119
¬
00AC
ź
017A
Č
010C
ş
015F
«
00AB
»
00BB
B_
176

2591

2592

2593

2502

2524
Á
00C1
Â
00C2
Ě
011A
Ş
015E

2563

2551

2557

255D
Ż
017B
ż
017C

2510
C_
192

2514

2534

252C

251C

2500

253C
Ă
0102
ă
0103

255A

2554

2569

2566

2560

2550

256C
¤
00A4
D_
208
đ
0111
Đ
0110
Ď
010E
Ë
00CB
ď
010F
Ň
0147
Í
00CD
Î
00CE
ě
011B

2518

250C

2588

2584
Ţ
0162
Ů
016E

2580
E_
224
Ó
00D3
ß
00DF
Ô
00D4
Ń
0143
ń
0144
ň
0148
Š
0160
š
0161
Ŕ
0154
Ú
00DA
ŕ
0155
Ű
0170
ý
00FD
Ý
00DD
ţ
0163
´
00B4
F_
240
SHY
00AD
˝
02DD
˛
02DB
ˇ
02C7
˘
02D8
§
00A7
÷
00F7
¸
00B8
°
00B0
¨
00A8
˙
02D9
ű
0171
Ř
0158
ř
0159

25A0
NBSP
00A0

See also

References

  1. ^ "Code Page CPGID 00852" (PDF). IBM. Retrieved 11 Nov 2011.
  2. ^ a b "OEM 852". Go Global Developer Center. Microsoft. Retrieved 11 Nov 2011.
  3. ^ "Code Pages Supported by Windows: OEM Code Pages". Go Global Developer Center. Microsoft. Retrieved 11 Oct 2011.
  4. ^ "Code Page 852 DOS Latin 2". Developing International Software. Microsoft. Retrieved 11 Nov 2011.
  5. ^ The Czech and Slovak Character Encoding Mess Explained / PC Latin 2
  6. ^ The Czech and Slovak Character Encoding Mess Explained / Kamenicky
  7. ^ "cp852_DOSLatin2 to Unicode table" (TXT). The Unicode Consortium. Retrieved 11 Nov 2011.
Code page

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers.

The term "code page" originated from IBM's EBCDIC-based mainframe systems, but Microsoft, SAP, and Oracle Corporation are among the few vendors which use this term. The majority of vendors identify their own character sets by a name. In the case when there is a plethora of character sets (like in IBM), identifying character sets through a number is a convenient way to distinguish them. Originally, the code page numbers referred to the page numbers in the IBM standard character set manual, a condition which has not held for a long time. Vendors that use a code page system allocate their own code page number to a character encoding, even if it is better known by another name; for example, UTF-8 has been assigned page numbers 1208 at IBM, 65001 at Microsoft, and 4110 at SAP.

Hewlett-Packard uses a similar concept in its HP-UX operating system and its Printer Command Language (PCL) protocol for printers (either for HP printers or not). The terminology, however, is different: What others call a character set, HP calls a symbol set, and what IBM or Microsoft call a code page, HP calls a symbol set code. HP developed a series of symbol sets, each with an associated symbol set code, to encode both its own character sets and other vendors’ character sets.

The multitude of character sets leads many vendors to recommend Unicode.

Code page 437

Code page 437 is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes ASCII codes 32–126, extended codes for accented letters (diacritics), some Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII" (one of many mutually incompatible ASCII extensions).

This character set remains the primary font in the core of any EGA and VGA-compatible graphics card. Text shown when a PC reboots, before any other font can be loaded from a storage medium, typically is rendered in this character set. Many file formats developed at the time of the IBM PC are based on code page 437 as well.

Czech orthography

Czech orthography is a system of rules for correct writing (orthography) in the Czech language.

The Czech orthographic system is diacritic. The caron is added to standard Latin letters for expressing sounds which are foreign to the Latin language (but some digraphs have been kept - ch, dž). The acute accent is used for long vowels.

The Czech orthography is considered the model for many other Slavic languages using the Latin alphabet; the Slovene and Slovak orthographies as well as Gaj's Latin alphabet are all based on Czech orthography, in that they use similar diacritics and also have a similar relationship between the letters and the sounds they represent.

Double acute accent

The double acute accent ( ˝ ) is a diacritic mark of the Latin script. It is used primarily in written Hungarian, and consequently is sometimes referred to by typographers as Hungarumlaut. The signs formed with a regular umlaut are letters in their own right in the Hungarian alphabet—for instance, they are separate letters for the purpose of collation. Letters with the double acute, however, are considered variants of their equivalents with the umlaut, being thought of as having both an umlaut and an acute accent.

ISO/IEC 8859-2

ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions. Code page 912 is an extension.

ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. 0.1% of all web pages use ISO 8859-2 in December 2018. Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned Code page 1111 to ISO 8859-2.

Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place).

These code values can be used for the following languages:

Albanian

Bosnian

Croatian

Czech

German (fully compatible with ISO/IEC 8859-1 for German texts)

Hungarian

Polish

Serbian Latin

Slovak

Slovene

Upper Sorbian

Lower Sorbian

Turkmen.It can also be used for Romanian, but it is not well suited for that language, due to lacking letters s and t with commas below, although it provides s and t with similar-looking cedillas. These letters were unified in the first versions of the Unicode standard, meaning that the appearance with cedilla or with a comma was treated as a glyph choice rather than as separate characters; fonts intended for use with Romanian should therefore, in theory, have characters with a comma below at those code points.

Microsoft did not really provide such fonts for computers sold in Romania. Still, ISO/IEC 8859-2 and Windows-1250 (with the same problem) have been heavily used for Romanian. Unicode subsequently disunified the comma variants from the cedilla variants, and has since taken the lead for web pages, which however often have s and t with cedilla anyway. Unicode notes as of 2014 that disunifying the letters with comma below was a mistake, causing corruptions of Romanian data: pre-existing data and input methods would still contain the older cedilla codepoints, complicating text searching.

Kamenický encoding

The Kamenický encoding (Czech: kódování Kamenických), named for the brothers Jiří and Marian Kamenický, was a code page for personal computers running DOS, very popular in Czechoslovakia (since 1993, the Czech Republic and Slovakia) around 1985–1995. Another name for this encoding is KEYBCS2, the name of the Terminate and Stay Resident utility which implemented the matching keyboard driver. It was also named KAMENICKY.It was based on the code page 437 encoding (with accented characters for Western-European languages) where most of the characters from code points 128 to 173 were replaced by Czech and Slovak characters chosen so that the glyphs of the replacement characters resembled those of the original as closely as possible, e. g. č in the place of ç. This ensured that text in the Kamenický encoding was (barely) readable even on older or cheap computers with the original fonts (which were often in videocard ROM, making modification difficult if not impossible).

A supplemental feature was that the block graphic and box-drawing characters of code page 437 remained unchanged (IBM's official Central-European code page 852 did not have this property, making programs like Norton Commander look funny with corners and joints of border lines broken by accented letters).

Some ambiguity exists in the official code page assignment for the Kamenický encoding:

Some dot matrix printers of the NEC Pinwriter series, namely the P3200/P3300 (P20/P30), P6200/P6300 (P60/P70), P9300 (P90), P7200/P7300 (P62/P72), P22Q/P32Q, P3800/P3900 (P42Q/P52Q), P1200/P1300 (P2Q/P3Q), P2000 (P2X) and P8000 (P72X), supported the installation of optional font EPROMs. The optional ROM #2 "East Europe" included this encoding, invokable via escape sequence ESC R (n) with (n) = 23. While named "Kamenický" in the documentation, it was originally advertised by NEC as code page 867 (CP867) or "Czech". (However, it was never registered with IBM under that ID, as IBM registered another unrelated code page Israel: Hebrew, based on CP862, under that ID in 1998.) The Fujitsu DL6400 (Pro) / DL6600 (Pro) printers support the Kamenický encoding as well.The encoding was also sometimes called code page 895 (CP895), for example with FoxPro, in the WordPerfect text processor and under the Arachne web browser for DOS, but IBM uses this code page number for a different encoding, CM/Group 2: 7-bit Latin SBCS: Japanese (EUC-JP JIS-Roman) or Japan 7-Bit Latin (00895), and the IANA does not recognize the number at all. The DOS code page switching file NECPINW.CPI for NEC Pinwriters supported the Kamenický encoding under both, code page 867 and 895 as well.The widespread use of the Kamenický encoding was undermined neither by IBM's code page 852, nor by the Windows 3.1 introducing Microsoft Central Europe code page 1250. Only with Windows 95 and the spreading deployment of Microsoft Office did users begin to use code page 1250, which in turn is now obsoleted by Unicode.

Lotus Multi-Byte Character Set

The Lotus Multi-Byte Character Set (LMBCS) is a proprietary multi-byte character encoding originally conceived in 1988 at Lotus Development Corporation with input from Bob Balaban and others. Created around the same time and addressing some of the same problems, LMBCS could be viewed as parallel development and possible alternative to Unicode. For maximum compatibility, later issues of LMBCS incorporate UTF-16 as a subset.Commercially, LMBCS was first introduced as the default character set of Lotus 1-2-3 Release 3 for DOS in March 1989 and Lotus 1-2-3/G Release 1 for OS/2 in 1990 replacing the 8-bit Lotus International Character Set (LICS) and ASCII used in earlier DOS-only versions of Lotus 1-2-3 and Symphony. LMBCS is also used in IBM/Lotus SmartSuite, Notes and Domino, as well as in a number of third-party products.

LMBCS encodes the characters required for languages using the Latin, Arabic, Hebrew, Greek and Cyrillic scripts, the Thai, Chinese, Japanese and Korean writing systems, and technical symbols.

Mazovia encoding

Mazovia encoding is used under DOS to represent Polish texts. Basically it is code page 437 with some positions filled with Polish letters. An important feature was that the block graphic characters of code page 437 remained unchanged. In contrast, IBM's official Central-European code page 852 did not preserve all block graphics, causing incorrect display in programs such as Norton Commander.

The Mazovia encoding was designed in 1984 by Jan Klimowicz of IMM. It was designed as part of a project to develop and produce a Polish IBM PC clone codenamed "Mazovia 1016". The code page was therefore optimized for that computer's typical peripheral devices, a graphics card with dual switchable graphics, a keyboard using US English and Russian layouts and printers with Polish fonts. In 1986, the Polish National Bank (NBP) adopted the Mazovia encoding as a standard, thereby causing its widespread acceptance and distribution in Poland. They also were instrumental in Ipaco producing compatible computers with Taiwanese components under the direction of Zbigniew Jakubas and Krzysztof Sochacki.

Some ambiguity exists in the official code page assignment for the Mazovia encoding:

PTS-DOS and S/DOS support this encoding under code page 667 (CP667). The same encoding was also called code page 991 (CP991) in some Polish software, however, the FreeDOS implementation of code page 991 seems not to be identical to this original encoding.

The DOS code page switching file NECPINW.CPI for NEC Pinwriters supports the Mazovia encoding under both code pages 667 and 991. FreeDOS has meanwhile introduced support for the original Mazovia encoding under code page 790 (CP790) as well. The Fujitsu DL6400 (Pro) / DL6600 (Pro) printers support the Mazovia encoding as well.

Early telecommunications
ISO/IEC 8859
Bibliographic use
National standards
EUC
ISO/IEC 2022
MacOS code pages("scripts")
DOS code pages
IBM AIX code pages
IBM Apple MacIntoshemulations
IBM Adobe emulations
IBM DEC emulations
IBM HP emulations
Windows code pages
EBCDIC code pages
Platform specific
Unicode / ISO/IEC 10646
TeX typesetting system
Miscellaneous code pages
Related topics

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.