In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Currently, three private use areas are defined: one in the Basic Multilingual Plane (
U+F8FF), and one each in, and nearly covering, planes 15 and 16 (
U+10FFFD). The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.
Assignments to Private Use Area characters need not be "private" in the sense of strictly internal to an organisation; a number of assignment schemes have been published by several organisations. Such publication may include a font that supports the definition (showing the glyphs), and software making use of the private-use characters (e.g. a graphics character for a "print document" function). By definition, multiple private parties may assign different characters to the same code point, with the consequence that a user may see one private character from an installed font where a different one was intended.
Under the Unicode definition, code points in the Private Use Areas are assigned characters—they are not noncharacters, reserved, or unassigned. Their category is "
Other, private use (Co)", and no character names are specified. No representative glyphs are provided, and character semantics are left to private agreement.
Private-use characters are assigned Unicode code points whose interpretation is not specified by this standard and whose use may be determined by private agreement among cooperating users. These characters are designated for private use and do not have defined, interpretable semantics except by private agreement.
No charts are provided for private-use characters, as any such characters are, by their very nature, defined only outside the context of this standard.
In the Basic Multilingual Plane (plane 0), the block titled Private Use Area has 6400 code points. Planes 15 and 16 are almost[note 1] entirely assigned to two further Private Use Areas, Supplemental Private Use Area-A and Supplemental Private Use Area-B respectively.
Many people and institutions have created character collections for the PUA. Some of these private use agreements are published, so other PUA implementers can aim for unused or less used code points to prevent overlaps. Several characters and scripts previously encoded in private use agreements have actually been fully encoded in Unicode, necessitating mappings from the PUA to other Unicode code points.
One of the more well-known and broadly implemented PUA agreements is maintained by the ConScript Unicode Registry (CSUR). The CSUR, which is not officially endorsed or associated with the Unicode Consortium, provides a mapping for constructed scripts, such as Klingon pIqaD and Ferengi script (Star Trek), Tengwar and Cirth (J.R.R. Tolkien's cursive and runic scripts), Alexander Melville Bell's Visible Speech, and Dr. Seuss' alphabet from On Beyond Zebra. The CSUR previously encoded the undeciphered Phaistos characters, as well as the Shavian and Deseret alphabets, which have all been accepted for official encoding in Unicode.
Another common PUA agreement is maintained by the Medieval Unicode Font Initiative (MUFI). This project is attempting to support all of the scribal abbreviations, ligatures, precomposed characters, symbols, and alternate letterforms found in medieval texts written in the Latin alphabet. The express purpose of MUFI is to experimentally determine which characters are necessary to represent these texts, and to have those characters officially encoded in Unicode. As of Unicode version 5.1, 152 MUFI characters have been incorporated into the official Unicode encoding.
Some agreed-upon PUA character collections exist in part or whole because Unicode Consortium is in no hurry to encode them. Some, such as unrepresented languages, are likely to end up encoded in the future. Some unusual cases such as fictional languages are outside the usual scope of Unicode but not explicitly ruled out by the principles of Unicode, and may show up eventually (such as the Star Trek and Tolkien writing systems). In other cases, the proposed encoding violates one or more Unicode principles and hence is unlikely to ever be officially recognized by Unicode—mostly where users want to directly encode alternate forms, ligatures, or base-character-plus-diacritic combinations (such as the TUNE scheme).
|Publishing organisation||Topic||PUA area used||Font|
|CSUR||Artificial scripts||PUA (BMP) and Plane 15||Code2000|
|MUFI||Medieval scripts||PUA (BMP)||several|
|SIL||Phonetics and languages||PUA (BMP)||Charis SIL|
|TITUS||Ancient and medieval scripts||PUA (BMP)||TITUS Cyberbit Basic|
Informally, the range U+F000 through U+F8FF is known as Corporate Use Area.
U+F000is a numeral succession starting at 13 or 18 in some video games like Agar.io.
U+E0FFis displayed as the "Circle Of Friends" logo and
U+F200is "ubuntu" in the Ubuntu (typeface) with a superscripted "Circle Of Friends" (this itself is
U+E000displays Tux, the mascot of Linux
U+E003is displayed as the Mozilla logo (the dinosaur head).
U+F8FE) in the Private Use Area for symbols not defined in Unicode. Of these,
U+F8FBis known to be reserved for a crown currency symbol ("Kr"), and
U+F8FDwere later mapped to
There are three PUA blocks in Unicode.
|Private Use Area|
(6,400 code points)
|Assigned||6,400 code points|
|Unused||0 reserved code points|
|Unicode version history|
|Note: Version 1.0.1 moved and expanded the Private Use Area block (previously located at U+E800-U+FDFF in version 1.0.0).|
|Supplementary Private Use Area-A|
(65,536 code points)
|Assigned||65,534 code points|
|Unused||0 reserved code points |
|Unicode version history|
|Supplementary Private Use Area-B|
(65,536 code points)
|Assigned||65,534 code points|
|Unused||0 reserved code points |
|Unicode version history|
The concept of reserving specific code points for Private Use is based on similar earlier usage in other character sets. In particular, many otherwise obsolete characters in East Asian scripts continue to be used in specific names or other situations, and so some character sets for those scripts made allowance for private-use characters (such as the user-defined planes of CNS 11643, or gaiji in certain Japanese encodings). The Unicode standard references these uses under the name "End User Character Definition" (EUCD).
Additionally, the C1 control block contains two codes intended for private use "control functions" by ECMA-48: 0x91 private use one (PU1) and 0x92 private use two (PU2). Unicode includes these at U+0091 <control-0091> and U+0092 <control-0092> but defines them as control characters (category
Cc), not private-use characters (category
Encodings which do not have private use areas but have more or less unused areas, such as ISO/IEC 8859 and Shift JIS, have seen uncontrolled variants of these encodings evolve. For Unicode, software companies can use the Private Use Areas for their desired additions.
Invalid NTFS filename characters are encodeded [sic] using the SFM (Services for Macintosh) private use Unicode characters.
Code2000 is a serif and pan-Unicode digital font, which includes characters and symbols from a very large range of writing systems. As of the current final version 1.171 released in 2008, Code2000 is designed and implemented by James Kass to include as much of the Unicode 5.2 standard as practical (whereas 12.0 is the currently-released version), and to support OpenType digital typography features. Code2000 supports the Basic Multilingual Plane. Code2001 and Code2002, related beta fonts created by James Kass, support characters in higher Unicode planes.
The Code2000 font was available as unrestricted shareware, and the Code2001 and Code2002 fonts as freeware, from the author's website until January 2011. The website subsequently went down, and the domain name was later taken by an Australian programming site. As of December 2011 there is no known official download site for the fonts.Code point
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space. Many code points represent single characters but they can also have other meanings, such as for formatting.For example, the character encoding scheme ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex. The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 216) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.Constructed script
A constructed script is a new writing system specifically created by an individual or group, rather than having evolved as part of a language or culture like a natural script. Some are designed for use with constructed languages, although several of them are used in linguistic experimentation or for other more practical ends in existing languages.
The most prominent of constructed scripts may be Glagolitic, Korean Hangul and the International Phonetic Alphabet. Some, such as the Shavian alphabet, Quikscript, Alphabet 26, and the Deseret alphabet, were devised as English spelling reforms. Others, including Alexander Melville Bell's Visible Speech and John Malone's Unifon were developed for pedagogical use. Blissymbols were developed as a written international auxiliary language. Shorthand systems may be considered constructed scripts.Geometric Shapes
Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0-25FF.ISO 14651
ISO/IEC 14651:2016, Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering, is an ISO Standard specifying an algorithm that can be used when comparing two strings. This comparison can be used when collating a set of strings. The standard also specifies a datafile specifying the comparison order, the Common Tailorable Template, CTT. The comparison order is supposed to be tailored for different languages (hence the CTT is regarded as a template and not a default, though the empty tailoring, not changing any weighting, is appropriate in many cases), since different languages have incompatible ordering requirements. One such tailoring is European ordering rules (EOR), which in turn is supposed to be tailored for different European languages.
The Common Tailorable Template (CTT) datafile of this ISO Standard is aligned with the Default Unicode Collation Entity Table (DUCET) datafile of the Unicode Collation Algorithm (UCA) specified in Unicode Technical Standard #10.
This is the fourth edition of the standard and was published on 2016-02-15, corrected on 2016-05-01 and covers up to and including Unicode 8.0. One additional amendment Amd.1:2017 was published in September 2017 and covers up to and including Unicode 9.0.Ideographic Rapporteur Group
The Ideographic Rapporteur Group (IRG) is a subgroup of the ISO/IEC JTC 1/SC 2 working group WG2.International Ideographs Core
International Ideographs Core (IICore) is a subset of up to ten thousand CJK Unified Ideographs characters, which can be implemented on devices with limited memories and capability that make it not feasible to implement the full ISO 10646/Unicode standard.Left-to-right mark
The left-to-right mark (LRM) is a control character (an invisible formatting character) used in computerized typesetting (including word processing in a program like Microsoft Word) of text that contains a mixture of left-to-right text (such as English or Russian) and right-to-left text (such as Arabic, Persian or Hebrew). It is used to set the way adjacent characters are grouped with respect to text direction.List of precomposed Latin characters in Unicode
This is a list of precomposed Latin characters in Unicode. Unicode typefaces may be needed for these to display correctly.Private use area
Private use area may refer to:
ISO/IEC 10646 / Unicode Private Use Areas
ISO 639-3 Private Use Area: qaa to qtz
ISO 15924 Private Use Area: Qaaa to Qabx
ISO 3166-1 alpha-2#User-assigned code elements: AA, QM-QZ, XA-XZ, ZZ
Address space in internet addressing for private networksSpecials (Unicode block)
Specials is a short Unicode block allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five are assigned as of Unicode 12.0:
U+FFF9 INTERLINEAR ANNOTATION ANCHOR, marks start of annotated text
U+FFFA INTERLINEAR ANNOTATION SEPARATOR, marks start of annotating character(s)
U+FFFB INTERLINEAR ANNOTATION TERMINATOR, marks end of annotation block
U+FFFC ￼ OBJECT REPLACEMENT CHARACTER, placeholder in the text for another unspecified object, for example in a compound document.
U+FFFD � REPLACEMENT CHARACTER used to replace an unknown, unrecognized or unrepresentable character
UTF-1 is one way of transforming ISO 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8.Unicode collation algorithm
The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which defines a customizable method to compare two strings. These comparisons can then be used to collate or sort text in any writing system and language that can be represented with Unicode.
Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This datafile specifies the default collation ordering. The DUCET is customizable for different languages. Some such customisations can be found in Common Locale Data Repository (CLDR).
An important open source implementation of UCA is included with the International Components for Unicode, ICU. ICU also supports tailoring and the collation tailorings from CLDR are included in ICU. You can see the effects of tailoring and a large number of language specific tailorings in the on-line ICU Locale Explorer.Voiceless palatal lateral fricative
The voiceless palatal lateral fricative is a type of consonantal sound, used in a few spoken languages.
This sound is somewhat rare; Dahalo has both a palatal lateral fricative and an affricate; Hadza has a series of affricates. In Bura, it is the realization of palatalized /ɬʲ/ and contrasts with [ʎ].
The IPA has no dedicated symbol for this sound. The devoicing and raising diacritics may be used to transcribe it: ⟨ʎ̝̊⟩. However, the "belt" on the existing symbol for a voiceless lateral fricative, ⟨ɬ⟩, forms the basis for other lateral fricatives used in the extIPA, including the palatal, ⟨⟩:
SIL International has added this symbol to the Private Use Areas of their Gentium, Charis, and Doulos fonts, as U+F267 ().
If distinction is necessary, the voiceless alveolo-palatal lateral fricative may be transcribed as ⟨ɬ̠ʲ⟩ (retracted and palatalized ⟨ɬ⟩) or ⟨ʎ̝̊˖⟩ (devoiced, advanced and raised ⟨ʎ⟩); these are essentially equivalent, since the contact includes both the blade and body (but not the tip) of the tongue. The equivalent X-SAMPA symbols are K_-_j or K_-' and L_0_+_r, respectively. A non-IPA letter ⟨ ⟩ can also be used, and so can the non-IPA ⟨ȴ̊˔⟩ (devoiced and raised ⟨ȴ⟩, which is an ordinary "l", plus the curl found in the symbols for alveolo-palatal sibilant fricatives ⟨ɕ, ʑ⟩).Voiceless velar lateral affricate
The voiceless velar lateral affricate is an uncommon speech sound found as a phoneme in the Caucasus and as an allophone in several languages of eastern and southern Africa.
Archi, a Northeast Caucasian language of Dagestan, has two such affricates, plain [k͡ʟ̝̊] and labialized [k͡ʟ̝̊ʷ], though they are further forward than velars in most languages, and might better be called prevelar. Archi also has ejective variants of its lateral affricates, several voiceless lateral fricatives, and a voiced lateral fricative at the same place of articulation, but no alveolar lateral fricatives or affricates.Zulu and Xhosa have a voiceless lateral affricate as an allophone of their voiceless velar affricate. Hadza has an ejective velar lateral affricate as an allophone of its velar ejective affricate. Indeed, in Hadza this [k͡ʟ̝̊ʼ] contrasts with a palatal lateral ejective affricate, [c͡ʎ̝̊ʼ]. ǁXegwi is reported to have contrasted velar /k͡ʟ̝̊/ from alveolar /t͜ɬ/.
Laghuu, a Loloish language of Vietnam, contrasts four velar lateral affricates, /k͡ʟ̝̊ʰ, k͡ʟ̝̊, ɡ͡ʟ̝, ᵑɡ͡ʟ̝/.
The IPA has no separate symbol for the fricative element of these sounds, but SIL International has added a symbol, ⟨⟩, to the Private Use Areas of their Gentium, Charis and Doulos fonts, at U+F268. Thus the fricatives can be written ⟨k͡⟩.Voiceless velar lateral fricative
The voiceless velar lateral fricative is a very rare speech sound. As one element of an affricate, it is found for example in Zulu and Xhosa (see velar lateral ejective affricate). However, a simple fricative has only been reported from a few languages in the Caucasus and New Guinea.
Archi, a Northeast Caucasian language of Dagestan, has four voiceless velar lateral fricatives: plain [ʟ̝̊], labialized [ʟ̝̊ʷ], fortis [ʟ̝̊ː], and labialized fortis [ʟ̝̊ːʷ]. Although clearly fricatives, these are further forward than velars in most languages, and might better be called prevelar. Archi also has a voiced fricative, as well as a voiceless and several ejective lateral velar affricates, but no alveolar lateral fricatives or affricates.In New Guinea, some of the Chimbu–Wahgi languages such as Melpa, Middle Wahgi, and Nii, have a voiceless velar lateral fricative, which they write with a double-bar el (Ⱡ, ⱡ). This sound also appears in syllable coda position as an allophone of the voiced velar lateral fricative in Kuman.The IPA has no separate symbol for these sounds, but they can be transcribed as a devoiced raised velar lateral approximant, ⟨ʟ̝̊⟩ (here the devoicing ring diacritic is placed above the letter to avoid clashing with the raising diacritic). By analogy with existing IPA laterals, a small capital Ɬ is used in the extIPA:
SIL International has added these symbols to the Private Use Areas of their Gentium, Charis and Doulos fonts, at U+F268 ().Z-variant
In Unicode, two glyphs are said to be Z-variants (often spelled zVariants) if they share the same etymology but have slightly different appearances and different Unicode code points. For example, the Unicode characters U+8AAA 說 and U+8AAC 説 are Z-variants. The notion of Z-variance is only applicable to the “CJKV scripts” — Chinese, Japanese, Korean and Vietnamese — and is a subtopic of Han unification.Zero-width joiner
The zero-width joiner (ZWJ) is a non-printing character used in the computerized typesetting of some complex scripts such as the Arabic script or any Indic script. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.
In some cases, such as the second Devanagari example below, the ZWJ follows the second rather than the first character.
When a ZWJ is placed between two emoji characters, it can also result in a new form being shown, such as the family emoji, made up of two adult emoji and one or two child emoji.The character's code point is U+200D ZERO WIDTH JOINER (HTML · ). In the InScript keyboard layout for Indian languages, it is typed by the key combination Ctrl+Shift+1. However, many layouts use the ']' key for this character.
|Unicode: Private Use Areas|
|Definition by character property: |
|Range||Plane||Block name||Number of code points||Note|
|U+E000..U+F8FF||BMP (0)||Private Use Area||6,400|
|U+F0000..U+FFFFD[c]||PUP (15)[d]||Supplemental Private Use Area-A||65,534||UTF-16 encodes these characters using codepoints from the block High Private Use Surrogates (U+DB80..U+DBFF) in the BMP.|
|U+100000..U+10FFFD[c]||PUP (16)[d]||Supplemental Private Use Area-B||65,534|
|On pairs of|