Uralic Phonetic Alphabet

The Uralic Phonetic Alphabet (UPA) or Finno-Ugric transcription system is a phonetic transcription or notational system used predominantly for the transcription and reconstruction of Uralic languages. It was first published in 1901 by Eemil Nestor Setälä, a Finnish linguist.

Unlike the International Phonetic Alphabet (IPA) notational standard which concentrates on accurately and uniquely transcribing the phonemes of a language, the UPA is also used to denote the functional categories of a language, as well as their phonetic quality. For this reason, it is not possible to automatically convert a UPA transcription into an IPA one.

The basic UPA characters are based on the Finnish alphabet where possible, with extensions taken from Cyrillic and Greek orthographies. Small-capital letters and some novel diacritics are also used.


Unlike the IPA, which is usually transcribed with upright characters, the UPA is usually transcribed with italic characters. Although many of its characters are also used in standard Latin, Greek, Cyrillic orthographies or the IPA, and are found in the corresponding Unicode blocks, many are not. These have been encoded in the Phonetic Extensions and Phonetic Extensions Supplement blocks. Font support for these extended characters is very rare; Code2000 and Fixedsys Excelsior are two fonts that do support them. A professional font containing them is Andron Mega; it supports UPA characters in Regular and Italics.


A vowel to the left of a dot is illabial (unrounded); to the right is labial (rounded).

  Palatal Central Velar
Blank vowel trapezoid-three-height
i • ü
e • ö

Other vowels are denoted using diacritics.

The UPA also uses three characters to denote a vowel of uncertain quality:

  • ɜ denotes a vowel of uncertain quality;
  • denotes a back vowel of uncertain quality;
  • ᴕ̈ denotes a front vowel of uncertain quality

If a distinction between close-mid vowels and open-mid vowels is needed, the IPA symbols for the open-mid basic front illabial and back labial vowels, ⟨ɛ⟩ and ⟨ɔ⟩, can be used. However, in keeping with the principles of the UPA, the open-mid front labial and back illabial vowels are still transcribed with the addition of diacritics, as ⟨ɔ̈⟩ and ⟨ɛ̮⟩.


The following table describes the consonants of the UPA. Note that the UPA does not distinguish voiced fricatives from approximants, and does not contain many characters of the IPA such as [ɹ].

UPA consonants
  Stop Fricative Lateral Trill Nasal Click
Bilabial p ʙ b φ β ψ m p˿ b˿
Labiodental ʙ͔ f v ᴍ͔
Dental ϑ δ
Alveolar t d s z š ž ʟ l ʀ r ɴ n t˿ d˿
Dentipalatal (palatalised) ť ᴅ́ ď ś ᴢ́ ź š́ ž́ ʟ́ ĺ ʀ́ ŕ ɴ́ ń  
Prepalatal (palatalised or anterior) ɢ́ ǵ χ́ j ᴎ́ ŋ́
Velar k ɢ g χ γ ŋ k˿ g˿
Postvelar ɢ͔ χ͔ γ͔ ᴎ͔ ŋ͔
Uvular ρ

When there are two or more consonants in a column, the rightmost one is voiced; when there are three, the centre one is partially devoiced.

ˀ denotes a voiced velar spirant.

ᴤ denotes a voiced laryngeal spirant.


UPA modifier characters
Character Unicode Image Description Use
ä U+0308 - umlaut above Palatal (fully front) vowel
U+0323 UPA a-dot below.png dot below Palatal (fronted) variant of vowel
U+032E UPA a-breve below.png breve below Velar (fully back or backed) vowel or variant of vowel
ā U+0304 UPA a-macron.png macron Long form of a vowel; also by duplication
U+0354 UPA a-left arrowhead.png left arrowhead below Retracted form of a vowel or consonant
U+0355 UPA a-right arrowhead.png right arrowhead below Advanced form of a vowel or consonant
U+032D UPA a-circumflex below.png circumflex below Raised variant of a vowel
U+032C UPA a-caron below.png caron below Lowered variant of a vowel
ă U+0306 UPA a-breve.png breve Shorter or reduced vowel
U+032F UPA a-inverted breve below.png inverted breve below Non-syllabic, glide or semi-vowel
ʀ U+0280 Xsampa-Rslash.png small capital Unvoiced or partially voiced version of voiced sound
superscripted character Very short sound
subscripted character Coarticulation due to surrounding sounds
U+1D1E UPA sideways diaresised u.png Rotated (180°) or sideways (−90°) Reduced form of sound

For diphthongs, triphthongs and prosody, the Uralic Phonetic Alphabet uses several forms of the tie or double breve:[1][2]

  • The triple inverted breve or triple breve below indicates a triphthong
  • The double inverted breve, also known as the ligature tie, marks a diphthong
  • The double inverted breve below indicates a syllable boundary between vowels
  • The undertie is used for prosody
  • The inverted undertie is used for prosody.

Differences from IPA

A major difference is that IPA notation distinguishes between phonetic and phonemic transcription by enclosing the transcription between either brackets [aɪ pʰiː eɪ] or slashes /ai pi e/. UPA instead used italics for the former and half bold font for the latter.[3]

For phonetic transcription, numerous small differences from IPA come into relevance:


Close-mid back rounded vowel [o]
Mid back rounded vowel o [o̞] or [ɔ̝]
Open-mid back rounded vowel or å̭  [ɔ]
Voiced dental fricative δ [ð]
Alveolar tap ð [ɾ]
Voiceless alveolar lateral approximant ʟ [l̥]
Velar lateral approximant л [ʟ]
Voiceless alveolar nasal ɴ [n̥]
Uvular nasal ŋ͔ [ɴ]


This section contains some sample words from both Uralic languages and English (using Australian English) along with comparisons to the IPA transcription.

Sample UPA words
Language UPA IPA Meaning
English šᴉp [ʃɪp] 'ship'
English rän [ɹæn] 'ran'
English ʙo̭o̭d [b̥oːd] 'bored'
Moksha və̂ďän [vɤ̈dʲæn] 'I sow'
Udmurt miśkᴉ̑nᴉ̑ [misʲkɪ̈nɪ̈] 'to wash'
Forest Nenets ŋàrŋū̬"ᴲ [ŋɑˑrŋu̞ːʔə̥] 'nostril'
Hill Mari pᴞ·ń(ᴅ́ᴢ̌́ö̭ [ˈpʏnʲd̥͡ʑ̥ø] 'pine'
Skolt Sami pŭə̆ī̮ᵈt̄ėi [pŭə̆ɨːd̆tːəi] 'ermine'


  • Setälä, E. N. (1901). "Über transskription der finnisch-ugrischen sprachen". Finnisch-ugrische Forschungen (in German). Helsingfors, Leipzig (1): 15–52.
  • Sovijärvi, Antti; Peltola, Reino (1970). "Suomalais-ugrilainen tarkekirjoitus" (PDF). Helsingin yliopiston fonetiikan laitoksen julkaisuja (in Finnish). University of Helsinki (9). hdl:10224/4089.
  • Posti, Lauri; Itkonen, Terho (1973). "FU-transkription yksinkertaistaminen. Az FU-átírás egyszerüsítése. Zur Vereinfachung der FU-Transkription. On Simplifying of the FU-transcription". Castrenianumin toimitteita. University of Helsinki (7). ISBN 951-45-0282-5. ISSN 0355-0141.
  • Ruppel, Klaas; Aalto, Tero; Everson, Michael (2009). "L2/09-028: Proposal to encode additional characters for the Uralic Phonetic Alphabet" (PDF).


  1. ^ Uralic Phonetic Alphabet characters for the UCS, 2002-03-20.
  2. ^ Proposal to encode additional characters for the Uralic Phonetic Alphabet, Klaas Ruppel, Tero Aalto, Michael Everson, 2009-01-27.
  3. ^ Setälä, E. N. (1901). Über transskription der finnisch-ugrischen sprachen (in German). Helsingfors, Leipzig. p. 47.
Combining Diacritical Marks Supplement

Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology (Teuthonista). It is an extension of the diacritic characters found in the Combining Diacritical Marks block.


K (named kay ) is the eleventh letter of the modern English alphabet and the ISO basic Latin alphabet. In English, the letter K usually represents the voiceless velar plosive.


L (named el ) is the twelfth letter of the modern English alphabet and the ISO basic Latin alphabet, used in words such as lagoon, lantern, and less.

Latin Extended-C

Latin Extended-C is a Unicode block containing Latin characters for Uighur New Script, the Uralic Phonetic Alphabet, Shona, and Claudian Latin.

Latin epsilon

Latin epsilon or open e (majuscule: Ɛ, minuscule: ɛ) is a letter of the extended Latin alphabet, based on the lowercase of the Greek letter epsilon (ε). It occurs in the orthographies of many Niger–Congo languages, such as Ewe, Akan, and Lingala, and is included in the African reference alphabet.

In the Berber Latin alphabet currently used in Algerian Berber school books, and before that proposed by the French institute INALCO, it represents a voiced pharyngeal fricative [ʕ]. Some authors use ƹayin ⟨ƹ⟩ instead; both letters are similar in shape with the Arabic ʿayn ⟨ع⟩.

The International Phonetic Alphabet (IPA) uses various forms of the Latin epsilon:

U+025B ɛ LATIN SMALL LETTER OPEN E represents the open-mid front unrounded vowel

U+025D ɝ LATIN SMALL LETTER REVERSED OPEN E WITH HOOK represents the rhotacized open-mid central vowel

U+025E ɞ LATIN SMALL LETTER CLOSED REVERSED OPEN E represents the open-mid central rounded vowel (shown as U+029A ʚ LATIN SMALL LETTER CLOSED OPEN E on the 1993 IPA chart)The Uralic Phonetic Alphabet uses various forms of the Latin epsilon:





M (named em ) is the thirteenth letter of the modern English alphabet and the ISO basic Latin alphabet.


N (named en ) is the fourteenth letter in the modern English alphabet and the ISO basic Latin alphabet.

Ou (ligature)

Ou (Majuscule: Ȣ, Minuscule: ȣ) is a ligature of the Greek letters ο and υ which was frequently used in Byzantine manuscripts. This ligature is still seen today on icon artwork in Greek Orthodox churches, and sometimes in graffiti or other forms of informal or decorative writing.

The ligature is now mostly used in the context of the Latin alphabet, interpreted as a ligature of Latin o and u: for example, in the orthography of the Wyandot language and of Algonquian languages of Western Abenaki to represent /ɔ̃/, and in Algonquin to represent /w/, /o/ or /oː/. Today, in Western Abenaki, "ô" is preferred, and in Algonquin, "w" is preferred.

An ou ligature much different in form (with the two letters side-by-side as in most ligatures, as opposed to one on top of the other) was used in the Initial Teaching Alphabet.

The ligature, in both majuscule and minuscule forms, is occasionally used to represent minuscule of "У" in the Romanian Transitional Alphabet, as the glyph for monograph Uk (ꙋ) is rarely available in font sets.

The same ligature was also used in the context of Cyrillic; see Uk (Cyrillic).

The Uralic Phonetic Alphabet uses U+1D15 ᴕ LATIN LETTER SMALL CAPITAL OU and U+1D3D ᴽ MODIFIER LETTER CAPITAL OUto indicate a back vowel of unknown quality.


P (named pee ) is the 16th letter of the modern English alphabet and the ISO basic Latin alphabet.

Phonetic Extensions

Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.


T (named tee ) is the 20th letter in the modern English alphabet and the ISO basic Latin alphabet. It is derived from the Semitic letter taw via the Greek letter tau. In English, it is most commonly used to represent the voiceless alveolar plosive, a sound it also denotes in the International Phonetic Alphabet. It is the most commonly used consonant and the second most common letter in English-language texts.


ẗ is a modified letter of the Latin alphabet, derived from the letter T with a diaeresis on it. It is used in the ISO 233 transliteration of Arabic to represent tāʼ marbūṭa (ﺓ, ﺔ), and also in the Uralic Phonetic Alphabet to represent a tenuis interdental stop [t̪͆].

Only the minuscule form exists in Unicode as a distinct character. The majuscule must be formed with a combination of T and a combining diacritic (T̈), and because of this may not display correctly when using some fonts or systems.

Tie (typography)

The tie is a symbol in the shape of an arc similar to a large breve, used in Greek, phonetic alphabets, and Z notation. It can be used between two characters with spacing as punctuation, or non-spacing as a diacritic. It can be above or below, and reversed. Its forms are called tie, double breve, enotikon or papyrological hyphen, ligature tie, and undertie.

Turned A

Turned A (capital: Ɐ, lowercase: ɐ, math symbol ∀) is a symbol based upon the letter A.

Lowercase ɐ (in two story form) is used in the International Phonetic Alphabet to identify the near-open central vowel. This is not to be confused with the turned alpha or turned script a, ɒ, which is used in the IPA for the open back rounded vowel.

It was used in the 18th century by Edward Lhuyd and William Pryce as phonetic character for the Cornish language. In their books, both Ɐ and ɐ have been used. It was used in the 19th century by Charles Sanders Peirce as a logical symbol for 'un-American' ("unamerican").The symbol ∀ has the same shape as a capital turned A, sans-serif. It is used to represent universal quantification in predicate logic. When it appears in a formula together with a predicate variable, they are referred to as a universal quantifier. In traffic engineering it is used to represent flow, the number of units (vehicles) passing a point in a unit of time.

U+1D44 ᵄ MODIFIER LETTER SMALL TURNED A is used in the Uralic Phonetic Alphabet.


Uralic is an adjective which refers to a group of peoples and their culture. It relates to the Ural region of Russia.

Eskimo–Uralic languages

Indo-Uralic languages

Proto-Uralic homeland hypotheses

Proto-Uralic language

Uralic languages

Uralic mythologies

Uralic neopaganism

Uralic peoples

Uralic Phonetic Alphabet

Uralic–Yukaghir languages

Uvular lateral approximant

The uvular lateral approximant is a type of consonantal sound used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ⟨ʟ̠⟩, and the equivalent X-SAMPA symbol is L\_-. ⟨ʟ̠⟩ may also represent the pharyngeal or epiglottal lateral approximant, a physically possible sound that is not attested in any language. The letter for a back-velar in the Uralic Phonetic Alphabet, ⟨ᴫ⟩, may also be used.


Æ (minuscule: æ) is a grapheme named æsc or ash, formed from the letters a and e, originally a ligature representing the Latin diphthong ae. It has been promoted to the full status of a letter in the alphabets of some languages, including Danish, Norwegian, Icelandic, and Faroese. As a letter of the Old English Latin alphabet, it was called æsc ("ash tree") after the Anglo-Saxon futhorc rune ᚫ ( ) which it transliterated; its traditional name in English is still ash . It was also used in Old Swedish before being changed to ä. In recent times, it is also used to represent a short "a" sound (as in "cat"). Variants include Ǣ ǣ Ǽ ǽ æ̀.


The grapheme Š, š (S with caron) is used in various contexts representing the sh sound usually denoting the voiceless postalveolar fricative or similar voiceless retroflex fricative /ʂ/. In the International Phonetic Alphabet this sound is denoted with ʃ or ʂ, but the lowercase š is used in the Americanist phonetic notation, as well as in the Uralic Phonetic Alphabet. It represents the same sound as the Turkic letter Ş and the Romanian letter Ș (S-comma).

For use in computer systems, Š and š are at Unicode codepoints U+0160 and U+0161 (Alt 0138 and Alt 0154 for input), respectively. In HTML code, the entities Š and š can also be used to represent the characters.


Ǯ (minuscule: ǯ) is a modified letter of the Latin alphabet, formed from ezh (ʒ) with the addition of a caron.

In the Uralic Phonetic Alphabet, it represents the sound [d͡ʒ].

Following its UPA usage, it was adopted in the Skolt Sami alphabet for the same value. It typically appears doubled, where it represents a geminate /dd͡ʒ/. e.g. viǯǯâd "to fetch". The letter is also used in Laz, where it represents [t͡sʼ]. Until 2007 it was also used by Olonets Karelian language.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.