Romanization of Persian

Romanization of Persian or Latinization of Persian is the representation of the Persian language (Farsi, Dari and Tajik) with the Latin script. Several different romanization schemes exist, each with its own set of rules driven by its own set of ideological goals.

Romanization paradigms

Because the Perso-Arabic script is an abjad writing system (with a consonant-heavy inventory of letters), many distinct words in standard Persian can have identical spellings, with widely varying pronunciations that differ in their (unwritten) vowel sounds. Thus a romanization paradigm can follow either transliteration (which mirrors spelling and orthography) or transcription (which mirrors pronunciation and phonology).

The Latin script plays in Iran the role of a second script. For the proof of this assertion it is sufficient to take a look at the city and street signs or the Internet addresses in all countries. On the other hand, experience has shown that efforts to teach millions of Iranian young people abroad in reading and writing Persian mostly prove to be unsuccessful, due to the lack of daily contact with the Persian script. It seems that a way out of this dilemma has been found; and that is the use of the Latin script parallel to the Persian script.


Transliteration (in the strict sense) attempts to be a complete representation of the original writing, so that an informed reader should be able to reconstruct the original spelling of unknown transliterated words. Transliterations of Persian are used to represent individual Persian words or short quotations, in scholarly texts in English or other languages that do not use the Arabic alphabet.

A transliteration will still have separate representations for different consonants of the Persian alphabet that are pronounced identically in Persian. Therefore, transliterations of Persian are often based on transliterations of Arabic.[1] The representation of the vowels of the Perso-Arabic alphabet is also complex, and transliterations are based on the written form.

Transliterations commonly used in the English-speaking world include BGN/PCGN romanization and ALA-LC Romanization.

Non-academic English-language quotation of Persian words usually uses a simplification of one of the strict transliteration schemes (typically omitting diacritical marks) and/or unsystematic choices of spellings meant to guide English speakers using English spelling rules towards an approximation of the Persian sounds.


Transcriptions of Persian attempt to straightforwardly represent Persian phonology in the Latin script, without requiring a close or reversible correspondence with the Perso-Arabic script, and also without requiring a close correspondence to English phonetic values of Roman letters.

Main romanization schemes

Comparison table

Unicode Persian
IPA DMG (1969) ALA-LC (1997) BGN/PCGN (1958) EI (1960) EI (2012) UN (1967) UN (2012)
U+0627 ا ʔ, ∅[a] ʾ, —[b] ’, —[b] ʾ
U+0628 ب b b
U+067E پ p p
U+062A ت t t
U+062B ث s t͟h s
U+062C ج ǧ j j d͟j j j
U+0686 چ č ch ch č č ch č
U+062D ح h ḩ/ḥ[c] h
U+062E خ x kh kh k͟h kh x
U+062F د d d
U+0630 ذ z d͟h z
U+0631 ر r r
U+0632 ز z z
U+0698 ژ ʒ ž zh zh z͟h ž zh ž
U+0633 س s s
U+0634 ش ʃ š sh sh s͟h š sh š
U+0635 ص s ş/ṣ[c] ş s
U+0636 ض z ż ż z
U+0637 ط t ţ/ṭ[c] ţ t
U+0638 ظ z z̧/ẓ[c] z
U+0639 ع ʿ [b] ʿ ʿ
U+063A غ ɢ~ɣ ġ gh gh g͟h gh q
U+0641 ف f f
U+0642 ق ɢ~ɣ q q
U+06A9 ک k k
U+06AF گ ɡ g
U+0644 ل l l
U+0645 م m m
U+0646 ن n n
U+0648 و v~w[a][d] v v, w[e] v
U+0647 ه h[a] h h h[f] h h h[f] h[f]
U+0629 ة ∅, t h[g] t[h] h[g]
U+06CC ی j[a] y
U+0621 ء ʔ, ∅ ʾ ʾ
U+0624 ؤ ʔ, ∅ ʾ ʾ
U+0626 ئ ʔ, ∅ ʾ ʾ
Unicode Final Medial Initial Isolated IPA DMG (1969) ALA-LC (1997) BGN/PCGN (1958) EI (2012) UN (1967) UN (2012)
U+064E ◌َ ◌َ اَ ◌َ æ a a a a a a
U+064F ◌ُ ◌ُ اُ ◌ُ o o o o u o o
U+0648 U+064F ◌ﻮَ ◌ﻮَ ◌وَ o[j] o o o u o o
U+0650 ◌ِ ◌ِ اِ ◌ِ e e i e e e e
U+064E U+0627 ◌َا ◌َا أ ◌َا ɑː~ɒː ā ā ā ā ā ā
U+0622 ◌ﺂ ◌ﺂ آ ◌آ ɑː~ɒː ā, ʾā[k] ā, ’ā[k] ā ā ā ā
U+064E U+06CC ◌َﯽ ◌َی ɑː~ɒː ā á á ā á ā
U+06CC U+0670 ◌ﯽٰ ◌یٰ ɑː~ɒː ā á á ā ā ā
U+064F U+0648 ◌ُﻮ ◌ُﻮ اُو ◌ُو uː, oː[e] ū ū ū u, ō[e] ū u
U+0650 U+06CC ◌ِﯽ ◌ِﯿ اِﯾ ◌ِی iː, eː[e] ī ī ī i, ē[e] ī i
U+064E U+0648 ◌َﻮ ◌َﻮ اَو ◌َو ow~aw[e] au aw ow ow, aw[e] ow ow
U+064E U+06CC ◌َﯽ ◌َﯿ اَﯾ ◌َی ej~aj[e] ai ay ey ey, ay[e] ey ey
U+064E U+06CC ◌ﯽ ◌ی –e, –je –e, –ye –i, –yi –e, –ye –e, –ye –e, –ye –e, –ye
U+06C0 ◌ﮥ ◌ﮤ –je –ye –’i –ye –ye –ye –ye


  1. ^ a b c d Used as a vowel as well.
  2. ^ a b c Hamza and ayn are not transliterated at the beginning of words.
  3. ^ a b c d The dot below may be used instead of cedilla.
  4. ^ At the beginning of words the combination ⟨خو⟩ was pronounced /xw/ or /xʷ/ in Classical Persian. In modern varieties the glide /ʷ/ has been lost, though the spelling has not been changed. It may be still heard in Dari as a relict pronunciation. The combination /xʷa/ was changed to /xo/ (see below).
  5. ^ a b c d e f g h i In Dari.
  6. ^ a b c Not transliterated at the end of words.
  7. ^ a b In the combination ⟨یة⟩ at the end of words.
  8. ^ When used instead of ⟨ت⟩ at the end of words.
  9. ^ Diacritical signs (harakat) are rarely written.
  10. ^ After ⟨خ⟩ from the earlier /xʷa/. Often transliterated as xwa or xva. For example, خور /xor/ "sun" was /xʷar/ in Classical Persian.
  11. ^ a b After vowels.

Pre-Islamic period

In the pre-Islamic period Old and Middle Persian employed various scripts including Old Persian cuneiform, Pahlavi and Avestan scripts. For each period there are established transcriptions and transliterations by prominent linguists.[5][9][10][11][12]

IPA Old Persian[i][ii] Middle Persian
p p
f f
b b
β~ʋ~w β β/w
t t t, t̰
θ θ/ϑ
d d
ð (δ) δ
θr ç/ϑʳ θʳ/ϑʳ
s s
z z
ʃ š š, š́, ṣ̌
ʒ ž
c~tʃ c/č
ɟ~dʒ j/ǰ
k k
x x x, x́
g g g, ġ
ɣ ɣ/γ
h h
m m m, m̨
ŋ ŋ, ŋʷ
ŋʲ ŋ́
n n n, ń, ṇ
r r
l l
w~ʋ~v v w v
j y y, ẏ
a a
ã ą, ą̇
ə ə
e (e) e
i i
o (o) o
u u
ɑː~ɒː å/ā̊
ə ə̄
əː ē


  1. ^ a b c Slash signifies equal variants.
  2. ^ There exist some differences in transcription of Old Persian preferred by different scholars:
    • ā = â
    • ī, ū = i, u
    • x = kh, ḵ, ḥ, ḫ
    • c/č = ǩ
    • j/ǰ = ǧ
    • θ = ϑ, þ, th, ṯ, ṭ
    • ç = tr, θʳ, ϑʳ, ṙ, s͜s, s̀
    • f = p̱
    • y, v = j, w.

Other romanization schemes

Bahá'í Persian romanization

Bahá'ís use a system standardized by Shoghi Effendi, which he initiated in a general letter on March 12, 1923.[13] The Bahá'í transliteration scheme was based on a standard adopted by the Tenth International Congress of Orientalists which took place in Geneva in September 1894. Shoghi Effendi changed some details of the Congress's system, most notably in the use of digraphs in certain cases (e.g. sh instead of š), and in incorporating the solar letters when writing the definite article al- (Arabic: ال) according to pronunciation (e.g. ar-Rahim, as-Saddiq, instead of al-Rahim, al-Saddiq).

A detailed introduction to the Bahá'í Persian romanization can usually be found at the back of a Bahá'í scripture.

ASCII Internet romanizations

It is common to write Persian language with only the Latin alphabet (as opposed to the Persian alphabet) especially in online chat, social networks, emails and SMS. It has developed and spread due to a former lack of software supporting the Persian alphabet, and/or due to a lack of knowledge about the software that was available. Although Persian writing is supported in recent operating systems, there are still many cases where the Persian alphabet is unavailable and there is a need for an alternative way to write Persian with the basic Latin alphabet. This way of writing is sometimes called Fingilish or Pingilish (a portmanteau of Farsi or Persian and English). In most cases this is an ad hoc simplification of the scientific systems listed above (such as ALA-LC or BGN/PCGN), but ignoring any special letters or diacritical signs. ع may be written using the numeral "3", as in the Arabic chat alphabet.

Tajik Latin alphabet

The Tajik language or Tajik Persian is a variety of the Persian language. It was written in Tajik SSR in a standardized Latin script from 1926 until the late 1930s, when the script was officially changed to Cyrillic. However, Tajik phonology differs slightly from that of Persian in Iran. As the result of these two factors romanization schemes of the Tajik Cyrillic script follow rather different principles.[14]

The Tajik alphabet in Latin (1928-1940)[15]
A a B ʙ C c Ç ç D d E e F f G g Ƣ ƣ H h I i Ī ī
/a/ /b/ /tʃ/ /dʒ/ /d/ /e/ /f/ /ɡ/ /ʁ/ /h/ /i/ /ˈi/
J j K k L l M m N n O o P p Q q R r S s Ş ş T t
/j/ /k/ /l/ /m/ /n/ /o/ /p/ /q/ /ɾ/ /s/ /ʃ/ /t/
U u Ū ū V v X x Z z Ƶ ƶ ʼ
/u/ /ɵ/ /v/ /χ/ /z/ /ʒ/ /ʔ/

See also


  1. ^ Joachim, Martin D. (1993). Languages of the world: cataloging issues and problems. New York: Haworth Press. p. 137. ISBN 1560245204.
  2. ^ a b Pedersen, Thomas T. "Persian (Farsi)" (PDF). Transliteration of Non-Roman Scripts.
  3. ^ "Persian" (PDF). The Library of Congress.
  4. ^ "Romanization system for Persian (Dari and Farsi). BGN/PCGN 1958 System" (PDF).
  5. ^ a b "Transliteration". Encyclopædia Iranica.
  6. ^ a b "Persian" (PDF). UNGEGN.
  7. ^ Toponymic Guidelines for map and other editors – Revised edition 1998. Working Paper No. 41. Submitted by the Islamic Republic of Iran. UNGEGN, 20th session. New York, 17–28 January 2000.
  8. ^ New Persian Romanization System. E/CONF.101/118/Rev.1*. Tenth United Nations Conference on the Standardization of Geographical Names. New York, 31 July – 9 August 2012.
  9. ^ Bartholomae, Christian (1904). Altiranisches Wörterbuch. Strassburg. p. XXIII.
  10. ^ Kent, Roland G. (1950). Old Persian. New Heaven, Connecticut. pp. 12–13.
  11. ^ MacKenzie, D. N. (1971). "Transcription". A Concise Pahlavi Dictionary. London.
  12. ^ Hoffmann, Karl; Forssman, Bernhard (1996). Avestische Laut- und Flexionslehre. Innsbruck. pp. 41–44. ISBN 3-85124-652-7.
  13. ^ Effendi, Shoghi (1974). Bahá'í Administration. Wilmette, Illinois, USA: Bahá'í Publishing Trust. p. 43. ISBN 0-87743-166-3.
  14. ^ Pedersen, Thomas T. "Tajik" (PDF). Transliteration of Non-Roman Scripts.
  15. ^ Perry, John R. (2005). A Tajik Persian Reference Grammar. Brill. pp. 34–35.

External links

ALA-LC romanization

ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.

BGN/PCGN romanization

BGN/PCGN romanization refers to the systems for romanization (transliteration into the Latin script) and Roman-script spelling conventions adopted by the United States Board on Geographic Names (BGN) and the Permanent Committee on Geographical Names for British Official Use (PCGN).

The systems have been approved by the BGN and the PCGN for application to geographic names, but they have also been used for personal names and text in the US and the UK.

Details of all the jointly approved systems are outlined in the National Geospatial-Intelligence Agency publication Romanization Systems and Policies (2012), which superseded the BGN 1994 publication Romanization Systems and Roman-Script Spelling Conventions. Romanization systems and spelling conventions for different languages have been gradually introduced over the course of several years. An incomplete list of BGN/PCGN systems and agreements covering the following languages is given below (the date of adoption is given in the parentheses).

Ey Iran

"Ey Irân" (Persian: ای ایران‎; "O Iran") is a popular patriotic song in Iran, which many Iranians consider the unofficial de facto national anthem of Iran.

Finglish (disambiguation)

Finglish is the Finnish language mixed with English.

Finglish may also refer to:

Fingilish, the casual romanization of Persian alphabet

Finglish, informal Fijian English language also known as Fijian Creole

Finglish (also Fingilish, pinglish), Persian written with English letters Romanization of Persian#Fingilish

Persian language

Persian (), also known by its endonym Farsi (فارسی fārsi [fɒːɾˈsiː] (listen)), is one of the Western Iranian languages within the Indo-Iranian branch of the Indo-European language family. It is primarily spoken in Iran, Afghanistan (officially known as Dari since 1958), and Tajikistan (officially known as Tajiki since the Soviet era), Uzbekistan and some other regions which historically were Persianate societies and considered part of Greater Iran. It is written right to left in the Persian alphabet, a modified variant of the Arabic script.

The Persian language is classified as a continuation of Middle Persian, the official religious and literary language of the Sasanian Empire, itself a continuation of Old Persian, the language of the Achaemenid Empire. Its grammar is similar to that of many contemporary European languages. A Persian-speaking person may be referred to as Persophone.There are approximately 110 million Persian speakers worldwide, with the language holding official status in Iran, Afghanistan, and Tajikistan. For centuries, Persian has also been a prestigious cultural language in other regions of Western Asia, Central Asia, and South Asia by the various empires based in the regions.Persian has had a considerable (mainly lexical) influence on neighboring languages, particularly the Turkic languages in Central Asia, Caucasus, and Anatolia, neighboring Iranian languages, as well as Armenian, Georgian, and Indo-Aryan languages, especially Urdu (a register of Hindustani). It also exerted some influence on Arabic, particularly Bahrani Arabic, while borrowing much vocabulary from it after the Arab conquest of Iran.With a long history of literature in the form of Middle Persian before Islam, Persian was the first language in the Muslim world to break through Arabic's monopoly on writing, and the writing of poetry in Persian was established as a court tradition in many eastern courts. Some of the famous works of Persian literature are the Shahnameh of Ferdowsi, the works of Rumi, the Rubaiyat of Omar Khayyam, the Panj Ganj of Nizami Ganjavi, the Divān of Hafez and the two miscellanea of prose and verse by Saadi Shirazi, the Gulistan and the Bustan.

Persian metres

Persian metres are patterns of long and short syllables in Persian poetry.

Over the past 1000 years the Persian language has enjoyed a rich literature, especially of poetry. Until the advent of free verse in the 20th century, this poetry was always quantitative—that is the lines were composed in various patterns of long and short syllables. The different patterns are known as metres (US: meters). A knowledge of metre is essential for someone to correctly recite Persian poetry—and also often, since short vowels are not written in Persian script, for to convey the correct meaning in cases of ambiguity. It is also helpful for those who memorise the verse.

Metres in Persian have traditionally been analysed in terms of Arabic metres, from which they are supposed to have been adapted. However, in recent years it has been recognised that for the most part Persian metres developed independently from those in Arabic, and there has been a movement to analyse them on their own terms.

An unusual feature of Persian poetry that is not found in Arabic, Latin, or Greek verse, is that instead of two lengths of syllables (short and long), there are three lengths (short, long, and overlong). Overlong syllables can be used instead of a long syllable plus a short one, or at the end of a verse in place of a long syllable.

Persian metres were used not only in classical Persian poetry, but were also imitated in Turkish poetry of the Ottoman period, and in Urdu poetry under the Mughal emperors. That the poets of Turkey and India copied Persian metres, not Arabic ones, is clear from the fact that, just as with Persian verse, the most commonly used metres of Arabic poetry (the ṭawīl, kāmil, wāfir and basīṭ) are avoided, while those metres used most frequently in Persian lyric poetry are exactly those most frequent in Turkish and Urdu.


Pinglish may refer to:

Palestine English

Paklish, Pakistani English

Fingilish or Pinglish, a mixed English language and Persian language

Persian English, English input of Persian on mobile phones as Romanization of Persian

Chinglish, when influence by pinyin

Pinglish (Poland), poor language spoken by Polish native speakers

Pinglish, Tral a village in the Pulwama district of Jammu and Kashmir

Filipino English (disambiguation)

Romanisation of Sindhi

Sindhi romanisation or Latinization of Sindhi is a system for representing the Sindhi language using the Latin script.

In Sindh, Pakistan the Sindhi language is written in modified persio-Arabic script and in India it is written in Devanagari (Hindi) Script.

Sindhis living in Pakistan as well as Sindhis living in India are able to speak and understand each other, however, they cannot write to each other because of the two different scripts.

Indus Roman Sindhi Script gives ability to Sindhis and would allow Sindhis all over the world to communicate with each other through one common script.


Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus trans- + liter-) in predictable ways (such as α → a, д → d, χ → ch, ն → n or æ → e).

For instance, for the Modern Greek term "Ελληνική Δημοκρατία", which is usually translated as "Hellenic Republic", the usual transliteration to Latin script is "Ellēnikḗ Dēmokratía", and the name for Russia in Cyrillic script, "Россия", is usually transliterated as "Rossiya".

Transliteration is not primarily concerned with representing the sounds of the original but rather with representing the characters, ideally accurately and unambiguously. Thus, in the above example, λλ is transliterated as 'll', but pronounced /l/; Δ is transliterated as 'D', but pronounced /ð/; and η is transliterated as 'ē', though it is pronounced /i/ (exactly like ι) and is not long.

Conversely, transcription notes the sounds but not necessarily the spelling. So "Ελληνική Δημοκρατία" could be transcribed as "elinikí ðimokratía", which does not specify which of the /i/ sounds are written as η and which as ι.

By publisher (for several languages)
By language or writing system

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.