A cedilla (/sɪˈdɪlə/ si-DIL-ə; from Spanish), also known as cedilha (from Portuguese) or cédille (from French), is a hook or tail ( ¸ ) added under certain letters as a diacritical mark to modify their pronunciation. In Catalan, French, and Portuguese, it is used only under the c, and the entire letter is called respectively c trencada (i.e. "broken C"), c cédille, and c cedilhado (or c cedilha, colloquially).

Visigothic Z-C cedille
Origin of the cedilla from the Visigothic z

The tail originated in Spain as the bottom half of a miniature cursive z. The word "cedilla" is the diminutive of the Old Spanish name for this letter, ceda (zeta).[1] Modern Spanish and Galician no longer use this diacritic, although it is used in Portuguese,[2] Catalan, Occitan, and French, which gives English the alternative spellings of cedille, from French "cédille", and the Portuguese form cedilha. An obsolete spelling of cedilla is cerilla.[2] The earliest use in English cited by the Oxford English Dictionary[2] is a 1599 Spanish-English dictionary and grammar.[3] Chambers’ Cyclopædia[4] is cited for the printer-trade variant ceceril in use in 1738.[2] The main use in English is not universal and applies to loan words from French and Portuguese such as "façade", "limaçon" and "cachaça" (often typed "facade", "limacon" and "cachaca" because of lack of ç keys on Anglophone keyboards).

With the advent of modernism, the calligraphic nature of the cedilla was thought somewhat jarring on sans-serif typefaces, and so some designers instead substituted a comma design, which could be made bolder and more compatible with the style of the text.[a] This can add to confusion as the use of commas as opposed to cedillas varies by language.


Modernist Ç
A conventional "ç" and 'modernist' cedilla "c̦" (right), intended for French and Swiss use.

The most frequent character with cedilla is "ç" ("c" with cedilla, as in façade). It was first used for the sound of the voiceless alveolar affricate /ts/ in old Spanish and stems from the Visigothic form of the letter "z" (ꝣ), whose upper loop was lengthened and reinterpreted as a "c", whereas its lower loop became the diminished appendage, the cedilla.

It represents the "soft" sound /s/, the voiceless alveolar sibilant, where a "c" would normally represent the "hard" sound /k/ (before "a", "o", "u", or at the end of a word) in English and in certain Romance languages such as Catalan, Galician, French (where ç appears in the name of the language itself, français), Ligurian, Occitan, and Portuguese. In Occitan, Friulian and Catalan ç can also be found at the beginning of a word (Çubran, ço) or at the end (braç).

It represents the voiceless postalveolar affricate /tʃ/ (as in English "church") in Albanian, Azerbaijani, Crimean Tatar, Friulian, Kurdish, Tatar, Turkish (as in çiçek, çam, çekirdek, Çorum), and Turkmen. It is also sometimes used this way in Manx, to distinguish it from the velar fricative.

In the International Phonetic Alphabet, ⟨ç⟩ represents the voiceless palatal fricative.


The character "ş" represents the voiceless postalveolar fricative /ʃ/ (as in "show") in several languages, including many belonging to the Turkic languages, and included as a separate letter in their alphabets:


Comparatively, some consider the diacritics on the Latvian consonants "ģ", "ķ", "ļ", "ņ", and formerly "ŗ" to be cedillas. Although their Adobe glyph names are commas, their names in the Unicode Standard are "g", "k", "l", "n", and "r" with a cedilla. The letters were introduced to the Unicode standard before 1992, and their names cannot be altered. The uppercase equivalent "Ģ" sometimes has a regular cedilla.


Four letters in Marshallese have cedillas: <ļ ņ >. In standard printed text they are always cedillas, and their omission or the substitution of comma below and dot below diacritics are nonstandard.

As of 2011, many font rendering engines do not display any of these properly, for two reasons:

  • "ļ" and "ņ" usually do not display properly at all, because of the use of the cedilla in Latvian. Unicode has precombined glyphs for these letters, but most quality fonts display them with comma below diacritics to accommodate the expectations of Latvian orthography. This is considered nonstandard in Marshallese. The use of a zero-width non-joiner between the letter and the diacritic can alleviate this problem: "l‌̧" and "n‌̧" may display properly, but may not; see below.
  • "" and "" do not currently exist in Unicode as precombined glyphs, and must be encoded as the plain Latin letters "m" and "o" with the combining cedilla diacritic. Most Unicode fonts issued with Windows do not display combining diacritics properly, showing them too far to the right of the letter, as with Tahoma ("" and "") and Times New Roman ("" and ""). This mostly affects "", and may or may not affect "". But some common Unicode fonts like Arial Unicode MS ("" and ""), Cambria ("" and "") and Lucida Sans Unicode ("" and "") do not have this problem. When "" is properly displayed, the cedilla is either underneath the center of the letter, or is underneath the right-most leg of the letter, but is always directly underneath the letter wherever it is positioned.

Because of these font display issues, it is not uncommon to find nonstandard ad hoc substitutes for these letters. The online version of the Marshallese-English Dictionary (the only complete Marshallese dictionary in existence) displays the letters with dot below diacritics, all of which do exist as precombined glyphs in Unicode: "", "", "" and "". The first three exist in the International Alphabet of Sanskrit Transliteration, and "" exists in the Vietnamese alphabet, and both of these systems are supported by the most recent versions of common fonts like Arial, Courier New, Tahoma and Times New Roman. This sidesteps most of the Marshallese text display issues associated with the cedilla, but is still inappropriate for polished standard text.

Other diacritics

Languages such as Romanian add a comma (virgula) to some letters, such as ș, which looks like a cedilla, but is more precisely a diacritical comma. This is particularly confusing with letters which can take either diacritic: for example, the consonant /ʃ/ is written as "ş" in Turkish but "ș" in Romanian, and Romanian writers will sometimes use the former instead of the latter because of insufficient font or character-set support.

The Polish letters "ą" and "ę" and Lithuanian letters "ą", "ę", "į", and "ų" are not made with the cedilla either, but with the unrelated ogonek diacritic.


In 1868, Ambroise Firmin-Didot suggested in his book Observations sur l'orthographe, ou ortografie, française (Observations on French Spelling) that French phonetics could be better regularized by adding a cedilla beneath the letter "t" in some words. For example, the suffix -tion this letter is usually not pronounced as (or close to) /t/ in either French or English, but respectively as /sjɔ̃/ and /ʃən/. It has to be distinctly learned that in words such as French diplomatie (but not diplomatique) and English action it is pronounced /s/ and /ʃ/, respectively (but not in active in either language). A similar effect occurs with other prefixes or within words also in French and English, such as partial where t is pronounced /s/ and /ʃ/ respectively. Firmin-Didot surmised that a new character could be added to French orthography. A similar letter, the t-comma, does exist in Romanian, but it has a comma accent, not a cedilla one.


The Unicode characters for Ţ (T with cedilla) and Ş (S with cedilla) were wrongly implemented in Windows Romanian. In Windows 7, Microsoft corrected the error by replacing T-cedilla with T-comma (Ț) and S-cedilla with S-comma (Ș).


Gagauz uses Ţ (T with cedilla), one of the few languages to do so, and Ş (S with cedilla). Besides being present in some Gagauz orthographies, T with Cedilla exists as part of the General Alphabet of Cameroon Languages, in the Kabyle dialect of the Berber language, and possibly elsewhere.


Unicode provides precomposed characters for some Latin letters with cedillas. Others can be formed using the cedilla combining character.

Unicode and HTML Codes for Cedillas
Description Letter Unicode HTML
Cedilla (spacing) ¸ U+00B8 &cedil; or &#184;
Combining cedilla ◌̧ U+0327 &#807;
C with cedilla Ç
&Ccedil; or &#199;
&ccedil; or &#231;
C with cedilla and acute accent
Combining small c with cedilla
(medieval superscript diacritic)[10]
◌ᷗ U+1DD7 &#7639;
D with cedilla
E with cedilla Ȩ
E with cedilla and breve
G with cedilla Ģ
H with cedilla
K with cedilla Ķ
L with cedilla Ļ
N with cedilla Ņ
R with cedilla Ŗ
S with cedilla Ş
T with cedilla Ţ


External links

Che with descender

Che with descender (Ҷ ҷ; italics: Ҷ ҷ) is a letter of the Cyrillic script. Its form is derived from the Cyrillic letter Che (Ч ч Ч ч). In the ISO 9 system of romanization, Che with descender is transliterated using the Latin letter C-cedilla (Ç ç).

Che with descender is used in the alphabets of the following languages:

Che with descender corresponds in other Cyrillic alphabets to the digraphs ⟨дж⟩ or ⟨чж⟩, or to the letters Che with vertical stroke (Ҹ ҹ), Dzhe (Џ џ), Khakassian Che (Ӌ ӌ), Zhe with breve (Ӂ ӂ), Zhe with diaeresis (Ӝ ӝ), or Zhje (Җ җ).

Ge with cedilla

Ge with cedilla (Г̧ г̧; italics: Г̧ г̧) is a letter of the Cyrillic script.

Ge with cedilla was used in the Karelian language in the 1820s.

General Alphabet of Cameroon Languages

The General Alphabet of Cameroon Languages is an orthographic system created in the late 1970s for all Cameroonian languages. Consonant and vowel letters are not to contain diacritics, though ⟨ẅ⟩ is a temporary exception. The alphabet is not used sufficiently for the one unique letter, for a bilabial trill, to have been added to Unicode.

Maurice Tadadjeu and Etienne Sadembouo were central to this effort.

** Like ⟨ɓ⟩, but with the top hook turned to the left.

Aspirated consonants are written ph, th, kh etc. Palatalized and labialized consonants are py, ty, ky and pw, tw, kw etc. Retroflex consonants are written either Cr or with a cedilla: tr, sr or ţ, ş, etc. Prenasalized consonants are mb, nd, ŋg etc. Preglottalized consonants are 'b, 'd, 'm etc. Geminant consonants are written double.

Long vowels are written double. Nasal vowels may be written with a cedilla: a̧ etc. or with a single following nasal consonant: aŋ etc. (presumably assimilating to any following consonant), in which case VN would be written with a double nasal: aŋŋ etc. Harmonic vowels are written with a sub-dot, as ⟨bibị⟩ for [bib-y].

Tone is written as in the IPA, with the addition or a vertical mark for mid-low tone: ⟨á ā a̍ à, â ǎ⟩ etc. Where rising and falling tones only occur on long vowels, they are decomposed: ⟨áà, àá⟩ etc. The high tone mark is used for contrastive stress in languages that do not have tone.

ISO/IEC 6937

ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or rather of ISO/IEC 646-IRV. It was developed in common with ITU-T (then CCITT) for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on. Only certain combinations of lead byte and follow byte are allowed, and there are some exceptions to the lead byte interpretation for some follow bytes. However, there are no combining characters at all are encoded in ISO/IEC 6937. But one can represent some free-standing diacritics, often by letting the follow byte have the code for ASCII space.

ISO/IEC 6937's architects were Hugh McGregor Ross, Peter Fenwick, Bernard Marti and Loek Zeckendorf.

ISO6937/2 defines 327 characters found in modern European languages using the Latin alphabet. Non-Latin European characters, such as Cyrillic and Greek, are not included in the standard. Also, some diacritics used with the Latin alphabet like the Romanian comma are not included, using cedilla instead as no distinction between cedilla and comma below was made at the time.

IANA has registered the charset names ISO_6937-2-25 and ISO_6937-2-add for two (older) versions of this standard (plus control codes). But in practice this character encoding is unused on the Internet.

The ISO/IEC 2022 escape sequence to specify the right-hand side of the ISO/IEC 6937 character set is ESC - R (hex 1B 2D 52).

ISO/IEC 8859-2

ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions. Code page 912 is an extension.

ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. 0.1% of all web pages use ISO 8859-2 in December 2018. Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned Code page 1111 to ISO 8859-2.

Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place).

These code values can be used for the following languages:





German (fully compatible with ISO/IEC 8859-1 for German texts)



Serbian Latin



Upper Sorbian

Lower Sorbian

Turkmen.It can also be used for Romanian, but it is not well suited for that language, due to lacking letters s and t with commas below, although it provides s and t with similar-looking cedillas. These letters were unified in the first versions of the Unicode standard, meaning that the appearance with cedilla or with a comma was treated as a glyph choice rather than as separate characters; fonts intended for use with Romanian should therefore, in theory, have characters with a comma below at those code points.

Microsoft did not really provide such fonts for computers sold in Romania. Still, ISO/IEC 8859-2 and Windows-1250 (with the same problem) have been heavily used for Romanian. Unicode subsequently disunified the comma variants from the cedilla variants, and has since taken the lead for web pages, which however often have s and t with cedilla anyway. Unicode notes as of 2014 that disunifying the letters with comma below was a mistake, causing corruptions of Romanian data: pre-existing data and input methods would still contain the older cedilla codepoints, complicating text searching.


The ISO international standard ISO 9 establishes a system for the transliteration into Latin characters of Cyrillic characters constituting the alphabets of many Slavic and non-Slavic languages.Published on February 23, 1995, the major advantage ISO 9 has over other competing systems is its univocal system of one character for one character equivalents (by the use of diacritics), which faithfully represents the original spelling and allows for reverse transliteration, even if the language is unknown.

Earlier versions of the standard, ISO/R 9:1954, ISO/R 9:1968 and ISO 9:1986, were more closely based on the international scholarly system for linguistics (scientific transliteration), but have diverged in favour of unambiguous transliteration over phonemic representation.

The edition of 1995 supersedes the edition of 1986.

Latin Extended-A

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characters from the ISO 6937 standard.

The Latin Extended-A block has been in the Unicode Standard since version 1.0, with its entire character repertoire, except for the Latin Small Letter Long S, which was added during unification with ISO 10646 in version 1.1.

List of Latin-script letters

This is a list of letters of the Latin script. The definition of a Latin-script letter for this list is a character encoded in the Unicode Standard that has a script property of 'Latin' and the general category of 'Letter'. An overview of the distribution of Latin-script letters in Unicode is given in Latin script in Unicode.


The ogonek (Polish: [ɔˈɡɔnɛk], "little tail", the diminutive of ogon; Lithuanian: nosinė, "nasal") is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American languages. It is also placed on the lower right corner of consonants in some Latin transcriptions of various indigenous languages of the Caucasus mountains.An ogonek can also be attached to the top of a vowel in Old Norse-Icelandic to show length or vowel affection. For example, in Old Norse, o᷎ represents the Old Norwegian vowel [ɔ], that in Old Icelandic merges with ø ‹ö›.

Romanian alphabet

The Romanian alphabet is a variant of the Latin alphabet used by the Romanian language. It is a modification of the classical Latin alphabet and consists of 31 letters, five of which (Ă, Â, Î, Ș, and Ț) have been modified from their Latin originals for the phonetic requirements of the language:

The letters Q (chiu), W (dublu v), and Y (igrec or i grec) were formally introduced in the Romanian alphabet in 1982, although they had been used earlier. They occur only in foreign words and their Romanian derivatives, such as quasar, watt, and yacht. The letter K, although relatively older, is also rarely used and appears only in proper names and international neologisms such as kilogram, broker, karate. These four letters are still perceived as foreign, which explains their usage for stylistic purposes in words such as nomenklatură (normally nomenclatură, meaning "nomenclature", but sometimes spelled with k instead of c if referring to members of the Communist leadership in the Soviet Union and the Eastern Bloc countries, as Nomenklatura is used in English).In cases where the word is a direct borrowing having diacritical marks not present in the above alphabet, official spelling tends to favor their use (München, Angoulême etc., as opposed to the use of Istanbul over İstanbul).


S-comma (majuscule: Ș, minuscule: ș) is a letter which is part of the Romanian alphabet, used to represent the sound /ʃ/, the voiceless postalveolar fricative (like sh in shoe).


T-comma (majuscule: Ț, minuscule: ț) is a letter which is part of the Romanian alphabet, used to represent the Romanian language sound /t͡s/, the voiceless alveolar affricate (like ts in bolts). It is written as the letter T with a small comma below and it has both the lower-case (U+021B) and the upper-case variants (U+021A).

The letter was proposed in the Buda Lexicon, a book published in 1825, which included two texts by Petru Maior, Orthographia romana sive Latino-valachica una cum clavi and Dialogu pentru inceputul linbei române, introducing ș for /ʃ/ and ț for /t͡s/.

The (Cyrillic)

The (Ҫ ҫ; italics: Ҫ ҫ) is a letter of the Cyrillic script. The name the is pronounced [θɛ], like the pronunciation of ⟨the⟩ in "theft". In Unicode, this letter is called "Es with descender". In Chuvashia, it looks identical to the Latin letter C with cedilla (Ç ç Ç ç). Occasionally it also has the hook diacritic curved rightward, as in the SVG image shown in the sidebar. In many fonts, the character hooks to the left.The is used in the alphabets of the Bashkir and Chuvash languages.

In Bashkir it represents the voiceless dental fricative /θ/.

In Chuvash it represents the voiceless alveolo-palatal fricative /ɕ/.It is usually romanized as 'ś', 'θ', or 'þ'.


Ç or ç (c-cedilla) is a Latin script letter, used in the Albanian, Azerbaijani, Manx, Tatar, Turkish, Turkmen, Kurdish and Zazaki alphabets. Romance languages that use this letter include French, Friulian, Ligurian, Occitan, Portuguese and Catalan as a variant of the letter C. It is also occasionally used in Crimean Tatar, and in Tajik when written in the Latin script to represent the /d͡ʒ/ sound. It is often retained in the spelling of loanwords from any of these languages in English, Dutch, Spanish, Basque, and other Latin script spelled languages.

It was first used for the sound of the voiceless alveolar affricate /t͡s/ in Old Spanish and stems from the Visigothic form of the letter z (Ꝣ). The phoneme originated in Vulgar Latin from the palatalization of the plosives /t/ and /k/ in some conditions. Later, /t͡s/ changed into /s/ in many Romance languages and dialects. Spanish has not used the symbol since an orthographic reform in the 18th century (which replaced ç with the now-devoiced z), but it was adopted for writing other languages.

In the International Phonetic Alphabet, /ç/ represents the voiceless palatal fricative.


Ķ, ķ (k-cedilla) is the 17th letter of the Latvian language.

In ISO 9, Ķ is the official Latin transliteration of the cyrillic letter Қ.


Ş, ş (S-cedilla) is a letter of the Azerbaijani, Gagauz, Turkish, and Turkmen alphabets. It is also used in the Roman alphabets of Tatar, Crimean Tatar, Bashkir, Kazakh, Chechen, and Kurdish. It commonly represents /ʃ/, the voiceless postalveolar fricative.


Ţ, ţ - t-cedilla.

Ḑ (minuscule ḑ) or D-cedilla is a letter of the Latin alphabet, consists of the letter "D" with the cedilla under it. The letter stands for the voiced palatal plosive [ɟ] in the Livonian alphabet. The cedilla traditionally looks like a comma below in Livonian use. In other use, like UNGEGN romanizations, the cedilla is like a regular cedilla.

Ꞥ (lowercase ꞥ) is a letter derived from the combination of the Latin letter N and a stroke diacritic. It was used in Latvian orthography until 1921, when it was replaced by Ņ (N with a cedilla).

It is represented in Unicode by:



