A diacritic – also diacritical mark, diacritical point, diacritical sign, or accent – is a glyph added to a letter, or basic glyph. The term derives from the Ancient Greekδιακριτικός (diakritikós, "distinguishing"), from διακρίνω (diakrī́nō, "to distinguish"). Diacritic is primarily an adjective, though sometimes used as a noun, whereas diacritical is only ever an adjective. Some diacritical marks, such as the acute ( ´ ) and grave ( ` ), are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters.
The main use of diacritical marks in the Latin script is to change the sound-values of the letters to which they are added. Examples are the diaereses in the borrowed French words naïve and Noël, which show that the vowel with the diaeresis mark is pronounced separately from the preceding vowel; the acute and grave accents, which can indicate that a final vowel is to be pronounced, as in saké and poetic breathèd; and the cedilla under the "c" in the borrowed French word façade, which shows it is pronounced /s/ rather than /k/. In other Latin-script alphabets, they may distinguish between homonyms, such as the Frenchlà ("there") versus la ("the") that are both pronounced /la/. In Gaelic type, a dot over a consonant indicates lenition of the consonant in question.
In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language, and may vary from case to case within a language. English is the only major modern European language requiring no diacritics for native words (although a diaeresis may be used in words such as "coöperation").
In some cases, letters are used as "in-line diacritics", with the same function as ancillary glyphs, in that they modify the sound of the letter preceding them, as in the case of the "h" in the English pronunciation of "sh" and "th".
The tilde, dot, comma, titlo, apostrophe, bar, and colon are sometimes diacritical marks, but also have other uses.
Not all diacritics occur adjacent to the letter they modify. In the Wali language of Ghana, for example, an apostrophe indicates a change of vowel quality, but occurs at the beginning of the word, as in the dialects ’Bulengee and ’Dolimi. Because of vowel harmony, all vowels in a word are affected, so the scope of the diacritic is the entire word. In abugida scripts, like those used to write Hindi and Thai, diacritics indicate vowels, and may occur above, below, before, after, or around the consonant letter they modify.
The tittle (dot) on the letter i or the letter j, of the Latin alphabet originated as a diacritic to clearly distinguish i from the minims (downstrokes) of adjacent letters. It first appeared in the 11th century in the sequence ii (as in ingeníí), then spread to i adjacent to m, n, u, and finally to all lowercase i's. The j, originally a variant of i, inherited the tittle. The shape of the diacritic developed from initially resembling today's acute accent to a long flourish by the 15th century. With the advent of Roman type it was reduced to the round dot we have today.
(ــًــٍــٌـ) tanwīn (تنوين) symbols: Serve a grammatical role in Arabic. The sign ـً is most commonly written in combination with alif, e.g. ـًا.
(ــّـ) shadda: Gemination (doubling) of consonants.
(ٱ) waṣla: Comes most commonly at the beginning of a word. Indicates a type of hamza that is pronounced only when the letter is read at the beginning of the talk.
(آ) madda: A written replacement for a hamza that is followed by an alif, i.e. (ءا). Read as a glottal stop followed by a long /aː/, e.g. ءاداب، ءاية، قرءان، مرءاة are written out respectively as آداب، آية، قرآن، مرآة. This writing rule does not apply when the alif that follows a hamza is not a part of the stem of the word, e.g. نتوءات is not written out as نتوآت as the stem نتوء does not have an alif that follows its hamza.
(ــٰـ) superscript alif (also "short" or "dagger alif": A replacement for an original alif that is dropped in the writing out of some rare words, e.g. لاكن is not written out with the original alif found in the word pronunciation, instead it is written out as لٰكن.
ḥarakāt (In Arabic: حركات also called تشكيل tashkīl):
(ــَـ) fatḥa (a)
(ــِـ) kasra (i)
(ــُـ) ḍamma (u)
(ــْـ) sukūn (no vowel)
The ḥarakāt or vowel points serve two purposes:
They serve as a phonetic guide. They indicate the presence of short vowels (fatḥa, kasra, or ḍamma) or their absence (sukūn).
At the last letter of a word, the vowel point reflects the inflection case or conjugation mood.
For nouns, The ḍamma is for the nominative, fatḥa for the accusative, and kasra for the genitive.
For verbs, the ḍamma is for the imperfective, fatḥa for the perfective, and the sukūn is for verbs in the imperative or jussive moods.
Vowel points or tashkīl should not be confused with consonant points or iʿjam (إعجام) – one, two or three dots written above or below a consonant to distinguish between letters of the same or similar form.
These diacritics are used in addition to the acute, grave, and circumflex accents and the diaeresis:
The diacritics >〮 and 〯 , known as Bangjeom (방점;傍點), were used to mark pitch accents in Hangul for Middle Korean. They were written to the left of a syllable in vertical writing and above a syllable in horizontal writing.
Devanagari script's (from Brahmic family) compound letters, which are vowels combined with consonants, have diacritics. Here क is shown with vowel diacritics.
A dot above and a dot below a letter represent [a], transliterated as a or ă,
Two diagonally-placed dots above a letter represent [ɑ], transliterated as ā or â or å,
Two horizontally-placed dots below a letter represent [ɛ], transliterated as e or ĕ; often pronounced [ɪ] and transliterated as i in the East Syriac dialect,
Two diagonally-placed dots below a letter represent [e], transliterated as ē,
A dot underneath the Beth represent a soft [v] sound, transliterated as v
A tilde (~) placed under Gamel represent a [dʒ] sound, transliterated as j
The letter Waw with a dot below it represents [u], transliterated as ū or u,
The letter Waw with a dot above it represents [o], transliterated as ō or o,
The letter Yōḏ with a dot beneath it represents [i], transliterated as ī or i,
A tilde (~) under Kaph represent a [t͡ʃ] sound, transliterated as ch or č,
A semicircle under Peh represents an [f] sound, transliterated as f or ph.
In addition to the above vowel marks, transliteration of Syriac sometimes includes ə, e̊ or superscript e (or often nothing at all) to represent an original Aramaic schwa that became lost later on at some point in the development of Syriac. Some transliteration schemes find its inclusion necessary for showing spirantization or for historical reasons.
Some non-alphabetic scripts also employ symbols that function essentially as diacritics.
Non-pure abjads (such as Hebrew and Arabic script) and abugidas use diacritics for denoting vowels. Hebrew and Arabic also indicate consonant doubling and change with diacritics; Hebrew and Devanagari use them for foreign sounds. Devanagari and related abugidas also use a diacritical mark called a virama to mark the absence of a vowel. In addition, Devanagari uses the moon-dot chandrabindu ( ँ ).
Unified Canadian Aboriginal Syllabics use several types of diacritics, including the diacritics with alphabetic properties known as Medials and Finals. Although long vowels originally were indicated with a negative line through the Syllabic glyphs, making the glyph appear broken, in the modern forms, a dot above is used to indicate vowel length. In some of the styles, a ring above indicates a long vowel with a [j] off-glide. Another diacritic, the "inner ring" is placed at the glyph's head to modify [p] to [f] and [t] to [θ]. Medials such as the "w-dot" placed next to the Syllabics glyph indicates a [w] being placed between the syllable onset consonant and the nucleus vowel. Finals indicate the syllable coda consonant; some of the syllable coda consonants in word medial positions, such as with the "h-tick", indicate the fortification of the consonant in the syllable following it.
Different languages use different rules to put diacritic characters in alphabetical order. French treats letters with diacritical marks the same as the underlying letter for purposes of ordering and dictionaries.
The Scandinavian languages, by contrast, treat the characters with diacritics ä, ö and å as new and separate letters of the alphabet, and sort them after z. Usually ä is sorted as equal to æ (ash) and ö is sorted as equal to ø (o-slash). Also, aa, when used as an alternative spelling to å, is sorted as such. Other letters modified by diacritics are treated as variants of the underlying letter, with the exception that ü is frequently sorted as y.
Languages that treat accented letters as variants of the underlying letter usually alphabetize words with such symbols immediately after similar unmarked words. For instance, in German where two words differ only by an umlaut, the word without it is sorted first in German dictionaries (e.g. schon and then schön, or fallen and then fällen). However, when names are concerned (e.g. in phone books or in author catalogues in libraries), umlauts are often treated as combinations of the vowel with a suffixed e; Austrian phone books now treat characters with umlauts as separate letters (immediately following the underlying vowel).
In Spanish, the grapheme ñ is considered a new letter different from n and collated between n and o, as it denotes a different sound from that of a plain n. But the accented vowels á, é, í, ó, ú are not separated from the unaccented vowels a, e, i, o, u, as the acute accent in Spanish only modifies stress within the word or denotes a distinction between homonyms, and does not modify the sound of a letter.
For a comprehensive list of the collating orders in various languages, see Collating sequence.
Generation with computers
Modern computer technology was developed mostly in English-speaking countries, so data formats, keyboard layouts, etc. were developed with a bias favoring English, a language with an alphabet without diacritical marks. This has led some to theorize that the marks and accents may be made obsolete to facilitate the worldwide exchange of data. Efforts have been made to create internationalized domain names that further extend the English alphabet (e.g., "pokémon.com").
Depending on the keyboard layout, which differs amongst countries, it is more or less easy to enter letters with diacritics on computers and typewriters. Some have their own keys; some are created by first pressing the key with the diacritic mark followed by the letter to place it on. Such a key is sometimes referred to as a dead key, as it produces no output of its own but modifies the output of the key pressed after it.
In modern Microsoft Windows and Linux operating systems, the keyboard layouts US International and UK International feature dead keys that allow one to type Latin letters with the acute, grave, circumflex, diæresis, tilde, and cedilla found in Western European languages (specifically, those combinations found in the ISO Latin-1 character set) directly: ¨ + e gives ë, ~ + o gives õ, etc. On Apple Macintosh computers, there are keyboard shortcuts for the most common diacritics; Option-e followed by a vowel places an acute accent, Option-u followed by a vowel gives an umlaut, Option-c gives a cedilla, etc. Diacritics can be composed in most X Window System keyboard layouts, as well as other operating systems, such as Microsoft Windows, using additional software.
On computers, the availability of code pages determines whether one can use certain diacritics. Unicode solves this problem by assigning every known character its own code; if this code is known, most modern computer systems provide a method to input it. With Unicode, it is also possible to combine diacritical marks with most characters.
Languages with letters containing diacritics
The following languages have letters that contain diacritics that are considered independent letters distinct from those without diacritics.
Lithuanian. In general usage, where letters appear with the caron (č, š and ž), they are considered as separate letters from c, s or z and collated separately; letters with the ogonek (ą, ę, į and ų), the macron (ū) and the superdot (ė) are considered as separate letters as well, but not given a unique collation order.
Welsh uses the circumflex, diaeresis, acute, and grave on its seven vowels a, e, i, o, u, w, y (â, ê, î, ô, û, ŵ, ŷ, ä, ë, ï, ö, ü, ẅ, ÿ, à, è, ì, ò, ù, ẁ, ỳ, á, é, í, ó, ú, ẃ, ý)
Following spelling reforms since the 1970s, Scottish Gaelic uses graves only, which can be used on any vowel (à, è, ì, ò, ù). Formerly acute accents could be used on á, ó and é, which were used to indicate a specific vowel quality. With the elimination of these accents, the new orthography relies on the reader having prior knowledge of pronunciation of a given word.
Manx uses the single diacritic ç combined with h to give the digraph ⟨çh⟩ (pronounced /tʃ/) to mark the distinction between it and the digraph ⟨ch⟩ (pronounced /h/ or /x/). Other diacritics used in Manx included â, ê, ï, etc. to mark the distinction between two similarly spelled words but with slightly differing pronunciation.
Irish uses only acute accents to mark long vowels, following the 1948 spelling reform.
Breton does not have a single orthography (spelling system), but uses diacritics for a number of purposes. The diaresis is used to mark that two vowels are pronounced separately and not as a diphthong/digraph. The circumflex is used to mark long vowels, but usually only when the vowel length is not predictable by phonology. Nasalization of vowels may be marked with a tilde, or following the vowel with the letter <ñ>. The plural suffix -où is used as a unified spelling to represent a suffix with a number of pronunciations in different dialects, and to distinguish this suffix from the digraph <ou> which is pronounced as /u:/. An apostrophe is used to distinguish c'h, pronounced /x/ as the digraph <ch> is used in other Celtic languages, from the French-influenced digraph ch, pronounced /ʃ/.
Belarusian, Bulgarian, Russian and Ukrainian have the letter й.
Belarusian and Russian have the letter ё. In Russian, this letter is usually replaced by е, although it has a different pronunciation. The use of е instead of ё does not affect the pronunciation. Ё is always used in children's books and in dictionaries. A minimal pair is все (vs'e, "everybody" pl.) and всё (vs'o, "everything" n. sg.). In Belarusian the replacement by е is a mistake, in Russian, it is permissible to use either е or ё for ё but the former is more common in everyday writing (as opposed to instructional or juvenile writing).
In Bulgarian and Macedonian the possessive pronoun ѝ (ì, "her") is spelled with a grave accent in order to distinguish it from the conjunction и (i, "and").
The acute accent " ́" above any vowel in Cyrillic alphabets is used in dictionaries, books for children and foreign learners to indicate the word stress, it also can be used for disambiguation of similarly spelled words with different lexical stresses.
Estonian has a distinct letter õ, which contains a tilde. Estonian "dotted vowels" ä, ö, ü are similar to German, but these are also distinct letters, not like German umlauted letters. All four have their own place in the alphabet, between w and x. Carons in š or ž appear only in foreign proper names and loanwords. Also these are distinct letters, placed in the alphabet between s and t.
Finnish uses dotted vowels (ä and ö). As in Swedish and Estonian, these are regarded as individual letters, rather than vowel + umlaut combinations (as happens in German). It also uses the characters å, š and ž in foreign names and loanwords. In the Finnish and Swedish alphabets, å, ä and ö collate as separate letters after z, the others as variants of their base letter.
Hungarian uses the umlaut, the acute and double acute accent (unique to Hungarian): (ö, ü), (á, é, í, ó, ú) and (ő, ű). The acute accent indicates the long form of a vowel (in case of i/í, o/ó, u/ú) while the double acute performs the same function for ö and ü. The acute accent can also indicate a different sound (more open, like in case of a/á, e/é). Both long and short forms of the vowels are listed separately in the Hungarian alphabet, but members of the pairs a/á, e/é, i/í, o/ó, ö/ő, u/ú and ü/ű are collated in dictionaries as the same letter.
Livonian has the following letters: ā, ä, ǟ, ḑ, ē, ī, ļ, ņ, ō, ȯ, ȱ, õ, ȭ, ŗ, š, ț, ū, ž.
Faroese uses acutes and other special letters. All are considered separate letters and have their own place in the alphabet: á, í, ó, ú, ý and ø.
Icelandic uses acutes and other special letters. All are considered separate letters, and have their own place in the alphabet: á, é, í, ó, ú, ý, and ö.
Danish and Norwegian use additional characters like the o-slash ø and the a-overring å. These letters come after z and æ in the order ø, å. Historically, the å has developed from a ligature by writing a small superscript a over a lowercase a; if an å character is unavailable, some Scandinavian languages allow the substitution of a doubled a. The Scandinavian languages collate these letters after z, but have different collation standards.
Swedish uses a-diaeresis (ä) and o-diaeresis (ö) in the place of ash (æ) and slashed o (ø) in addition to the a-overring (å). Historically, the diaeresis for the Swedish letters ä and ö, like the German umlaut, developed from a small Gothic e written above the letters. These letters are collated after z, in the order å, ä, ö.
Romanian uses a breve on the letter a (ă) to indicate the sound schwa/ə/, as well as a circumflex over the letters a (â) and i (î) for the sound /ɨ/. Romanian also writes a comma below the letters s (ș) and t (ț) to represent the sounds /ʃ/ and /t͡s/, respectively. These characters are collated after their non-diacritic equivalent.
The Bosnian, Croatian, and Serbian Latin alphabets have the symbols č, ć, đ, š and ž, which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order. They also have one digraph including a diacritic, dž, which is also alphabetized independently, and follows d and precedes đ in the alphabetical order. The Serbian Cyrillic alphabet has no diacritics, instead it has a grapheme (glyph) for every letter of its Latin counterpart (including Latin letters with diacritics and the digraphs dž, lj and nj).
The Czech alphabet uses the acute (á é í ó ú ý), caron (čďěňřšťž), and for one letter (ů) the ring. (Note that in ď and ť the caron is modified to look lither like an apostrophe.)
Polish has the following letters: ąćęłńóśźż. These are considered to be separate letters: each of them is placed in the alphabet immediately after its Latin counterpart (e.g. ą between a and b), ź and ż are placed after z in that order.
The Slovak alphabet uses the acute (á é í ó ú ý ĺŕ), caron (č ď ľ ň š ť ž), umlaut (ä) and circumflex accent (ô).
The basic Slovenian alphabet has the symbols č, š, and ž, which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order. Letters with a caron are placed right after the letters as written without the diacritic. The letter đ may be used in non-transliterated foreign words, particularly names, and is placed after č and before d.
Crimean Tatar includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö, Ş and Ü. Unlike Standard Turkish (but like Cypriot Turkish), Crimean Tatar also has the letter Ñ.
Gagauz includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö and Ü. Unlike Turkish, Gagauz also has the letters Ä, ÊȘ and Ț. Ș and Ț are derived from the Romanian alphabet for the same sounds. Sometime the Turkish Ş may be used instead of Ș.
Turkish uses a G with a breve (Ğ), two letters with an umlaut (Ö and Ü, representing two rounded front vowels), two letters with a cedilla (Ç and Ş, representing the affricate /tʃ/ and the fricative /ʃ/), and also possesses a dotted capital İ (and a dotless lowercase ı representing a high unrounded back vowel). In Turkish each of these are separate letters, rather than versions of other letters, where dotted capital İ and lower case i are the same letter, as are dotless capital I and lowercase ı. Typographically, Ç and Ş are often rendered with a subdot, as in Ṣ; when a hook is used, it tends to have more a comma shape than the usual cedilla. The new Azerbaijani, Crimean Tatar, and Gagauz alphabets are based on the Turkish alphabet and its same diacriticized letters, with some additions.
Turkmen includes the distinct Turkish alphabet letters Ç, Ö, Ş and Ü. In addition, Turkmen uses A with diaeresis (Ä) to represent /æ/, N with caron (Ň) to represent the velar nasal/ŋ/, Y with acute (Ý) to represent the palatal approximant/j/, and Z with caron (Ž) to represent /ʒ/.
Albanian has two special letters Ç and Ё upper and lowercase. They are placed next to the most similar letters in the alphabet, c and e correspondingly.
Esperanto has the symbols ŭ, ĉ, ĝ, ĥ, ĵ and ŝ, which are included in the alphabet, and considered separate letters.
Hawaiian uses the kahakô (macron) over vowels, although there is some disagreement over considering them as individual letters. The kahakô over a vowel can completely change the meaning of a word that is spelled the same but without the kahakô.
Kurdish uses the symbols Ç, Ê, Î, Ş and Û with other 26 standard Latin alphabet symbols.
Lakota alphabet uses the caron for the letters č, ȟ, ǧ, š, and ž. It also uses the acute accent for stressed vowels á, é, í, ó, ú, áŋ, íŋ, úŋ.
Classical Malay uses some diacritics such as â, ā, é, ḥ, ñ, ô, ṣ, û. Uses of diacritics was continued until 19th century.
Maltese uses a C, G, and Z with a dot over them (Ċ, Ġ, Ż), and also has an H with an extra horizontal bar. For uppercase H, the extra bar is written slightly above the usual bar. For lowercase H, the extra bar is written crossing the vertical, like a t, and not touching the lower part (Ħ, ħ). The above characters are considered separate letters. The letter 'c' without a dot has fallen out of use due to redundancy. 'Ċ' is pronounced like the English 'ch' and 'k' is used as a hard c as in 'cat'. 'Ż' is pronounced just like the English 'Z' as in 'Zebra', while 'Z' is used to make the sound of 'ts' in English (like 'tsunami' or 'maths'). 'Ġ' is used as a soft 'G' like in 'geometry', while the 'G' sounds like a hard 'G' like in 'log'. The digraph 'għ' (called għajn after the Arabic letter name ʻayn for غ) is considered separate, and sometimes ordered after 'g', whilst in other volumes it is placed between 'n' and 'o' (the Latin letter 'o' originally evolved from the shape of Phoenicianʻayin, which was traditionally collated after Phoenician nūn).
English is one of the few European languages that does not have many words that contain diacritical marks. Exceptions are unassimilated foreign loanwords, including borrowings from French and, increasingly, Spanish; however, the diacritic is also sometimes omitted from such words. Loanwords that frequently appear with the diacritic in English include café, résumé or resumé (a usage that helps distinguish it from the verb resume), soufflé, and naïveté (see English terms with diacritical marks). In older practice (and even among some orthographically conservative modern writers) one may see examples such as élite, mêlée and rôle.
English speakers and writers once used the diaeresis more often than now in words such as coöperation (from Fr. coopération), zoölogy (from Grk. zoologia), and seeër (now more commonly see-er or simply seer) as a way of indicating that adjacent vowels belonged to separate syllables, but this practice has become far less common. The New Yorker magazine is a major publication that continues to use the diaresis in place of a dash for clarity and economy of space.
A few English words, out of context, can only be distinguished from others by a diacritic or modified letter, including exposé, lamé, maté, öre, øre, pâté, and rosé'. The same is true of résumé, alternately resumé, but nevertheless it is regularly spelled resume. In a few words, diacritics that did not exist in the original have been added for disambiguation, as in maté (from Sp. and Port. mate), saké (the standard Romanization of the Japanese has no accent mark), and Malé (from Dhivehi މާލެ), to clearly distinguish them from the English words "mate", "sake", and "male".
The acute and grave accents are occasionally used in poetry and lyrics: the acute to indicate stress overtly where it might be ambiguous (rébel vs. rebél) or nonstandard for metrical reasons (caléndar), the grave to indicate that an ordinarily silent or elided syllable is pronounced (warnèd,parlìament).
In certain personal names such as Renée and Zoë, often two spellings exist, and the preference will be known only to those close to the person themselves. Even when the name of a person is spelled with a diacritic, like Charlotte Brontë, this may be dropped in English language articles and even official documents such as passports either due to carelessness, the typist not knowing how to enter letters with diacritical marks, or for technical reasons - California, for example, does not allow names with diacritics as the computer system cannot process such characters. They also appear in some worldwide company names and/or trademarks such as Nestlé or Citroën.
The following languages have letter-diacritic combinations that are not considered independent letters.
Afrikaans uses a diaeresis to mark vowels that are pronounced separately and not as one would expect where they occur together, for example voel (to feel) as opposed to voël (bird). The circumflex is used in ê, î, ô and û generally to indicate long close-mid, as opposed to open-mid vowels, for example in the words wêreld (world) and môre (morning, tomorrow). The acute accent is used to add emphasis in the same way as underlining or writing in bold or italics in English, for example Dit is jóú boek (It is your book). The grave accent is used to distinguish between words that are different only in placement of the stress, for example appel (apple) and appèl (appeal) and in a few cases where it makes no difference to the pronunciation but distinguishes between homophones. The two most usual cases of the latter are the in the sayings òf... òf (either... or) and nòg... nòg (neither... nor) to distinguish them from of (or) and nog (again, still).
Aymara uses a diacritical horn over p, q, t, k, ch.
Catalan has the following composite characters: à, ç, é, è, í, ï, ó, ò, ú, ü, l·l. The acute and the grave indicate stress and vowel height, the cedilla marks the result of a historical palatalization, the diaeresis indicates either a hiatus, or that the letter u is pronounced when the graphemes gü, qü are followed by e or i, the interpunct (·) distinguishes the different values of ll/l·l.
Dutch uses the diaeresis. For example, in ruïne it means that the u and the i are separately pronounced in their usual way, and not in the way that the combination ui is normally pronounced. Thus it works as a separation sign and not as an indication for an alternative version of the i. Diacritics can be used for emphasis (érg koud for very cold) or for disambiguation between a number of words that are spelled the same when context doesn't indicate the correct meaning (één appel = one apple, een appel = an apple; vóórkomen = to occur, voorkómen = to prevent). Grave and acute accents are used on a very small number of words, mostly loanwords. The ç also appears in some loanwords.
Faroese. Non-Faroese accented letters are not added to the Faroese alphabet. These include é, ö, ü, å and recently also letters like š, ł, and ć.
Filipino has the following composite characters: á, à, â, é, è, ê, í, ì, î, ó, ò, ô, ú, ù, û. The actual use of diacritics for Filipino is, however, uncommon, meant only to distinguish between homonyms with different stresses and meanings that either occur near each other in a text or to aid the reader in ascertaining its otherwise ambiguous meaning. The letter eñe is due to the Spanish alphabet and too, is considered a separate letter. The diacritics appears in Spanishloanwords and names if Spanish orthography is observed.
Finnish. Carons in š and ž appear only in foreign proper names and loanwords, but may be substituted with sh or zh if and only if it is technically impossible to produce accented letters in the medium. Contrary to Estonian, š and ž are not considered distinct letters in Finnish.
French uses five diacritics. The grave (accent grave) marks the sound /ɛ/ when over an e, as in père ("father") or is used to distinguish words that are otherwise homographs such as a/à ("has"/"to") or ou/où ("or"/"where"). The acute (accent aigu) is only used in "é", modifying the "e" to make the sound /e/, as in étoile ("star"). The circumflex (accent circonflexe) generally denotes that an S once followed the vowel in Old French or Latin, as in fête ("party"), the Old French being feste and the Latin being festum. Whether the circumflex modifies the vowel's pronunciation depends on the dialect and the vowel. The cedilla (cédille) indicates that a normally hard "c" (before the vowels "a", "o", and "u") is to be pronounced /s/, as in ça ("that"). The diaeresis (tréma) indicates that two adjacent vowels that would normally be pronounced as one are to be pronounced separately, as in Noël ("Christmas").
Galician vowels can bear an acute (á, é, í, ó, ú) to indicate stress or difference between two otherwise same written words (é, 'is' vs. e, 'and'), but the diaeresis (trema) is only used with ï and ü to show two separate vowel sounds in pronunciation. Only in foreign words may Galician use other diacritics such as ç (common during the Middle Ages), ê, or à.
German uses the three umlauted characters ä, ö and ü. These diacritics indicate vowel changes. For instance, the word Ofen[ˈoːfən] "oven" has the plural Öfen[ˈøːfən]. The mark originated as a superscript e; a handwritten blackletter e resembles two parallel vertical lines, like a diaeresis. Due to this history, "ä", "ö" and "ü" can be written as "ae", "oe" and "ue" respectively, if the umlaut letters are not available.
Hebrew has many various diacritic marks known as niqqud that are used above and below script to represent vowels. These must be distinguished from cantillation, which are keys to pronunciation and syntax.
Irish uses the acute to indicate that a vowel is long: á, é, í, ó, ú. It is known as síneadh fada "long sign" or simply fada "long" in Irish. In the older Gaelic type, overdots are used to indicate lenition of a consonant: ḃ, ċ, ḋ, ḟ, ġ, ṁ, ṗ, ṡ, ṫ.
Italian mainly has the acute and the grave (à, è/é, ì, ò/ó, ù), typically to indicate a stressed syllable that would not be stressed under the normal rules of pronunciation but sometimes also to distinguish between words that are otherwise spelled the same way (e.g. "e", and; "è", is). Despite its rare use, Italian orthography allows the circumflex (î) too, in two cases: it can be found in old literary context (roughly up to 19th century) to signal a syncope (fêro→fecero, they did), or in modern Italian to signal the contraction of ″-ii″ due to the plural ending -i whereas the root ends with another -i; e.g., s. demonio, p. demonii→demonî; in this case the circumflex also signals that the word intended is not demoni, plural of "demone" by shifting the accent (demònî, "devils"; dèmoni, "demons").
Occitan has the following composite characters: á, à, ç, é, è, í, ï, ó, ò, ú, ü, n·h, s·h. The acute and the grave indicate stress and vowel height, the cedilla marks the result of a historical palatalization, the diaeresis indicates either a hiatus, or that the letter u is pronounced when the graphemes gü, qü are followed by e or i, and the interpunct (·) distinguishes the different values of nh/n·h and sh/s·h (i.e., that the letters are supposed to be pronounced separately, not combined into "ny" and "sh").
Portuguese has the following composite characters: à, á, â, ã, ç, é, ê, í, ó, ô, õ, ú. The acute and the circumflex indicate stress and vowel height, the grave indicates crasis, the tilde represents nasalization, and the cedilla marks the result of a historical palatalization.
Acutes are also used in Slavic language dictionaries and textbooks to indicate lexical stress, placed over the vowel of the stressed syllable. This can also serve to disambiguate meaning (e.g., in Russian писа́ть (pisáť) means "to write", but пи́сать (písať) means "to piss"), or "бо́льшая часть" (the biggest part) vs "больша́я часть" (the big part).
Spanish uses the acute and the diaeresis. The acute is used on a vowel in a stressed syllable in words with irregular stress patterns. It can also be used to "break up" a diphthong as in tío (pronounced [ˈti.o], rather than [ˈtjo] as it would be without the accent). Moreover, the acute can be used to distinguish words that otherwise are spelled alike, such as si ("if") and sí ("yes"), and also to distinguish interrogative and exclamatory pronouns from homophones with a different grammatical function, such as donde/¿dónde? ("where"/"where?") or como/¿cómo? ("as"/"how?"). The acute may also used be used to avoid typographical ambiguity, as in 1 ó 2 ("1 or 2"; without the acute this might be interpreted as "1 0 2". The diaeresis is used only over u (ü) for it to be pronounced [w] in the combinations gue and gui, where u is normally silent, for example ambigüedad. In poetry, the diaeresis may be used on i and u as a way to force a hiatus. As foreshadowed above, in nasal ñ the tilde (squiggle) is not considered a diacritic sign at all, but a composite part of a distinct glyph, with its own chapter in the dictionary: a glyph that denotes the 15th letter of the Spanish alphabet.
Swedish uses the acute to show non-standard stress, for example in kafé (café) and resumé (résumé). This occasionally helps resolve ambiguities, such as ide (hibernation) versus idé (idea). In these words, the acute is not optional. Some proper names use non-standard diacritics, such as Carolina Klüft and Staël von Holstein. For foreign loanwords the original accents are strongly recommended, unless the word has been infused into the language, in which case they are optional. Hence crème fraîche but ampere. Swedish also has the letters å, ä, and ö, but these are considered distinct letters, not a and o with diacritics.
Tamil does not have any diacritics in itself, but uses the Arabic numerals 2, 3 and 4 as diacritics to represent aspirated, voiced, and voiced-aspirated consonants when Tamil script is used to write long passages in Sanskrit.
Vietnamese uses the acute (dấu sắc), the grave (dấu huyền), the tilde (dấu ngã), the underdot (dấu nặng) and the hoi (dấu hỏi) on vowels as tone indicators.
Welsh uses the circumflex, diaeresis, acute, and grave on its seven vowels a, e, i, o, u, w, y. The most common is the circumflex (which it calls to bach, meaning "little roof", or acen grom "crooked accent", or hirnod "long sign") to denote a long vowel, usually to disambiguate it from a similar word with a short vowel. The rarer grave accent has the opposite effect, shortening vowel sounds that would usually be pronounced long. The acute accent and diaeresis are also occasionally used, to denote stress and vowel separation respectively. The w-circumflex and the y-circumflex are among the most commonly accented characters in Welsh, but unusual in languages generally, and were until recently very hard to obtain in word-processed and HTML documents.
Several languages that are not written with the Roman alphabet are transliterated, or romanized, using diacritics. Examples:
Arabic has several romanisations, depending on the type of the application, region, intended audience, country, etc. many of them extensively use diacritics, e.g., some methods use an underdot for rendering emphatic consonants (ṣ, ṭ, ḍ, ẓ, ḥ). The macron is often used to render long vowels. š is often used for /ʃ/, ġ for /ɣ/.
Sanskrit, as well as many of its descendants, like Hindi and Bengali, uses a lossless romanization system. This includes several letters with diacritical markings, such as the macron (ā, ī, ū), over- and underdots (ṛ, ḥ, ṃ, ṇ, ṣ, ṭ, ḍ) as well as a few others (ś, ñ).
Possibly the greatest number of combining diacritics required to compose a valid character in any Unicode language is 8, for the "well-known grapheme cluster in Tibetan and Ranjana scripts",ཧྐྵྨླྺྼྻྂ, or HAKṢHMALAWARAYAṀ.
It is U+0F67 U+0F90 U+0FB5 U+0FA8 U+0FB3 U+0FBA U+0FBC U+0FBB U+0F82, or:
TIBETAN LETTER HA + TIBETAN SUBJOINED LETTER KA + TIBETAN SUBJOINED LETTER SSA + TIBETAN SUBJOINED LETTER MA + TIBETAN SUBJOINED LETTER LA + TIBETAN SUBJOINED LETTER FIXED-FORM WA + TIBETAN SUBJOINED LETTER FIXED-FORM RA + TIBETAN SUBJOINED LETTER FIXED-FORM YA + TIBETAN SIGN NYI ZLA NAA DA.
Some users have explored the limits of rendering in web browsers and other software by "decorating" words with multiple nonsensical diacritics per character. The result is called "Zalgo text". The composed bogus characters and words can be copied and pasted normally via the system clipboard.
^Henry Sweet (1877) A Handbook of Phonetics, p 174–175: "Even letters with accents and diacritics [...] being only cast for a few founts, act practically as new letters. [...] We may consider the h in sh and th simply as a diacritic written for convenience on a line with the letter it modifies."
^Nestle, Eberhard (1888). Syrische Grammatik mit Litteratur, Chrestomathie und Glossar. Berlin: H. Reuther's Verlagsbuchhandlung. [translated to English as Syriac grammar with bibliography, chrestomathy and glossary, by R. S. Kennedy. London: Williams & Norgate 1889].
^Coakley, J. F. (2002). Robinson's Paradigms and Exercises in Syriac Grammar (5th ed.). Oxford University Press. ISBN 978-0-19-926129-1.
^Michaelis, Ioannis Davidis (1784). Grammatica Syriaca.
^Academia de la Llingua Asturiana, Gramática de la Llingua Asturiana, tercera edición, Oviedo: Academia de la Llingua Asturiana (2001), ISBN 84-8168-310-8, "Archived copy"(PDF). Archived from the original(PDF) on 2011-05-25. Retrieved 2011-06-07.CS1 maint: Archived copy as title (link) (page 16, section 1.2)
The Arabic script has numerous diacritics, including i'jam ⟨إِعْجَام⟩ - i‘jām, consonant pointing and tashkil ⟨تَشْكِيل⟩ - tashkīl, supplementary diacritics. The latter include the ḥarakāt ⟨حَرَكَات⟩ vowel marks - singular: ḥarakah ⟨حَرَكَة⟩.
The Arabic script is an impure abjad, where short consonants and long vowels are represented by letters but short vowels and consonant length are not generally indicated in writing. Tashkīl is optional to represent missing vowels and consonant length. Modern Arabic is always written with the i‘jām - consonant pointing, but only religious texts, children's books and works for learners are written with the full tashkīl - vowel guides and consonant length.
A bar or stroke is a modification consisting of a line drawn through a grapheme. It may be used as a diacritic to derive new letters from old ones, or simply as an addition to make a grapheme more distinct from others. It can take the form of a vertical bar, slash, or crossbar.
A stroke is sometimes drawn through the numbers 7 (horizontal overbar) and 0 (overstruck foreslash), to make them more distinguishable from the number 1 and the letter O, respectively.
For the specific usages of various letters with bars and strokes, see their individual articles.
In Unicode, there are bars at U+0335 ◌̵ COMBINING SHORT STROKE OVERLAY, U+0336 ◌̶ COMBINING LONG STROKE OVERLAY, U+0337 ◌̷ COMBINING SHORT SOLIDUS OVERLAY, and U+0338 ◌̸ COMBINING LONG SOLIDUS OVERLAY.
The Batak script, natively known as surat Batak, surat na sampulu sia (the nineteen letters), or si-sia-sia, is a writing system used to write the Austronesian Batak languages spoken by several million people on the Indonesian island of Sumatra. The script may be derived from the Kawi and Pallava script, ultimately derived from the Brahmi script of India, or from the hypothetical Proto-Sumatran script influenced by Pallava.
The comma ( , ) is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe ( ' ) or single closing quotation mark in many typefaces, but it differs from them in being placed on the baseline of the text. Some typefaces render it as a small line, slightly curved or straight but inclined from the vertical, or with the appearance of a small, filled-in figure 9.
The comma is used in many contexts and languages, mainly for separating parts of a sentence such as clauses, and items in lists, particularly when there are three or more items listed. The word comma comes from the Greek κόμμα (kómma), which originally meant a cut-off piece; specifically, in grammar, a short clause.A comma-shaped mark is used as a diacritic in several writing systems, and is considered distinct from the cedilla. The rough and smooth breathings (ἁ, ἀ) appear above the letter in Ancient Greek, and the comma diacritic appears below the letter in Latvian, Romanian, and Livonian.
The diaeresis ( dy-ERR-i-sis; plural: diaereses; also spelled diæresis or dieresis and also known as the tréma or trema) and the umlaut are two homoglyphic diacritical marks that consist of two dots ( ¨ ) placed over a letter, usually a vowel. When that letter is an i or a j, the diacritic replaces the tittle: ï.The diaeresis and the umlaut are diacritics marking two distinct phonological phenomena. The diaeresis represents the phenomenon also known as diaeresis or hiatus in which a vowel letter is pronounced separately from an adjacent vowel and not as part of a digraph or diphthong. The umlaut (), in contrast, indicates a sound shift.
These two diacritics originated separately; the diaeresis is considerably older.
Nevertheless, in modern computer systems using Unicode, the umlaut and diaeresis diacritics are identically encoded, e.g. U+00E4 ä LATIN SMALL LETTER A WITH DIAERESIS (HTML ä · ä) represents both a-umlaut and a-diaeresis (much like the hyphen-minus code point represents both a hyphen and often a minus sign).
The same symbol is also used as a diacritic in other cases, distinct from both diaeresis and umlaut. For example, in Albanian and Tagalog ë represents a schwa.
In typesetting, the hook or tail is a diacritic mark attached to letters in many alphabets. In shape it looks like a hook and it can be attached below as a descender, on top as an ascender and sometimes to the side. The orientation of the hook can change its meaning: when it is below and curls to the left it can be interpreted as a palatal hook, and when it curls to the right is called hook tail or tail and can be interpreted as a retroflex hook. It should not be mistaken with the hook above, a diacritical mark used in Vietnamese, or the rhotic hook, used in the International Phonetic Alphabet.
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin alphabet. It was devised by the International Phonetic Association in the late 19th century as a standardized representation of the sounds of spoken language. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators and translators.The IPA is designed to represent only those qualities of speech that are part of oral language: phones, phonemes, intonation and the separation of words and syllables. To represent additional qualities of speech, such as tooth gnashing, lisping, and sounds made with a cleft lip and cleft palate, an extended set of symbols, the extensions to the International Phonetic Alphabet, may be used.IPA symbols are composed of one or more elements of two basic types, letters and diacritics. For example, the sound of the English letter ⟨t⟩ may be transcribed in IPA with a single letter, [t], or with a letter plus diacritics, [t̺ʰ], depending on how precise one wishes to be. Often, slashes are used to signal broad or phonemic transcription; thus, /t/ is less specific than, and could refer to, either [t̺ʰ] or [t], depending on the context and language.
Occasionally letters or diacritics are added, removed or modified by the International Phonetic Association. As of the most recent change in 2005, there are 107 letters, 52 diacritics and four prosodic marks in the IPA. These are shown in the current IPA chart, also posted below in this article and at the website of the IPA.
Inverted breve or arch is a diacritical mark, shaped like the top half of a circle ( ̑ ), that is, like an upside-down breve (˘). It looks similar to the circumflex (ˆ), but the circumflex has a sharp tip; the inverted breve is rounded: compare Â â Ê ê Î î Ô ô Û û (circumflex) versus Ȃ ȃ Ȇ ȇ Ȋ ȋ Ȏ ȏ Ȗ ȗ (inverted breve).
Inverted breve can occur above or below the letter. It is not used in any natural language alphabet, but only as a phonetic indicator though it is identical in form to the Ancient Greek circumflex.
The Khmer script (Khmer: អក្សរខ្មែរ; IPA: [ʔaʔsɑː kʰmaːe]) is an abugida (alphasyllabary) script used to write the Khmer language (the official language of Cambodia). It is also used to write Pali in the Buddhist liturgy of Cambodia and Thailand.
The Khmer script was adapted from the Pallava script, which ultimately descended from the Brahmi script, which was used in southern India and South East Asia during the 5th and 6th centuries AD. The oldest dated inscription in Khmer was found at Angkor Borei District in Takéo Province south of Phnom Penh and dates from 611. The modern Khmer script differs somewhat from precedent forms seen on the inscriptions of the ruins of Angkor. The Thai and Lao scripts are descendants of an older form of the Khmer script.
Khmer is written from left to right. Words within the same sentence or phrase are generally run together with no spaces between them. Consonant clusters within a word are "stacked", with the second (and occasionally third) consonant being written in reduced form under the main consonant. Originally there were 35 consonant characters, but modern Khmer uses only 33. Each character represents a consonant sound together with an inherent vowel, either â or ô; in many cases, in the absence of another vowel mark, the inherent vowel is to be pronounced after the consonant.
There are some independent vowel characters, but vowel sounds are more commonly represented as dependent vowels, additional marks accompanying a consonant character, and indicating what vowel sound is to be pronounced after that consonant (or consonant cluster). Most dependent vowels have two different pronunciations, depending in most cases on the inherent vowel of the consonant to which they are added. There are also a number of diacritics used to indicate further modifications in pronunciation. The script also includes its own numerals and punctuation marks.
Linguolabials or apicolabials are consonants articulated by placing the tongue tip or blade against the upper lip, which is drawn downward to meet the tongue. They represent one extreme of a coronal articulatory continuum which extends from linguolabial to subapical palatal places of articulation. Cross-linguistically, linguolabial consonants are very rare, but they do not represent a particularly exotic combination of articulatory configurations, unlike click consonants or ejectives. They are found in a cluster of languages in Vanuatu, in the Kajoko dialect of Bijago in Guinea-Bissau, and in Umotína (a recently extinct Bororoan language of Brazil), Hawaiian Creole English and as paralinguistic sounds elsewhere. They are also relatively common in disordered speech, and the diacritic is specifically provided for in the extensions to the IPA.
Linguolabial consonants are transcribed in the International Phonetic Alphabet by adding the "seagull" diacritic, U+033C ̼ COMBINING SEAGULL BELOW, to the corresponding alveolar consonant, or with the apical diacritic, U+033A ̺ COMBINING INVERTED BRIDGE BELOW, on the corresponding bilabial consonant.
A macron () is a diacritical mark: it is a straight bar (¯) placed above a letter, usually a vowel. Its name derives from Greek, Modern μακρόν (makrón), meaning 'long', since it was originally used to mark long or heavy syllables in Greco-Roman metrics. It now more often marks a long vowel. In the International Phonetic Alphabet, the macron is used to indicate a mid-tone; the sign for a long vowel is instead a modified triangular colon ⟨ː⟩.
The opposite is the breve ⟨˘⟩, which marks a short or light syllable or a short vowel.
The open-mid central unrounded vowel, or low-mid central unrounded vowel, is a type of vowel sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ⟨ɜ⟩. The IPA symbol is not the digit ⟨3⟩ or the Cyrillic small letter Ze (з). The symbol is instead a reversed Latinized variant of the lowercase epsilon, ɛ. The value was specified only in 1993; until then, it had been transcribed ⟨ɛ̈⟩.
The ⟨ɜ⟩ letter may be used with a raising diacritic ⟨ɜ̝⟩, to denote the mid central unrounded vowel. It may also be used with a lowering diacritic ⟨ɜ̞⟩, to denote the near-open central unrounded vowel.
Conversely, ⟨ə⟩, the symbol for the mid central vowel may be used with a lowering diacritic ⟨ə̞⟩ to denote the open-mid central unrounded vowel, although that is more accurately written with an additional unrounding diacritic ⟨ə̞͑⟩ to explicitly denote the lack of rounding (the canonical value of IPA ⟨ə⟩ is undefined for rounding).
Similarly, the symbol for the near-open central vowel with a raising diacritic ⟨ɐ̝⟩ may be used instead of ⟨ɜ⟩. Again, an additional unrounding diacritic ⟨ɐ̝͑⟩ may be used to explicitly denote the unroundedness, as the canonical value of IPA ⟨ɐ⟩ is also not definited for rounding.
The tilde ( or ; ˜ or ~) is a grapheme with several uses. The name of the character came into English from Spanish and from Portuguese, which in turn came from the Latin titulus, meaning "title" or "superscription".The reason for the name was that it was originally written over a letter as a scribal abbreviation, as a "mark of suspension", shown as a straight line when used with capitals. Thus the commonly used words Anno Domini were frequently abbreviated to Ao Dñi, an elevated terminal with a suspension mark placed over the "n". Such a mark could denote the omission of one letter or several letters. This saved on the expense of the scribe's labour and the cost of vellum and ink. Medieval European charters written in Latin are largely made up of such abbreviated words with suspension marks and other abbreviations; only uncommon words were given in full. The tilde has since been applied to a number of other uses as a diacritic mark or a character in its own right. These are encoded in Unicode at U+0303 ◌̃ COMBINING TILDE and U+007E ~ TILDE (as a spacing character), and there are additional similar characters for different roles. In lexicography, the latter kind of tilde and the swung dash (⁓) are used in dictionaries to indicate the omission of the entry word.
A tittle or superscript dot is a small distinguishing mark, such as a diacritic or the dot on a lowercase i or j. The tittle is an integral part of the glyph of i and j, but diacritic dots can appear over other letters in various languages. In most languages, the tittle of i or j is omitted when a diacritic is placed in the tittle's usual position (as í or ĵ), but not when the diacritic appears elsewhere (as į, ɉ).
Í, í (i-acute) is a letter in the Faroese, Hungarian, Icelandic, Czech, Slovak, and Tatar languages, where it often indicates a long /i/ vowel. This form also appears in Catalan, Irish, Italian, Occitan, Portuguese, Spanish, Galician, Leonese, Navajo, and Vietnamese language as a variant of the letter “i”. In Latin, the long i ⟨ꟾ⟩ is used instead of ⟨í⟩ for a long i-vowel.
Ü, or ü, is a character that typically represents a close front rounded vowel [y]. It is classified as a separate letter in several extended Latin alphabets (including Azeri, Estonian, Hungarian and Turkish), but as the letter U with an umlaut/diaeresis in others such as Catalan, French, Galician, German, Occitan and Spanish. Although not a part of their alphabet, it also appears in languages such as Swedish when retained in foreign names and words, and Swedish spells said letter and sound in domestic words solely as Y. A small number of Dutch words also use this as a diaeresis.
This page is based on a Wikipedia article written by authors
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.