Mutual intelligibility

In linguistics, mutual intelligibility is a relationship between languages or dialects in which speakers of different but related varieties can readily understand each other without prior familiarity or special effort. It is sometimes used as an important criterion for distinguishing languages from dialects, although sociolinguistic factors are often also used.

Intelligibility between languages can be asymmetric, with speakers of one understanding more of the other than speakers of the other understanding the first. When it is relatively symmetric, it is characterized as "mutual". It exists in differing degrees among many related or geographically proximate languages of the world, often in the context of a dialect continuum.

Linguistic distance is the name for the concept of calculating a measurement for how different languages are from one another. The higher the linguistic distance, the lower the mutual intelligibility. One common metric used is the Levenshtein distance.


For individuals to achieve moderate proficiency or understanding in a language (called L2) other than their first language (L1) typically requires considerable time and effort through study and/or practical application[1]. Advanced speakers of a second language typically aim for intelligibility, especially in situations where they work in their second language and the necessity of being understood is high[2]. However, many groups of languages are partly mutually intelligible, i.e. most speakers of one language find it relatively easy to achieve some degree of understanding in the related language(s). Often the languages are genetically related, and they are likely to be similar to each other in grammar, vocabulary, pronunciation, or other features.

Intelligibility among languages can vary between individuals or groups within a language population according to their knowledge of various registers and vocabulary in their own language, their exposure to additional related languages, their interest in or familiarity with other cultures, the domain of discussion, psycho-cognitive traits, the mode of language used (written vs. oral), and other factors.

Mutually intelligible languages or varieties of one language

There is no formal distinction between two distinct languages and two varieties of a single language, but some linguists use mutual intelligibility as one of the primary factors in deciding between the two cases.[3][4]

Some linguists[5] claim that mutual intelligibility is, ideally at least, the primary criterion separating languages from dialects. On the other hand, speakers of closely related languages can often communicate with each other; thus there are varying degrees of mutual intelligibility, and often other criteria are also used. As an example, in the case of a linear dialect continuum that shades gradually between varieties, where speakers near the center can understand the varieties at both ends, but speakers at one end cannot understand the speakers at the other end, the entire chain is often considered a single language. If the central varieties then die out and only the varieties at both ends survive, they may then be reclassified as two languages, even though no actual language change has occurred.

In addition, political and social conventions often override considerations of mutual intelligibility. For example, the varieties of Chinese are often considered a single language even though there is usually no mutual intelligibility between geographically separated varieties. Another similar example would be varieties of Arabic. In contrast, there is often significant intelligibility between different Scandinavian languages, but as each of them has its own standard form, they are classified as separate languages.[6] There is also significant intelligibility between Thai languages of different regions of Thailand.

To deal with the conflict in cases such as Arabic, Chinese and German, the term Dachsprache (a sociolinguistic "umbrella language") is sometimes seen: Chinese and German are languages in the sociolinguistic sense even though some speakers cannot understand each other without recourse to a standard or prestige form.

Asymmetric intelligibility

Asymmetric intelligibility refers to two languages that are considered partially mutually intelligible, but where one group of speakers has more difficulty understanding the other language than the other way around. There can be various reasons for this. If, for example, one language is related to another but has simplified its grammar, the speakers of the original language may understand the simplified language, but less vice versa. For example, Dutch speakers tend to find it easier to understand Afrikaans than vice versa as a result of Afrikaans's simplified grammar.[7]

Perhaps the most common reason for apparent asymmetric intelligibility is that speakers of one variety have more exposure to the other than vice versa. For example, speakers of Scottish English have frequent exposure to standard American English through movies and TV programs, whereas speakers of American English have little exposure to Scottish English; hence, American English speakers often find it difficult to understand Scottish English or, especially, Scots (which differs significantly from standard Scottish English), whereas Scots tend to have few problems understanding standard American English.

Northern Germanic languages spoken in Scandinavia form a dialect continuum where two furthermost dialects have almost no mutual intelligibility. As such, spoken Danish and Swedish normally have low mutual intelligibility,[7] but Swedes in the Öresund region (including Malmö and Helsingborg), across a strait from the Danish capital Copenhagen, understand Danish somewhat better, largely due to the proximity of the region to Danish-speaking areas (see Mutual intelligibility in North Germanic languages). While Norway was under Danish rule, the Bokmål written standard of Norwegian originates from Dano-Norwegian, a koiné language that evolved among the urban elite in Norwegian cities during the later years of the union. Additionally, Norwegian assimilated a considerable amount of Danish vocabulary as well as traditional Danish expressions.[7] As a consequence, spoken mutual intelligibility is not reciprocal.[7]

Similarly, in Germany and Italy, standard German or Italian speakers may have great difficulty understanding the "dialects" from regions other than their own, but virtually all "dialect" speakers learn the standard languages in school and from the media.

List of mutually intelligible languages

Below is an incomplete list of fully and partially mutually intelligible varieties sometimes considered languages.

Written and spoken forms

Spoken forms mainly

Written forms mainly

  • Icelandic: Faroese[54]
  • French: With some Romance languages.[55]
  • German: Dutch. Standard Dutch and Standard German show a limited degree of mutual intelligibility when written. One study concluded that when concerning written language, Dutch speakers could translate 50.2% of the provided German words correctly, while the German test subjects were able to translate 41.9% of the Dutch equivalents correctly. In terms of orthography, 22% of the vocabulary of Dutch and German is identical or near identical (including most commonly used vocabulary). The Levenshtein distance between written Dutch and German is 50.4% as opposed to 61.7% between English and Dutch.[56][57] The spoken languages are much more difficult to understand for both, with studies showing Dutch speakers having slightly less difficulty in understanding German speakers than vice versa, though it remains unclear whether this asymmetry has to do with prior knowledge of the language (Dutch people being more exposed to German than vice versa), better knowledge of another related language (English) or any other non-linguistic reasons.[56][58]

List of mutually intelligible varieties

Dialects or registers of one language sometimes considered separate languages

  • Assyrian Neo-Aramaic: Chaldean Neo-Aramaic,[61] Lishana Deni,[62] Hértevin,[63] Bohtan Neo-Aramaic,[64] and Senaya[65][66] – the standard forms are structurally the same language and thus mutually intelligible to a significant degree. As such, these varieties are occasionally considered dialects of Assyrian Neo-Aramaic. They are only considered separate languages for geographical, political and religious reasons.[46]
  • Catalan: Valencian – the standard forms are structurally the same language, and hence mutually intelligible. They are considered separate languages only for political reasons.[67]
  • Hindustani: Hindi and Urdu[68] – the standard forms are separate registers of structurally the same language (called Hindustani or Hindi-Urdu), with Hindi written in Devanagari and Urdu mainly in a Perso-Arabic script, and with Hindi drawing its vocabulary mainly from Sanskrit and Urdu drawing it mainly from Persian and Arabic.
  • Malay: Indonesian (the normative register regulated by Indonesia)[69] and Standard Malay/Malaysian (the normative register used in Malaysia, Brunei and Singapore). Both varieties are based on the same material basis and hence are generally mutually intelligible, despite the numerous lexical differences.[70]
  • Serbo-Croatian: Bosnian, Croatian, Montenegrin, and Serbian – the national varieties are structurally the same language, all constituting normative registers of the Shtokavian dialect, and hence mutually intelligible,[4] spoken and written (if the Latin alphabet is used).[71] For political reasons, they are sometimes considered distinct languages.[72]
    • The non-standard vernacular dialects of Serbo-Croatian (Kajkavian, Chakavian and Torlakian) are considered by some linguists to be separate, albeit closely related languages to Serbo-Croatian (Shtokavian), rather than its dialects, as Shtokavian has its own set of subdialects. Their mutual intelligibility varies greatly, both between the dialects themselves as well as with other languages. Kajkavian has higher mutual intelligibility with Slovenian than the national varieties of Shtokavian, while Chakavian has a low mutual intelligibility with either, in part due to large number of loanwords from Venetian. Torlakian (considered a subdialect of Serbian Old Shtokavian by some) has a significant level of mutual intelligibility with Macedonian and Bulgarian.[73] All South Slavic languages in effect form a large dialect continuum of gradually mutually intelligible varieties depending on distance between the areas where they are spoken.
  • Tagalog: Filipino[74] – the national language of the Philippines, Filipino, is based almost entirely on the Luzon dialects of Tagalog
  • Romanian: Moldovan – the standard forms are structurally the same language, and hence mutually intelligible. They are considered separate languages only for political reasons.[75] Moldovan does, however, have more foreign loanwords from Russian and Ukrainian due to historical East Slavic influence on the region but not to the extent where those would affect mutual intelligiblity.

Dialect continua


Because of the difficulty of imposing boundaries on a continuum, various counts of the Romance languages are given; in The Linguasphere register of the world’s languages and speech communities David Dalby lists 23 based on mutual intelligibility:[76]

See also


Bangba language

Bangba (Abangba) is a minor Ubangian language of DRC Congo. It is not close enough to other Eastern Ngbaka languages for mutual intelligibility.

Banka language

Banka, or Bankagooma, is a minor Mande language of Mali. There is a reasonable degree of mutual intelligibility with Duun.

Cakfem-Mushere language

Cakfem-Mushere is an Afro-Asiatic language cluster spoken in Plateau State, Nigeria. Dialects are Kadim-Kaban and Jajura. Mutual intelligibility with Mwaghavul is high.Mushere is very close to Mwaghavul.Cakfem has two varieties, namely Outer Cakfem and Inner Cakfem. Outer Cakfem is very similar to Mwaghavul, but Inner Cakfem is more divergent, as Mwaghavul speakers have trouble understanding Inner Cakfem.


The term dialect (from Latin dialectus, dialectos, from the Ancient Greek word διάλεκτος, diálektos, "discourse", from διά, diá, "through" and λέγω, légō, "I speak") is used in two distinct ways to refer to two different types of linguistic phenomena:

One usage refers to a variety of a language that is a characteristic of a particular group of the language's speakers. Under this definition, the dialects or varieties of a particular language are closely related and, despite their differences, are most often largely mutually intelligible, especially if close to one another on the dialect continuum. The term is applied most often to regional speech patterns, but a dialect may also be defined by other factors, such as social class or ethnicity. A dialect that is associated with a particular social class can be termed a sociolect, a dialect that is associated with a particular ethnic group can be termed an ethnolect, and a geographical/regional dialect may be termed a regiolect (alternative terms include 'regionalect', 'geolect', and 'topolect'). According to this definition, any variety of a given language constitutes "a dialect", including any standard varieties. In this case, the distinction between the "standard language" (i.e. the "standard" dialect of a particular language) and the "nonstandard" dialects of the same language is often arbitrary and based on social, political, cultural, or historical considerations. In a similar way, the definitions of the terms "language" and "dialect" may overlap and are often subject to debate, with the differentiation between the two classifications often grounded in arbitrary and/or sociopolitical motives.

The other usage of the term "dialect", often deployed in colloquial settings, refers (often somewhat pejoratively) to a language that is socially subordinated to a regional or national standard language, often historically cognate or genetically related to the standard language, but not actually derived from the standard language. In other words, it is not an actual variety of the "standard language" or dominant language, but rather a separate, independently evolved but often distantly related language. In this sense, unlike in the first usage, the standard language would not itself be considered a "dialect", as it is the dominant language in a particular state or region, whether in terms of linguistic prestige, social or political status, official status, predominance or prevalence, or all of the above. Meanwhile, under this usage, the "dialects" subordinate to the standard language are generally not variations on the standard language but rather separate (but often loosely related) languages in and of themselves. Thus, these "dialects" are not dialects or varieties of a particular language in the same sense as in the first usage; though they may share roots in the same family or subfamily as the standard language and may even, to varying degrees, share some mutual intelligibility with the standard language, they often did not evolve closely with the standard language or within the same linguistic subgroup or speech community as the standard language and instead may better fit various parties’ criteria for a separate language.For example, most of the various regional Romance languages of Italy, often colloquially referred to as Italian "dialects", are, in fact, not actually derived from modern standard Italian, but rather evolved from Vulgar Latin separately and individually from one another and independently of standard Italian, long prior to the diffusion of a national standardized language throughout what is now Italy. These various Latin-derived regional languages are, therefore, in a linguistic sense, not truly "dialects" or varieties of the standard Italian language, but are instead better defined as their own separate languages. Conversely, with the spread of standard Italian throughout Italy in the 20th century, regional versions or varieties of standard Italian have developed, generally as a mix of national standard Italian with a substratum of local regional languages and local accents. While "dialect" levelling has increased the number of standard Italian speakers and decreased the number of speakers of other languages native to Italy, Italians in different regions have developed variations of standard Italian particular to their region. These variations on standard Italian, known as regional Italian, would thus more appropriately be called "dialects" in accordance with the first linguistic definition of "dialect", as they are in fact derived partially or mostly from standard Italian.A dialect is distinguished by its vocabulary, grammar, and pronunciation (phonology, including prosody). Where a distinction can be made only in terms of pronunciation (including prosody, or just prosody itself), the term accent may be preferred over dialect. Other types of speech varieties include jargons, which are characterized by differences in lexicon (vocabulary); slang; patois; pidgins; and argots. The particular speech patterns used by an individual are termed an idiolect.


Dialectology (from Greek διάλεκτος, dialektos, "talk, dialect"; and -λογία, -logia) is the scientific study of linguistic dialect, a sub-field of sociolinguistics. It studies variations in language based primarily on geographic distribution and their associated features. Dialectology treats such topics as divergence of two local dialects from a common ancestor and synchronic variation.

Dialectologists are ultimately concerned with grammatical, lexical and phonological features that correspond to regional areas. Thus they usually deal not only with populations that have lived in certain areas for generations, but also with migrant groups that bring their languages to new areas (see language contact).

Commonly studied concepts in dialectology include the problem of mutual intelligibility in defining languages and dialects; situations of diglossia, where two dialects are used for different functions; dialect continua including a number of partially mutually intelligible dialects; and pluricentrism, where what is essentially a single genetic language exists as two or more standard varieties.

Hans Kurath and William Labov are among the most prominent researchers in this field.

Gbantu language

Gbantu (Gwantu) is a dialect cluster of Plateau languages in Nigeria. Gwantu is the name of the principal dialect; the others are Numana, Janda and Numbu. Nka, spoken by the Aninka, may be another, or perhaps a distinct language, as mutual intelligibility with Numana is low. Nunku is apparently a dialect of Mada, not Gbantu.

Igboid languages

Igboid languages constitute a branch of the Volta–Niger language family. It includes Ekpeye, Ukwuani, and the Igbo languages:


Igbo: Igbo, Ikwerre, Ika, Enuani Izii–Ikwo–Ezza–Mgbo, and Ogba.Williamson and Blench conclude that the Igbo languages (Igboid apart from Ekpeye) form a "language cluster" and that they are somewhat mutually intelligible. However, mutual intelligibility is only marginal, even among the Izii–Ikwo–Ezaa–Mgbo languages.

Jiangyin dialect

Jiangyin dialect (江阴话) is a Northern Wu Chinese dialect spoken in the city of Jiangyin in Jiangsu province. Jiangyin dialect is a member of the Wu Chinese Taihu Wu family of dialects, which means the inhabitants speak a dialect similar to that of nearby Wuxi, Changzhou, Suzhou, and Shanghai. Jiangyin dialect itself is of the Piling variety, related to the Changzhou dialect. Jiangyin dialect has the highest degree of mutual intelligibility with the dialects of the closest neighboring cities of Changzhou and Wuxi but also has a fairly large degree of mutual intelligibility with the dialects of nearby Suzhou and Shanghai. As one travels south towards Wuxi away from the urban center of Jiangyin, Jiangyin dialect gradually becomes more and more closer sounding to the Wuxi dialect.

A book called A collection of Jiangyin dialect has been published.

Komyandaret language

Komyandaret is a poorly documented Papuan language of Indonesia. It is close enough to Tsaukambo that there is some mutual intelligibility.

Lexical similarity

In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

There are different ways to define the lexical similarity and the results vary accordingly. For example, Ethnologue's method of calculation consists in comparing a standardized set of wordlists and counting those forms that show similarity in both form and meaning. Using such a method, English was evaluated to have a lexical similarity of 60% with German and 27% with French.

Lexical similarity can be used to evaluate the degree of genetic relationship between two languages. Percentages higher than 85% usually indicate that the two languages being compared are likely to be related dialects.The lexical similarity is only one indication of the mutual intelligibility of the two languages, since the latter also depends on the degree of phonetical, morphological, and syntactical similarity. The variations due to differing wordlists weigh on this. For example, lexical similarity between French and English is considerable in lexical fields relating to culture, whereas their similarity is smaller as far as basic (function) words are concerned. Unlike mutual intelligibility, lexical similarity can only be symmetrical.

Mayogo language

Mayogo (also spelled Mayugo, Majugu, Maigo, Maiko, Mayko and also called Kiyogo) is a Ubangian language spoken by the Day (Angai), Maambi, and Mangbele peoples of DRC Congo. It is not close enough to Bangba, the most closely related language, for mutual intelligibility.

Momogun language

Kimaragang (Marigang), Tobilung, and Rungus are varieties of a single Austronesian language of Sabah, Malaysia. The three varieties share moderate mutual intelligibility. Children are not learning it well in some areas.Minokok is an endonym of the Sugut Dusun. Their language may be a dialect of Rungus. Their number are not included in the population estimate at right.

Namuyi language

Namuyi (Namuzi; autonym: na˥˦mʑi˥˦) is a poorly attested Tibeto-Burman and more specifically Naic language of Sichuan and Tibet. It has also been classified as Qiangic by Sun Hongkai (2001) and Guillaume Jacques (2011). The eastern and western dialects have low mutual intelligibility. In Sichuan, it is spoken in Muli County and Mianning County. The language is endangered and the number of speakers with fluency is decreasing year by year, as most teenagers do not speak the language, instead speaking the Sichuan dialect of Chinese.

Nyangatom language

Nyangatom (also Inyangatom, Donyiro, Dongiro, Idongiro) is a Nilo-Saharan language (Eastern Sudanic, Nilotic) spoken in Ethiopia by the Nyangatom people. It is an oral language only, having no working orthography at present. Related languages include Toposa and Turkana, both of which have a level of mutual intelligibility; Blench (2012) counts it as a dialect of Turkana.

Ogoni languages

The Ogoni languages, or Kegboid languages, are the five languages of the Ogoni people of Rivers State, Nigeria.

They fall into two clusters, East and West, with a limited degree of mutual intelligibility between members of each cluster. The Ogoni think of the cluster members as separate languages, however.

The classification of the Ogoni languages is as follows:

East: Khana and Tẹẹ, with around 500,000 speakers between them, and Gokana, with about 130,000.

West: Eleme, with about 70,000 speakers, and Baan, with around 17,500.

Purum language

Purum (Purum Naga) is a Kuki-Chin language of India. However, speakers consider themselves to be ethnic Naga people, rather than part of the Kuki and Chin ethnic groups. Peterson (2017) classifies Purum as part of the Northwestern branch of Kuki-Chin. According Ethnologue, Purum shares a high degree of mutual intelligibility with Kharam.

Simte language

Simte is a Kuki-Chin language of India. It is spoken primarily by the Simte people in Northeastern India, who are concentrated in Manipur and adjacent areas of Mizoram and Assam. The dialect spoken in Manipur exhibits partial mutual intelligibility with the other Kukish dialects of the area including Thadou, Hmar, Vaiphei, Paite, Kom and Gangte. It is written in Latin script.

Western Romance languages

Western Romance languages are one of the two subdivisions of a proposed subdivision of the Romance languages based on the La Spezia–Rimini line. They include the Gallo-Romance and Iberian-Romance branches as well as northern Italian. The subdivision is based mainly on the use of the "s" for pluralization, the weakening of some consonants and the pronunciation of “Soft C” as /t͡s/ (often later /s/) rather than /t͡ʃ/ as in Italian and Romanian. but that makes the categorization highly problematic because there is a much higher lexical similarity between all dialects of Italian and French than between French and Spanish. There is also a much higher morphological, orthographic and phonetic similarity between Spanish and Italian dialects than between Italian and French.

Based on mutual intelligibility, Dalby counts a dozen languages: Portuguese, Spanish, Asturian-Leonese, Aragonese, Catalan, Galician, Gascon, Provençal, Gallo-Wallon, French, Franco-Provençal, Romansh, and Ladin. This classification criterion is however problematic, due to the much higher levels of mutual intelligibility between Italic and Iberian languages than between either of these with Gallo-Romance languages.Some classifications include Italo-Dalmatian; the resulting clade is generally called Italo-Western Romance. Other classifications place Italo-Dalmatian with Eastern Romance.

Sardinian does not fit into either Western or Eastern Romance, and may have split off before either.

Today the four most-widely spoken standardized Western Romance languages are Spanish (c. 410 million native speakers), Portuguese (c. 220 million native, another 45 million or so second-language speakers, mainly in Lusophone Africa), French (c. 80 million native speakers, another 70 million or so second-language speakers, mostly in Francophone Africa), and Catalan (c. 7.2 million native). Many of these languages have large numbers of non-native speakers; this is especially the case for French, in widespread use throughout West Africa as a lingua franca.

Zhanjiang dialect

The Zhanjiang dialect is a dialect mostly spoken in Zhanjiang in Guangdong, China. It is a sub-dialect of Leizhou Min. It is considered to be part of Southern Min though it has little mutual intelligibility with Minnan Proper (Hokkien-Taiwanese) and Teochew.

