Language family

A language family is a group of languages related through descent from a common ancestral language or parental language, called the proto-language of that family. The term "family" reflects the tree model of language origination in historical linguistics, which makes use of a metaphor comparing languages to people in a biological family tree, or in a subsequent modification, to species in a phylogenetic tree of evolutionary taxonomy. Linguists therefore describe the daughter languages within a language family as being genetically related.[1]

According to Ethnologue the 7,097 living human languages are distributed in 141 different language families.[2] A "living language" is simply one that is used as the primary form of communication of a group of people. There are also many dead and extinct languages, as well as some that are still insufficiently studied to be classified, or are even unknown outside their respective speech communities.

Membership of languages in a language family is established by comparative linguistics. Sister languages are said to have a "genetic" or "genealogical" relationship. The latter term is older.[3] Speakers of a language family belong to a common speech community. The divergence of a proto-language into daughter languages typically occurs through geographical separation, with the original speech community gradually evolving into distinct linguistic units. Individuals belonging to other speech communities may also adopt languages from a different language family through the language shift process.[4]

Genealogically related languages present shared retentions; that is, features of the proto-language (or reflexes of such features) that cannot be explained by chance or borrowing (convergence). Membership in a branch or group within a language family is established by shared innovations; that is, common features of those languages that are not found in the common ancestor of the entire family. For example, Germanic languages are "Germanic" in that they share vocabulary and grammatical features that are not believed to have been present in the Proto-Indo-European language. These features are believed to be innovations that took place in Proto-Germanic, a descendant of Proto-Indo-European that was the source of all Germanic languages.

Primary Human Language Families Map
Contemporary distribution (2005 map) of the world's major language families (in some cases geographic groups of families).
For greater detail, see Distribution of languages on Earth.

Structure of a family

Language families can be divided into smaller phylogenetic units, conventionally referred to as branches of the family because the history of a language family is often represented as a tree diagram. A family is a monophyletic unit; all its members derive from a common ancestor, and all attested descendants of that ancestor are included in the family. (Thus, the term family is analogous to the biological term clade.)

Some taxonomists restrict the term family to a certain level, but there is little consensus in how to do so. Those who affix such labels also subdivide branches into groups, and groups into complexes. A top-level (i.e., the largest) family is often called a phylum or stock. The closer the branches are to each other, the closer the languages will be related. This means if a branch off of a proto-language is 4 branches down and there is also a sister language to that fourth branch, then the two sister languages are more closely related to each other than to that common ancestral proto-language.

The term macrofamily or superfamily is sometimes applied to proposed groupings of language families whose status as phylogenetic units is generally considered to be unsubstantiated by accepted historical linguistic methods. For example, the Celtic, Germanic, Slavic, Italic, and Indo-Iranian language families are branches of a larger Indo-European language family. There is a remarkably similar pattern shown by the linguistic tree and the genetic tree of human ancestry[5] that was verified statistically.[6] Languages interpreted in terms of the putative phylogenetic tree of human languages are transmitted to a great extent vertically (by ancestry) as opposed to horizontally (by spatial diffusion).[7]

Dialect continua

Some closely knit language families, and many branches within larger families, take the form of dialect continua in which there are no clear-cut borders that make it possible to unequivocally identify, define, or count individual languages within the family. However, when the differences between the speech of different regions at the extremes of the continuum are so great that there is no mutual intelligibility between them, as occurs in Arabic, the continuum cannot meaningfully be seen as a single language.

A speech variety may also be considered either a language or a dialect depending on social or political considerations. Thus, different sources, especially over time, can give wildly different numbers of languages within a certain family. Classifications of the Japonic family, for example, range from one language (a language isolate with dialects) to nearly twenty—until the classification of Ryukyuan as separate languages within a Japonic language family rather than dialects of Japanese, the Japanese language itself was considered a language isolate and therefore the only language in its family.


Most of the world's languages are known to be related to others. Those that have no known relatives (or for which family relationships are only tentatively proposed) are called language isolates, essentially language families consisting of a single language. An example is Basque. In general, it is assumed that language isolates have relatives or had relatives at some point in their history but at a time depth too great for linguistic comparison to recover them.

A language isolated in its own branch within a family, such as Albanian and Armenian within Indo-European, is often also called an isolate, but the meaning of the word "isolate" in such cases is usually clarified with a modifier. For instance, Albanian and Armenian may be referred to as an "Indo-European isolate". By contrast, so far as is known, the Basque language is an absolute isolate: it has not been shown to be related to any other language despite numerous attempts. Another well-known isolate is Mapudungun, the Mapuche language from the Araucanían language family in Chile. A language may be said to be an isolate currently but not historically if related but now extinct relatives are attested. The Aquitanian language, spoken in Roman times, may have been an ancestor of Basque, but it could also have been a sister language to the ancestor of Basque. In the latter case, Basque and Aquitanian would form a small family together. (Ancestors are not considered to be distinct members of a family.)


A proto-language can be thought of as a mother language (not to be confused with a mother tongue, which is one that a specific person has been exposed to from birth[8]), being the root which all languages in the family stem from. The common ancestor of a language family is seldom known directly since most languages have a relatively short recorded history. However, it is possible to recover many features of a proto-language by applying the comparative method, a reconstructive procedure worked out by 19th century linguist August Schleicher. This can demonstrate the validity of many of the proposed families in the list of language families. For example, the reconstructible common ancestor of the Indo-European language family is called Proto-Indo-European. Proto-Indo-European is not attested by written records and so is conjectured to have been spoken before the invention of writing.

Other classifications of languages


Shared innovations, acquired by borrowing or other means, are not considered genetic and have no bearing with the language family concept. It has been asserted, for example, that many of the more striking features shared by Italic languages (Latin, Oscan, Umbrian, etc.) might well be "areal features". However, very similar-looking alterations in the systems of long vowels in the West Germanic languages greatly postdate any possible notion of a proto-language innovation (and cannot readily be regarded as "areal", either, since English and continental West Germanic were not a linguistic area). In a similar vein, there are many similar unique innovations in Germanic, Baltic and Slavic that are far more likely to be areal features than traceable to a common proto-language. But legitimate uncertainty about whether shared innovations are areal features, coincidence, or inheritance from a common ancestor, leads to disagreement over the proper subdivisions of any large language family.

A sprachbund is a geographic area having several languages that feature common linguistic structures. The similarities between those languages are caused by language contact, not by chance or common origin, and are not recognized as criteria that define a language family. An example of a sprachbund would be the Indian subcontinent.

Contact languages

The concept of language families is based on the historical observation that languages develop dialects, which over time may diverge into distinct languages. However, linguistic ancestry is less clear-cut than familiar biological ancestry, in which species do not crossbreed.[9] It is more like the evolution of microbes, with extensive lateral gene transfer: Quite distantly related languages may affect each other through language contact, which in extreme cases may lead to languages with no single ancestor, whether they be creoles or mixed languages. In addition, a number of sign languages have developed in isolation and appear to have no relatives at all. Nonetheless, such cases are relatively rare and most well-attested languages can be unambiguously classified as belonging to one language family or another, even if this family's relation to other families is not known.

Afroasiatic languages

Afroasiatic (Afro-Asiatic), also known as Afrasian and in older sources as Hamito-Semitic (Chamito-Semitic) or Semito-Hamitic, is a large language family of about 300 languages. It includes languages spoken predominantly in West Asia, North Africa, the Horn of Africa and parts of the Sahel.

Afroasiatic languages have over 495 million native speakers, the fourth largest number of any language family (after Indo-European, Sino-Tibetan and Niger–Congo). The phylum has six branches: Berber, Chadic, Cushitic, Egyptian, Omotic and Semitic.

By far the most widely spoken Afroasiatic language or dialect continuum is Arabic. A de facto group of distinct language varieties within the Semitic branch, the languages that evolved from Proto-Arabic have around 313 million native speakers, concentrated primarily in West Asia and North Africa.Other widely spoken Afroasiatic languages include:

Hausa (Chadic), the dominant language of northern Nigeria and southern Niger, spoken as a first language by over 40 million people and used as a lingua franca by another 20 million across West Africa and the Sahel.

Oromo (Cushitic), spoken in Ethiopia and Kenya by around 34 million people

Amharic (Semitic), spoken in Ethiopia, with over 25 million native speakers in addition to millions of other Ethiopians speaking it as a second language.

Somali (Cushitic), spoken by 15 million people in Somalia, Djibouti, eastern Ethiopia and northeastern Kenya.

Afar (Cushitic), spoken by around 7.5 million people in Ethiopia, Djibouti, and Eritrea.

Shilha (Berber), spoken by around 7 million people in Morocco.

Tigrinya (Semitic), spoken by around 6.9 million people in Eritrea and Ethiopia

Kabyle (Berber), spoken by around 5.6 million people in Algeria.

Hebrew (Semitic), spoken by around 5 million people native speakers and 4 million second language speakers in Israel and worldwide; also the liturgical language of Jewish people.

Central Atlas Tamazight (Berber), spoken by around 4.6 million people in Morocco.

Riffian (Berber), spoken by around 4.2 million people in Morocco.

Gurage languages (Semitic), a group of languages spoken by more than 2 million people in Ethiopia.In addition to languages spoken today, Afroasiatic includes several important ancient languages, such as Ancient Egyptian, which forms a distinct branch of the family, and Akkadian, Biblical Hebrew and Old Aramaic, all of which are from the Semitic branch.

The original homeland of the Afroasiatic family, and when the parent language (i.e. Proto-Afroasiatic) was spoken, are yet to be agreed upon by historical linguists. Proposed locations include North Africa, the Horn of Africa, the Eastern Sahara and the Levant (see below).

Algonquian languages

The Algonquian languages ( or ;

also Algonkian) are a subfamily of Native American languages which includes most of the languages in the Algic language family. The name of the Algonquian language family is distinguished from the orthographically similar Algonquin dialect of the indigenous Ojibwe language (Chippewa), which is a senior member of the Algonquian language family. The term "Algonquin" has been suggested to derive from the Maliseet word elakómkwik (pronounced [ɛlæˈɡomoɡwik]), "they are our relatives/allies". A number of Algonquian languages, like many other Native American languages, are now extinct.

Speakers of Algonquian languages stretch from the east coast of North America to the Rocky Mountains. The proto-language from which all of the languages of the family descend, Proto-Algonquian, was spoken around 2,500 to 3,000 years ago. There is no scholarly consensus about where this language was spoken.

Austroasiatic languages

The Austroasiatic languages, formerly known as Mon–Khmer, are a large language family of Mainland Southeast Asia, also scattered throughout India, Bangladesh, Nepal and the southern border of China, with around 117 million speakers. The name Austroasiatic comes from a combination of the Latin words for "South" and "Asia", hence "South Asia". Of these languages, only Vietnamese, Khmer, and Mon have a long-established recorded history, and only Vietnamese and Khmer have official status as modern national languages (in Vietnam and Cambodia, respectively). In Myanmar, the Wa language is the de facto official language of Wa State. Santali is recognized as a regional language of India. The rest of the languages are spoken by minority groups and have no official status.

Ethnologue identifies 168 Austroasiatic languages. These form thirteen established families (plus perhaps Shompen, which is poorly attested, as a fourteenth), which have traditionally been grouped into two, as Mon–Khmer and Munda. However, one recent classification posits three groups (Munda, Nuclear Mon-Khmer and Khasi–Khmuic) while another has abandoned Mon–Khmer as a taxon altogether, making it synonymous with the larger family.Austroasiatic languages have a disjunct distribution across India, Bangladesh, Nepal and Southeast Asia, separated by regions where other languages are spoken. They appear to be the extant autochthonous languages of Southeast Asia (if Andaman islands are not included), with the neighboring Indo-Aryan, Kra–Dai, Hmong-Mien, Dravidian, Austronesian, and Sino-Tibetan languages being the result of later migrations.

Austronesian languages

The Austronesian languages are a language family that is widely dispersed throughout Maritime Southeast Asia, Madagascar and the islands of the Pacific Ocean, with a few members in continental Asia. Austronesian languages are spoken by about 386 million people (4.9%), making it the fifth-largest language family by number of speakers. Major Austronesian languages with the highest number of speakers are Malay (Indonesian and Malaysian), Javanese, and Filipino (Tagalog). The family contains 1,257 languages, which is the second most of any language family.Similarities between the languages spoken in the Malay Archipelago and the Pacific Ocean were first observed in 1706 by the Dutch scholar Adriaan Reland. In the 19th century, researchers (e.g. Wilhelm von Humboldt, Herman van der Tuuk)

started to apply the comparative method to the Austronesian languages, but the first comprehensive and extensive study on the phonological history of the Austronesian language family including a reconstruction of Proto-Austronesian lexicon was made by the German linguist Otto Dempwolff. The term Austronesian itself was coined by Wilhelm Schmidt (German austronesisch, based on Latin auster "south wind" and Greek νῆσος "island"). The family is aptly named, as the vast majority of Austronesian languages are spoken on islands: only a few languages, such as Malay and the Chamic languages, are indigenous to mainland Asia. Many Austronesian languages have very few speakers, but the major Austronesian languages are spoken by tens of millions of people and one Austronesian language, Malay (including both Indonesian and Malaysian variants), is spoken by 250 million people, making it the 8th most spoken language in the world. Approximately twenty Austronesian languages are official in their respective countries (see the list of major and official Austronesian languages).

Different sources count languages differently, but Austronesian and Niger–Congo are the two largest language families in the world by the number of languages they contain, each having roughly one-fifth of the total languages counted in the world. The geographical span of Austronesian was the largest of any language family before the spread of Indo-European in the colonial period, ranging from Madagascar off the southeastern coast of Africa to Easter Island in the eastern Pacific. Hawaiian, Rapa Nui, and Malagasy (spoken on Madagascar) are the geographic outliers of the Austronesian family.

According to Robert Blust (1999), Austronesian is divided in several primary branches, all but one of which are found exclusively on Taiwan. The Formosan languages of Taiwan are grouped into as many as nine first-order subgroups of Austronesian. All Austronesian languages spoken outside Taiwan (including its offshore Yami language) belong to the Malayo-Polynesian branch, sometimes called Extra-Formosan.

Most Austronesian languages lack a long history of written attestation, making the feat of reconstructing earlier stages – up to distant Proto-Austronesian – all the more remarkable. The oldest inscription in the Cham language, the Đông Yên Châu inscription, but with the influence of Indo-European languages, dated to the mid-6th century AD at the latest, is also the first attestation of any Austronesian language.


Celtic languages

The Celtic languages (usually , but sometimes ) are a group of related languages descended from Proto-Celtic. They form a branch of the Indo-European language family. The term "Celtic" was first used to describe this language group by Edward Lhuyd in 1707, following Paul-Yves Pezron, who made the explicit link between the Celts described by classical writers and the Welsh and Breton languages.During the 1st millennium BC, Celtic languages were spoken across much of Europe and in Asia Minor. Today, they are restricted to the northwestern fringe of Europe and a few diaspora communities. There are four living languages: Welsh, Breton, Irish and Scottish Gaelic. All are minority languages in their respective countries, though there are continuing efforts at revitalisation. Welsh is an official language in Wales and Irish is an official language of Ireland and of the European Union. Welsh is the only Celtic language not classified as endangered by UNESCO. The Cornish and Manx languages have gone extinct in modern times.

Irish and Scottish form the Goidelic languages, while Welsh and Breton are Brittonic. Beyond that there is no agreement on the subdivisions of the Celtic language family They may be divided into and Continental group and Insular group, or else into P-Celtic and Q-Celtic. All the living languages are Insular, since Breton, the only Celtic language spoken in continental Europe, is descended from the language of settlers from Britain. The Continental Celtic languages, such as Celtiberian, Galatian and Gaulish, are all extinct.

The Celtic languages have a rich literary tradition. The earliest specimens of written Celtic are Lepontic inscriptions from the 6th century BC in the Alps. Early Continental inscriptions used Italic and Paleohispanic scripts. Between the 4th and 8th centuries, Irish and Pictish were occasionally written in an original script, Ogham, but the Latin alphabet came to be used for all Celtic languages. Welsh has had a continuous literary tradition from the 6th century AD.

Chinese language

Chinese (simplified Chinese: 汉语; traditional Chinese: 漢語; pinyin: Hànyǔ; literally: "Han language"; or Chinese: 中文; pinyin: Zhōngwén; literally: "Chinese writing") is a group of related, but in many cases not mutually intelligible, language varieties, forming the Sinitic branch of the Sino-Tibetan language family. Chinese is spoken by the Han majority and many minority ethnic groups in China. About 1.2 billion people (around 16% of the world's population) speak some form of Chinese as their first language.

The varieties of Chinese are usually described by native speakers as dialects of a single Chinese language, but linguists note that they are as diverse as a language family. The internal diversity of Chinese has been likened to that of the Romance languages, but may be even more varied. There are between 7 and 13 main regional groups of Chinese (depending on classification scheme), of which the most spoken by far is Mandarin (about 960 million, e.g. Southwestern Mandarin), followed by Wu (80 million, e.g. Shanghainese), Min (70 million, e.g. Southern Min), Yue (60 million, e.g. Cantonese), etc. Most of these groups are mutually unintelligible, and even dialect groups within Min Chinese may not be mutually intelligible. Some, however, like Xiang and certain Southwest Mandarin dialects, may share common terms and a certain degree of intelligibility. All varieties of Chinese are tonal and analytic.

Standard Chinese (Pǔtōnghuà/Guóyǔ/Huáyǔ) is a standardized form of spoken Chinese based on the Beijing dialect of Mandarin. It is the official language of China and Taiwan, as well as one of the four official languages of Singapore. It is one of the six official languages of the United Nations. The written form of the standard language (中文; Zhōngwén), based on the logograms known as Chinese characters (汉字/漢字; Hànzì), is shared by literate speakers of otherwise unintelligible dialects.

The earliest Chinese written records are Shang dynasty-era oracle inscriptions, which can be traced back to 1250 BCE. The phonetic categories of Archaic Chinese can be reconstructed from the rhymes of ancient poetry. During the Northern and Southern dynasties period, Middle Chinese went through several sound changes and split into several varieties following prolonged geographic and political separation. Qieyun, a rime dictionary, recorded a compromise between the pronunciations of different regions. The royal courts of the Ming and early Qing dynasties operated using a koiné language (Guanhua) based on Nanjing dialect of Lower Yangtze Mandarin. Standard Chinese was adopted in the 1930s, and is now the official language of both the People's Republic of China and Taiwan.

Dravidian languages

The Dravidian languages are a language family spoken mainly in southern India and parts of eastern and central India, as well as in Sri Lanka with small pockets in southwestern Pakistan, southern Afghanistan, Nepal, Bangladesh and Bhutan, and overseas in other countries such as Malaysia, Philippines, Indonesia and Singapore. The Dravidian languages with the most speakers are Telugu, Tamil, Kannada and Malayalam. There are also small groups of Dravidian-speaking scheduled tribes, who live outside Dravidian-speaking areas, such as the Kurukh in Eastern India and Gondi in Central India. The Dravidian languages are spoken by more than 215 million people in India, Pakistan, and Sri Lanka.Though some scholars have argued that the Dravidian languages may have been brought to India by migrations in the fourth or third millennium BCE or even earlier, the Dravidian languages cannot easily be connected to any other language family, and they could well be indigenous to India.Epigraphically the Dravidian languages have been attested since the 2nd century BCE as Tamil-Brahmi script on the cave walls discovered in the Madurai and Tirunelveli districts of Tamil Nadu. Only two Dravidian languages are spoken exclusively outside the post-1947 state of India: Brahui in the Balochistan region of Pakistan and Afghanistan; and Dhangar, a dialect of Kurukh, in parts of Nepal and Bhutan. Dravidian place names along the Arabian Sea coasts and Dravidian grammatical influence such as clusivity in the Indo-Aryan languages, namely Marathi, Konkani, Gujarati, Marwari, and Sindhi, suggest that Dravidian languages were once spoken more widely across the Indian subcontinent.


French Sign Language family

The French Sign Language (LSF) or Francosign family is a language family of sign languages which includes French Sign Language and American Sign Language.

The FSL family descends from Old French Sign Language, which developed among the deaf community in Paris. The earliest mention of Old French Sign Language is by the abbé Charles-Michel de l'Épée in the late 17th century, but it could have existed for centuries prior. Several European sign languages, such as Russian Sign Language, derive from it, as does American Sign Language, established when French educator Laurent Clerc taught his language at the American School for the Deaf. Others, such as Spanish Sign Language, are thought to be related to French Sign Language even if they are not directly descendant from it.

Germanic languages

The Germanic languages are a branch of the Indo-European language family spoken natively by a population of about 515 million people mainly in Europe, North America, Oceania, and Southern Africa.

The West Germanic languages include the three most widely spoken Germanic languages: English with around 360-400 million native speakers; German, with over 100 million native speakers; and Dutch, with 24 million native speakers. Other West Germanic languages include Afrikaans, an offshoot of Dutch, with over 7.1 million native speakers; Low German, considered a separate collection of unstandardized dialects, with roughly 0.3 million native speakers and probably 6.7–10 million people who can understand it (at least 5 million in Germany and 1.7 million in the Netherlands); Yiddish, once used by approximately 13 million Jews in pre-World War II Europe and Scots, both with 1.5 million native speakers; Limburgish varieties with roughly 1.3 million speakers along the Dutch–Belgian–German border; and the Frisian languages with over 0.5 million native speakers in the Netherlands and Germany.

The main North Germanic languages are Danish, Faroese, Icelandic, Norwegian and Swedish, which have a combined total of about 20 million speakers.

The East Germanic branch included Gothic, Burgundian, and Vandalic, all of which are now extinct. The last to die off was Crimean Gothic, spoken until the late 18th century in some isolated areas of Crimea.The SIL Ethnologue lists 48 different living Germanic languages, 41 of which belong to the Western branch and six to the Northern branch; it places Riograndenser Hunsrückisch German in neither of the categories, but it is often considered a German dialect by linguists. The total number of Germanic languages throughout history is unknown as some of them, especially the East Germanic languages, disappeared during or after the Migration Period. Some of the West Germanic languages also did not survive past the Migration Period, including Lombardic. As a result of World War II, the German language suffered a significant loss of Sprachraum, as well as moribundness and extinction of several of its dialects. In the 21st century, its dialects are dying out anyway due to Standard German gaining primacy.The common ancestor of all of the languages in this branch is called Proto-Germanic, also known as Common Germanic, which was spoken in about the middle of the 1st millennium BC in Iron Age Scandinavia. Proto-Germanic, along with all of its descendants, is characterised by a number of unique linguistic features, most famously the consonant change known as Grimm's law. Early varieties of Germanic entered history with the Germanic tribes moving south from Scandinavia in the 2nd century BC, to settle in the area of today's northern Germany and southern Denmark.

Goidelic languages

The Goidelic or Gaelic languages (Irish: teangacha Gaelacha; Scottish Gaelic: cànanan Goidhealach; Manx: çhengaghyn Gaelgagh) form one of the two groups of Insular Celtic languages, the other being the Brittonic languages.Goidelic languages historically formed a dialect continuum stretching from Ireland through the Isle of Man to Scotland. There are three modern Goidelic languages: Irish (Gaeilge), Scottish Gaelic (Gàidhlig) and Manx (Gaelg), the last of which died out in the 20th century but has since been revived to some degree.

Indo-European languages

The Indo-European languages are a language family of several hundred related languages and dialects.There are about 445 living Indo-European languages, according to the estimate by Ethnologue, with over two thirds (313) of them belonging to the Indo-Iranian branch. The Indo-European languages with the greatest numbers of native speakers are Spanish, English, Hindustani (Hindi-Urdu), Portuguese, Bengali, Punjabi, and Russian, each with over 100 million speakers, with German, French, Marathi, Italian, and Persian also having more than 50 million. Today, nearly 42% of the human population (3.2 billion) speaks an Indo-European language as a first language, by far the highest of any language family.

The Indo-European family includes most of the modern languages of Europe; notable exceptions include Hungarian, Turkish, Finnish, Estonian, Basque, Maltese, and Sami. The Indo-European family is also represented in Asia with the exception of East and Southeast Asia. It was predominant in ancient Anatolia (present-day Turkey), the ancient Tarim Basin (present-day Northwest China) and most of Central Asia until the medieval Turkic and Mongol invasions. Outside Eurasia, Indo-European languages are dominant in the Americas and much of Oceania and Africa, having reached there during the Age of Discovery and later periods. Indo-European languages are also most commonly present as minority languages or second languages in countries where other families are dominant.

With written evidence appearing since the Bronze Age in the form of the Anatolian languages and Mycenaean Greek, the Indo-European family is significant to the field of historical linguistics as possessing the second-longest recorded history, after the Afroasiatic family, although certain language isolates, such as Sumerian, Elamite, Hurrian, Hattian, and Kassite are recorded earlier.

All Indo-European languages are descendants of a single prehistoric language, reconstructed as Proto-Indo-European, spoken sometime in the Neolithic era. Although no written records remain, aspects of the culture and religion of the Proto-Indo-Europeans can also be reconstructed from the related cultures of ancient and modern Indo-European speakers who continue to live in areas to where the Proto-Indo-Europeans migrated from their original homeland. Several disputed proposals link Indo-European to other major language families. Although they are written in Semitic Old Assyrian, the Hittite loanwords and names found in the Kültepe texts are the oldest record of any Indo-European language.During the nineteenth century, the linguistic concept of Indo-European languages was frequently used interchangeably with the racial concepts of Aryan and Japhetite.

Indo-Iranian languages

The Indo-Iranian languages, Indo-Iranic languages, or Aryan languages constitute the largest and southeasternmost extant branch of the Indo-European language family. It has more than 1.5 billion speakers, stretching from Europe (Romani), Turkey (Kurdish and Zaza–Gorani) and the Caucasus (Ossetian) eastward to Xinjiang (Sarikoli) and Assam (Assamese), and south to Sri Lanka (Sinhalese) and the Maldives (Maldivian). Furthermore, there are large communities of Indo-Iranian speakers in northwestern Europe (the United Kingdom), North America and Australia.

The common ancestor of all of the languages in this family is called Proto-Indo-Iranian—also known as Common Aryan—which was spoken in approximately the late 3rd millennium BC. The three branches of the modern Indo-Iranian languages are Indo-Aryan, Iranian, and Nuristani. Additionally, sometimes a fourth independent branch, Dardic, is posited, but recent scholarship in general places Dardic languages as archaic members of the Indo-Aryan branch.

Kra–Dai languages

The Kra–Dai languages (also known as Tai–Kadai, Daic and Kadai) are a language family of tonal languages found in southern China, Northeast India and Southeast Asia. They include Thai and Lao, the national languages of Thailand and Laos respectively. Around 93 million people speak Kra–Dai languages, 60% of whom speak Thai. Ethnologue lists 95 languages in the family, with 62 of these being in the Tai branch.The high diversity of Kra–Dai languages in southern China points to the origin of the Kra–Dai language family in southern China. The Tai branch moved south into Southeast Asia only around 1000 AD.

Genetic and linguistic analysis show great homogeneity between Kra-Dai speaking people in Thailand.

Languages of India

Languages spoken in India belong to several language families, the major ones being the Indo-Aryan languages spoken by 78.05% of Indians and the Dravidian languages spoken by 19.64% of Indians. Languages spoken by the remaining 2.31% of the population belong to the Austroasiatic, Sino-Tibetan, Tai-Kadai, and a few other minor language families and isolates. India (780) has the world's second highest number of languages, after Papua New Guinea (839).Article 343 of the Indian constitution stated that the official language of the Union should become Hindi in Devanagari script instead of the extant English. But this was thought to be a violation of the constitution's guarantee of federalism. Later, a constitutional amendment, The Official Languages Act, 1963, allowed for the continuation of English in the Indian government indefinitely until legislation decides to change it. The form of numerals to be used for the official purposes of the Union were supposed to be the international form of Indian numerals, distinct from the numerals used in most English-speaking countries. Despite the misconceptions, Hindi is not the national language of India. The Constitution of India does not give any language the status of national language.The Eighth Schedule of the Indian Constitution lists 22 languages, which have been referred to as scheduled languages and given recognition, status and official encouragement. In addition, the Government of India has awarded the distinction of classical language to Kannada, Malayalam, Odia, Sanskrit, Tamil and Telugu. Classical language status is given to languages which have a rich heritage and independent nature.

According to the Census of India of 2001, India has 122 major languages and 1599 other languages. However, figures from other sources vary, primarily due to differences in definition of the terms "language" and "dialect". The 2001 Census recorded 30 languages which were spoken by more than a million native speakers and 122 which were spoken by more than 10,000 people. Two contact languages have played an important role in the history of India: Persian and English. Persian was the court language during the Mughal period in India. It reigned as an administrative language for several centuries until the era of British colonisation. English continues to be an important language in India. It is used in higher education and in some areas of the Indian government. Hindi, the most commonly spoken language in India today, serves as the lingua franca across much of North and Central India. However, there have been anti-Hindi agitations in South India, most notably in the state of Tamil Nadu and Karnataka. Maharashtra, West Bengal, Assam, Punjab and other non-Hindi regions have also started to voice concerns about Hindi.

Niger–Congo languages

The Niger–Congo languages constitute one of the world's major language families and Africa's largest in terms of geographical area, number of speakers, and number of distinct languages. It is generally considered to be the world's largest language family in terms of distinct languages, ahead of Austronesian, although this is complicated by the ambiguity about what constitutes a distinct language; the number of named Niger–Congo languages listed by Ethnologue is 1,540. It is the third-largest language family in the world by number of native speakers, comprising around 700 million people as of 2015. Within Niger–Congo, the Bantu languages alone account for 350 million people (2015), or half the total Niger–Congo speaking population.

One of the characteristics common to most Niger–Congo languages (the Atlantic–Congo languages) is the use of a noun class system. The most widely spoken Niger–Congo languages by number of native speakers are Yoruba, Igbo, Fula and Shona. The most widely spoken by number of speakers is Swahili.While the ultimate genetic unity of Niger–Congo is widely accepted (aside from Dogon, Mande and a few other languages), the internal cladistic structure of Niger–Congo is not well established. Its primary branches are Dogon, Mande, Ijo, Katla, Rashad and Atlantic–Congo.

Turkic languages

The Turkic languages are a language family of at least thirty-five documented languages, spoken by the Turkic peoples of Eurasia from Eastern Europe, the Caucasus, Central Asia, and West Asia all the way to North Asia (particularly in Siberia) and East Asia. The Turkic languages originated in a region of East Asia spanning Western China to Mongolia, where Proto-Turkic is thought to have been spoken, according to one estimate, around 2,500 years ago, from where they expanded to Central Asia and farther west during the first millennium.Turkic languages are spoken as a native language by some 170 million people, and the total number of Turkic speakers, including second language speakers, is over 200 million. The Turkic language with the greatest number of speakers is Turkish, spoken mainly in Anatolia and the Balkans; its native speakers account for about 40% of all Turkic speakers.Characteristic features of Turkish, such as vowel harmony, agglutination, and lack of grammatical gender, are universal within the Turkic family. There is also a high degree of mutual intelligibility among the various Oghuz languages, which include Turkish, Azerbaijani, Turkmen, Qashqai, Gagauz, Balkan Gagauz Turkish, and Oghuz-influenced Crimean Tatar. Although methods of classification vary, the Turkic languages are usually considered to be divided equally into two branches: Oghur, the only surviving member of which is Chuvash, and Common Turkic, which includes all other Turkic languages including the Oghuz subbranch.

Turkic languages show some similarities with the Mongolic, Tungusic, Koreanic, and Japonic languages. These similarities led some linguists to propose an Altaic language family, though this proposal is not widely accepted. Apparent similarities with the Uralic languages family even caused these families to be regarded as one for a long time under the hypothesis of Ural-Altaic languages. However, there has not been sufficient evidence to conclude the existence of either of these macrofamilies, the shared characteristics between the languages being attributed presently to extensive prehistoric language contact.

Uralic languages

The Uralic languages (; sometimes called Uralian languages ) form a language family of 38 languages spoken by approximately 25 million people, predominantly in Northern Eurasia and in the European Union. The Uralic languages with the most native speakers are Hungarian, Finnish, and Estonian, which are official languages in Hungary, Finland, and Estonia, respectively. Other Uralic languages with significant numbers of speakers are Erzya, Moksha, Mari, Udmurt, and Komi, which are officially recognized languages in various regions of Russia.

The name "Uralic" derives from the fact that the areas where the languages are spoken are found on both sides of the Ural Mountains. Also, the original homeland (Urheimat) is commonly hypothesized to be in the vicinity of the Urals.

Finno-Ugric is sometimes used as a synonym for Uralic, though Finno-Ugric is widely understood to exclude the Samoyedic languages. Scholars who do not accept the traditional notion that Samoyedic split first from the rest of the Uralic family may treat the terms as synonymous.

