A vocabulary is a set of familiar words within a person's language. A vocabulary, usually developed with age, serves as a useful and fundamental tool for communication and acquiring knowledge. Acquiring an extensive vocabulary is one of the largest challenges in learning a second language.

Definition and usage

Vocabulary is commonly defined as "all the words known and used by a particular person".[1] Knowing a word, however, is not as simple as merely being able to recognize or use it. There are several aspects of word knowledge that are used to measure word knowledge.

Productive and receptive knowledge

The first major distinction that must be made when evaluating word knowledge is whether the knowledge is productive (also called achieve) or receptive (also called receive); even within those opposing categories, there is often no clear distinction. Words that are generally understood when heard or read or seen constitute a person's receptive vocabulary. These words may range from well-known to barely known (see degree of knowledge below). A person's receptive vocabulary is the larger of the two. For example, although a young child may not yet be able to speak, write, or sign, he or she may be able to follow simple commands and appear to understand a good portion of the language to which they are exposed. In this case, the child's receptive vocabulary is likely tens, if not hundreds of words, but his or her active vocabulary is zero. When that child learns to speak or sign, however, the child's active vocabulary begins to increase. It is also possible for the productive vocabulary to be larger than the receptive vocabulary, for example in a second-language learner who has learned words through study rather than exposure, and can produce them, but has difficulty recognizing them in conversation.

Productive vocabulary, therefore, generally refers to words that can be produced within an appropriate context and match the intended meaning of the speaker or signer. As with receptive vocabulary, however, there are many degrees at which a particular word may be considered part of an active vocabulary. Knowing how to pronounce, sign, or write a word does not necessarily mean that the word that has been used correctly or accurately reflects the intended message; but it does reflect a minimal amount of productive knowledge.

Degree of knowledge

Within the receptive–productive distinction lies a range of abilities that are often referred to as degree of knowledge. This simply indicates that a word gradually enters a person's vocabulary over a period of time as more aspects of word knowledge are learnt. Roughly, these stages could be described as:

  1. Never encountered the word.
  2. Heard the word, but cannot define it.
  3. Recognize the word due to context or tone of voice.
  4. Able to use the word and understand the general and/or intended meaning, but cannot clearly explain it.
  5. Fluent with the word – its use and definition.

Depth of knowledge

The differing degrees of word knowledge imply a greater depth of knowledge, but the process is more complex than that. There are many facets to knowing a word, some of which are not hierarchical so their acquisition does not necessarily follow a linear progression suggested by degree of knowledge. Several frameworks of word knowledge have been proposed to better operationalise this concept. One such framework includes nine facets:

  1. orthography – written form
  2. phonology – spoken form
  3. reference – meaning
  4. semantics – concept and reference
  5. register – appropriacy of use or cash register
  6. collocation – lexical neighbours
  7. word associations
  8. syntax – grammatical function
  9. morphology – word parts

Definition of word

Words can be defined in various ways, and estimates of vocabulary size differ depending on the definition used. The most common definition is that of a lemma (the uninflected or dictionary form; this includes walk, but not walks, walked or walking). Most of the time lemmas do not include proper nouns (names of people, places, companies, etc). Another definition often used in research of vocabulary size is that of word family. These are all the words that can be derived from a ground word (e.g., the words effortless, effortlessly, effortful, effortfully are all part of the word family effort). Estimates of vocabulary size range from as high as 200 thousand to as low as 10 thousand, depending on the definition used. [2]

Types of vocabulary

Listed in order of most ample to most limited:[3][4]

Reading vocabulary

A literate person's vocabulary is all the words they can recognize when reading. This is generally the largest type of vocabulary simply because a reader tends to be exposed to more words by reading than by listening.

Listening vocabulary

A person's listening vocabulary is all the words they can recognize when listening to speech. People may still understand words they were not exposed to before using cues such as tone, gestures, the topic of discussion and the social context of the conversation.

Speaking vocabulary

A person's speaking vocabulary is all the words they use in speech. It is likely to be a subset of the listening vocabulary. Due to the spontaneous nature of speech, words are often misused. This misuse, though slight and unintentional, may be compensated by facial expressions and tone of voice.

Writing vocabulary

Words are used in various forms of writing from formal essays to social media feeds. Many written words do not commonly appear in speech. Writers generally use a limited set of words when communicating. For example, if there are a number of synonyms, a writer may have a preference as to which of them to use, and they are unlikely to use technical vocabulary relating to a subject in which they have no knowledge or interest.

Focal vocabulary

Focal vocabulary is a specialized set of terms and distinctions that is particularly important to a certain group: those with a particular focus of experience or activity. A lexicon, or vocabulary, is a language's dictionary: its set of names for things, events, and ideas. Some linguists believe that lexicon influences people's perception of things, the Sapir–Whorf hypothesis. For example, the Nuer of Sudan have an elaborate vocabulary to describe cattle. The Nuer have dozens of names for cattle because of the cattle's particular histories, economies, and environments. This kind of comparison has elicited some linguistic controversy, as with the number of "Eskimo words for snow". English speakers with relevant specialised knowledge can also display elaborate and precise vocabularies for snow and cattle when the need arises.[5][6]

Vocabulary growth

During its infancy, a child instinctively builds a vocabulary. Infants imitate words that they hear and then associate those words with objects and actions. This is the listening vocabulary. The speaking vocabulary follows, as a child's thoughts become more reliant on his/her ability to self-express without relying on gestures or babbling. Once the reading and writing vocabularies start to develop, through questions and education, the child starts to discover the anomalies and irregularities of language.

In first grade, a child who can read learns about twice as many words as one who cannot. Generally, this gap does not narrow later. This results in a wide range of vocabulary by age five or six, when an English-speaking child will have learned about 1500 words.[7]

Vocabulary grows throughout our entire life. Between the ages of 20 and 60, people learn some 6,000 more lemmas, or one every other day.[8] An average 20-year-old knows 42,000 words coming from 11,100 word families; an average 60-year-old knows 48,200 lemmas coming from 13,400 word families.[8] People expand their vocabularies by e.g. reading, playing word games, and participating in vocabulary-related programs. Exposure to traditional print media teaches correct spelling and vocabulary, while exposure to text messaging leads to more relaxed word acceptability constraints.[9]


  • An extensive vocabulary aids expression and communication.
  • Vocabulary size has been directly linked to reading comprehension.[10]
  • Linguistic vocabulary is synonymous with thinking vocabulary.[10]
  • A person may be judged by others based on his or her vocabulary.
  • Wilkins (1972) once said, "Without grammar, very little can be conveyed; without vocabulary, nothing can be conveyed."[11]

Vocabulary size

Native-language vocabulary

Estimating average vocabulary size poses various difficulties and limitations due to the different definitions and methods employed such as what is the word, what is to know a word, what sample dictionaries were used, how tests were conducted, and so on.[8][12][13][14] Native speakers' vocabularies also vary widely within a language, and are dependent on the level of the speaker's education.

As a result estimates vary from as little as 10,000 to as many as over 50,000 for young adult native speakers of English.[8][12][13][15]

One most recent 2016 study shows that 20-year-old English native speakers recognize on average 42,000 lemmas, ranging from 27,100 for the lowest 5% of the population to 51,700 lemmas for the highest 5%. These lemmas come from 6,100 word families in the lowest 5% of the population and 14,900 word families in the highest 5%. 60-year-olds know on average 6,000 lemmas more. [8]

According to another, earlier 1995 study junior-high students would be able to recognize the meanings of about 10,000–12,000 words, whereas for college students this number grows up to about 12,000–17,000 and for elderly adults up to about 17,000 or more.[16]

For native speakers of German average absolute vocabulary sizes range from 5,900 lemmas in first grade to 73,000 for adults.[17]

Foreign-language vocabulary

The effects of vocabulary size on language comprehension

The knowledge of the 3000 most frequent English word families or the 5000 most frequent words provides 95% vocabulary coverage of spoken discourse.[18] For minimal reading comprehension a threshold of 3,000 word families (5,000 lexical items) was suggested[19][20] and for reading for pleasure 5,000 word families (8,000 lexical items) are required.[21] An "optimal" threshold of 8,000 word families yields the coverage of 98% (including proper nouns).[20]

Second language vocabulary acquisition

Learning vocabulary is one of the first steps in learning a second language, but a learner never finishes vocabulary acquisition. Whether in one's native language or a second language, the acquisition of new vocabulary is an ongoing process. There are many techniques that help one acquire new vocabulary.


Although memorization can be seen as tedious or boring, associating one word in the native language with the corresponding word in the second language until memorized is considered one of the best methods of vocabulary acquisition. By the time students reach adulthood, they generally have gathered a number of personalized memorization methods. Although many argue that memorization does not typically require the complex cognitive processing that increases retention (Sagarra and Alba, 2006),[22] it does typically require a large amount of repetition, and spaced repetition with flashcards is an established method for memorization, particularly used for vocabulary acquisition in computer-assisted language learning. Other methods typically require more time and longer to recall.

Some words cannot be easily linked through association or other methods. When a word in the second language is phonologically or visually similar to a word in the native language, one often assumes they also share similar meanings. Though this is frequently the case, it is not always true. When faced with a false friend, memorization and repetition are the keys to mastery. If a second language learner relies solely on word associations to learn new vocabulary, that person will have a very difficult time mastering false friends. When large amounts of vocabulary must be acquired in a limited amount of time, when the learner needs to recall information quickly, when words represent abstract concepts or are difficult to picture in a mental image, or when discriminating between false friends, rote memorization is the method to use. A neural network model of novel word learning across orthographies, accounting for L1-specific memorization abilities of L2-learners has recently been introduced (Hadzibeganovic and Cannas, 2009).[23]

The Keyword Method

One useful method of building vocabulary in a second language is the keyword method. If time is available or one wants to emphasize a few key words, one can create mnemonic devices or word associations. Although these strategies tend to take longer to implement and may take longer in recollection, they create new or unusual connections that can increase retention. The keyword method requires deeper cognitive processing, thus increasing the likelihood of retention (Sagarra and Alba, 2006).[22] This method uses fits within Paivio's (1986)[24] dual coding theory because it uses both verbal and image memory systems. However, this method is best for words that represent concrete and imageable things. Abstract concepts or words that do not bring a distinct image to mind are difficult to associate. In addition, studies have shown that associative vocabulary learning is more successful with younger students (Sagarra and Alba, 2006).[22] Older students tend to rely less on creating word associations to remember vocabulary.

Word lists

Several word lists have been developed to provide people with a limited vocabulary either for the purpose of rapid language proficiency or for effective communication. These include Basic English (850 words), Special English (1,500 words), General Service List (2,000 words), and Academic Word List. Some learner's dictionaries have developed defining vocabularies which contain only most common and basic words. As a result word definitions in such dictionaries can be understood even by learners with a limited vocabulary.[25][26][27] Some publishers produce dictionaries based on word frequency[28] or thematic groups.[29][30][31]

The Swadesh list was made for investigation in linguistics.

Comparison of American and British English

The English language was first introduced to the Americas by British colonization, beginning in the late 16th and early 17th centuries. The language also spread to numerous other parts of the world as a result of British trade and colonisation and the spread of the former British Empire, which, by 1921, included about 470–570 million people, about a quarter of the world's population.

Over the past 400 years, the forms of the language used in the Americas—especially in the United States—and that used in the United Kingdom have diverged in a few minor ways, leading to the versions now often referred to as American English and British English. Differences between the two include pronunciation, grammar, vocabulary (lexis), spelling, punctuation, idioms, and formatting of dates and numbers. However, the differences in written and most spoken grammar structure tend to be much less than in other aspects of the language in terms of mutual intelligibility. A few words have completely different meanings in the two versions or are even unknown or not used in one of the versions. One particular contribution towards formalizing these differences came from Noah Webster, who wrote the first American dictionary (published 1828) with the intention of showing that people in the United States spoke a different dialect from those spoken in Britain, much like a regional accent.This divergence between American English and British English has provided opportunities for humorous comment: e.g. in fiction George Bernard Shaw says that the United States and United Kingdom are "two countries divided by a common language"; and Oscar Wilde says that "We have really everything in common with America nowadays, except, of course, the language" (The Canterville Ghost, 1888). Henry Sweet incorrectly predicted in 1877 that within a century American English, Australian English and British English would be mutually unintelligible (A Handbook of Phonetics). Perhaps increased worldwide communication through radio, television, the Internet and globalization has tended to reduce regional variation. This can lead to some variations becoming extinct (for instance the wireless being progressively superseded by the radio) or the acceptance of wide variations as "perfectly good English" everywhere.

Although spoken American and British English are generally mutually intelligible, there are occasional differences which might cause embarrassment—for example, in American English a rubber is usually interpreted as a condom rather than an eraser; and a British fanny refers to the female pubic area, while the American fanny refers to an ass (US) or an arse (UK).

Controlled vocabulary

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other forms of knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

Dutch language

Dutch (Nederlands ) is a West Germanic language spoken by around 23 million people as a first language and 5 million people as a second language, constituting the majority of people in the Netherlands (where it is the sole official language) and Belgium (as one of three official languages). It is the third most widely spoken Germanic language, after its close relatives English and German.

Outside the Low Countries, it is the native language of the majority of the population of Suriname where it also holds an official status, as it does in Aruba, Curaçao and Sint Maarten, which are constituent countries of the Kingdom of the Netherlands located in the Caribbean. Historical linguistic minorities on the verge of extinction remain in parts of France and Germany, and in Indonesia, while up to half a million native speakers may reside in the United States, Canada and Australia combined. The Cape Dutch dialects of Southern Africa have evolved into Afrikaans, a mutually intelligible daughter language which is spoken to some degree by at least 16 million people, mainly in South Africa and Namibia.Dutch is one of the closest relatives of both German and English and is colloquially said to be "roughly in between" them. Dutch, like English, has not undergone the High German consonant shift, does not use Germanic umlaut as a grammatical marker, has largely abandoned the use of the subjunctive, and has levelled much of its morphology, including most of its case system. Features shared with German include the survival of two to three grammatical genders—albeit with few grammatical consequences—as well as the use of modal particles, final-obstruent devoicing, and a similar word order. Dutch vocabulary is mostly Germanic and incorporates slightly more Romance loans than German but far fewer than English. As with German, the vocabulary of Dutch also has strong similarities with the continental Scandinavian languages, but is not mutually intelligible in text or speech with any of them.

English language

English is a West Germanic language that was first spoken in early medieval England and eventually became a global lingua franca. It is named after the Angles, one of the Germanic tribes that migrated to the area of Great Britain that later took their name, as England. Both names derive from Anglia, a peninsula in the Baltic Sea. The language is closely related to Frisian and Low Saxon, and its vocabulary has been significantly influenced by other Germanic languages, particularly Norse (a North Germanic language), and to a greater extent by Latin and French.English has developed over the course of more than 1,400 years. The earliest forms of English, a group of West Germanic (Ingvaeonic) dialects brought to Great Britain by Anglo-Saxon settlers in the 5th century, are collectively called Old English. Middle English began in the late 11th century with the Norman conquest of England; this was a period in which the language was influenced by French. Early Modern English began in the late 15th century with the introduction of the printing press to London, the printing of the King James Bible and the start of the Great Vowel Shift.Through the worldwide influence of the British Empire, and later the United States, Modern English has been spreading around the world since the 17th century. Through all types of printed and electronic media, and spurred by the emergence of the United States as a global superpower, English has become the leading language of international discourse and the lingua franca in many regions and professional contexts such as science, navigation and law.English is the third most-spoken native language in the world, after Standard Chinese and Spanish. It is the most widely learned second language and is either the official language or one of the official languages in almost 60 sovereign states. There are more people who have learned it as a second language than there are native speakers. English is the most commonly spoken language in the United Kingdom, the United States, Canada, Australia, Ireland and New Zealand, and it is widely spoken in some areas of the Caribbean, Africa and South Asia. It is a co-official language of the United Nations, the European Union and many other world and regional international organisations. It is the most widely spoken Germanic language, accounting for at least 70% of speakers of this Indo-European branch. English has a vast vocabulary, though counting how many words any language has is impossible. English speakers are called "Anglophones".

Modern English grammar is the result of a gradual change from a typical Indo-European dependent marking pattern with a rich inflectional morphology and relatively free word order to a mostly analytic pattern with little inflection, a fairly fixed SVO word order and a complex syntax. Modern English relies more on auxiliary verbs and word order for the expression of complex tenses, aspect and mood, as well as passive constructions, interrogatives and some negation. Despite noticeable variation among the accents and dialects of English used in different countries and regions—in terms of phonetics and phonology, and sometimes also vocabulary, grammar and spelling—English-speakers from around the world are able to communicate with one another with relative ease.


FYI is a common abbreviation of "For Your Information".

"FYI" is commonly used in e mail, instant messaging or memo and messages, typically in the message subject, to flag the message as an informational message, with the intent to communicate to the receiver that he/she may be interested in the topic, but is not required to perform any action. It is also commonly used in informal and business spoken conversations.

The usage of "FYI" dates back at least as far as 1941. A notable early usage occurs in an episode in 1959 of The Twilight Zone, "One for the Angels."

Prince Charming used the term in Shrek 2 (2004).

Among Internet Standards, FYIs are a subset of the Request for Comments (RFC) series.

The FYI series of notes is designed to provide Internet users with a central repository of information about any topics which relate to the Internet. FYIs topics may range from historical memos on "Why it was done that way" to answers to commonly asked operational questions.

Gene ontology

Gene ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies (OBO).Although gene nomenclature itself aims to maintain and develop controlled vocabulary of gene and gene products, the Gene Ontology extends the effort by using markup language to make the data (not only of the genes and their products but also of all their attributes) machine readable, and to do so in a way that is unified across all species (whereas gene nomenclature conventions vary by biologic taxon).

Glossary of comics terminology

Comics has developed specialized terminology. Several attempts have been made to formalise and define the terminology of comics by authors such as Will Eisner, Scott McCloud, R. C. Harvey and Dylan Horrocks. Much of the terminology in English is under dispute, so this page will list and describe the most common terms used in comics.

Japanese language

Japanese (日本語, Nihongo, [ɲihoŋɡo] (listen) or Japanese pronunciation: [ɲihoŋŋo]) is an East Asian language spoken by about 128 million people, primarily in Japan, where it is the national language. It is a member of the Japonic (or Japanese-Ryukyuan) language family, and its relation to other languages, such as Korean, is debated. Japanese has been grouped with language families such as Ainu, Austroasiatic, and the now-discredited Altaic, but none of these proposals has gained widespread acceptance.

Little is known of the language's prehistory, or when it first appeared in Japan. Chinese documents from the 3rd century recorded a few Japanese words, but substantial texts did not appear until the 8th century. During the Heian period (794–1185), Chinese had considerable influence on the vocabulary and phonology of Old Japanese. Late Middle Japanese (1185–1600) included changes in features that brought it closer to the modern language, and the first appearance of European loanwords. The standard dialect moved from the Kansai region to the Edo (modern Tokyo) region in the Early Modern Japanese period (early 17th century–mid-19th century). Following the end in 1853 of Japan's self-imposed isolation, the flow of loanwords from European languages increased significantly. English loanwords, in particular, have become frequent, and Japanese words from English roots have proliferated.

Japanese is an agglutinative, mora-timed language with simple phonotactics, a pure vowel system, phonemic vowel and consonant length, and a lexically significant pitch-accent. Word order is normally subject–object–verb with particles marking the grammatical function of words, and sentence structure is topic–comment. Sentence-final particles are used to add emotional or emphatic impact, or make questions. Nouns have no grammatical number or gender, and there are no articles. Verbs are conjugated, primarily for tense and voice, but not person. Japanese equivalents of adjectives are also conjugated. Japanese has a complex system of honorifics with verb forms and vocabulary to indicate the relative status of the speaker, the listener, and persons mentioned.

Japanese has no genetic relationship with Chinese, but it makes extensive use of Chinese characters, or kanji (漢字), in its writing system, and a large portion of its vocabulary is borrowed from Chinese. Along with kanji, the Japanese writing system primarily uses two syllabic (or moraic) scripts, hiragana (ひらがな or 平仮名) and katakana (カタカナ or 片仮名). Latin script is used in a limited fashion, such as for imported acronyms, and the numeral system uses mostly Arabic numerals alongside traditional Chinese numerals.

Korean language

The Korean language (South Korean: 한국어/韓國語 Hangugeo; North Korean: 조선말/朝鮮말 Chosŏnmal) is an East Asian language spoken by about 80 million people. It is a member of the Koreanic language family and is the official and national language of both Koreas: North Korea and South Korea, with different standardized official forms used in each territory. It is also one of the two official languages in the Yanbian Korean Autonomous Prefecture and Changbai Korean Autonomous County of Jilin province, China. Historical and modern linguists classify Korean as a language isolate; however, it does have a few extinct relatives, which together with Korean itself and the Jeju language (spoken in the Jeju Province and considered somewhat distinct) form the Koreanic language family. This implies that Korean is not an isolate, but a member of a micro-family. The idea that Korean belongs to the controversial Altaic language family is discredited in academic research. Korean is agglutinative in its morphology and SOV in its syntax.


A lexicon, word-hoard, wordbook, or word-stock is the vocabulary of a person, language, or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word "lexicon" derives from the Greek λεξικόν (lexicon), neuter of λεξικός (lexikos) meaning "of or for words."Linguistic theories generally regard human languages as consisting of two parts: a lexicon, essentially a catalogue of a language's words (its wordstock); and a grammar, a system of rules which allow for the combination of those words into meaningful sentences. The lexicon is also thought to include bound morphemes, which cannot stand alone as words (such as most affixes). In some analyses, compound words and certain classes of idiomatic expressions and other collocations are also considered to be part of the lexicon. Dictionaries represent attempts at listing, in alphabetical order, the lexicon of a given language; usually, however, bound morphemes are not included.

Madras Bashai

Madras Bashai (Tamil: மெட்ராஸ் பாஷ, translit. Meṭrās pāṣa, lit. 'Madras slang') is a pidgin language or a dialect of Tamil language heavily influenced by Indian English, Urdu and Hindi language spoken in the city of Chennai (previously known as Madras) in the Indian state of Tamil Nadu; it is not mutually intelligible with any of those except for Tamil, to a certain extent. The word bashai derives from the Sanskrit word bhasha, meaning "language", which means Mozhi (மொழி) in Tamil.

Madras Bashai evolved largely during the past three centuries. It grew in parallel with the growth of cosmopolitan Madras. After Madras Bashai became somewhat common in Madras, it became a source of satire for early Tamil films from the 1950s, in the form of puns and double entendres. Subsequent generations in Chennai identified with it and absorbed English constructs into the dialect, making it what it is today.

Maltese language

Maltese (Maltese: Malti) is the national language of Malta and a co-official language of the country alongside English, while also serving as an official language of the European Union, the only Semitic language so distinguished. Maltese is descended from Siculo-Arabic, the extinct variety of Arabic that developed in Sicily and was later introduced to Malta, between the end of the ninth century and the end of the twelfth century.Maltese has evolved independently of Classical Arabic and its varieties into a standardized language over the past 800 years in a gradual process of Latinisation. Maltese is therefore considered an exceptional descendant of Arabic that has no diglossic relationship with Classical or Modern Standard Arabic, and is classified separately from the Arabic macrolanguage. Maltese is also unique among Semitic languages since its morphology has been deeply influenced by Romance languages, namely Italian and Sicilian.The original Semitic base, Siculo-Arabic, comprises around one-third of the Maltese vocabulary, especially words that denote basic ideas and the function words, but about half of the vocabulary is derived from standard Italian and Sicilian; and English words make up between 6% and 20% of the vocabulary. A recent study shows that, in terms of basic everyday language, speakers of Maltese are able to understand less than a third of what is said to them in Tunisian Arabic, which is related to Siculo-Arabic, whereas speakers of Tunisian are able to understand about 40% of what is said to them in Maltese. This reported level of asymmetric intelligibility is considerably lower than the mutual intelligibility found between Arabic dialects.Maltese has always been written in the Latin script, the earliest surviving example dating from the late Middle Ages. It remains the only standardized Semitic language written in the Latin script.


Newspeak is the language of Oceania, a fictional totalitarian state and the setting of the novel Nineteen Eighty-Four (1949), by George Orwell. The ruling Party of Oceania created the Newspeak language to meet the ideological requirements of English Socialism (Ingsoc). Newspeak is a controlled language, of restricted grammar and limited vocabulary, meant to limit the freedom of thought—personal identity, self-expression, free will—that ideologically threatens the régime of Big Brother and the Party, who thus criminalized such concepts as thoughtcrime, contradictions of "Ingsoc orthodoxy".In "The Principles of Newspeak", the "appendix" to the novel, George Orwell explains that Newspeak usage follows most of the English grammar, yet is a language characterised by a continually diminishing vocabulary; complete thoughts reduced to simple terms of simplistic meaning. Linguistically, the contractions of Newspeak—Ingsoc (English Socialism), Minitrue (Ministry of Truth), etc.—derive from the syllabic abbreviations of Russian, which identify the government and social institutions of the Soviet Union, such as politburo (Politburo of the Central Committee of the Communist Party of the Soviet Union), Comintern (Communist International), kolkhoz (collective farm), and Komsomol (Young Communists' League). The long-term political purpose of the new language is for every member of the Party and society, except the Proles—the working-class of Oceania—to exclusively communicate in Newspeak, by the year A.D. 2050; during that 66-year transition, the usage of Oldspeak (Standard English) shall remain interspersed among Newspeak conversations.Newspeak is also a constructed language, of planned phonology, grammar, and vocabulary, like Basic English, which Orwell promoted (1942–44) during the Second World War (1939–45), and later rejected in the essay "Politics and the English Language" (1946), wherein he criticises the bad usage of English in his day: dying metaphors, pretentious diction, and high-flown rhetoric, which produce the meaningless words of doublespeak, the product of unclear reasoning. Orwell's conclusion thematically reiterates linguistic decline: "I said earlier that the decadence of our language is probably curable. Those who deny this may argue that language merely reflects existing social conditions, and that we cannot influence its development, by any direct tinkering with words or constructions."

Persian language

Persian (), also known by its endonym Farsi (فارسی fārsi [fɒːɾˈsiː] (listen)), is one of the Western Iranian languages within the Indo-Iranian branch of the Indo-European language family. It is primarily spoken in Iran, Afghanistan (officially known as Dari since 1958), and Tajikistan (officially known as Tajiki since the Soviet era), Uzbekistan and some other regions which historically were Persianate societies and considered part of Greater Iran. It is written right to left in the Persian alphabet, a modified variant of the Arabic script.

The Persian language is classified as a continuation of Middle Persian, the official religious and literary language of the Sasanian Empire, itself a continuation of Old Persian, the language of the Achaemenid Empire. Its grammar is similar to that of many contemporary European languages. A Persian-speaking person may be referred to as Persophone.There are approximately 110 million Persian speakers worldwide, with the language holding official status in Iran, Afghanistan, and Tajikistan. For centuries, Persian has also been a prestigious cultural language in other regions of Western Asia, Central Asia, and South Asia by the various empires based in the regions.Persian has had a considerable (mainly lexical) influence on neighboring languages, particularly the Turkic languages in Central Asia, Caucasus, and Anatolia, neighboring Iranian languages, as well as Armenian, Georgian, and Indo-Aryan languages, especially Urdu (a register of Hindustani). It also exerted some influence on Arabic, particularly Bahrani Arabic, while borrowing much vocabulary from it after the Arab conquest of Iran.With a long history of literature in the form of Middle Persian before Islam, Persian was the first language in the Muslim world to break through Arabic's monopoly on writing, and the writing of poetry in Persian was established as a court tradition in many eastern courts. Some of the famous works of Persian literature are the Shahnameh of Ferdowsi, the works of Rumi, the Rubaiyat of Omar Khayyam, the Panj Ganj of Nizami Ganjavi, the Divān of Hafez and the two miscellanea of prose and verse by Saadi Shirazi, the Gulistan and the Bustan.


A pidgin , or pidgin language, is a grammatically simplified means of communication that develops between two or more groups that do not have a language in common: typically, its vocabulary and grammar are limited and often drawn from several languages. It is most commonly employed in situations such as trade, or where both groups speak languages different from the language of the country in which they reside (but where there is no common language between the groups). Fundamentally, a pidgin is a simplified means of linguistic communication, as it is constructed impromptu, or by convention, between individuals or groups of people. A pidgin is not the native language of any speech community, but is instead learned as a second language.A pidgin may be built from words, sounds, or body language from a multitude of languages as well as onomatopoeia. As the lexicon of any pidgin will be limited to core vocabulary, words with only a specific meaning in lexifier language may acquire a completely new (or additional) meaning in the pidgin.

Pidgins have historically been considered a form of patois, unsophisticated simplified versions of their lexifiers, and as such usually have low prestige with respect to other languages. However, not all simplified or "unsophisticated" forms of a language are pidgins. Each pidgin has its own norms of usage which must be learned for proficiency in the pidgin.A pidgin differs from a creole, which is the first language of a speech community of native speakers that at one point arose from a pidgin. Unlike pidgins, creoles have fully developed vocabulary and patterned grammar. Most linguists believe that a creole develops through a process of nativization of a pidgin when children of acquired pidgin-speakers learn and use it as their native language.

Sino-Korean vocabulary

Sino-Korean vocabulary or Hanja-eo (Hangul: 한자어; Hanja: 漢字語) refers to Korean words of Chinese origin. Sino-Korean vocabulary includes words borrowed directly from Chinese, new Korean words created from Chinese characters, and words borrowed from Sino-Japanese vocabulary. About 60 percent of Korean words are of Chinese origin.

Union List of Artist Names

The Union List of Artist Names (ULAN) is a free online database of the Getty Research Institute using a controlled vocabulary, which by 2018 contained over 300,000 artists and over 720,000 names for them, as well as other information about artists. Names in ULAN may include given names, pseudonyms, variant spellings, names in multiple languages, and names that have changed over time (e.g., married names). Among these names, one is flagged as the preferred name.

Although it is displayed as a list, ULAN is structured as a thesaurus, compliant with ISO and NISO standards for thesaurus construction; it contains hierarchical, equivalence, and associative relationships.

The focus of each ULAN record is an artist. In the database, each artist record (also called a subject) is identified by a unique numeric ID. The artist's nationality is given, as are places and dates of birth and death (if known). Linked to each artist record are names, related artists, sources for the data, and notes. The temporal coverage of the ULAN ranges from Antiquity to the present and the scope is global.

Artists may be either individuals (persons) or groups of individuals working together (corporate bodies). Artists in the ULAN generally represent creators involved in the conception or production of visual arts and architecture. Some performance artists are included (but typically not actors, dancers, or other performing artists). Repositories and some donors are included as well.


Urdu (; Urdu: اُردُو‎ ALA-LC: Urdū [ˈʊrduː] (listen)) (also known as Lashkari, locally written لشکری)—or, more precisely, Modern Standard Urdu—is a Persianised standard register of the Hindustani language. It is the official national language and lingua franca of Pakistan. In India, it is one of the 22 official languages recognized in the Constitution of India, having official status in the six states of Jammu and Kashmir, Telangana, Uttar Pradesh, Bihar, Jharkhand and West Bengal, as well as the national capital territory of Delhi. It is a registered regional language of Nepal.

Apart from specialized vocabulary, spoken Urdu is mutually intelligible with Standard Hindi, another recognized register of Hindustani. The Urdu variant of Hindustani received recognition and patronage under British rule when the British replaced the local official languages with English and Hindustani written in Nastaʿlīq script, as the official language in North and Northwestern India. Religious, social, and political factors pushed for a distinction between Urdu and Hindi in India, leading to the Hindi–Urdu controversy.According to Nationalencyklopedin's 2010 estimates, Urdu is the 21st most spoken first language in the world, with approximately 66 million speakers. According to Ethnologue's 2017 estimates, Urdu, along with standard Hindi and the languages of the Hindi belt (as Hindustani), is the 3rd most spoken language in the world, with approximately 329.1 million native speakers, and 697.4 million total speakers.


Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The W3C's XML 1.0 Specification and several other related specifications—all of them free open standards—define XML.The design goals of XML emphasize simplicity, generality, and usability across the Internet. It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.

Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid the processing of XML data.

