In traditional grammar, a part of speech (abbreviated form: PoS or POS) is a category of words (or, more generally, of lexical items) which have similar grammatical properties. Words that are assigned to the same part of speech generally display similar behavior in terms of syntax—they play similar roles within the grammatical structure of sentences—and sometimes in terms of morphology, in that they undergo inflection for similar properties.
Commonly listed English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, and sometimes numeral, article, or determiner. Other Indo-European languages also have essentially all these word classes; one exception to this generalization is that the Slavic languages as well as Latin and Sanskrit do not have articles. Beyond the Indo-European family, such other European languages as Hungarian and Finnish, both of which belong to the Uralic family, completely lack prepositions or have only very few of them; rather, they have postpositions.
Other terms than part of speech—particularly in modern linguistic classifications, which often make more precise distinctions than the traditional scheme does—include word class, lexical class, and lexical category. Some authors restrict the term lexical category to refer only to a particular type of syntactic category; for them the term excludes those parts of speech that are considered to be functional, such as pronouns. The term form class is also used, although this has various conflicting definitions. Word classes may be classified as open or closed: open classes (like nouns, verbs and adjectives) acquire new members constantly, while closed classes (such as pronouns and conjunctions) acquire new members infrequently, if at all.
Almost all languages have the word classes noun and verb, but beyond these two there are significant variations among different languages. For example,
Because of such variation in the number of categories and their identifying properties, analysis of parts of speech must be done for each individual language. Nevertheless, the labels for each category are assigned on the basis of universal criteria.
These four were grouped into two larger classes: inflectable (nouns and verbs) and uninflectable (pre-verbs and particles).
The ancient work on the grammar of the Tamil language, Tolkāppiyam, argued to have been written around 2,500 years ago, classifies Tamil words as peyar (பெயர்; noun), vinai (வினை; verb), idai (part of speech which modifies the relationships between verbs and nouns), and uri (word that further qualifies a noun or verb).
A century or two after the work of Nirukta, the Greek scholar Plato wrote in his Cratylus dialog that "... sentences are, I conceive, a combination of verbs [rhêma] and nouns [ónoma]". Aristotle added another class, "conjunction" [sýndesmos], which included not only the words known today as conjunctions, but also other parts (the interpretations differ; in one interpretation it is pronouns, prepositions, and the article).
The Latin grammarian Priscian (fl. 500 AD) modified the above eightfold system, excluding "article" (since the Latin language, unlike Greek, does not have articles), but adding "interjection".
The Latin names for the parts of speech, from which the corresponding modern English terms derive, were nomen, verbum, participium, pronomen, praepositio, adverbium, conjunctio and interjectio. The category nomen included substantives (nomen substantivum, corresponding to what are today called nouns in English), adjectives (nomen adjectivum) and numerals (nomen numerale). This is reflected in the older English terminology noun substantive, noun adjective and noun numeral. Later the adjective became a separate class, as often did the numerals, and the English word noun came to be applied to substantives only.
Works of English grammar generally follow the pattern of the European tradition as described above, except that participles are now usually regarded as forms of verbs rather than as a separate part of speech, and numerals are often conflated with other parts of speech: nouns (cardinal numerals, e.g., "one", and collective numerals, e.g., "dozen"), adjectives (ordinal numerals, e.g., "first", and multiplier numerals, e.g., "single") and adverbs (multiplicative numerals, e.g., "once", and distributive numerals, e.g., "singly"). Eight or nine parts of speech are commonly listed:
Some modern classifications define further classes in addition to these. For discussion see the sections below.
The classification below, or slight expansions of it, is still followed in most dictionaries:
English words are not generally marked as belonging to one part of speech or another; this contrasts with many other European languages, which use inflection more extensively, meaning that a given word form can often be identified as belonging to a particular part of speech and having certain additional grammatical properties. In English, most words are uninflected, while the inflected endings that exist are mostly ambiguous: -ed may mark a verbal past tense, a participle or a fully adjectival form; -s may mark a plural noun or a present-tense verb form; -ing may mark a participle, gerund, or pure adjective or noun. Although -ly is a frequent adverb marker, some adverbs (e.g. tomorrow, fast, very) do not have that ending, while some adjectives do have that ending (e.g. friendly, ugly, lovely).
Many English words can belong to more than one part of speech. Words like neigh, break, outlaw, laser, microwave, and telephone might all be either verbs or nouns. In certain circumstances, even words with primarily grammatical functions can be used as verbs or nouns, as in, "We must look to the hows and not just the whys." The process whereby a word comes to be used as a different part of speech is called conversion or zero derivation.
Linguists recognize that the above list of eight or nine word classes is drastically simplified. For example, "adverb" is to some extent a catch-all class that includes words with many different functions. Some have even argued that the most basic of category distinctions, that of nouns and verbs, is unfounded, or not applicable to certain languages. Modern linguists have proposed many different schemes whereby the words of English or other languages are placed into more specific categories and subcategories based on a more precise understanding of their grammatical functions.
Common lexical categories defined by function may include the following (not all of them will necessarily be applicable in a given language):
Within a given category, subgroups of words may be identified based on more precise grammatical properties. For example, verbs may be specified according to the number and type of objects or other complements which they take. This is called subcategorization.
Many modern descriptions of grammar include not only lexical categories or word classes, but also phrasal categories, used to classify phrases, in the sense of groups of words that form units having specific grammatical functions. Phrasal categories may include noun phrases (NP), verb phrases (VP) and so on. Lexical and phrasal categories together are called syntactic categories.
Word classes may be either open or closed. An open class is one that commonly accepts the addition of new words, while a closed class is one to which new items are very rarely added. Open classes normally contain large numbers of words, while closed classes are much smaller. Typical open classes found in English and many other languages are nouns, verbs (excluding auxiliary verbs, if these are regarded as a separate class), adjectives, adverbs and interjections. Ideophones are often an open class, though less familiar to English speakers,[a] and are often open to nonce words. Typical closed classes are prepositions (or postpositions), determiners, conjunctions, and pronouns.
The open–closed distinction is related to the distinction between lexical and functional categories, and to that between content words and function words, and some authors consider these identical, but the connection is not strict. Open classes are generally lexical categories in the stricter sense, containing words with greater semantic content, while closed classes are normally functional categories, consisting of words that perform essentially grammatical functions. This is not universal: in many languages verbs and adjectives are closed classes, usually consisting of few members, and in Japanese the formation of new pronouns from existing nouns is relatively common, though to what extent these form a distinct word class is debated.
Words are added to open classes through such processes as compounding, derivation, coining, and borrowing. When a new word is added through some such process, it can subsequently be used grammatically in sentences in the same ways as other words in its class. A closed class may obtain new items through these same processes, but such changes are much rarer and take much more time. A closed class is normally seen as part of the core language and is not expected to change. In English, for example, new nouns, verbs, etc. are being added to the language constantly (including by the common process of verbing and other types of conversion, where an existing word comes to be used in a different part of speech). However, it is very unusual for a new pronoun, for example, to become accepted in the language, even in cases where there may be felt to be a need for one, as in the case of gender-neutral pronouns.
The open or closed status of word classes varies between languages, even assuming that corresponding word classes exist. Most conspicuously, in many languages verbs and adjectives form closed classes of content words. An extreme example is found in Jingulu, which has only three verbs, while even the modern Indo-European Persian has no more than a few hundred simple verbs, a great deal of which are archaic. (Some twenty Persian verbs are used as light verbs to form compounds; this lack of lexical verbs is shared with other Iranian languages.) Japanese is similar, having few lexical verbs. Basque verbs are also a closed class, with the vast majority of verbal senses instead expressed periphrastically.
In Japanese, verbs and adjectives are closed classes, though these are quite large, with about 700 adjectives, and verbs have opened slightly in recent years. Japanese adjectives are closely related to verbs (they can predicate a sentence, for instance). New verbal meanings are nearly always expressed periphrastically by appending suru (する, to do) to a noun, as in undō suru (運動する, to (do) exercise), and new adjectival meanings are nearly always expressed by adjectival nouns, using the suffix -na (〜な) when an adjectival noun modifies a noun phrase, as in hen-na ojisan (変なおじさん, strange man). The closedness of verbs has weakened in recent years, and in a few cases new verbs are created by appending -ru (〜る) to a noun or using it to replace the end of a word. This is mostly in casual speech for borrowed words, with the most well-established example being sabo-ru (サボる, cut class; play hooky), from sabotāju (サボタージュ, sabotage). This recent innovation aside, the huge contribution of Sino-Japanese vocabulary was almost entirely borrowed as nouns (often verbal nouns or adjectival nouns). Other languages where adjectives are closed class include Swahili, Bemba, and Luganda.
By contrast, Japanese pronouns are open class—if they can even be considered a class—and nouns become used as pronouns with some frequency; a recent example jibun (自分, self), now used by some young men as a first-person pronoun. The status of Japanese pronouns as a distinct class is disputed, however, with some considering it only a use of nouns, not a distinct class. The case is similar in languages of Southeast Asia, including Thai and Lao, in which, like Japanese, pronouns and terms of address vary significantly based on relative social standing and respect.
Some word classes are universally closed, however, including demonstratives and interrogative words.
...the school tradition about parts of speech is so desperately impoverished
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and coreference resolution. These tasks are usually required to build more advanced text processing services.Article (grammar)
An article (with the linguistic glossing abbreviation ART) is a word that is used with a noun (as a standalone word or a prefix or suffix) to specify grammatical definiteness of the noun, and in some languages extending to volume or numerical scope.
The articles in English grammar are the and a/an, and in certain contexts some. "An" and "a" are modern forms of the Old English "an", which in Anglian dialects was the number "one" (compare "on" in Saxon dialects) and survived into Modern Scots as the number "owan". Both "on" (respelled "one" by the Norman language) and "an" survived into Modern English, with "one" used as the number and "an" ("a", before nouns that begin with a consonant sound) as an indefinite article.
In many languages, articles are a special part of speech which cannot be easily combined with other parts of speech. In English grammar, articles are frequently considered part of a broader category called determiners, which contains articles, demonstratives (such as "this" and "that"), possessive determiners (such as "my" and "his"), and quantifiers (such as "all" and "few"). Articles and other determiners are also sometimes counted as a type of adjective, since they describe the words that they precede.In languages that employ articles, every common noun, with some exceptions, is expressed with a certain definiteness, definite or indefinite, as an attribute (similar to the way many languages express every noun with a certain grammatical number—singular or plural—or a grammatical gender). Articles are among the most common words in many languages; in English, for example, the most frequent word is the.Articles are usually categorized as either definite or indefinite. A few languages with well-developed systems of articles may distinguish additional subtypes. Within each type, languages may have various forms of each article, due to conforming to grammatical attributes such as gender, number, or case. Articles may also be modified as influenced by adjacent sounds or words as in elision (e.g., French "le" becoming "l'" before a vowel), epenthesis (e.g., English "a" becoming "an" before a vowel), or contraction (e.g. Irish "i + na" becoming "sna").Cardinal number (linguistics)
In linguistics, more precisely in traditional grammar, a cardinal number or cardinal numeral (or just cardinal) is a part of speech used to count, such as the English words one, two, three, but also compounds, e.g. three hundred and forty-two (Commonwealth English) or three hundred forty-two (American English). Cardinal numbers are classified as definite numerals and are related to ordinal numbers, such as first, second, third, etc.Complementizer
In linguistics (especially generative grammar), complementizer or complementiser (glossing abbreviation: comp) is a lexical category (part of speech) that includes those words that can be used to turn a clause into the subject or object of a sentence. For example, the word that may be called a complementizer in English sentences like Mary believes that it is raining. The concept of complementizers is specific to certain modern grammatical theories; in traditional grammar, such words are normally considered conjunctions.
The standard abbreviation for complementizer is C. The complementizer is often held to be the syntactic head of a full clause, which is therefore often represented by the abbreviation CP (for complementizer phrase). Evidence that the complementizer functions as the head of its clause includes that it is commonly the last element in a clause in head-final languages like Korean or Japanese, in which other heads follow their complements, whereas it appears at the start of a clause in head-initial languages such as English, where heads normally precede their complements.Conjunction (grammar)
In grammar, a conjunction (abbreviated CONJ or CNJ) is a part of speech that connects words, phrases, or clauses that are called the conjuncts of the conjunctions. The term discourse marker is mostly used for conjunctions joining sentences. This definition may overlap with that of other parts of speech, so what constitutes a "conjunction" must be defined for each language. In English a given word may have several senses, being either a preposition or a conjunction depending on the syntax of the sentence (for example, "after" being a preposition in "he left after the fight" versus it being a conjunction in "he left after they fought"). In general, a conjunction is an invariable (noninflected) grammatical particle and it may or may not stand between the items conjoined.
The definition may also be extended to idiomatic phrases that behave as a unit with the same function, e.g. "as well as", "provided that".
A simple literary example of a conjunction: "the truth of nature, and the power of giving interest". (Samuel Taylor Coleridge's Biographia Literaria)Conjunctions may be placed at the beginning of sentences: "But some superstition about the practice persists".Grammatical particle
In grammar the term particle (abbreviated PTCL) has a traditional meaning, as a part of speech that cannot be inflected, and a modern meaning, as a function word associated with another word or phrase to impart meaning.Interjection
An interjection is a word or expression that occurs as an utterance on its own and expresses a spontaneous feeling or reaction. The category is quite heterogeneous, and includes such things as exclamations (ouch!, wow!), curses (damn!), greetings (hey, bye), response particles (okay, oh!, m-hm, huh?), hesitation markers (uh, er, um) and other words (stop, cool). Due to its heterogeneous nature, the category of interjections partly overlaps with categories like profanities, discourse markers and fillers. The use and linguistic discussion of interjections can be traced historically through the Greek and Latin Modistae over many centuries.Kiten (program)
Kiten is a Japanese Kanji learning tool and reference for the KDE Software Compilation, specifically, in the kdeedu package. It also works as a Japanese-to-English and English-to-Japanese dictionary. The user can input words into a search box, and all related Kanji are returned with their meaning and part of speech. Kanji can be filtered by rarity and part of speech. A list of Kanji is also available which sorts characters by grade level and stroke number. Selecting one shows its Onyomi, Kunyomi, and meanings. Users can also add Kanji to their "learn list" and get simple flashcard quizzes where the Kanji is displayed along with possible meanings to choose from.The program was available only for Linux operating systems, but with the beta release of KDE for Windows, it is now available on Microsoft Windows.Lemmatisation
Lemmatisation (or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form.In computational linguistics, lemmatisation is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatisation depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an entire document. As a result, developing efficient lemmatisation algorithms is an open area of research.Nominative case
The nominative case (abbreviated NOM), subjective case, straight case or upright case is one of the grammatical cases of a noun or other part of speech, which generally marks the subject of a verb or the predicate noun or predicate adjective, as opposed to its object or other verb arguments. Generally, the noun "that is doing something" is in the nominative, and the nominative is often the form listed in dictionaries.Noun
A noun (from Latin nōmen, literally meaning "name") is a word that functions as the name of some specific thing or set of things, such as living creatures, objects, places, actions, qualities, states of existence, or ideas. Linguistically, a noun is a member of a large, open part of speech whose members can occur as the main word in the subject of a clause, the object of a verb, or the object of a preposition.Lexical categories (parts of speech) are defined in terms of the ways in which their members combine with other kinds of expressions. The syntactic rules for nouns differ from language to language. In English, nouns are those words which can occur with articles and attributive adjectives and can function as the head of a noun phrase.Ordinal number (linguistics)
In linguistics, ordinal numbers (or ordinal numerals) are words representing position or rank in a sequential order; the order may be of size, importance, chronology, and so on (e.g., "third", "tertiary"). They differ from cardinal numerals, which represent quantity (e.g., "three") and other types of numerals. In traditional grammar, all numerals, including ordinal numerals, are grouped into a separate part of speech (Latin: nomen numerale, hence, "noun numeral" in older English grammar books); however, in modern interpretations of English grammar, ordinal numerals are usually conflated with adjectives.
Ordinal numbers may be written in English with numerals and letter suffixes: 1st, 2nd or 2d, 3rd or 3d, 4th, 11th, 21st, 101st, 477th, etc., with the suffix acting as an ordinal indicator. Written dates often omit the suffix, although it is nevertheless pronounced. For example: 5 November 1605 (pronounced "the fifth of November ... "); November 5, 1605, ("November Fifth ..."). When written out in full with "of", however, the suffix is retained: the 5th of November. In other languages, different ordinal indicators are used to write ordinal numbers.
In American Sign Language, the ordinal numbers first through ninth are formed with handshapes similar to those for the corresponding cardinal numbers with the addition of a small twist of the wrist.Part-of-speech tagging
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph.
A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.
Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.Quranic Arabic Corpus
The Quranic Arabic Corpus is an annotated linguistic resource consisting of 77,430 words of Quranic Arabic. The project aims to provide morphological and syntactic annotations for researchers wanting to study the language of the Quran.Synonym
A synonym is a word or phrase that means exactly or nearly the same as another lexeme (word or phrase) in the same language. Words that are synonyms are said to be synonymous, and the state of being a synonym is called synonymy. For example, the words begin, start, commence, and initiate are all synonyms of one another. Words are typically synonymous in one particular sense: for example, long and extended in the context long time or extended time are synonymous, but long cannot be used in the phrase extended family. Synonyms with the exact same meaning share a seme or denotational sememe, whereas those with inexactly similar meanings share a broader denotational or connotational sememe and thus overlap within a semantic field. The former are sometimes called cognitive synonyms and the latter, near-synonyms, plesionyms or poecilonyms.Text corpus
In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.Verb
A verb, from the Latin verbum meaning word, is a word (part of speech) that in syntax conveys an action (bring, read, walk, run, learn), an occurrence (happen, become), or a state of being (be, exist, stand). In the usual description of English, the basic form, with or without the particle to, is the infinitive. In many languages, verbs are inflected (modified in form) to encode tense, aspect, mood, and voice. A verb may also agree with the person, gender or number of some of its arguments, such as its subject, or object. Verbs have tenses: present, to indicate that an action is being carried out; past, to indicate that an action has been done; future, to indicate that an action will be done.Wiktionary
Wiktionary is a multilingual, web-based project to create a free content dictionary of all words in all languages. It is collaboratively edited via a wiki, and its name is a portmanteau of the words wiki and dictionary. It is available in 171 languages and in Simple English. Like its sister project Wikipedia, Wiktionary is run by the Wikimedia Foundation, and is written collaboratively by volunteers, dubbed "Wiktionarians". Its wiki software, MediaWiki, allows almost anyone with access to the website to create and edit entries.
Because Wiktionary is not limited by print space considerations, most of Wiktionary's language editions provide definitions and translations of words from many languages, and some editions offer additional information typically found in thesauri and lexicons. The English Wiktionary includes a thesaurus (formerly known as Wikisaurus) of synonyms of various words.
Wiktionary data are frequently used in various natural language processing tasks.
Lexical categories and their features