Word count

The word count is the number of words in a document or passage of text. Word counting may be needed when a text is required to stay within certain numbers of words. This may particularly be the case in academia, legal proceedings, journalism and advertising. Word count is commonly used by translators to determine the price for the translation job. Word counts may also be used to calculate measures of readability and to measure typing and reading speeds (usually in words per minute). When converting character counts to words, a measure of 5 or 6 characters to a word is generally used for English.[1]

Details and variations of definition

Variations in the operational definitions of how to count the words can occur (namely, what "counts as" a word, and which words "don't count" toward the total). However, especially since the advent of widespread word processing, there is a broad consensus on these operational definitions (and hence the bottom-line integer result). The consensus is to accept the text segmentation rules generally found in most word processing software (including how word boundaries are determined, which depends on how word dividers are defined). The first trait of that definition is that a space (any of various whitespace characters, such as a "regular" word space, an em space, or a tab character) is a word divider. Usually a hyphen or a slash is, too. Different word counting programs may give varying results, depending on the text segmentation rule details, and on whether words outside the main text (such as footnotes, endnotes, or hidden text) are counted. But the behavior of most major word processing applications is broadly similar.

However, during the era when school assignments were done in handwriting or with typewriters, the rules for these definitions often differed from today's consensus. Most importantly, many students were drilled on the rule that "certain words don't count", usually articles (namely, "a", "an", "the"), but sometimes also others, such as conjunctions (for example, "and", "or", "but") and some prepositions (usually "to", "of"). Hyphenated permanent compounds such as "follow-up" (noun) or "long-term" (adjective) were counted as one word. To save the time and effort of counting word-by-word, often a rule of thumb for the average number of words per line was used, such as 10 words per line. These "rules" have fallen by the wayside in the word processing era; the "word count" feature of such software (which follows the text segmentation rules mentioned earlier) is now the standard arbiter, because it is largely consistent (across documents and applications) and because it is fast, effortless, and costless (already included with the application).

As for which sections of a document "count" toward the total (such as footnotes, endnotes, abstracts, reference lists and bibliographies, tables, figure captions, hidden text), the person in charge (teacher, client) can define their choice, and users (students, workers) can simply select (or exclude) the elements accordingly, and watch the word count automatically update.

Software

Modern web browsers support word counting via extensions, via a JavaScript bookmarklet, or a script that is hosted in a website. Most word processors can also count words. Unix-like systems include a program, wc, specifically for word counting. There are a wide variety of word counting tools available online.

As explained earlier, different word counting programs may give varying results, depending on the text segmentation rule details. The exact number of words often is not a strict requirement, thus the variation is acceptable.

In fiction

Novelist Jane Smiley suggests that length is an important quality of the novel.[2] However, novels can vary tremendously in length; Smiley lists novels as typically being between 100,000 and 175,000 words,[3] while National Novel Writing Month requires its novels to be at least 50,000 words. There are no firm rules: for example, the boundary between a novella and a novel is arbitrary and a literary work may be difficult to categorise.[4] But while the length of a novel is to a large extent up to its writer,[5] lengths may also vary by subgenre; many chapter books for children start at a length of about 16,000 words,[6] and a typical mystery novel might be in the 60,000 to 80,000 word range while a thriller could be well over 100,000 words.[7]

The Science Fiction and Fantasy Writers of America specifies word lengths for each category of its Nebula award categories:[8]

Classification Word count
Novel 40,000 words or over
Novella 17,500 to 39,999 words
Novelette 7,500 to 17,499 words
Short story under 7,500 words

In non-fiction

The acceptable length of an academic dissertation varies greatly, dependent predominantly on the subject. Numerous American universities limit Ph.D. dissertations to 100,000 words, barring special permission for exceeding this limit.[9]

See also

References

  1. ^ The Science Fiction and Fantasy Writers of America suggest 6 chars to a word
  2. ^ Smiley, Jane. 2005. Thirteen Ways of Looking at the Novel. NY: Alfred A. Knopf, p. 14.
  3. ^ Smiley, 2005, p. 15.
  4. ^ Edge, Tom, "Does Size Matter?" The Guardian (UK), Booksblog, Nov. 2, 2006. http://www.guardian.co.uk/books/booksblog/2006/nov/02/doessizematter
  5. ^ Quindlen, Anna (September 23, 2002), "Writers on Writing: The Eye of the Reporter, the Heart of the Novelist", New York Times, A novelist doesn't write to space, of course; 80,000 words, 100,000, it is up to the writer to say when the story is done..
  6. ^ Lamb, Nancy, Crafting Stories for Children. Cincinnati: Writer's Digest Books, p. 24
  7. ^ Thurston, Carol (August 3, 1997), "Agents give writers the book on what's hot and what's not", Austin American-Statesman, no one wants more than 60-80,000 words in a mystery, 110,000 for a thriller.
  8. ^ SFWA Awards FAQ, Science Fiction and Fantasy Writers of America as follows:
  9. ^ Dunleavy, Patrick (2003), Authoring a PhD, Palgrave Macmillan, p. 46, ISBN 978-1-4039-1191-9.

Sources

Apache Hive

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

Apache Pig

Apache Pig

is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for relational database management systems. Pig Latin can be extended using user-defined functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language.

BlueGriffon

BlueGriffon is a WYSIWYG content editor for the World Wide Web. It is based on the discontinued Nvu editor, which in turn is based on the Composer component of the Mozilla Application Suite. Powered by Gecko, the rendering engine of Firefox, it can edit Web pages in conformance to Web Standards. It runs on Microsoft Windows, macOS and Linux.

BlueGriffon complies with the W3C's web standards. It can create and edit pages in accordance to HTML 4, XHTML 1.1, HTML 5 and XHTML 5. It supports CSS 2.1 and all parts of CSS 3 already implemented by Gecko. BlueGriffon also includes SVG-edit, an XUL-based editor for SVG that is originally distributed as an add-on to Firefox and was adapted to BlueGriffon.

A version without the CSS Stylesheet editor is free to download and is available on Microsoft Windows, macOS and Linux.

Many enhancements are available via add-ons. Most add-ons such as 'Project Manager', 'CSS Stylesheet editor', 'MathML Editor', 'Word Count' and 'FullScreen view/edit' must be paid for, while only two ('FireFTP' and 'Dictionaries') are free to download.

Comte

Comte is the French, Catalan and Occitan form of the word 'count' (Latin: comes); comté is the Gallo-Romance form of the word 'county' (Latin: comitatus).

Comte or Comté may refer to:

a count in French, from Latin comes

A county in France, that is, the territory ruled by a count

La Comté, a commune in the Pas-de-Calais département of France

Comté cheese, a French cheese from Franche-Comté

Constitution of Monaco

The Constitution of Monaco, first adopted in 1911 after the Monégasque Revolution and heavily revised by Prince Rainier III on 17 December 1962, outlines three branches of government, including several administrative offices and a number of councils, who share advisory and legislative power with the Prince.

The constitution also defines the line of succession to the Monegasque throne; this section was modified on 2 April 2002.

By word count, it is the shortest constitution in the world currently in force.

Flash fiction

Flash fiction is a fictional work of extreme brevity that still offers character and plot development. Identified varieties, many of them defined by word count, include the six-word story; the 280-character story (also known as "twitterature"); the "dribble" (also known as the "minisaga," 50 words); the "drabble" (also known as "microfiction," 100 words); "sudden fiction" (750 words); flash fiction (1,000 words); and "micro-story".Some commentators have suggested that flash fiction possesses a unique literary quality in its ability to hint at or imply a larger story.

Gwrite

gwrite is an open-source styled text word processor for Linux. It uses a GTK+ interface and saves files in HTML5+CSS format. Images can be embedded using base64 encoding. It is available for installation in the Ubuntu and Debian repositories.

Features include:

Word count

Save to Microsoft Word container format, HTML5 format, or plain text format without formatting

Search and Replace

I Am a Dalek

I am a Dalek is a BBC Books original novella written by Gareth Roberts and based on the long-running British science fiction television series Doctor Who. It features the Tenth Doctor and Rose. This paperback is part of the Quick Reads Initiative sponsored by the UK government, to encourage literacy. It has a similar look to BBC Books' other new series adventures, except for its much shorter word count, being a paperback and not being numbered as part of the same series. To date it is the one of only five novels based upon the revived series that have not been published in hardcover. The others are: Made of Steel, published in March 2007, Revenge of the Judoon (March 2008), The Sontaran Games (February 2009) and Code of the Krillitanes (March 2010). These four books are also part of the Quick Reads Initiative.

Keyword density

Keyword density is the percentage of times a keyword or phrase appears on a web page compared to the total number of words on the page. In the context of search engine optimization, keyword density can be used to determine whether a web page is relevant to a specified keyword or keyword phrase.

In the late 1990s, the early days of search engines, keyword density was an important factor in page ranking. However, as webmasters discovered how to implement optimum keyword density, search engines began giving priority to other factors beyond the direct control of webmasters. Today, the overuse of keywords, a practice called keyword stuffing, will cause a web page to be penalized.

Many SEO experts consider the optimum keyword density to be 1 to 3 percent; more could be considered search spam. The formula to calculate your keyword density on a web page for SEO purposes is , where Nkr is how many times you repeated a specific keyword, and Tkn the total words in the analyzed text. The result is a keyword density value. When calculating keyword density, ignore html tags and other embedded tags which will not appear in the text of the page once published.

When calculating the density of a keyword phrase, the formula would be , where Nwp is the number of words in the phrase. So, for example, for a four-hundred word page about search engine optimization where "search engine optimization" is used four times, the keyword phrase density is (4*3/400)*100 or 3 percent.

From a mathematical viewpoint, the original concept of keyword density refers to the frequency (Nkr) of appearance of a keyword in a dissertation. A "keyword" consisting of multiple terms, e.g. "blue suede shoes," is an entity in itself. T frequency of the phrase "blue suede shoes" within a dissertation drives the key(phrase) density. It is "more" mathematically correct for a "keyphrase" to be calculated just like the original calculation, but considering the word group, "blue suede shoes," as a single appearance, not three:

Density = ( Nkr / Tkn ) * 100.

'Keywords' (kr) that consist of several words artificially inflate the total word count of the dissertation. The purest mathematical representation should adjust the total word count (Tkn) lower by removing the excess key(phrase) word counts from the total:

Density = ( Nkr / ( Tkn -( Nkr * ( Nwp-1 ) ) ) ) * 100. where Nwp = the number of terms in the keyphrase.

This general formula allows that the total word count will be unaffected if the key(phrase) is indeed a single term, so it acts as the original formula.

Beyond the formulas, keyword density can be measured in a push of a button online with tools that count the number of instances a keyphrase has been mentioned.

However, with the release of the Hummingbird update changed how Google evaluates content. Instead of that looking for exact-match keywords, Google now attempts to understand the intent behind a user’s query, and finds pages that match that intent. For example, rather than looking for instances of “ice cream parlor” on pages online, Google looks for pages that demonstrate qualities that an ice cream parlor would have, speaking contextually about ice cream parlors using natural, conversational language.

This implies that keyword inclusion isn’t nearly as important as simply writing about the right subjects—and relying on natural language to take care of the rest.

List of longest novels

This is a list of the longest novels over 500,000 words published through a mainstream publisher. The longest novel is Artamène ou le Grand Cyrus, originally published (1649–54) in ten parts, each part in three volumes. Artamène is generally attributed to Madeleine de Scudéry.Compiling a list of longest novels yields different results depending on whether pages, words or characters are counted. Length of a book is typically associated with its size—specifically page count—leading many to assume the largest and thickest book equates to its length. Word count is a direct way to measure the length of a novel in a manner unaffected by variations of format and page size; however different languages have words of different average lengths.

Made of Steel (novella)

Made of Steel is a BBC Books original novella written by Terrance Dicks and based on the long-running British science fiction television series Doctor Who. It features the Tenth Doctor and Martha. This paperback is part of the Quick Reads Initiative sponsored by the UK government, to encourage literacy. It has a similar look to BBC Books' other new series adventures, except for its much shorter word count, being a paperback and not being numbered as part of the same series. To date it is the one of only five novels based upon the revived series that have not been published in hardcover: the first, I am a Dalek, was published in May 2006; the third, Revenge of the Judoon, was published in March 2008; the fourth, The Sontaran Games, was published in February 2009; and the fifth, Code of the Krillitanes, was published in March 2010. These four books are also part of the Quick Reads Initiative.

Novella

A novella is a text of written, fictional, narrative prose normally longer than a short story but shorter than a novel, somewhere between 17,500 and 40,000 words.

The English word "novella" derives from the Italian novella, feminine of novello, which means "new". The novella is a common literary genre in several European languages.

Official Tournament and Club Word List

Official Tournament and Club Word List or Tournament Word List, referred to as OTCWL, OWL, or TWL, is the official word authority for tournament Scrabble in

the USA, Canada and Thailand. It is based on the Official Scrabble Players Dictionary (OSPD) with modifications to make it more suitable for tournament play. Its British counterpart is Collins Scrabble Words, and the combination of the two word lists is known as SOWPODS.

Paul Harland Prize

The Paul Harland Prize is the oldest annual award for original Dutch short science fiction, fantasy or horror stories. It is named after Dutch science fiction author Paul Harland, who died in 2003.

This award is for short stories and novelettes with a word count up to 10,000 words.

Pocket PC 2002

Pocket PC 2002, originally codenamed "Merlin", was released in October 2001. Like Pocket PC 2000, it was powered by Windows CE 3.0. Although targeted mainly for 240×320 (QVGA) Pocket PC devices, Pocket PC 2002 was also used for Pocket PC phones, and for the first time, Smartphones. These Pocket PC 2002 Smartphones were mainly GSM devices. With future releases, the Pocket PC and Smartphone lines would increasingly collide as the licensing terms were relaxed allowing OEMs to take advantage of more innovative, individual design ideas. Aesthetically, Pocket PC 2002 was meant to be similar in design to the then newly released Windows XP. Newly added or updated programs include Windows Media Player 8 with streaming capability; MSN Messenger, and Microsoft Reader 2, with digital rights management support. Upgrades to the bundled version of Office Mobile include a spell checker and word count tool in Pocket Word and improved Pocket Outlook. Connectivity was improved with file beaming on non-Microsoft devices such as Palm OS, the inclusion of Terminal Services and Virtual Private Networking support, and the ability to synchronize folders. Other upgrades include an enhanced UI with theme support and savable downloads and WAP in Pocket Internet Explorer.

Revenge of the Judoon

Revenge of the Judoon is a BBC Books original novella written by Terrance Dicks and based on the long-running British science fiction television series Doctor Who. It features the Tenth Doctor and his companion Martha Jones. This paperback is part of the Quick Reads Initiative sponsored by the UK government, to encourage literacy. It has a similar look to BBC Books' other new series adventures, except for its much shorter word count, being a paperback and not being numbered as part of the same series. To date it is the one of only five novels based upon the revived series that have not been published in hardcover: the first, I am a Dalek, was published in May 2006; the second, Made of Steel, was published in March 2007; the fourth, The Sontaran Games, was published in February 2009; and the fifth, Code of the Krillitanes, was published in March 2010. These four books are also part of the Quick Reads Initiative.

The book sees the return of the Judoon, who were first seen in the third series episode "Smith and Jones", and again in "The Stolen Earth". The presence of Martha Jones as the Doctor's companion indicates it takes place prior to the trilogy of episodes that ended the third season of the revived Doctor Who the previous year, in which Martha Jones departed as the Doctor's companion.

Tahir Hemphill

Tahir Hemphill (born May 14, 1972) is an African-American multimedia artist, scientist and designer of the Hip Hop Word Count database.

The Sontaran Games

The Sontaran Games is a BBC Books original novella written by Jacqueline Rayner and based on the long-running British science fiction television series Doctor Who. It features the Tenth Doctor as played by David Tennant. This paperback is part of the Quick Reads Initiative sponsored by the UK government, to encourage literacy. It has a similar look to BBC Books' other new series adventures, except for its much shorter word count, being a paperback and not being numbered as part of the same series. To date it is the one of only five novels based upon the revived series that have not been published in hardcover.

Wc (Unix)

wc (short for word count) is a command in Unix and Unix-like operating systems.

The program reads either standard input or a list of files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.