reCAPTCHA is a CAPTCHA-like system designed to establish that a computer user is human (normally in order to protect websites from bots) and, at the same time, assist in the digitization of books. reCAPTCHA was originally developed by Luis von Ahn, David Abraham, Manuel Blum, Michael Crawford, Ben Maurer, Colin McMillen, and Edison Tan at Carnegie Mellon University's main Pittsburgh campus.[1] It was acquired by Google in September 2009.[2]

reCAPTCHA has completely digitized the archives of The New York Times and books from Google Books, as of 2011.[3] The archive can be searched from the New York Times Article Archive, where more than 13 million articles in total have been archived, dating from 1851 to the present day.[4] Through mass collaboration, reCAPTCHA was helping to digitize books that are too illegible to be scanned by computers, as well as translate books to different languages, as of 2015.[5]

The system has been reported as displaying over 100 million CAPTCHAs every day,[6] on sites such as Facebook, TicketMaster, Twitter, 4chan,, StumbleUpon,[7] Craigslist (since June 2008),[8] and the U.S. National Telecommunications and Information Administration's digital TV converter box coupon program website (as part of the US DTV transition).[9]

reCAPTCHA's slogan was "Stop Spam, Read Books.".[10] After the introduction of a new version of the reCAPTCHA plugin in 2014, the slogan is now "Easy on Humans, Hard on Bots."[11] A new system featuring image verification was also introduced. In this system, users are asked to just click on a checkbox (the system will verify whether the user is a human or not, for example, with some clues such as already-known cookies or mouse movements within the ReCAPTCHA frame) or, if it fails, select one or more images from a selection of set of images.[12] In 2018, Google started beta testing a completely invisible reCAPTCHA system which does not present any human verification visually. Instead, the new system actively monitors user actions across the entire property and returns a score which represents the probability if it is a human or a bot.[13]

Original author(s)
Initial releaseMay 27, 2007
TypeClassic version: CAPTCHA
New version: checkbox


Distributed Proofreaders was the first project to volunteer its time to decipher scanned text that could not be read by OCR. It works with Project Gutenberg to digitize public domain material and uses methods quite different from reCAPTCHA.

The reCAPTCHA program originated with Guatemalan computer scientist Luis von Ahn,[14] and was aided by a MacArthur Fellowship. An early CAPTCHA developer, he realized "he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles".[15][16]


An example of how a reCAPTCHA challenge looked in 2007,[17] containing the words "following finding". The waviness and horizontal stroke were added to increase the difficulty of breaking the CAPTCHA with a computer program.

Scanned text is subjected to analysis by two different optical character recognition programs. Their respective outputs are then aligned with each other by standard string-matching algorithms and compared both to each other and to an English dictionary. Any word that is deciphered differently by both OCR programs or that is not in the English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed, out of context, sometimes along with a control word already known. If the human types the control word correctly, then the response to the questionable word is accepted as probably valid. If enough users were to correctly type the control word, but incorrectly type the second word which OCR had failed to recognize, then the digital version of documents could end up containing the incorrect word. The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 points, the word is considered valid. Those words that are consistently given a single identity by human judges are later recycled as control words.[18] If the first three guesses match each other but do not match either of the OCRs, they are considered a correct answer, and the word becomes a control word.[19] When six users reject a word before any correct spelling is chosen, the word is discarded as unreadable.[19]

The original reCAPTCHA method was designed to show the questionable words separately, as out-of-context correction, rather than in use, such as within a phrase of five words from the original document.[20] Also, the control word might mislead context for the second word, such as a request of "/metal/ /fife/" being entered as "metal file" due to the logical connection of filing with a metal tool being considered more common than the musical instrument "fife".

In 2012, reCAPTCHA began using photographs of house numbers taken from Google's Street View project, in addition to scanned words.[21]

Images Recaptcha
Image identification captcha

In 2014, reCAPTCHA implemented another system in which users are asked to select one or more images from a selection of nine images.[12]

In 2017, reCAPTCHA was improved to require no interaction for most users.[22]



In 2014, reCAPTCHA began implementing behavioral analysis of the browser's interactions with the CAPTCHA to predict whether the user was a human or a bot before displaying the captcha, and presenting a "considerably more difficult" captcha in cases where it had reason to think the user might be a bot. By end of 2014 this mechanism started to be rolled out to most of the public Google services.[23] Because NoCAPTCHA relies on the use of Google cookies that are at least a few weeks old, reCAPTCHA has become nearly impossible to complete for people who frequently clear their cookies.

In 2017, Google improved this mechanism, calling it an "invisible reCAPTCHA". According to former Google "click fraud czar" Shuman Ghosemajumder, this capability "creates a new sort of challenge that very advanced bots can still get around, but introduces a lot less friction to the legitimate human."[24]


The reCAPTCHA tests are displayed from the central site of the reCAPTCHA project, which supplies the words to be deciphered. This is done through a JavaScript API with the server making a callback to reCAPTCHA after the request has been submitted. The reCAPTCHA project provides libraries for various programming languages and applications to make this process easier. reCAPTCHA is a free-of-charge service provided to websites for assistance with the decipherment,[25] but the reCAPTCHA software is not open-source.[26]

Also, reCAPTCHA offers plugins for several web-application platforms including ASP.NET, Ruby, and PHP, to ease the implementation of the service.[27]


Some have criticized Google for using reCAPTCHA as a source of unpaid labor.[28] They say Google is unfairly using people around the world to help it transcribe books, addresses, and newspapers, and label image data for its driverless car effort without any compensation.[29] In response to this criticism, competitors have emerged promising unique value propositions like compensating the website host or end-user for the work they do.[30]

The use of reCAPTCHA has been labelled "a serious barrier to internet use" for people with sight problems or disabilities such as dyslexia by BBC journalist Stephanie Hegarty.[31]

reCAPTCHA is also a barrier to Internet use in areas of the world where there is heavy Internet censorship and the underlying enabling sites are blocked.

Software engineer Andrew Munsell, in his article "Captchas Are Becoming Ridiculous", states "A couple of years ago, I don’t remember being truly baffled by a captcha. In fact, reCAPTCHA was one of the better systems I’d seen. It wasn’t difficult to solve, and it seemed to work when I used it on my own websites." [32] Munsell goes on to state, after encountering a series of unintelligible images that despite refreshing "Again, and again, and again. The captchas were not only difficult for a computer to read, but impossible for a human." Munsell then provided numerous examples.


An example of how reCAPTCHA challenges were presented in 2010,[33] containing the words "and chisels"

The main purpose of a CAPTCHA system is to prevent automated access to a system by computer programs or "bots". On December 14, 2009, Jonathan Wilkins released a paper describing weaknesses in reCAPTCHA that allowed a solve rate of 18%.[34][35][36]

On August 1, 2010, Chad Houck gave a presentation to the DEF CON 18 Hacking Conference detailing a method to reverse the distortion added to images which allowed a computer program to determine a valid response 10% of the time.[37][38] The reCAPTCHA system was modified on July 21, 2010, before Houck was to speak on his method. Houck modified his method to what he described as an "easier" CAPTCHA to determine a valid response 31.8% of the time. Houck also mentioned security defenses in the system, including a high security lock out if an invalid response is given 32 times in a row.[39]

On May 26, 2012, Adam, C-P and Jeffball of DC949 gave a presentation at the LayerOne hacker conference detailing how they were able to achieve an automated solution with an accuracy rate of 99.1%.[40] Their tactic was to use techniques from machine learning, a subfield of artificial intelligence, to analyse the audio version of reCAPTCHA which is available for the visually impaired. Google released a new version of reCAPTCHA just hours before their talk, making major changes to both the audio and visual versions of their service. In this release, the audio version was increased in length from 8 seconds to 30 seconds, and is much more difficult to understand, both for humans as well as bots. In response to this update and the following one, the members of DC949 released two more versions of Stiltwalker which beat reCAPTCHA with an accuracy of 60.95% and 59.4% respectively. After each successive break, Google updated reCAPTCHA within a few days. According to DC949, they often reverted to features that had been previously hacked.

On June 27, 2012, Claudia Cruz, Fernando Uceda, and Leobardo Reyes (a group of students from Mexico) published a paper showing a system running on reCAPTCHA images with an accuracy of 82%.[41] The authors have not said if their system can solve recent reCAPTCHA images, although they claim their work to be intelligent OCR and robust to some, if not all changes in the image database.

In an August 2012 presentation given at BsidesLV 2012, DC949 called the latest version "unfathomably impossible for humans" – they were not able to solve them manually either.[40] The web accessibility organization WebAIM reported in May 2012, "Over 90% of respondents [screen reader users] find CAPTCHA to be very or somewhat difficult."[42]

reCAPTCHA frequently modifies its system, requiring spammers to frequently update their methods of decoding, which may frustrate potential abusers.

Only words that both OCR programs failed to recognize are used as control words. Thus, any program that can recognize these words with nonnegligible probability would represent an improvement over state of the art OCR programs.[19]

Derivative projects

reCAPTCHA had also created project Mailhide, which protects email addresses on web pages from being harvested by spammers.[43] By default, the email address was converted into a format that did not allow a crawler to see the full email address; for example, "" would have been converted to "". The visitor would then click on the "..." and solve the CAPTCHA in order to obtain the full email address. One could also edit the pop-up code so that none of the address was visible. Mailhide has been discontinued in 2018 because it relied on reCAPTCHA V1.[44]

Automated solvers

In response to the difficulty for users with disabilities and regular users alike, automated solvers such as Buster have been created, which solve the reCAPTCHA for the user, without them having to complete a challenge. Buster uses the audio part of reCAPTCHA and solves that instead of selecting visual elements, and can be installed as a browser add-on.


  1. ^ "reCAPTCHA: About Us". Archived from the original on 2010-06-11. Retrieved 2018-08-14.
  2. ^ "Teaching computers to read: Google acquires reCAPTCHA". Retrieved 2009-09-16.
  3. ^ "Deciphering Old Texts, One Woozy, Curvy Word at a Time". The New York Times. March 28, 2011. Retrieved November 20, 2017.
  4. ^ "New York Times Article Archive". The New York Times. September 25, 2007. ISSN 0362-4331. Retrieved 2017-11-21.
  5. ^ "Massive-scale online collaboration". Retrieved 2015-10-24.
  6. ^ "reCAPTCHA FAQ". Retrieved 2011-06-12.
  7. ^ Rubens, Paul (October 2, 2007). "Spam weapon helps preserve books". BBC.
  8. ^ "Fight Spam, Digitize Books". Craigslist Blog. June 2008.
  9. ^ "TV Converter Box Program". Archived from the original on November 4, 2009.
  10. ^ "reCAPTCHA: Stop Spam, Read Books". Retrieved 2013-07-10.
  11. ^ "reCAPTCHA: Easy on Humans, Hard on Bots". Retrieved 2018-02-01.
  12. ^ a b Greenberg, Andy (December 3, 2014). "Google Can Now Tell You're Not a Robot with Just One Click". Wired. Retrieved October 1, 2015.
  13. ^ "Google no Captcha + INVISIBLE reCaptcha – First Experience Results Review". Retrieved 2019-01-11.
  14. ^ ""Full Interview: Luis von Ahn on Duolingo", Spark, November 2011". Canadian Broadcasting Corporation. November 30, 2011. Retrieved 2013-07-10.
  15. ^ Hutchinson, Alex (March 2009). "Human Resources: The job you didn't even know you had". The Walrus. pp. 15–16.
  16. ^ Hutchinson, Alex (2009-03-12). "Human Resources: The job you didn't even know you had". The Walrus. Retrieved December 7, 2015.
  17. ^ Contributor. "reCAPTCHA: Using Captchas To Digitize Books – TechCrunch".
  18. ^ Timmer, John (August 14, 2008). "CAPTCHAs work? for digitizing old, damaged texts, manuscripts". Ars Technica. Retrieved 2008-12-09.
  19. ^ a b c Luis; Maurer, Ben; McMillen, Colin; Abraham, David; Blum, Manuel (2008). "reCAPTCHA: Human-Based Character Recognition via Web Security Measures" (PDF)". Science. 321 (5895): 1465–1468. CiteSeerX doi:10.1126/science.1160379. PMID 18703711.
  20. ^ ""questionable validity of results if words are presented out of context", Google Groups, August 29, 2008". Retrieved 2013-07-10.
  21. ^ "Google Now Using ReCAPTCHA To Decode Street View Addresses". TechCrunch. March 29, 2012. Retrieved 2013-07-10.
  22. ^ "Digital Certification: The Digital Rating For Websites". Digital Certification | Blog. March 14, 2017. Retrieved 2017-03-14.
  23. ^ "Are you a robot? Introducing "No CAPTCHA reCAPTCHA"". December 3, 2014. Retrieved 2015-04-14.
  24. ^ "Google just made the internet a tiny bit less annoying". Popular Science. March 10, 2017. Retrieved 2017-04-05.
  25. ^ "FAQ". Archived from the original on July 16, 2012.
  26. ^ "reCAPTCHA: Stop Spam, Read Books". Retrieved 2014-01-14.
  27. ^ "Developer's Guide – reCAPTCHA — Google Developers". Retrieved 2014-01-14.
  28. ^ "Massachusetts woman's lawsuit accuses Google of using free labor to transcribe books, newspapers". Boston Business Journal.
  29. ^ "Google is Using You". Hacker Noon. 2018-08-20.
  30. ^ "hCaptcha". hCaptcha.
  31. ^ Hegarty, Stephanie (2012-06-20). "BBC News – The evolution of those annoying online security tests". BBC News. Retrieved 2014-09-22.
  32. ^ "Captchas Are Becoming Ridiculous | Andrew Munsell". Retrieved 2014-09-22.
  33. ^ Firewall, The. "Those Scrambled Word Tests For Stopping Spambots Are Tough For Humans Too".
  34. ^ "Strong CAPTCHA Guidelines" (PDF).
  35. ^ "Google's reCAPTCHA busted by new attack".
  36. ^ "Google's reCAPTCHA dented".
  37. ^ "Def Con 18 Speakers".
  38. ^ "Decoding reCAPTCHA Paper". Chad Houck. Archived from the original on August 19, 2010.
  39. ^ "Decoding reCAPTCHA Power Point". Chad Houck. Archived from the original on October 24, 2010.
  40. ^ a b "Project Stiltwalker".
  41. ^ Claudia Cruz-Perez; Oleg Starostenko; Fernando Uceda-Ponga; Vicente Alarcon-Aquino; Leobardo Reyes-Cabrera (June 30, 2012). "Breaking reCAPTCHAs with Unpredictable Collapse: Heuristic Character Segmentation and Recognition". In Carrasco-Ochoa, Jesús Ariel; Martínez-Trinidad, José Francisco; Olvera López, José Arturo; Boyer, Kim L. Pattern Recognition. Lecture Notes in Computer Science. 7329. México. pp. 155–165. doi:10.1007/978-3-642-31149-9_16. ISBN 978-3-642-31148-2.
  42. ^ "Screen Reader User Survey #4 Results".
  43. ^ "Mailhide: Free Spam Protection".
  44. ^ "Mailhide: Service discontinued".

Further reading

External links

Adobe Muse

Adobe Muse is a website builder that allows designers to create fixed, fluid, and adaptive websites without having to write any code. Muse generates static websites giving users the freedom to host their sites with any hosting provider. Users can add more advanced functionality such as blogging and eCommerce to their website with plugins created by third-party developers. This application is available through Adobe's Creative Cloud subscription. Muse will be discontinued as of March 2020, with the last feature improvements having appeared in March 2018.

Android Q

Android "Q" is the upcoming tenth major release and the 17th version of the Android mobile operating system. The first beta of Android Q was released on March 13, 2019 for all Google Pixel phones. The final release of Android Q is scheduled to be released in the third quarter of 2019.


BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is a serverless Platform as a Service (PaaS) that may be used complementarily with MapReduce.


A CAPTCHA (, an acronym for "completely automated public Turing test to tell computers and humans apart") is a type of challenge–response test used in computing to determine whether or not the user is human.The term was coined in 2003 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. The most common type of CAPTCHA (displayed as Version 1.0) was first invented in 1997 by two groups working in parallel. This form of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, a CAPTCHA is sometimes described as a reverse Turing test.This user identification procedure has received many criticisms, especially from people with disabilities, but also from other people who feel that their everyday work is slowed down by distorted words that are difficult to read. It takes the average person approximately 10 seconds to solve a typical CAPTCHA.


The Chromebit is a dongle running Google's Chrome OS operating system. When placed in the HDMI port of a television or a monitor, this device turns that display into a personal computer. Chromebit allows adding a keyboard or mouse over Bluetooth or Wi-Fi. The device was announced in April 2015 and began shipping that November.

G Suite Marketplace

G Suite Marketplace (formerly Google Apps Marketplace) is a product of Google Inc. It is an online store for web applications that work with Google Apps (Gmail, Google Docs, Google Sites, Google Calendar, Google Contacts, etc.) and with third party software. Some Apps are free. Apps are based on Google APIs or on Google Apps Script.

Google Behind the Screen

"Google: Behind the Screen" (Dutch: "Google: achter het scherm") is a 51-minute episode of the documentary television series Backlight about Google. The episode was first broadcast on 7 May 2006 by VPRO on Nederland 3. It was directed by IJsbrand van Veelen, produced by Nicoline Tania, and edited by Doke Romeijn and Frank Wiering.

Google Dataset Search

Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use. The company launched the service on September 5, 2018, and stated that the product was targeted at scientists and data journalists.

Google Dataset Search complements Google Scholar, the company's search engine for academic studies and reports.

Google Finance

Google Finance is a website focusing on business news and financial information hosted by Google.

Google Fit

Google Fit is a health-tracking platform developed by Google for the Android operating system and Wear OS. It is a single set of APIs that blends data from multiple apps and devices. Google Fit uses sensors in a user's activity tracker or mobile device to record physical fitness activities (such as walking or cycling), which are measured against the user's fitness goals to provide a comprehensive view of their fitness.

Google Forms

Google Forms is a survey administration app that is included in the Google Drive office suite along with Google Docs, Google Sheets, and Google Slides.

Forms features all of the collaboration and sharing features found in Docs, Sheets, and Slides.

Google Guice

Google Guice (pronounced "juice") is an open-source software framework for the Java platform released by Google under the Apache License. It provides support for dependency injection using annotations to configure Java objects. Dependency injection is a design pattern whose core principle is to separate behavior from dependency resolution.

Guice allows implementation classes to be bound programmatically to an interface, then injected into constructors, methods or fields using an @Inject annotation. When more than one implementation of the same interface is needed, the user can create custom annotations that identify an implementation, then use that annotation when injecting it.

Being the first generic framework for dependency injection using Java annotations in 2008, Guice won the 18th Jolt Award for best Library, Framework, or Component.

Google The Thinking Factory

Google: The Thinking Factory is documentary film about Google Inc. from 2008 written and directed by Gilles Cayatte.

Human presence detection

Human presence detection is a range of technologies and methods for detecting the presence of a human body in an area of interest (AOI), or verification that computer, smartphone ( or other device controlled by software) is operated by human.

Software and hardware technologies are used for human presence detection. Unlike human sensing, that is dealing with human body only , human presence detection technologies are used to verify for safety , security or other reasons that human person , but not any other object is identified. Methods can be used for internet security authentication.

Software technologies :


reCAPTCHAHardware technologies :

Radar technology

Image recognition of human shapes

Security switch

Fingerprint sensors

Infrared detectors

Acoustic sensors

Vibration sensors

Luis von Ahn

Luis von Ahn (Spanish: [ˈlwis fon ˈan]; born 19 August 1978) is a Guatemalan entrepreneur and a Consulting Professor in the Computer Science Department at Carnegie Mellon University in Pittsburgh, Pennsylvania. He is known as one of the pioneers of crowdsourcing. He is the founder of the company reCAPTCHA, which was sold to Google in 2009, and the co-founder and CEO of Duolingo, a popular language-learning platform.


Omegle is a free online chat website that allows users to socialize with others without the need to register. The service randomly pairs users in one-on-one chat sessions where they chat anonymously using the names "You" and "Stranger" or "Stranger 1" and "Stranger 2" in the case of Spy mode. The site was created by 18-year-old Leif K-Brooks of Brattleboro, Vermont, and was launched on March 25, 2009. Less than a month after launch, Omegle garnered around 150,000 page views a day, and in March 2009 the site introduced a video conferencing feature. The site now provides a mobile application that lets users chat with strangers from mobile devices.

Comparisons have been made to early-1990s AOL. Other services that provide similar services include Tinychat and Whisper.

Page Hunt

Page Hunt is a game developed by Bing for investigating human research behavior. It is a so-called "game with a purpose", as it pursues additional goals: not only to provide entertainment but also to harness human computation for some specific research task.

The term "games with a purpose" was coined by Luis von Ahn, inventor of CAPTCHA, co-organizer of the reCAPTCHA project, and inventor of a famous ESP game.

Project Sunroof

Project Sunroof is a solar power initiative started by Google engineer Carl Elkin. The initiative's stated purpose is "mapping the planet's solar potential, one roof at a time."

Rajen Sheth

Rajen Sheth is an executive at Google, where he currently runs product management at cloud AI and machine learning team. The idea of an enterprise version Google's email service Gmail was pitched by Rajen in a meeting with CEO Eric Schmidt in 2004. Schmidt initially rejected the proposal, arguing that the division should focus on web search, but the suggestion was later accepted. Sheth is known as "father of Google Apps", and is responsible for development of Chrome and Chrome OS for Business.


This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.