Googlebot

Googlebot is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler (to simulate desktop users) and a mobile crawler (to simulate a mobile user).[1]


Your website will probably be crawled by both Googlebot Desktop and Googlebot Mobile. You can identify the subtype of Googlebot by looking at the user agent string in the request. However, both crawler types obey the same product token (useent token) in robots.txt, and so you cannot selectively target either Googlebot mobile or Googlebot desktop using robots.txt.

If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file,[2] or by adding the meta tag <meta name="Googlebot" content="nofollow" /> to the web page.[3] Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".[4]

Currently, Googlebot follows HREF links and SRC links.[2] There is increasing evidence Googlebot can execute JavaScript and parse content generated by Ajax calls as well.[5][6] There are many theories regarding how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability derived from custom interpreters.[7][8][9] Currently, Googlebot uses a web rendering service (WRS) that is based on Chrome 41 (M41)[10]. Googlebot discovers pages by harvesting all the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to from other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster.

A problem that webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Search Console" that allow website owners to throttle the crawl rate.[11]

How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how often a website is updated. A site's crawl budget is determined by how many incoming links it has and how frequently the site is updated.

Technically, Googlebot's development team (Crawling and Indexing team) uses several defined terms internally to takes over what "crawl budget" stands for.[12]

Googlebot
Google 2015 logo
Original author(s)Google
TypeWeb crawler
WebsiteGooglebot FAQ

References

  1. ^ "Googlebot". Google. 2019-03-11. Retrieved 2019-03-11.
  2. ^ a b "Google Search Console". Google.com.
  3. ^ "Google Search Console". search.google.com. Retrieved 2019-03-11.
  4. ^ Exact Googlebot client info can be found in Google-cached copies of pages which display such data to visitors. For example, see [1]
  5. ^ "Googlebot makes POST requests via AJAX".
  6. ^ "Google, the Jig is Up! Googlebot is actually a browser..."
  7. ^ "Googlebot's Javascript Interpreter: A Diagnostic".
  8. ^ "Googlebot is Chrome".
  9. ^ "How Googlebot crawls JavaScript".
  10. ^ "Understand rendering on Google Search | Search". Google Developers. Retrieved 2019-03-11.
  11. ^ "Google - Webmasters". Google.com. Retrieved 2012-12-15.
  12. ^ "What Crawl Budget Means for Googlebot". Official Google Webmaster Central Blog. Retrieved 2018-07-04.

External links

Android Q

Android "Q" is the upcoming tenth major release and the 17th version of the Android mobile operating system. The first beta of Android Q was released on March 13, 2019 for all Google Pixel phones. The final release of Android Q is scheduled to be released in the third quarter of 2019.

BigQuery

BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is a serverless Platform as a Service (PaaS) that may be used complementarily with MapReduce.

Bingbot

Bingbot is a web-crawling robot (type of internet bot), deployed by Microsoft October 2010 to supply Bing. It collects documents from the web to build a searchable index for the Bing (search engine). It performs the same function as Google's Googlebot.

A typical user agent string for Bingbot is "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)". This appears in the web server logs to tell the webmaster who is requesting a file. Each webmaster is able to use the included agent identifier, "bingbot", to disallow or allow access to their site (by default access is allowed). If they don't want to grant access they can use the Robots Exclusion Standard to block it (relying on the assumed good behaviour of bingbot), or use other server specific means (relying on the web server to do the blocking).

Chromebit

The Chromebit is a dongle running Google's Chrome OS operating system. When placed in the HDMI port of a television or a monitor, this device turns that display into a personal computer. Chromebit allows adding a keyboard or mouse over Bluetooth or Wi-Fi. The device was announced in April 2015 and began shipping that November.

Field v. Google, Inc.

Field v. Google, Inc., 412 F.Supp. 2d 1106 (D. Nev. 2006) is a case where Google Inc. successfully defended a lawsuit for copyright infringement. Field argued that Google infringed his exclusive right to reproduce his copyrighted works when it "cached" his website and made a copy of it available on its search engine. Google raised multiple defenses: fair use, implied license, estoppel, and Digital Millennium Copyright Act safe harbor protection. The court granted Google's motion for summary judgment and denied Field's motion for summary judgment.

G Suite Marketplace

G Suite Marketplace (formerly Google Apps Marketplace) is a product of Google Inc. It is an online store for web applications that work with Google Apps (Gmail, Google Docs, Google Sites, Google Calendar, Google Contacts, etc.) and with third party software. Some Apps are free. Apps are based on Google APIs or on Google Apps Script.

Google Behind the Screen

"Google: Behind the Screen" (Dutch: "Google: achter het scherm") is a 51-minute episode of the documentary television series Backlight about Google. The episode was first broadcast on 7 May 2006 by VPRO on Nederland 3. It was directed by IJsbrand van Veelen, produced by Nicoline Tania, and edited by Doke Romeijn and Frank Wiering.

Google Blog Search

Google Blog Search was a specialized service of Google used to search blogs. It was discontinued in May 2011. The Blog Search was "the first major search engine to offer full-blown blog and feed search capabilities". It was released in 2005. The bots appeared to be faster than the standard Googlebot, because updates to blogs often become available within hours instead of weeks taken by Googlebot default. The Blog Search searches were done identically to the Google Search by typing your search terms in the search field and seeing the most relevant results related to the topic. The Blog Search looked at various services in the world of blogs like Blogger, Live Journal, and Weblog. For some time it was possible to force Google to access and search the Blogsearch database by manually formatting the URL in your browser's address bar. But in March 2016, Google also took away this access.

Google Dataset Search

Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use. The company launched the service on September 5, 2018, and stated that the product was targeted at scientists and data journalists.

Google Dataset Search complements Google Scholar, the company's search engine for academic studies and reports.

Google Finance

Google Finance is a website focusing on business news and financial information hosted by Google.

Google Fit

Google Fit is a health-tracking platform developed by Google for the Android operating system and Wear OS. It is a single set of APIs that blends data from multiple apps and devices. Google Fit uses sensors in a user's activity tracker or mobile device to record physical fitness activities (such as walking or cycling), which are measured against the user's fitness goals to provide a comprehensive view of their fitness.

Google Forms

Google Forms is a survey administration app that is included in the Google Drive office suite along with Google Docs, Google Sheets, and Google Slides.

Forms features all of the collaboration and sharing features found in Docs, Sheets, and Slides.

Google Guice

Google Guice (pronounced "juice") is an open-source software framework for the Java platform released by Google under the Apache License. It provides support for dependency injection using annotations to configure Java objects. Dependency injection is a design pattern whose core principle is to separate behavior from dependency resolution.

Guice allows implementation classes to be bound programmatically to an interface, then injected into constructors, methods or fields using an @Inject annotation. When more than one implementation of the same interface is needed, the user can create custom annotations that identify an implementation, then use that annotation when injecting it.

Being the first generic framework for dependency injection using Java annotations in 2008, Guice won the 18th Jolt Award for best Library, Framework, or Component.

Google Search Console

Google Search Console (previously Google Webmaster Tools) is a no-charge web service by Google for webmasters. It allows webmasters to check indexing status and optimize visibility of their websites.

As of May 20, 2015, Google rebranded Google Webmaster Tools as Google Search Console. In January 2018, Google introduced a new version of the Search Console, with a refreshed user interface and improvements.

Google The Thinking Factory

Google: The Thinking Factory is documentary film about Google Inc. from 2008 written and directed by Gilles Cayatte.

Noindex

The noindex value of an HTML robots meta tag requests that automated Internet bots avoid indexing a web page. Reasons why one might want to use this meta tag include advising robots not to index a very large database, webpages that are very transitory, pages that one wishes to keep slightly more private, or the printer and mobile-friendly versions of pages. Since the burden of honoring a website's noindex tag lies with the author of the search robot, sometimes these tags are ignored. Also the interpretation of the noindex tag is sometimes slightly different from one search engine company to the next.

Project Sunroof

Project Sunroof is a solar power initiative started by Google engineer Carl Elkin. The initiative's stated purpose is "mapping the planet's solar potential, one roof at a time."

Rajen Sheth

Rajen Sheth is an executive at Google, where he currently runs product management at cloud AI and machine learning team. The idea of an enterprise version Google's email service Gmail was pitched by Rajen in a meeting with CEO Eric Schmidt in 2004. Schmidt initially rejected the proposal, arguing that the division should focus on web search, but the suggestion was later accepted. Sheth is known as "father of Google Apps", and is responsible for development of Chrome and Chrome OS for Business.

Robots exclusion standard

The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned. Robots are often used by search engines to categorize websites. Not all robots cooperate with the standard; email harvesters, spambots, malware and robots that scan for security vulnerabilities may even start with the portions of the website where they have been told to stay out. The standard is different from but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites.

Overview
Advertising
Communication
Software
Platforms
Hardware
Development
tools
Publishing
Search
(timeline)
Events
People
Other
Related
Active
Discontinued
Types

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.