WebCrawler

WebCrawler is a web search engine, and is the oldest surviving search engine on the web today. For many years, it operated as a metasearch engine. WebCrawler was the first web search engine to provide full text search.[2]

WebCrawler
Webcrawler logo 2018
Logo since 2018
Type of site
Web search engine
Available inEnglish
OwnerSystem1
Created byBrian Pinkerton
Websitewww.webcrawler.com
Alexa rankIncrease2,601 (May 2019)[1]
CommercialNo
Registrationnone
LaunchedApril 20, 1994
Current statusActive

History

Webcrawler screenshot 1995
Screenshot of WebCrawler homepage in September 1995

Brian Pinkerton first started working on WebCrawler, which was originally a desktop application, on January 27, 1994 at the University of Washington.[3] On March 15, 1994, he generated a list of the top 25 websites.[2]

WebCrawler launched on April 20, 1994, with more than 4,000 different websites in its database[3] and on November 14, 1994, WebCrawler served its 1 millionth search query[3] for "nuclear weapons design and research".[4]

On December 1, 1994, WebCrawler acquired two sponsors, DealerNet and Starwave, which provided money to keep WebCrawler operating.[3] Starting on October 3, 1995, WebCrawler was fully supported by advertising, but separated the adverts from search results.[3]

On June 1, 1995, America Online (AOL) acquired WebCrawler.[3] After being acquired by AOL, the website introduced its mascot "Spidey" on September 1, 1995.[3]

Starting in April 1996,[3] WebCrawler also included the human-edited internet guide GNN Select, which was also under AOL ownership.[5][6]

On April 1, 1997, Excite acquired WebCrawler from AOL for $12.3 million.[3][7]

WebCrawler received a facelift on June 16, 1997, adding WebCrawler Shortcuts, which suggested alternative links to material related to a search topic.[8]

WebCrawler was maintained by Excite as a separate search engine with its own database until 2001, when it started using Excite's own database, effectively putting an end to WebCrawler as an independent search engine.[9] Later that year, Excite (then called Excite@Home) went bankrupt and WebCrawler was bought by InfoSpace in 2001.[3]

WebCrawler Screenshot 6-7-2010
WebCrawler's homepage (June 2010)

Pinkerton, WebCrawler's creator, led the Amazon A9.com search division as of 2012.[10][11]

In July 2016, Blucora announced the sale of its InfoSpace business to OpenMail for $45 million, putting WebCrawler under the ownership of OpenMail.[12] OpenMail was later renamed System1.[13]

In 2018, WebCrawler received another facelift and the logo of the search engine was changed.[14][15]

Traffic

WebCrawler was highly successful early on[16] and at one point, it was unusable during peak times due to server overload.[17] It was the second most visited website on the internet as of February 1996, but it quickly dropped below rival search engines and directories such as Yahoo!, Infoseek, Lycos, and Excite by 1997.[18]

See also

References

  1. ^ "Webcrawler.com Site Info". Alexa Internet. Retrieved 2019-05-14.
  2. ^ a b "Short History of Early Search Engines – The History of SEO". www.thehistoryofseo.com. Retrieved 2019-02-03.
  3. ^ a b c d e f g h i j "WebCrawler's History". www.thinkpink.com. Archived from the original on 2000-12-18. Retrieved 2019-01-09.
  4. ^ Lammle, Rob (2012-03-16). "'90s Tech Icons: Where Are They Now?". Mashable. Archived from the original on 2012-03-17. Retrieved 2019-02-18.
  5. ^ "Se-En". searchenginearchive.com. Retrieved 2019-01-25.
  6. ^ "WebCrawler Select: Review Categories". WebCrawler. 1996-10-24. Retrieved 2019-02-03.
  7. ^ Keogh, Garret. "Excite buys WebCrawler from AOL". ZDNet. Retrieved 2019-01-15.
  8. ^ Sullivan, Danny (1997-06-16). "The Search Engine Update, June 17, 1997, Number 7". Search Engine Watch. Archived from the original on 2016-04-14. Retrieved 2019-02-02.
  9. ^ R. Notess, Greg (2002). "On the Net: Dead Search Engines". InfoToday. Archived from the original on 2002-05-25. Retrieved 2019-01-16.
  10. ^ Brid-Aine Parnell (December 18, 2012). "Search engines we have known ... before Google crushed them". The Register. Retrieved November 17, 2016.
  11. ^ "Leading Leaders". A9 Management web page. Archived from the original on November 14, 2016. Retrieved November 15, 2016.
  12. ^ "Blucora to sell InfoSpace business for $45 million". Seattle Times. July 5, 2016.
  13. ^ "System1 raises $270 million for 'consumer intent' advertising". L.A. Biz. Retrieved 2017-12-01.
  14. ^ "WebCrawler Search". WebCrawler. 2018-05-31. Retrieved 2019-02-02.
  15. ^ "WebCrawler Search". WebCrawler. 2018-11-30. Retrieved 2019-02-02.
  16. ^ McGuigan, Brendan (2007). "What was the First Search Engine?". WiseGeek. Archived from the original on 2007-04-27. Retrieved 2019-02-18.
  17. ^ "Search Engine History.com". www.searchenginehistory.com. Retrieved 2019-01-25.
  18. ^ "Infographic: Top 20 Most Popular Websites (1996-2013)". TechCo. 2014-12-26. Retrieved 2019-01-15.

External links

A9.com

A9.com is a subsidiary of Amazon that develops search engine and search advertising technology. A9 is based in Palo Alto, California, with teams in Bangalore, Beijing, Dublin, Iași, Munich and Tokyo. A9 has development efforts in areas of product search, cloud search, visual search, augmented reality, advertising technology and community question answering.

Agora (web browser)

Agora was a World Wide Web email browser and was a proof of concept to help people to use the full internet. Agora was an email-based web browser designed for non-graphic terminals and to help people without full access to the internet such as in developing countries or without a permanent internet connection. Similar to W3Gate, Agora was a server application designed to fetch HTML documents through e-mail rather than http.

Aliweb

ALIWEB (Archie Like Indexing for the WEB) is considered the first Web search engine, as its predecessors were either built with different purposes (the Wanderer, Gopher) or were literally just indexers (Archie, Veronica and Jughead).

First announced in November 1993 by developer Martijn Koster while working at Nexor, and presented in May 1994 at the First International Conference on the World Wide Web at CERN in Geneva, ALIWEB preceded WebCrawler by several months.ALIWEB allowed users to submit the locations of index files on their sites which enabled the search engine to include webpages and add user-written page descriptions and keywords. This empowered webmasters to define the terms that would lead users to their pages, and also avoided setting bots (e.g. the Wanderer, JumpStation) which used up bandwidth. As relatively few people submitted their sites, ALIWEB was not very widely used.

Martijn Koster, who was also instrumental in the creation of the Robots Exclusion Standard, detailed the background and objectives of ALIWEB with an overview of its functions and framework in the paper he presented at CERN.Koster is not associated with a commercial website which uses the aliweb name.

Blucora

Blucora (formerly Infospace, Inc.) is a provider of Internet-related services, mostly search engines. InfoSpace changed its name to Blucora and NASDAQ symbol from INSP to BCOR on June 7, 2012. This event reflected the company's change as the owner of two online businesses, after its acquisition of TaxACT in January 2012, and distinguishes the parent company from its search business operating unit, which is called InfoSpace.Blucora's InfoSpace business provides metasearch and private-label Internet search services for consumers and online search and monetization solutions to a network of more than 100 partners worldwide. InfoSpace's main metasearch site is Dogpile; its other brands are WebCrawler, and MetaCrawler.Blucora's TaxACT subsidiary offers online tax preparation services. Founded in 1998 and made by 2nd Story Software, in the 2005 tax season, TaxACT became the first to offer free federal tax software and free e-file to all U.S. taxpayers.

Common Crawl

Common Crawl is a nonprofit 501(c)(3) organization that crawls the web and freely provides its archives and datasets to the public. Common Crawl's web archive consists of petabytes of data collected since 2011. It completes crawls generally every month.Common Crawl was founded by Gil Elbaz. Advisors to the non-profit include Peter Norvig and Joi Ito. The organization's crawlers respect nofollow and robots.txt policies. Open source code for processing Common Crawl's data set is publicly available.

DoGreatGood

Do Great Good was a search engine that allowed users to help support charitable causes by conducting online searches.

Do Great Good was founded in May 2009 by InfoSpace, a well-known search company which also owns Dogpile, WebCrawler, MetaCrawler and Nation. The site has closed in August 2010 redirecting visitors to Dogpile.

Dogpile

Dogpile is a metasearch engine for information on the World Wide Web that fetches results from Google, Yahoo!, Yandex, Bing and results from other popular search engines, including those from audio and video content providers such as Yahoo!.

Excite

Excite (stylized as excite) is an internet portal launched in 1995 that provides a variety of content including news and weather, a metasearch engine, a web-based email, instant messaging, stock quotes, and a customizable user homepage. It is currently operated by IAC Applications (formerly Mindspark) of IAC, and Excite Networks. In the U.S., the main Excite site has long been a personal start page called My Excite. Excite also operates an e-mail service, although it is no longer open for new customers.

The original Excite company was founded in 1994 and went public two years later. Excite was one of the most recognized brands on the Internet that decade, with the main portal site Excite.com being the sixth most visited website in 1997 and fourth by 2000. The company merged with broadband provider @Home Network but together went bankrupt in 2001. Excite's portal and services were acquired by iWon.com and then by Ask Jeeves, but the website went into a steep decline in popularity afterwards. As of January 2019, Excite.com ranks 3616th in the U.S. according to the Alexa rankings. The most popular Excite site is the local Japanese one, which ranks 240th in Japan.

JumpStation

JumpStation was the first WWW search engine that behaved, and appeared to the user, the way current web search engines do. It started indexing on 12 December 1993 and was announced on the Mosaic "What's New" webpage on 21 December 1993. It was hosted at the University of Stirling in Scotland.

It was written by Jonathon Fletcher, from Scarborough, England, who graduated from the University with a first class honours degree in Computing Science in the summer of 1992 and has subsequently been named "father of the search engine".He was subsequently employed there as a systems administrator. JumpStation's development discontinued when he left the University in late 1994, having failed to get any investors, including the University of Stirling, to financially back his idea. At this point the database had 275,000 entries spanning 1,500 servers.JumpStation used document titles and headings to index the web pages found using a simple linear search, and did not provide any ranking of results. However, JumpStation had the same basic shape as Google search in that it used an index solely built by a web robot, searched this index using keyword queries entered by the user on a web form whose location was well-known, and presented its results in the form of a list of URLs that matched those keywords.

Kinja

Kinja is a free online news aggregator, launched in April 2004. It is operated by Gizmodo Media Group, which was purchased by Univision Communications during Gawker Media's bankruptcy.

LeapFish

LeapFish.com was a search aggregator that retrieved results from other portals and search engines, including Google, Bing and Yahoo!, and also search engines of blogs, videos etc. It was a registered trademark of Dotnext Inc, launched on 3 November 2008.

List of search engines

This is a list of search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites that have a search facility for online databases. For a list of search engine software, see List of enterprise search vendors.

LookSmart

LookSmart is a search advertising, content management, online media, and technology company. It provides search, machine learning and chatbot technologies as well as pay-per-click and contextual advertising services.

LookSmart also licenses and manages search ad networks as white-label products. It abides by the click measurement guidelines of the Interactive Advertising Bureau.LookSmart also owns several subsidiaries, including Clickable Inc., LookSmart AdCenter, Novatech.io, ShopWiki and Syncapse.The current CEO of LookSmart is Michael Onghai and the company is headquartered in Henderson, Nevada.

MetaCrawler

MetaCrawler is a web search engine program, and a registered trademark of InfoSpace, Inc.

It was originally a metasearch engine, as its name suggests. Throughout its lifetime it combined web search results from sources including Google, Yahoo!, Bing (formerly Live Search), Ask.com, About.com, MIVA, LookSmart and other search engine programs. MetaCrawler also provided users the option to search for images, video, news, business and personal telephone directories, and for a while even audio.

Pogo.com

Pogo.com is a free online gaming website that offers over 100 casual games from brands like Hasbro and PopCap Games. It offers a variety of card and board games like First Class Solitaire and Monopoly to puzzle, sports and word games like Scrabble. It is owned by Electronic Arts and is based in Redwood Shores, CA.

The website is free due to advertising sponsorships but during a game, it produces commercials that can last up to 20 seconds. Players are strongly encouraged to sign up for Club Pogo, a subscription service. The enticement to do so is the offer of premium benefits and the omission of the advertisements that would otherwise interrupt the games. Games are played in a browser with the Java-plugin, Flash and more recently HTML5. Games load in a "room" allowing other players to join and chat.

Players can win jackpot prizes and tokens from playing the games on Pogo.com. Tokens are no longer used in sweepstakes drawings as of December 2010. Players can place bets of tokens on some games, such as Texas hold 'em poker and High Stakes poker. Cash (in the form of pre-paid credit cards) and merchandise prizes are available to U.S. and Canadian residents, excluding Quebec.Pogo also offers downloadable games, often "deluxe" or "to go" versions of already-released games, which can be bought and played while offline. Some of these downloadable games include chat and tokens, similar to the original games. Since 2006, Pogo.com has consistently been a top-10 Internet site for U.S. visitors when measured by time spent online.

Robots exclusion standard

The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned. Robots are often used by search engines to categorize websites. Not all robots cooperate with the standard; email harvesters, spambots, malware and robots that scan for security vulnerabilities may even start with the portions of the website where they have been told to stay out. The standard can be used in conjunction with Sitemaps, a robot inclusion standard for websites.

Scrapy

Scrapy ( SKRAY-pee) is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler. It is currently maintained by Scrapinghub Ltd., a web-scraping development and services company.

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django, it makes it easier to build and scale large crawling projects by allowing developers to reuse their code. Scrapy also provides a web-crawling shell, which can be used by developers to test their assumptions on a site’s behavior.Some well-known companies and products using Scrapy are: Lyst, Parse.ly, Sayone Technologies, Sciences Po Medialab, Data.gov.uk’s World Government Data site.[1]

Web crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).

Web search engines and some other sites use Web crawling or spidering software to update their web content or indices of others sites' web content. Web crawlers copy pages for processing by a search engine which indexes the downloaded pages so users can search more efficiently.

Crawlers consume resources on visited systems and often visit sites without approval. Issues of schedule, load, and "politeness" come into play when large collections of pages are accessed. Mechanisms exist for public sites not wishing to be crawled to make this known to the crawling agent. For example, including a robots.txt file can request bots to index only parts of a website, or nothing at all.

The number of Internet pages is extremely large; even the largest crawlers fall short of making a complete index. For this reason, search engines struggled to give relevant search results in the early years of the World Wide Web, before 2000. Today, relevant results are given almost instantly.

Crawlers can validate hyperlinks and HTML code. They can also be used for web scraping (see also data-driven programming).

Web search engine

A web search engine or Internet search engine is a software system that is designed to carry out web search (Internet search), which means to search the World Wide Web in a systematic way for particular information specified in a web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of web pages, images, videos, infographics, articles, research papers and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler.

Internet content that is not capable of being searched by a web search engine is generally described as the deep web.

Active
Inactive

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.