Logo
Unionpedia
Communication
Get it on Google Play
New! Download Unionpedia on your Android™ device!
Free
Faster access than browser!
 

Web crawler

Index Web crawler

A Web crawler, sometimes called a spider, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering). [1]

129 relations: Ajax (programming), Algorithm, Apache Hadoop, Apache License, Apache Nutch, Apache Solr, ASCII, Ask.com, Automatic indexing, Backlink, Bandwidth (computing), Bing (search engine), Bingbot, Breadth-first search, BSD licenses, C (programming language), CiteSeerX, Combination, Command-line interface, Crawl frontier, Data breach, Data scraping, Data-driven programming, Deep web, Duplicate content, Edward G. Coffman Jr., Elasticsearch, Enterprise search, Filippo Menczer, FOAF (ontology), Focused crawler, Frontera (web crawling), GNU Affero General Public License, GNU General Public License, Gnutella crawler, Google Scholar, Google Search, Googlebot, Grep, Grub (search engine), Heritrix, Ht-//Dig, HTML, HTTrack, Hyperlink, Hypertext Transfer Protocol, IBM WebFountain, Internet Archive, Internet bot, Java (programming language), ..., John Wiley & Sons, Larry Page, Lee Giles, Machine learning, Media type, Metadata, Microsoft, Microsoft Academic Search, Microsoft Word, Middleware, MnoGoSearch, Mod oai, Msnbot, MySQL, National Center for Supercomputing Applications, Norconex, Offline reader, OpenSearchServer, OWASP, PageRank, Panos Ipeirotis, Parallel computing, PDF, PHP, PHP-Crawler, PostScript, Python (programming language), Query string, Recursion, Regular expression, Rewrite engine, Robots exclusion standard, Scrapy, Search engine indexing, Search engine scraping, Seeks, Sergey Brin, Sitemaps, Software agent, Spambot, Spamdexing, Sphinx (search engine), Spider trap, Steve Lawrence (computer scientist), Storm (event processor), StormCrawler, Swiftype, Thumbnail, TkWWW, Top-level domain, Torsten Suel, Unintended consequences, Unix, URL, URL normalization, User agent, Vertical search, Web application security, Web archiving, Web content, Web indexing, Web page, Web scraping, Web search engine, Web server, WebCrawler, Webgraph, Website, Wget, Wikia Search, Wired (magazine), World Wide Web, World-Wide Web Worm, Xapian, Xenon (program), YaCy, Yahoo! Search, Zip (file format), .NET Framework. Expand index (79 more) »

Ajax (programming)

Ajax (also AJAX; short for "Asynchronous JavaScript And XML") is a set of Web development techniques using many Web technologies on the client side to create asynchronous Web applications.

New!!: Web crawler and Ajax (programming) · See more »

Algorithm

In mathematics and computer science, an algorithm is an unambiguous specification of how to solve a class of problems.

New!!: Web crawler and Algorithm · See more »

Apache Hadoop

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.

New!!: Web crawler and Apache Hadoop · See more »

Apache License

The Apache License is a permissive free software license written by the Apache Software Foundation (ASF).

New!!: Web crawler and Apache License · See more »

Apache Nutch

Apache Nutch is a highly extensible and scalable open source web crawler software project.

New!!: Web crawler and Apache Nutch · See more »

Apache Solr

Solr (pronounced "solar") is an open source enterprise search platform, written in Java, from the Apache Lucene project.

New!!: Web crawler and Apache Solr · See more »

ASCII

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication.

New!!: Web crawler and ASCII · See more »

Ask.com

Ask.com (originally known as Ask Jeeves) is a question answering-focused e-business and web search engine founded in 1996 by Garrett Gruener and David Warthen in Berkeley, California.

New!!: Web crawler and Ask.com · See more »

Automatic indexing

Automatic indexing is the ability for a computer to scan large volumes of documents against a controlled vocabulary, taxonomy, thesaurus or ontology and use those controlled terms to quickly and effectively index large document depositories.

New!!: Web crawler and Automatic indexing · See more »

Backlink

A backlink for a given web resource is a link from some other website (the referrer) to that web resource (the referent).

New!!: Web crawler and Backlink · See more »

Bandwidth (computing)

In computing, bandwidth is the maximum rate of data transfer across a given path.

New!!: Web crawler and Bandwidth (computing) · See more »

Bing (search engine)

Bing is a web search engine owned and operated by Microsoft.

New!!: Web crawler and Bing (search engine) · See more »

Bingbot

Bingbot is a web-crawling robot (type of internet bot), deployed by Microsoft October 2010 to supply Bing.

New!!: Web crawler and Bingbot · See more »

Breadth-first search

Breadth-first search (BFS) is an algorithm for traversing or searching tree or graph data structures.

New!!: Web crawler and Breadth-first search · See more »

BSD licenses

BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and redistribution of covered software.

New!!: Web crawler and BSD licenses · See more »

C (programming language)

C (as in the letter ''c'') is a general-purpose, imperative computer programming language, supporting structured programming, lexical variable scope and recursion, while a static type system prevents many unintended operations.

New!!: Web crawler and C (programming language) · See more »

CiteSeerX

x or CiteSeerX but DISPLAYTITLE only allows changing an initial letter to lower case --> CiteSeerx (originally called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.

New!!: Web crawler and CiteSeerX · See more »

Combination

In mathematics, a combination is a selection of items from a collection, such that (unlike permutations) the order of selection does not matter.

New!!: Web crawler and Combination · See more »

Command-line interface

A command-line interface or command language interpreter (CLI), also known as command-line user interface, console user interface and character user interface (CUI), is a means of interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines).

New!!: Web crawler and Command-line interface · See more »

Crawl frontier

A crawl frontier is a data structure used for storage of URLs eligible for crawling and supporting such operations as adding URLs and selecting for crawl.

New!!: Web crawler and Crawl frontier · See more »

Data breach

A data breach is the intentional or unintentional release of secure or private/confidential information to an untrusted environment.

New!!: Web crawler and Data breach · See more »

Data scraping

Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program.

New!!: Web crawler and Data scraping · See more »

Data-driven programming

In computer programming, data-driven programming is a programming paradigm in which the program statements describe the data to be matched and the processing required rather than defining a sequence of steps to be taken.

New!!: Web crawler and Data-driven programming · See more »

Deep web

The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search engines for any reason.

New!!: Web crawler and Deep web · See more »

Duplicate content

Duplicate content is a term used in the field of search engine optimization to describe content that appears on more than one web page.

New!!: Web crawler and Duplicate content · See more »

Edward G. Coffman Jr.

Edward Grady "Ed" Coffman Jr. is a computer scientist.

New!!: Web crawler and Edward G. Coffman Jr. · See more »

Elasticsearch

Elasticsearch is a search engine based on Lucene.

New!!: Web crawler and Elasticsearch · See more »

Enterprise search

Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience.

New!!: Web crawler and Enterprise search · See more »

Filippo Menczer

Filippo Menczer is an American and Italian professor of informatics and computer science who is the director at the Center for Complex Networks and Systems Research, a research unit of the Indiana University School of Informatics and Computing and a member lab of the Web Science Trust Network.

New!!: Web crawler and Filippo Menczer · See more »

FOAF (ontology)

FOAF (an acronym of friend of a friend) is a machine-readable ontology describing persons, their activities and their relations to other people and objects.

New!!: Web crawler and FOAF (ontology) · See more »

Focused crawler

A focused crawler is a web crawler that collects Web pages that satisfy some specific property, by carefully prioritizing the and managing the hyperlink exploration process.

New!!: Web crawler and Focused crawler · See more »

Frontera (web crawling)

Frontera is an open source, web crawling framework implementing crawl frontier component and providing scalability primitives for web crawler applications.

New!!: Web crawler and Frontera (web crawling) · See more »

GNU Affero General Public License

The GNU Affero General Public License is a free, copyleft license published by the Free Software Foundation in November 2007, and based on the GNU General Public License, version 3 and the Affero General Public License.

New!!: Web crawler and GNU Affero General Public License · See more »

GNU General Public License

The GNU General Public License (GNU GPL or GPL) is a widely used free software license, which guarantees end users the freedom to run, study, share and modify the software.

New!!: Web crawler and GNU General Public License · See more »

Gnutella crawler

A Gnutella crawler is a software program used to gather statistic information on the gnutella file sharing network, such as the number of users, the market share of different clients and the geographical distribution of the userbase.

New!!: Web crawler and Gnutella crawler · See more »

Google Scholar

Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.

New!!: Web crawler and Google Scholar · See more »

Google Search

Google Search, commonly referred to as Google Web Search or simply Google, is a web search engine developed by Google.

New!!: Web crawler and Google Search · See more »

Googlebot

Googlebot is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google Search engine.

New!!: Web crawler and Googlebot · See more »

Grep

grep is a command-line utility for searching plain-text data sets for lines that match a regular expression.

New!!: Web crawler and Grep · See more »

Grub (search engine)

Grub is an open source distributed search crawler platform.

New!!: Web crawler and Grub (search engine) · See more »

Heritrix

Heritrix is a web crawler designed for web archiving.

New!!: Web crawler and Heritrix · See more »

Ht-//Dig

is a free software indexing and searching system created in 1995 by Andrew Scherpbier while he was employed at San Diego State University.

New!!: Web crawler and Ht-//Dig · See more »

HTML

Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications.

New!!: Web crawler and HTML · See more »

HTTrack

HTTrack is a free and open source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.

New!!: Web crawler and HTTrack · See more »

Hyperlink

In computing, a hyperlink, or simply a link, is a reference to data that the reader can directly follow either by clicking, tapping, or hovering.

New!!: Web crawler and Hyperlink · See more »

Hypertext Transfer Protocol

The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, and hypermedia information systems.

New!!: Web crawler and Hypertext Transfer Protocol · See more »

IBM WebFountain

WebFountain is an Internet analytical engine implemented by IBM for the study of unstructured data on the World Wide Web.

New!!: Web crawler and IBM WebFountain · See more »

Internet Archive

The Internet Archive is a San Francisco–based nonprofit digital library with the stated mission of "universal access to all knowledge." It provides free public access to collections of digitized materials, including websites, software applications/games, music, movies/videos, moving images, and nearly three million public-domain books.

New!!: Web crawler and Internet Archive · See more »

Internet bot

An Internet Bot, also known as web robot, WWW robot or simply -bot-, is a software application that runs automated tasks (scripts) over the Internet.

New!!: Web crawler and Internet bot · See more »

Java (programming language)

Java is a general-purpose computer-programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible.

New!!: Web crawler and Java (programming language) · See more »

John Wiley & Sons

John Wiley & Sons, Inc., also referred to as Wiley, is a global publishing company that specializes in academic publishing.

New!!: Web crawler and John Wiley & Sons · See more »

Larry Page

Lawrence Edward Page (born March 26, 1973) is an American computer scientist and Internet entrepreneur who co-founded Google with Sergey Brin.

New!!: Web crawler and Larry Page · See more »

Lee Giles

Clyde Lee Giles is an American computer scientist and the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University.

New!!: Web crawler and Lee Giles · See more »

Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.

New!!: Web crawler and Machine learning · See more »

Media type

A media type (formerly known as MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet.

New!!: Web crawler and Media type · See more »

Metadata

Metadata is "data that provides information about other data".

New!!: Web crawler and Metadata · See more »

Microsoft

Microsoft Corporation (abbreviated as MS) is an American multinational technology company with headquarters in Redmond, Washington.

New!!: Web crawler and Microsoft · See more »

Microsoft Academic Search

Microsoft Academic Search was a research project and academic search engine retired in 2012.

New!!: Web crawler and Microsoft Academic Search · See more »

Microsoft Word

Microsoft Word (or simply Word) is a word processor developed by Microsoft.

New!!: Web crawler and Microsoft Word · See more »

Middleware

Middleware is computer software that provides services to software applications beyond those available from the operating system.

New!!: Web crawler and Middleware · See more »

MnoGoSearch

mnoGoSearch is an open source search engine for Unix-like computer systems written in C. It is distributed under the GNU General Public License and designed to organize search within a website, group of websites, intranet or local system.

New!!: Web crawler and MnoGoSearch · See more »

Mod oai

mod_oai is an Apache module that allows web crawlers to efficiently discover new, modified, and deleted web resources from a web server by using OAI-PMH, a protocol which is widely used in the digital libraries community.

New!!: Web crawler and Mod oai · See more »

Msnbot

msnbot was a web-crawling robot (type of internet bot), deployed by Microsoft to collect documents from the web to build a searchable index for the MSN Search engine.

New!!: Web crawler and Msnbot · See more »

MySQL

MySQL ("My S-Q-L") is an open-source relational database management system (RDBMS).

New!!: Web crawler and MySQL · See more »

National Center for Supercomputing Applications

The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale cyberinfrastructure that advances research, science and engineering based in the United States of America.

New!!: Web crawler and National Center for Supercomputing Applications · See more »

Norconex

Norconex is a North American information technology company specialising in Enterprise Search professional services and software development (both commercial and open-source).

New!!: Web crawler and Norconex · See more »

Offline reader

An offline reader (sometimes called an offline browser or offline navigator) is computer software that downloads e-mail, newsgroup posts or web pages, making them available when the computer is offline: not connected to the Internet.

New!!: Web crawler and Offline reader · See more »

OpenSearchServer

OpenSearchServer is an open-source application server allowing development of index-based applications such as search engines.

New!!: Web crawler and OpenSearchServer · See more »

OWASP

The Open Web Application Security Project (OWASP), an online community, produces freely-available articles, methodologies, documentation, tools, and technologies in the field of web application security.

New!!: Web crawler and OWASP · See more »

PageRank

PageRank (PR) is an algorithm used by Google Search to rank websites in their search engine results.

New!!: Web crawler and PageRank · See more »

Panos Ipeirotis

Panagiotis G. Ipeirotis (born 1976 in Serres, Greece) is a Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University.

New!!: Web crawler and Panos Ipeirotis · See more »

Parallel computing

Parallel computing is a type of computation in which many calculations or the execution of processes are carried out concurrently.

New!!: Web crawler and Parallel computing · See more »

PDF

The Portable Document Format (PDF) is a file format developed in the 1990s to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.

New!!: Web crawler and PDF · See more »

PHP

PHP: Hypertext Preprocessor (or simply PHP) is a server-side scripting language designed for Web development, but also used as a general-purpose programming language.

New!!: Web crawler and PHP · See more »

PHP-Crawler

PHP-Crawler is an open-source crawling script based on PHP and MySQL.

New!!: Web crawler and PHP-Crawler · See more »

PostScript

PostScript (PS) is a page description language in the electronic publishing and desktop publishing business.

New!!: Web crawler and PostScript · See more »

Python (programming language)

Python is an interpreted high-level programming language for general-purpose programming.

New!!: Web crawler and Python (programming language) · See more »

Query string

On the World Wide Web, a query string is the part of a uniform resource locator (URL) containing data that does not fit conveniently into a hierarchical path structure.

New!!: Web crawler and Query string · See more »

Recursion

Recursion occurs when a thing is defined in terms of itself or of its type.

New!!: Web crawler and Recursion · See more »

Regular expression

A regular expression, regex or regexp (sometimes called a rational expression) is, in theoretical computer science and formal language theory, a sequence of characters that define a search pattern.

New!!: Web crawler and Regular expression · See more »

Rewrite engine

A rewrite engine is a software component that performs rewriting on Uniform Resource Locators, modifying their appearance.

New!!: Web crawler and Rewrite engine · See more »

Robots exclusion standard

The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots.

New!!: Web crawler and Robots exclusion standard · See more »

Scrapy

Scrapy is a free and open source web crawling framework, written in Python.

New!!: Web crawler and Scrapy · See more »

Search engine indexing

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval.

New!!: Web crawler and Search engine indexing · See more »

Search engine scraping

Search engine scraping is the process of harvesting URLs, descriptions, or other information from search engines such as Google, Bing or Yahoo.

New!!: Web crawler and Search engine scraping · See more »

Seeks

Seeks is a free and open-source project licensed under the Affero General Public License version 3 (AGPLv3).

New!!: Web crawler and Seeks · See more »

Sergey Brin

Sergey Mikhaylovich Brin (Серге́й Миха́йлович Брин; born August 21, 1973) is a Russian-born American computer scientist and internet entrepreneur.

New!!: Web crawler and Sergey Brin · See more »

Sitemaps

The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling.

New!!: Web crawler and Sitemaps · See more »

Software agent

In computer science, a software agent is a computer program that acts for a user or other program in a relationship of agency, which derives from the Latin agere (to do): an agreement to act on one's behalf.

New!!: Web crawler and Software agent · See more »

Spambot

A spambot is a computer program designed to assist in the sending of spam.

New!!: Web crawler and Spambot · See more »

Spamdexing

In digital marketing and online advertising, spamdexing (also known as search engine spam, search engine poisoning, black-hat SEO, search spam or web spam) is the deliberate manipulation of search engine indexes.

New!!: Web crawler and Spamdexing · See more »

Sphinx (search engine)

Sphinx is a fulltext F/OSS search engine that provides text search functionality to client applications.

New!!: Web crawler and Sphinx (search engine) · See more »

Spider trap

A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a web crawler or search bot to make an infinite number of requests or cause a poorly constructed crawler to crash.

New!!: Web crawler and Spider trap · See more »

Steve Lawrence (computer scientist)

Steve Lawrence is an Australian computer scientist.

New!!: Web crawler and Steve Lawrence (computer scientist) · See more »

Storm (event processor)

Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language.

New!!: Web crawler and Storm (event processor) · See more »

StormCrawler

StormCrawler is an open-source collection of resources for building low-latency, scalable web crawlers on Apache Storm.

New!!: Web crawler and StormCrawler · See more »

Swiftype

Swiftype is a search and index company based in San Francisco, CA, that provides search software for organizations, websites, and computer programs.

New!!: Web crawler and Swiftype · See more »

Thumbnail

Thumbnails are reduced-size versions of pictures or videos, used to help in recognizing and organizing them, serving the same role for images as a normal text index does for words.

New!!: Web crawler and Thumbnail · See more »

TkWWW

tkWWW is an early, now discontinued web browser and WYSIWYG HTML editor written by Joseph Wang at MIT as part of Project Athena and the Globewide Network Academy project.

New!!: Web crawler and TkWWW · See more »

Top-level domain

A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet.

New!!: Web crawler and Top-level domain · See more »

Torsten Suel

Torsten Suel is a professor in the Department of Computer Science and Engineering at the New York University Tandon School of Engineering.

New!!: Web crawler and Torsten Suel · See more »

Unintended consequences

In the social sciences, unintended consequences (sometimes unanticipated consequences or unforeseen consequences) are outcomes that are not the ones foreseen and intended by a purposeful action.

New!!: Web crawler and Unintended consequences · See more »

Unix

Unix (trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, development starting in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.

New!!: Web crawler and Unix · See more »

URL

A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.

New!!: Web crawler and URL · See more »

URL normalization

URL normalization is the process by which URLs are modified and standardized in a consistent manner.

New!!: Web crawler and URL normalization · See more »

User agent

In computing, a user agent is software (a software agent) that is acting on behalf of a user.

New!!: Web crawler and User agent · See more »

Vertical search

A vertical search engine is distinct from a general web search engine, in that it focuses on a specific segment of online content.

New!!: Web crawler and Vertical search · See more »

Web application security

Web application security is a branch of Information Security that deals specifically with security of websites, web applications and web services.

New!!: Web crawler and Web application security · See more »

Web archiving

Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public.

New!!: Web crawler and Web archiving · See more »

Web content

Web content is the textual, visual, or aural content that is encountered as part of the user experience on websites.

New!!: Web crawler and Web content · See more »

Web indexing

Web indexing (or Internet indexing) refers to various methods for indexing the contents of a website or of the Internet as a whole.

New!!: Web crawler and Web indexing · See more »

Web page

A web page (also written as webpage) is a document that is suitable for the World Wide Web and web browsers.

New!!: Web crawler and Web page · See more »

Web scraping

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.

New!!: Web crawler and Web scraping · See more »

Web search engine

A web search engine is a software system that is designed to search for information on the World Wide Web.

New!!: Web crawler and Web search engine · See more »

Web server

Web server refers to server software, or hardware dedicated to running said software, that can serve contents to the World Wide Web.

New!!: Web crawler and Web server · See more »

WebCrawler

WebCrawler was a metasearch engine that blends the top search results from Google Search and Yahoo! Search.

New!!: Web crawler and WebCrawler · See more »

Webgraph

The webgraph describes the directed links between pages of the World Wide Web.

New!!: Web crawler and Webgraph · See more »

Website

A website is a collection of related web pages, including multimedia content, typically identified with a common domain name, and published on at least one web server.

New!!: Web crawler and Website · See more »

Wget

GNU Wget (or just Wget, formerly Geturl, also written as its package name, wget) is a computer program that retrieves content from web servers.

New!!: Web crawler and Wget · See more »

Wikia Search

Wikia Search was a short-lived free and open-source web search engine launched by Wikia, a for-profit wiki-hosting company founded in late 2004 by Jimmy Wales and Angela Beesley.

New!!: Web crawler and Wikia Search · See more »

Wired (magazine)

Wired is a monthly American magazine, published in print and online editions, that focuses on how emerging technologies affect culture, the economy, and politics.

New!!: Web crawler and Wired (magazine) · See more »

World Wide Web

The World Wide Web (abbreviated WWW or the Web) is an information space where documents and other web resources are identified by Uniform Resource Locators (URLs), interlinked by hypertext links, and accessible via the Internet.

New!!: Web crawler and World Wide Web · See more »

World-Wide Web Worm

The World-Wide Web Worm (WWWW) is claimed to be the first search engine for the World-Wide Web, though it was not released until March 1994, by which time a number of other search engines had been made publicly available.

New!!: Web crawler and World-Wide Web Worm · See more »

Xapian

Xapian is a free and open source probabilistic information retrieval library, released under the GNU General Public License (GPL).

New!!: Web crawler and Xapian · See more »

Xenon (program)

Xenon is software to perform covert Internet searches and surveillance, presently in use by taxing authorities in at least six nations to investigate the possibilities of tax evasion by various revenue producing web sites (online shops, gambling sites, or pornography sites) and clients selling goods on on-line auction sites.

New!!: Web crawler and Xenon (program) · See more »

YaCy

YaCy (pronounced "ya see") is a free distributed search engine, built on principles of peer-to-peer (P2P) networks.

New!!: Web crawler and YaCy · See more »

Yahoo! Search

Yahoo! Search is a web search engine owned by Yahoo, headquartered in Sunnyvale, California.

New!!: Web crawler and Yahoo! Search · See more »

Zip (file format)

ZIP is an archive file format that supports lossless data compression.

New!!: Web crawler and Zip (file format) · See more »

.NET Framework

.NET Framework (pronounced dot net) is a software framework developed by Microsoft that runs primarily on Microsoft Windows.

New!!: Web crawler and .NET Framework · See more »

Redirects here:

Automated agent, Automatic indexer, Crawl site, FAST Crawler, RBSE, Scutter, Search bot, Search engine robot, Search engine robots, Search engine spider, Search engine spiders, Search robot, Searchbot, Site crawler, Spider bot, Spider operating system, Spiderable, Spiderbot, Spidering, Web Crawler, Web crawlers, Web crawling, Web scutter, Web spider, Web-crawler, Webcrawler, Webcrawlers.

References

[1] https://en.wikipedia.org/wiki/Web_crawler

OutgoingIncoming
Hey! We are on Facebook now! »