欢迎来到 Android人的问与答 ,在这里提问或帮助他人解答。 Question2Answer的1.6.x和1.7.x版本的中文翻译包我修改更新:

Question2answer

如何使用中文语言包

Q2A 1.6.x - Chinese Simplified (简体中文)

Q2A 1.7.x - Chinese Simplified (简体中文)


国内服务器推荐 阿里云服务器
国外服务器推荐 DigitalOcean
VPN推荐 正版Green VPN
阿里云推荐码:ldvq50

Top 50 open source web crawlers for data mining

0 喜欢 0 不喜欢
657 浏览

http://www.bigdata-madesimple.com/top-50-open-source-web-crawlers-for-data-mining/

A web crawler (also known in other terms like ants, automatic indexers, bots, web spiders, web robots or web scutters) is an automated program, or script, that methodically scans or "crawls" through web pages to create an index of the data it is set to look for. This process is called Web crawling or spidering.

There are various uses for web crawlers, but essentially a web crawler is used to collect/mine data from the Internet. Most search engines use it as a means of providing up-to-date data and to find what’s new on the Internet. Analytics companies and market researchers use web crawlers to determine customer and market trends in a given geography. In this article, we present top 50 open source web crawlers available on the web for data mining.

     

 

最新提问 2月 2, 2015 分类:Android人的问与答 | 用户: forlong401 (7,050 分)

1个回答

0 喜欢 0 不喜欢

Name

Language

Platform

Heritrix

Java

Linux

Nutch

Java

Cross-platform

Scrapy

Python

Cross-platform

DataparkSearch

C++

Cross-platform

GNU Wget

C

Linux

GRUB

C#, C, Python, Perl

Cross-platform

ht://Dig

C++

Unix

HTTrack

C/C++

Cross-platform

ICDL Crawler

C++

Cross-platform

mnoGoSearch

C

Windows

Norconex HTTP Collector

Java

Cross-platform

Open Source Server

C/C++, Java PHP

Cross-platform

PHP-Crawler

PHP

Cross-platform

YaCy

Java

Cross-platform

WebSPHINX

Java

Cross-platform

WebLech

Java

Cross-platform

 

   

 

最新回答 2月 2, 2015 用户: forlong401 (7,050 分)
Arale

Java

Cross-platform

JSpider

Java

Cross-platform

HyperSpider

Java

Cross-platform

Arachnid

Java

Cross-platform

Spindle

Java

Cross-platform

Spider

Java

Cross-platform

LARM

Java

Cross-platform

Metis

Java

Cross-platform

SimpleSpider

Java

Cross-platform

Grunk

Java

Cross-platform

CAPEK

Java

Cross-platform

Aperture

Java

Cross-platform

Smart and Simple Web Crawler

Java

Cross-platform

Web Harvest

Java

Cross-platform

Aspseek

C++

Linux

Bixo

Java

Cross-platform

crawler4j

Java

Cross-platform

Ebot

Erland

Linux

Hounder

Java

Cross-platform

Hyper Estraier

C/C++

Cross-platform

OpenWebSpider

C#, PHP

Cross-platform

Pavuk

C

Lunix

Sphider

PHP

Cross-platform

Xapian

C++

Cross-platform

Arachnode.net

C#

Windows

Crawwwler

C++

Java

Distributed Web Crawler

C, Java, Python

Cross-platform

iCrawler

Java

Cross-platform

pycreep

Java

Cross-platform

Opese

C++

Linux

Andjing

Java



Ccrawler

C#

Windows

WebEater

Java

Cross-platform

JoBo

Java

Cross-platform
...