ants-go
open source, restful, distributed crawler engine
gitter
comming up
- Persistence
- Dynamic Master
design of ants-go
ants
I wrote a crawler engine named ants in python base on scrapy. But sometimes, dynamic language is chaos.
So I start to write it in a compile language.
scrapy
I design the crawler framework by imitating scrapy.
such as downloader,scraper,and the way user write customize spider,
but in a compile way
elasticsearch
I design my distributed architecture by imitating elasticsearch.
it spire me to do a engine for distributed crawler
requirement
go get github.com/PuerkitoBio/goquery
go get github.com/go-sql-driver/mysql
install
go get github.com/wcong/ants-go
go install github.com/wcong/ants-go
run
cd bin
./ants-go
check cluster status
curl 'http://localhost:8200/cluster'
get all spiders
curl 'http://localhost:8200/spiders'
start a spider
curl 'http://localhost:8200/crawl?spider=spiderName'
cluster in one computer
to test cluster in one computer,you can run it from different port in different terminal
one node,use the default port tcp 8300 http 8200
cd bin
./ants-go
the other node set tcp port and http port
cd bin
./ants-go -tcp 9300 -http 9200
flags
there are some flags you can set,check out the help message
./ants-go -h
./ants-go -help
Customize spider
- go to spiders
- write your spiders follow the example deap_loop_spider.go or go to the spider page
- add you spider to spiderMap,follow the example in LoadAllSpiders in load_all_spider.go
- install again