scrapyd-go
an drop-in replacement for scrapyd that is more easy to be scalable and distributed on any number of commodity machines with no hassle, each scrapyd-go
instance is a stateless microservice, all instances must be connected to the same redis
server, redis
is used as a ceneralized registry system for all instances, so each instance se what others see.
Why
scrapyd isn't bad, but it is very stateful, it isn't that easy to deploy it in a distributed environment like k8s
, as well as I wanted to add more features, so I started this project as a drop-in replacement for scrapyd
but writing in modern & scalable environment like go
for restful server and redis
as centeralized registry.
TODOs
-
schedule.json
-
cancel.json
-
addversion.json
-
listprojects.json
-
listversions.json
-
listspiders.json
-
delproject.json
-
delversion.json
-
listjobs.json
-
daemonstatus.json
-
logs/{jobid}
, new: realtime output of the job log
Configurations
scrapyd-go
configs are just simple command line flags
-dir string
the directory to use for local caching (default ".scrapyd-go")
-listen string
the address to bind to (default ":6800")
-max2keep int
the maximum jobs/logs to keep in memory (default 1000000)
-poll int
time in millisecond between each poll operation from queue(s) (default 10)
-python string
the python binary to use (default "python3")
-redis string
the redis server address (default "redis://:somepass@localhost:6379/1")
-sync int
time in seconds between each sync operation (default 15)
-workers int
the maximum workers count (default cpu-cores-count)
Installation
- binary : go to releases page and download your os based release
- docker:
$ docker pull alash3al/scrapyd-go
- source:
$ go get github.com/alash3al/scrapyd-go
Running
- binary:
$ ./scrapyd_bin_file -redis redis://localhost:6379/1
- docker:
$ docker run --link SomeRedisServerContainer -p 6800:6800 alash3al/scrapyd-go -redis redis://SomeRedisServerContainer:6379/1
- source:
$ scrapyd-go -redis redis://localhost:6379/1
Contributing
- Fork the repo
- Create a feature branch
- Push your changes
- Create a pull request
License
Apache License v2.0
Author