Documentation
¶
Index ¶
- Variables
- type Crawler
- type Fetcher
- type FetcherOptions
- type Options
- type Queue
- type RedisQueue
- func (q *RedisQueue) Cleanup()
- func (q *RedisQueue) Dequeue() (url string, err error)
- func (q *RedisQueue) DoneURL(url string) error
- func (q *RedisQueue) Enqueue(urls ...string) error
- func (q *RedisQueue) FailedURLs() []string
- func (q *RedisQueue) Repaire() error
- func (q *RedisQueue) RetryURL(url string) error
- type Spider
- type SpiderFunc
- type SpiderMiddleware
- type URLFetcher
- type Wrapper
Constants ¶
This section is empty.
Variables ¶
View Source
var (
ErrRunning = errors.New("already running")
)
Error var
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
Crawler struct
func NewCrawler ¶
NewCrawler creates a new instance of Crawler.
func (*Crawler) RegisterSpider ¶
func (c *Crawler) RegisterSpider(spider Spider, ms ...SpiderMiddleware)
RegisterSpider registers spider and its middlewares.
type Fetcher ¶
Fetcher interface
func NewFetcher ¶
func NewFetcher(opts FetcherOptions) Fetcher
NewFetcher creates a new Fetcher instance.
type FetcherOptions ¶
FetcherOptions struct
type Queue ¶
type Queue interface { Enqueue(urls ...string) error Dequeue() (string, error) Repaire() error DoneURL(url string) error RetryURL(url string) error FailedURLs() []string Cleanup() }
Queue interface
type RedisQueue ¶
type RedisQueue struct { QueueReady string QueuePending string QueueDone string QueueFailed string // contains filtered or unexported fields }
RedisQueue is an redis-based implementation of Queue interface.
func NewRedisQueue ¶
func NewRedisQueue(url, password, prefix string) *RedisQueue
NewRedisQueue creates a new RedisQueue instance.
func (*RedisQueue) Cleanup ¶
func (q *RedisQueue) Cleanup()
func (*RedisQueue) Dequeue ¶
func (q *RedisQueue) Dequeue() (url string, err error)
func (*RedisQueue) DoneURL ¶
func (q *RedisQueue) DoneURL(url string) error
func (*RedisQueue) Enqueue ¶
func (q *RedisQueue) Enqueue(urls ...string) error
Enqueue adds urls into ready queue.
func (*RedisQueue) FailedURLs ¶
func (q *RedisQueue) FailedURLs() []string
func (*RedisQueue) Repaire ¶
func (q *RedisQueue) Repaire() error
func (*RedisQueue) RetryURL ¶
func (q *RedisQueue) RetryURL(url string) error
type Spider ¶
type Spider interface {
Parse(crawler *Crawler, url string, r io.Reader, err error) ([]string, error)
}
Spider interface.
func ReduceSpideMiddlewares ¶
func ReduceSpideMiddlewares(spider Spider, ms ...SpiderMiddleware) Spider
ReduceSpideMiddlewares merges multi SpiderMiddlewares and a spider into a new Spider.
type SpiderFunc ¶
SpiderFunc type Spider.
type URLFetcher ¶
type URLFetcher struct {
// contains filtered or unexported fields
}
URLFetcher struct
Click to show internal directories.
Click to hide internal directories.