Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var DefaultOptions = &Options{ client: http.DefaultClient, limitDuration: 5 * time.Second, timeoutDuration: 1 * time.Minute, burst: 1, }
Default options to be used with a `Fetcher` instance
Functions ¶
This section is empty.
Types ¶
type Fetchable ¶
type Fetchable interface { // Unique identifier for this fetchable item. This is useful in logging. Id() string // Build a request. Request() (*http.Request, error) // Validate the request before doing the actual fetch. This is useful for // example, to check if the store has already fetched the data recently. Validate() error // Callback to handle the http response corresponding to the request. This // can be used for example, to store data into the store, or to parse the // results in some way HandleResponse(*http.Response) error }
Interface that defines what can be `fetched`. The request to be fetched is returned by `Request()` method. Before the actual fetching is performed, the `Validate()` method is called. Fetching only proceeds if that method returns a `nil` error. Finally, `HandleResponse()` is the callback when crawling is successful.
type Fetcher ¶
type Fetcher struct {
// contains filtered or unexported fields
}
Fetcher struct used to download
func NewFetcherWithOptions ¶
Returns a `Fetcher` with specified options. If any fields of the option are equal to the zero value, we use the value from `DefaultOptions` instead. This allows a caller to specify only the changed options
func (*Fetcher) Fetch ¶
Performs the actual fetch of a given `Fetchable`. The steps it follows are:
- Build the request by calling `Request()`
- Validate the request by calling `Validate()`
- Wait until the rate limit allows the domain to be crawled, or options.timeoutDuration is exceeded
- Actually make the http request with the supplied client, calling `HandleResponse()` on the output
func (*Fetcher) FetchConcurrentlyWait ¶
Starts `concurrency` goroutines to fetch content from `urlChannel` in parallel. The goroutines end when the `urlChannel` is closed. This method waits until all the launched goroutines are complete.
Note: Please ensure you call `close()` on the `urlChannel`, or else this method will never return