Documentation ¶
Index ¶
- Variables
- func NewStoreWrappedFetchable(fetchable Fetchable, store Store) *storeWrappedFetchable
- func ReaderToStringFetchable(ctx context.Context, reader io.Reader, channelBufferSize int) <-chan Fetchable
- type FetchI
- type Fetchable
- type Fetcher
- type Options
- type PebbleStore
- type Store
- type StoreBackedFetcher
- type StoringFetchable
- type StringFetchable
Constants ¶
This section is empty.
Variables ¶
var DefaultOptions = &Options{ client: http.DefaultClient, limitDuration: 5 * time.Second, timeoutDuration: 1 * time.Minute, burst: 1, }
Default options to be used with a `Fetcher` instance
Functions ¶
func NewStoreWrappedFetchable ¶ added in v0.1.1
Types ¶
type Fetchable ¶
type Fetchable interface { // Unique identifier for this fetchable item. This is useful in logging. Id() string // Url that this is trying to fetch. It can also be determined from // Request. Keeping it here to avoid repetition in the codebase. Url() string // Build a request. Request() (*http.Request, error) // Callback to handle the http response body corresponding to the request. // This can be used for example, to store data into the store, or to parse // the results in some way HandleResponseBody([]byte) error }
Interface that defines what can be `fetched`.
type Fetcher ¶
type Fetcher struct {
// contains filtered or unexported fields
}
Fetcher struct used to download
func NewFetcherWithOptions ¶
Returns a `Fetcher` with specified options. If any fields of the option are equal to the zero value, we use the value from `DefaultOptions` instead. This allows a caller to specify only the changed options
func (*Fetcher) Fetch ¶
Performs the actual fetch of a given `Fetchable`. The steps it follows are:
- Build the request by calling `Request()`
- Validate the request by calling `Validate()`
- Wait until the rate limit allows the domain to be crawled, or options.timeoutDuration is exceeded
- Actually make the http request with the supplied client, calling `HandleResponse()` on the output
func (*Fetcher) FetchConcurrentlyWait ¶ added in v0.1.2
Starts `concurrency` goroutines to fetch content from `urlChannel` in parallel. The goroutines end when the `urlChannel` is closed. This method waits until all the launched goroutines are complete.
Note: Please ensure you call `close()` on the `urlChannel`, or else this method will never return
type PebbleStore ¶
type PebbleStore struct {
// contains filtered or unexported fields
}
func NewPebbleStore ¶
func NewPebbleStore(dirname string) (*PebbleStore, error)
func (*PebbleStore) Close ¶
func (s *PebbleStore) Close()
func (*PebbleStore) Get ¶
func (s *PebbleStore) Get(key string) (*crawled_url.CrawledUrl, io.Closer, error)
type Store ¶
type Store interface { Get(key string) (*crawled_url.CrawledUrl, io.Closer, error) Set(key string, body []byte) error }
type StoreBackedFetcher ¶
type StoreBackedFetcher struct {
// contains filtered or unexported fields
}
func NewStoreBackedFetcher ¶
func NewStoreBackedFetcher(store Store, fetcher *Fetcher, minInterval time.Duration) *StoreBackedFetcher
func (*StoreBackedFetcher) Fetch ¶
func (sbf *StoreBackedFetcher) Fetch(furl StoringFetchable) error
func (*StoreBackedFetcher) FetchConcurrentlyWait ¶ added in v0.1.2
func (sbf *StoreBackedFetcher) FetchConcurrentlyWait(urlChannel <-chan StoringFetchable, concurrency int)
Starts `concurrency` goroutines to fetch content from `urlChannel` in parallel. The goroutines end when the `urlChannel` is closed. This method waits until all the launched goroutines are complete.
Note: Please ensure you call `close()` on the `urlChannel`, or else this method will never return
type StoringFetchable ¶
type StringFetchable ¶
type StringFetchable string
Wrapping a `string` (holding a url) into a `Fetchable`
func (StringFetchable) HandleResponseBody ¶
func (sf StringFetchable) HandleResponseBody(body []byte) error
Always returns nil.