Documentation ¶
Index ¶
- Constants
- Variables
- func DefaultRoundTripper() http.RoundTripper
- func DefaultTemplateFuncMap(cache ski.Cache) template.FuncMap
- func DoByte(fetch ski.Fetch, req *http.Request) ([]byte, error)
- func DoString(fetch ski.Fetch, req *http.Request) (string, error)
- func NewFetch(opt Options) ski.Fetch
- func NewRequest(method, u string, body any, headers map[string]string) (*http.Request, error)
- func NewTemplateRequest(tpl *template.Template, arg any) (*http.Request, error)
- func ProxyFromRequest(req *http.Request) (*url.URL, error)
- func ReadRequest(request string) (req *http.Request, err error)
- func WithRoundRobinProxy(ctx context.Context, proxy ...string) context.Context
- type CacheTransport
- type Options
- type Policy
Constants ¶
const ( // DefaultMaxBodySize fetch.Response default max body size DefaultMaxBodySize int64 = 1024 * 1024 * 1024 // DefaultRetryTimes fetch.RequestConfig retry times DefaultRetryTimes = 3 // DefaultTimeout fetch.RequestConfig timeout DefaultTimeout = time.Minute )
Variables ¶
var ( // DefaultRetryHTTPCodes retry fetch.RequestConfig error status code DefaultRetryHTTPCodes = []int{http.StatusInternalServerError, http.StatusBadGateway, http.StatusServiceUnavailable, http.StatusGatewayTimeout, http.StatusRequestTimeout} // DefaultHeaders defaults http headers DefaultHeaders = http.Header{ "Accept": {"*/*"}, "Accept-Language": {"en-US,en;"}, "User-Agent": {"ski"}, } )
var ErrNoDateHeader = errors.New("no Date header")
ErrNoDateHeader indicates that the HTTP headers contained no Date header.
Functions ¶
func DefaultRoundTripper ¶
func DefaultRoundTripper() http.RoundTripper
DefaultRoundTripper the fetch default RoundTripper
func DefaultTemplateFuncMap ¶
DefaultTemplateFuncMap The default template function map
func NewRequest ¶
NewRequest returns a new RequestConfig given a method, URL, optional body, optional headers. Body type: slice, map, struct, string, []byte, io.Reader, fmt.Stringer
func NewTemplateRequest ¶
NewTemplateRequest returns a new Request given a http template with argument.
func ProxyFromRequest ¶
ProxyFromRequest returns a proxy URL on request context.
func ReadRequest ¶
ReadRequest returns a new RequestConfig given a http template with argument.
Types ¶
type CacheTransport ¶
type CacheTransport struct { Policy Policy // The RoundTripper interface actually used to make requests // If nil, http.DefaultTransport is used Transport http.RoundTripper Cache ski.Cache // If true, responses returned from the cache will be given an extra header, X-From-Cache MarkCachedResponses bool }
CacheTransport is an implementation of http.RoundTripper that will return values from a cache where possible (avoiding a network request) and will additionally add validators (etag/if-modified-since) to repeated requests allowing servers to return 304 / Not Modified
func NewCacheTransport ¶
func NewCacheTransport(c ski.Cache) *CacheTransport
NewCacheTransport returns new CacheTransport with the provided Cache implementation and MarkCachedResponses set to true
func (*CacheTransport) RoundTrip ¶
RoundTrip is a wrapper for caching requests. If there is a fresh Response already in cache, then it will be returned without connecting to the server.
func (*CacheTransport) RoundTripDummy ¶
RoundTripDummy has no awareness of any HTTP Cache-Control directives. Every request and its corresponding response are cached. When the same request is seen again, the response is returned without transferring anything from the Internet.
func (*CacheTransport) RoundTripRFC2616 ¶
RoundTripRFC2616 provides a RFC2616 compliant HTTP cache, i.e. with HTTP Cache-Control awareness, aimed at production and used in continuous runs to avoid downloading unmodified data (to save bandwidth and speed up crawls).
If there is a stale Response, then any validators it contains will be set on the new request to give the server a chance to respond with NotModified. If this happens, then the cached Response will be returned.
type Options ¶
type Options struct { CharsetAutoDetect bool `yaml:"charset-auto-detect"` MaxBodySize int64 `yaml:"max-body-size"` RetryTimes int `yaml:"retry-times"` // greater than or equal 0 RetryHTTPCodes []int `yaml:"retry-http-codes"` Timeout time.Duration `yaml:"timeout"` Headers http.Header `yaml:"headers"` RoundTripper http.RoundTripper `yaml:"-"` Jar http.CookieJar `yaml:"-"` }
Options The fetchImpl instance options
type Policy ¶
type Policy string
Policy has no awareness of any HTTP Cache-Control directives.
const ( // Dummy policy is useful for testing spiders faster (without having to wait for downloads every time) // and for trying your spider offline, when an Internet connection is not available. // The goal is to be able to “replay” a spider run exactly as it ran before. Dummy Policy = "dummy" // RFC2616 This policy provides a RFC2616 compliant HTTP cache, i.e. with HTTP Cache-Control awareness, // aimed at production and used in continuous runs to avoid downloading unmodified data // (to save bandwidth and speed up crawls). RFC2616 Policy = "rfc2616" // XFromCache is the header added to responses that are returned from the cache XFromCache = "X-From-Cache" )