fetch

package module
v0.0.0-...-6820e3e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2025 License: MIT Imports: 26 Imported by: 0

README

Fetch

GitHub go.mod Go version Go Report Card GitHub
Fetch is the ski.fetch implement for fetching resource from the network.
Support:

  • TLS fingerprinting resistance
  • HTTP2 fingerprinting resistance

License

Distributed under the MIT license.

Documentation

Index

Constants

View Source
const (
	// DefaultMaxBodySize fetch.Response default max body size
	DefaultMaxBodySize int64 = 1024 * 1024 * 1024
	// DefaultRetryTimes fetch.RequestConfig retry times
	DefaultRetryTimes = 3
	// DefaultTimeout fetch.RequestConfig timeout
	DefaultTimeout = time.Minute
)

Variables

View Source
var (
	// DefaultRetryHTTPCodes retry fetch.RequestConfig error status code
	DefaultRetryHTTPCodes = []int{http.StatusInternalServerError, http.StatusBadGateway, http.StatusServiceUnavailable,
		http.StatusGatewayTimeout, http.StatusRequestTimeout}
	// DefaultHeaders defaults http headers
	DefaultHeaders = http.Header{
		"Accept":          {"*/*"},
		"Accept-Language": {"en-US,en;"},
		"User-Agent":      {"ski"},
	}
)
View Source
var ErrNoDateHeader = errors.New("no Date header")

ErrNoDateHeader indicates that the HTTP headers contained no Date header.

Functions

func DecodeResponse

func DecodeResponse(res *http.Response) (*http.Response, error)

DecodeResponse decode Content-Encoding from HTTP header (gzip, deflate, br) encodings.

func DefaultRoundTripper

func DefaultRoundTripper() http.RoundTripper

DefaultRoundTripper the fetch default RoundTripper

func NewRequest

func NewRequest(method, u string, body any, headers map[string]string) (*http.Request, error)

NewRequest returns a new RequestConfig given a method, URL, optional body, optional headers. Body type: slice, map, struct, string, []byte, io.Reader, fmt.Stringer

func NewTemplateRequest

func NewTemplateRequest(tpl *template.Template, arg any) (*http.Request, error)

NewTemplateRequest returns a new Request given a http template with argument.

func ProxyFromRequest

func ProxyFromRequest(req *http.Request) (*url.URL, error)

ProxyFromRequest returns a proxy URL on request context.

func ReadRequest

func ReadRequest(request string) (req *http.Request, err error)

ReadRequest returns a new RequestConfig given a http template with argument.

func WithRoundRobinProxy

func WithRoundRobinProxy(ctx context.Context, proxy ...string) context.Context

WithRoundRobinProxy returns a copy of parent context in which the proxies associated with context.

Types

type Cache

type Cache interface {
	Get(ctx context.Context, key string) ([]byte, error)
	Set(ctx context.Context, key string, value []byte, timeout time.Duration) error
	Del(ctx context.Context, key string) error
}

A Cache interface is used to store bytes.

type CacheTransport

type CacheTransport struct {
	Policy Policy
	// The RoundTripper interface actually used to make requests
	// If nil, http.DefaultTransport is used
	Transport http.RoundTripper
	Cache     Cache
	// If true, responses returned from the cache will be given an extra header, X-From-Cache
	MarkCachedResponses bool
}

CacheTransport is an implementation of http.RoundTripper that will return values from a cache where possible (avoiding a network request) and will additionally add validators (etag/if-modified-since) to repeated requests allowing servers to return 304 / Not Modified

func NewCacheTransport

func NewCacheTransport(c Cache) *CacheTransport

NewCacheTransport returns new CacheTransport with the provided Cache implementation and MarkCachedResponses set to true

func (*CacheTransport) RoundTrip

func (t *CacheTransport) RoundTrip(req *http.Request) (resp *http.Response, err error)

RoundTrip is a wrapper for caching requests. If there is a fresh Response already in cache, then it will be returned without connecting to the server.

func (*CacheTransport) RoundTripDummy

func (t *CacheTransport) RoundTripDummy(req *http.Request) (resp *http.Response, err error)

RoundTripDummy has no awareness of any HTTP Cache-Control directives. Every request and its corresponding response are cached. When the same request is seen again, the response is returned without transferring anything from the Internet.

func (*CacheTransport) RoundTripRFC2616

func (t *CacheTransport) RoundTripRFC2616(req *http.Request) (resp *http.Response, err error)

RoundTripRFC2616 provides a RFC2616 compliant HTTP cache, i.e. with HTTP Cache-Control awareness, aimed at production and used in continuous runs to avoid downloading unmodified data (to save bandwidth and speed up crawls).

If there is a stale Response, then any validators it contains will be set on the new request to give the server a chance to respond with NotModified. If this happens, then the cached Response will be returned.

type Decoder

type Decoder http.Transport

Decoder decode Content-Encoding from HTTP header (gzip, deflate, br) encodings.

func (*Decoder) RoundTrip

func (t *Decoder) RoundTrip(req *http.Request) (*http.Response, error)

type Fetch

type Fetch struct {
	*http.Client
	// contains filtered or unexported fields
}

func NewFetch

func NewFetch(opt Options) *Fetch

NewFetch returns a new ski.Fetch instance

func (*Fetch) Do

func (f *Fetch) Do(req *http.Request) (res *http.Response, err error)

Do sends an HTTP request and returns an HTTP response, following policy (such as redirects, cookies, auth) as configured on the client.

type Options

type Options struct {
	CharsetAutoDetect bool              `yaml:"charset-auto-detect"`
	MaxBodySize       int64             `yaml:"max-body-size"`
	RetryTimes        int               `yaml:"retry-times"` // greater than or equal 0
	RetryHTTPCodes    []int             `yaml:"retry-http-codes"`
	Timeout           time.Duration     `yaml:"timeout"`
	Headers           http.Header       `yaml:"headers"`
	RoundTripper      http.RoundTripper `yaml:"-"`
	Jar               http.CookieJar    `yaml:"-"`
}

Options The Fetch instance options

type Policy

type Policy string

Policy has no awareness of any HTTP Cache-Control directives.

const (

	// Dummy policy is useful for testing spiders faster (without having to wait for downloads every time)
	// and for trying your spider offline, when an Internet connection is not available.
	// The goal is to be able to “replay” a spider run exactly as it ran before.
	Dummy Policy = "dummy"

	// RFC2616 This policy provides a RFC2616 compliant HTTP cache, i.e. with HTTP Cache-Control awareness,
	// aimed at production and used in continuous runs to avoid downloading unmodified data
	// (to save bandwidth and speed up crawls).
	RFC2616 Policy = "rfc2616"

	// XFromCache is the header added to responses that are returned from the cache
	XFromCache = "X-From-Cache"
)

Directories

Path Synopsis
Package http2 implements the HTTP/2 protocol.
Package http2 implements the HTTP/2 protocol.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL