client

package
v1.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 2, 2024 License: MPL-2.0 Imports: 26 Imported by: 0

Documentation

Index

Constants

View Source
const (
	DefaultUserAgent        = "Geziyor 2.0"
	DefaultMaxBody    int64 = 1024 * 1024 * 1024 // 1GB
	DefaultRetryTimes       = 2
)

Default values for client

Variables

View Source
var (
	DefaultRetryHTTPCodes = []int{500, 502, 503, 504, 522, 524, 408}
)
View Source
var (
	// ErrNoCookieJar is the error type for missing cookie jar
	ErrNoCookieJar = errors.New("cookie jar is not available")
)

Functions

func Chrome126HelloSpec added in v1.1.6

func Chrome126HelloSpec() (utls.ClientHelloSpec, error)

func ConvertHeaderToMap

func ConvertHeaderToMap(header http.Header) map[string]interface{}

ConvertHeaderToMap converts http.Header to map[string]interface{}

func ConvertMapToHeader

func ConvertMapToHeader(m map[string]interface{}) http.Header

ConvertMapToHeader converts map[string]interface{} to http.Header

func NewRedirectionHandler

func NewRedirectionHandler(maxRedirect int) func(req *http.Request, via []*http.Request) error

NewRedirectionHandler returns maximum allowed redirection function with provided maxRedirect

func RoundRobinProxy

func RoundRobinProxy(proxyURLs ...string) func(*http.Request) (*url.URL, error)

RoundRobinProxy creates a proxy switcher function which rotates ProxyURLs on every request. The proxy type is determined by the URL scheme. "http", "https" and "socks5" are supported. If the scheme is empty, "http" is assumed.

func SetDefaultHeader

func SetDefaultHeader(header http.Header, key string, value string) http.Header

SetDefaultHeader sets header if not exists before

Types

type Client

type Client struct {
	*req.Client

	Histogram cmap.ConcurrentMap[string, int]
	// contains filtered or unexported fields
}

Client is a small wrapper around *http.Client to provide new methods.

func NewClient

func NewClient(opt *Options) *Client

NewClient creates http.Client with modified values for typical web scraper

func (*Client) Cookies

func (c *Client) Cookies(URL string) []*http.Cookie

Cookies returns the cookies to send in a request for the given URL.

func (*Client) DoRequest

func (c *Client) DoRequest(req *Request) (resp *Response, err error)

DoRequest selects appropriate request handler, client or Chrome

func (*Client) SetClientCookies added in v1.0.9

func (c *Client) SetClientCookies(URL string, cookies []*http.Cookie) error

SetCookies handles the receipt of the cookies in a reply for the given URL

type ClientRequestMiddleware

type ClientRequestMiddleware interface {
	BeforeRequest(req *http.Request)
}

type MyClient added in v1.1.4

type MyClient struct {
	*req.Client
}

func NewMyClient added in v1.1.4

func NewMyClient() *MyClient

func (*MyClient) ImpersonateChrome120 added in v1.1.4

func (c *MyClient) ImpersonateChrome120() *MyClient

func (*MyClient) SetTLSFingerprintFromSpec added in v1.1.6

func (c *MyClient) SetTLSFingerprintFromSpec(clientHelloSpec utls.ClientHelloSpec) *MyClient

type Options

type Options struct {
	MaxBodySize           int64
	CharsetDetectDisabled bool
	RetryTimes            int
	RetryHTTPCodes        []int
	RemoteAllocatorURL    string
	AllocatorOptions      []chromedp.ExecAllocatorOption
	ProxyFunc             func(*http.Request) (*url.URL, error)
	// Changing this will override the existing default PreActions for Rendered requests.
	// Geziyor Response will be nearly empty. Because we have no way to extract response without default pre actions.
	// So, if you set this, you should handle all navigation, header setting, and response handling yourself.
	// See defaultPreActions variable for the existing defaults.
	PreActions        []chromedp.Action
	RequestMiddleware []ClientRequestMiddleware
}

Options is custom http.client options

type ProxyURLKey

type ProxyURLKey int

type Request

type Request struct {
	*http.Request

	// Meta contains arbitrary data.
	// Use this Meta map to store contextual data between your requests
	Meta map[string]interface{}

	// If true, requests will be synchronized
	Synchronized bool

	// If true request will be opened in Chrome and
	// fully rendered HTML DOM response will returned as response
	Rendered bool

	// Optional response body encoding. Leave empty for automatic detection.
	// If you're having issues with auto detection, set this.
	Encoding string

	// Set this true to cancel requests. Should be used on middlewares.
	Cancelled bool

	// Chrome actions to be run if the request is Rendered
	Actions []chromedp.Action
	// contains filtered or unexported fields
}

Request is a small wrapper around *http.Request that contains Metadata and Rendering option

func NewRequest

func NewRequest(method, url string, body io.Reader) (*Request, error)

NewRequest returns a new Request given a method, URL, and optional body.

func (*Request) Cancel

func (r *Request) Cancel()

Cancel request

type Response

type Response struct {
	*http.Response

	// Response body
	Body []byte

	// Goquery Document object. If response IsHTML, its non-nil.
	HTMLDoc *goquery.Document

	Request *Request
}

Response type wraps http.Response Contains parsed response data and Geziyor functions.

func (*Response) IsHTML

func (r *Response) IsHTML() bool

IsHTML checks if response content is HTML by looking content-type header

func (*Response) JoinURL

func (r *Response) JoinURL(relativeURL string) string

JoinURL joins base response URL and provided relative URL. DEPRECATED: Use response.Request.URL.Parse(relativeURL) instead.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL