hybrid

package
v0.0.0-...-24f6000 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 20, 2023 License: MIT Imports: 28 Imported by: 0

Documentation

Overview

Package hybrid implements the functionality for a hybrid-headless crawler. It uses both headless browser and net/http for making requests, and goquery for processing rawand dom-rendered web page HTML.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FetchContinueRequest

func FetchContinueRequest(page *rod.Page, e *proto.FetchRequestPaused) error

FetchContinueRequest continue request

func FetchGetResponseBody

func FetchGetResponseBody(page *rod.Page, e *proto.FetchRequestPaused) ([]byte, error)

FetchGetResponseBody get request body.

Types

type Crawler

type Crawler struct {
	*common.Shared
	// contains filtered or unexported fields
}

Crawler is a standard crawler instance

func New

func New(options *types.CrawlerOptions) (*Crawler, error)

New returns a new standard crawler instance

func (*Crawler) Close

func (c *Crawler) Close() error

Close closes the crawler process

func (*Crawler) Crawl

func (c *Crawler) Crawl(rootURL string) error

Crawl crawls a URL with the specified options

type Hijack

type Hijack struct {
	// contains filtered or unexported fields
}

Hijack is a hijack handler

func NewHijack

func NewHijack(page *rod.Page) *Hijack

NewHijack create hijack from page.

func (*Hijack) SetPattern

func (h *Hijack) SetPattern(pattern *proto.FetchRequestPattern)

SetPattern set pattern directly

func (*Hijack) Start

func (h *Hijack) Start(handler HijackHandler) func() error

Start hijack.

func (*Hijack) Stop

func (h *Hijack) Stop() error

Stop

type HijackHandler

type HijackHandler = func(e *proto.FetchRequestPaused) error

HijackHandler type

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL