scraper

package
v1.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 30, 2024 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CollyScraper

type CollyScraper struct {
	Collector             *colly.Collector
	Transport             *http.Transport
	Response              *http.Response
	TimeoutSeconds        int
	LoadingTimeoutSeconds int
	UserAgent             string

	Silently bool
	// contains filtered or unexported fields
}

func (*CollyScraper) CanRenderPage

func (s *CollyScraper) CanRenderPage() bool

func (*CollyScraper) EvalJS

func (s *CollyScraper) EvalJS(jsProp string) (*string, error)

Colly cannot eval JS

func (*CollyScraper) Init

func (s *CollyScraper) Init() error

func (*CollyScraper) Scrape

func (s *CollyScraper) Scrape(paramURL string) (*ScrapedData, error)

func (*CollyScraper) SetDepth

func (s *CollyScraper) SetDepth(depth int)

type GoWapTransport

type GoWapTransport struct {
	*http.Transport
	// contains filtered or unexported fields
}

func NewGoWapTransport

func NewGoWapTransport(t *http.Transport, f func(resp *http.Response)) *GoWapTransport

func (*GoWapTransport) RoundTrip

func (gt *GoWapTransport) RoundTrip(req *http.Request) (*http.Response, error)

type RodScraper

type RodScraper struct {
	Browser               *rod.Browser
	Page                  *rod.Page
	TimeoutSeconds        int
	LoadingTimeoutSeconds int
	UserAgent             string

	Silently bool
	// contains filtered or unexported fields
}

func (*RodScraper) CanRenderPage

func (s *RodScraper) CanRenderPage() bool

func (*RodScraper) EvalJS

func (s *RodScraper) EvalJS(jsProp string) (*string, error)

func (*RodScraper) Init

func (s *RodScraper) Init() error

func (*RodScraper) Scrape

func (s *RodScraper) Scrape(paramURL string) (*ScrapedData, error)

func (*RodScraper) SetDepth

func (s *RodScraper) SetDepth(depth int)

type ScrapedData

type ScrapedData struct {
	URLs       ScrapedURL
	HTML       string
	Headers    map[string][]string
	Scripts    []string
	Cookies    map[string]string
	Meta       map[string][]string
	DNS        map[string][]string
	CertIssuer []string
}

type ScrapedURL

type ScrapedURL struct {
	URL    string `json:"url,omitempty"`
	Status int    `json:"status,omitempty"`
}

type Scraper

type Scraper interface {
	Init() error
	CanRenderPage() bool
	Scrape(paramURL string) (*ScrapedData, error)
	EvalJS(jsProp string) (*string, error)
	SetDepth(depth int)
}

Scraper is an interface for different scrapping brower (colly, rod)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL