remilia

package module
v0.5.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 30, 2024 License: MIT Imports: 20 Imported by: 0

README

Remilia

GitHub license

Remilia is a high-performance web scraping framework designed for efficiency. It enables users to concentrate on extracting and utilizing web content, delegating the complexity of web scraping processes to the framework.

Features

  • Clean API & elegant mental model
  • Concurrency supporting
  • Configurable backoff retry algorithm
  • Pre-request & post-response hooks supporting

Example

titleParser := func(in *goquery.Document, put remilia.Put[string]) {
    in.Find("h1").Each(func(i int, s *goquery.Selection) {
        fmt.Println(s.Text())
    })
}

rem, _ := remilia.New()
err := rem.Do(
    rem.Just("https://go.dev/"),
    rem.Unit(titleParser),
)
if err != nil {
    fmt.Println("Error: ", err)
}

Install

go get -u github.com/ShroXd/remilia

License

This project is licensed under the MIT License. See the LICENSE file for details.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WithWorkLinearAttempt added in v0.5.4

func WithWorkLinearAttempt(a uint8) exponentialBackoffOptionFunc

func WithWorkMaxAttempt added in v0.5.4

func WithWorkMaxAttempt(a uint8) exponentialBackoffOptionFunc

func WithWorkMaxDelay added in v0.5.4

func WithWorkMaxDelay(d time.Duration) exponentialBackoffOptionFunc

func WithWorkMinDelay added in v0.5.4

func WithWorkMinDelay(d time.Duration) exponentialBackoffOptionFunc

func WithWorkMultiplier added in v0.5.4

func WithWorkMultiplier(m float64) exponentialBackoffOptionFunc

Types

type Bucket added in v0.5.5

type Bucket struct {
	// contains filtered or unexported fields
}

func NewBucket added in v0.5.5

func NewBucket(clock Clock, capacity int64, fillInterval time.Duration, fillQuantum int64, availableTokens int64) *Bucket

TODO: fuck these params' type, we should use time instead of fucking int64

func (*Bucket) Take added in v0.5.5

func (b *Bucket) Take(count int64) time.Duration

TODO: use tick instead of time

func (*Bucket) Wrap added in v0.5.5

func (b *Bucket) Wrap(op func() error) ExecutableFunc

type Client added in v0.2.0

type Client struct {
	// contains filtered or unexported fields
}

type ClientOptionFunc added in v0.3.0

type ClientOptionFunc optionFunc[*Client]

TODO: is this a good preactice to mixin otps for network request and custom functionality?

func WithBaseURL added in v0.3.0

func WithBaseURL(url string) ClientOptionFunc

func WithHeaders added in v0.3.0

func WithHeaders(headers map[string]string) ClientOptionFunc

func WithLinearAttempt added in v0.3.0

func WithLinearAttempt(a uint8) ClientOptionFunc

func WithMaxAttempt added in v0.3.0

func WithMaxAttempt(a uint8) ClientOptionFunc

func WithMaxDelay added in v0.3.0

func WithMaxDelay(d time.Duration) ClientOptionFunc

func WithMinDelay added in v0.3.0

func WithMinDelay(d time.Duration) ClientOptionFunc

func WithMultiplier added in v0.3.0

func WithMultiplier(m float64) ClientOptionFunc

func WithPostResponseHooks added in v0.3.0

func WithPostResponseHooks(hooks ...ResponseHook) ClientOptionFunc

func WithPreRequestHooks added in v0.3.0

func WithPreRequestHooks(hooks ...RequestHook) ClientOptionFunc

func WithTimeout added in v0.3.0

func WithTimeout(timeout time.Duration) ClientOptionFunc

func WithTransformer added in v0.5.0

func WithTransformer(transformer transform.Transformer) ClientOptionFunc

func WithUserAgentGenerator added in v0.5.2

func WithUserAgentGenerator(fn func() string) ClientOptionFunc

type Clock added in v0.5.5

type Clock interface {
	Now() time.Time
	Sleep(d time.Duration)
}

type ExecutableFunc added in v0.5.5

type ExecutableFunc func() error

type Get added in v0.2.0

type Get[T any] func() (T, bool)

type LayerFunc added in v0.5.5

type LayerFunc func(in *goquery.Document, put Put[string])

type Limiter added in v0.5.5

type Limiter interface {
	Take() (bool, time.Duration)
}

type Logger added in v0.2.0

type Logger interface {
	Debug(msg string, context ...logContext)
	Info(msg string, context ...logContext)
	Warn(msg string, context ...logContext)
	Error(msg string, context ...logContext)
	Panic(msg string, context ...logContext)
}

type Put added in v0.2.0

type Put[T any] func(T)

type Remilia

type Remilia struct {
	ID   string
	Name string
	// contains filtered or unexported fields
}

func New

func New(opts ...RemiliaOptionFunc) (*Remilia, error)

func (*Remilia) AddLayer added in v0.5.5

func (r *Remilia) AddLayer(fn LayerFunc, opts ...StageOptionFunc) actionLayerDef[*Request]

func (*Remilia) Do added in v0.2.0

func (r *Remilia) Do(pd providerDef[*Request], stageDefs ...actionLayerDef[*Request]) error

func (*Remilia) URLProvider added in v0.5.5

func (r *Remilia) URLProvider(urlStr string) providerDef[*Request]

type RemiliaOptionFunc added in v0.5.5

type RemiliaOptionFunc func(*Remilia)

func WithClientOptions added in v0.5.0

func WithClientOptions(opts ...ClientOptionFunc) RemiliaOptionFunc

func WithLayerOptions added in v0.5.5

func WithLayerOptions(opts ...StageOptionFunc) RemiliaOptionFunc

func WithLogger added in v0.2.0

func WithLogger(logger Logger) RemiliaOptionFunc

type Request added in v0.2.0

type Request struct {
	Method      []byte
	URL         []byte
	Headers     *fasthttp.Args
	Body        []byte
	QueryParams *fasthttp.Args
}

type RequestHook added in v0.2.0

type RequestHook func(*Request) error

type Response added in v0.2.0

type Response struct {
	// contains filtered or unexported fields
}

type ResponseHook added in v0.2.0

type ResponseHook func(*Response) error

type StageOptionFunc added in v0.5.5

type StageOptionFunc optionFunc[*stageOptions]

func WithConcurrency added in v0.5.0

func WithConcurrency(concurrency uint) StageOptionFunc

func WithInputBufferSize added in v0.5.0

func WithInputBufferSize(size uint) StageOptionFunc

Directories

Path Synopsis
cmd
dev

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL