Documentation ¶
Index ¶
- Constants
- func DisableImages() func(*jsOptions)
- func Headfull() func(*jsOptions)
- func WithCache(cacheType, cachePath string) func(*Config) error
- func WithConcurrency(concurrency int) func(*Config) error
- func WithExitOnInactivity(duration time.Duration) func(*Config) error
- func WithInitJob(job scrapezilla.IJob) func(*Config) error
- func WithJS(opts ...func(*jsOptions)) func(*Config) error
- func WithProvider(provider scrapezilla.JobProvider) func(*Config) error
- func WithProxies(proxies []string) func(*Config) error
- func WithStealth(browser string) func(*Config) error
- type Config
- type ScrapezillaApp
Constants ¶
View Source
const ( DefaultConcurrency = 1 DefaultProvider = "memory" )
Variables ¶
This section is empty.
Functions ¶
func DisableImages ¶
func DisableImages() func(*jsOptions)
func Headfull ¶
func Headfull() func(*jsOptions)
Headfull is a helper function to create a headfull browser. Use it as a parameter to WithJS.
func WithConcurrency ¶
WithConcurrency sets the concurrency of the app.
func WithExitOnInactivity ¶
WithExitOnInactivity sets the duration after which the app will exit if there are no more jobs to run.
func WithInitJob ¶
func WithInitJob(job scrapezilla.IJob) func(*Config) error
WithInitJob sets the initial job of the app.
func WithProvider ¶
func WithProvider(provider scrapezilla.JobProvider) func(*Config) error
WithProvider sets the provider of the app.
func WithProxies ¶
WithProxies sets the proxies of the app.
func WithStealth ¶
Types ¶
type Config ¶
type Config struct { // Concurrency is the number of concurrent scrapers to run. // If not set, it defaults to 1. Concurrency int `validate:"required,gte=1"` // Cache is the cache to use for storing scraped data. // If left empty then no caching will be used. // Otherwise the CacheType must be one of file or leveldb. CacheType string `validate:"omitempty,oneof=file leveldb"` // CachePath is the path to the cache file or directory. // It is required to be a valid path if CacheType is set. CachePath string `validate:"required_with=CacheType"` // UseJS is whether to use JavaScript to render the page. UseJS bool `validate:"omitempty"` // UseStealth is whether to use stealth mode to scrape the page. // uses a special http client to scrape the page. UseStealth bool `validate:"omitempty"` // StealthBrowser is the browser to use for stealth mode. StealthBrowser string `validate:"omitempty"` // JSOpts are the options for the JavaScript renderer. JSOpts jsOptions // ProviderType is the type of provider to use. // It is required to be a valid type if Provider is set. // If not set the memory provider will be used. Provider scrapezilla.JobProvider // Writers are the writers to use for writing the results. // At least one writer must be provided. Writers []scrapezilla.ResultWriter `validate:"required,gt=0"` // InitJob is the job to initialize the app with. InitJob scrapezilla.IJob // ExitOnInactivityDuration is whether to exit the app when there are no more jobs to run. ExitOnInactivityDuration time.Duration // Proxies are the proxies to use for the app. Proxies []string }
func NewConfig ¶
func NewConfig(writers []scrapezilla.ResultWriter, options ...func(*Config) error) (*Config, error)
NewConfig creates a new config with default values.
type ScrapezillaApp ¶ added in v0.0.4
type ScrapezillaApp struct {
// contains filtered or unexported fields
}
func NewScrapezillaApp ¶ added in v0.0.3
func NewScrapezillaApp(cfg *Config) (*ScrapezillaApp, error)
NewScrapezillaApp creates a new ScrapezillaApp.
func (*ScrapezillaApp) Close ¶ added in v0.0.4
func (app *ScrapezillaApp) Close() error
Close closes the app.
func (*ScrapezillaApp) Start ¶ added in v0.0.4
func (app *ScrapezillaApp) Start(ctx context.Context, seedJobs ...scrapezilla.IJob) error
Start starts the app.
Click to show internal directories.
Click to hide internal directories.