Documentation ¶
Index ¶
- Constants
- func DisableImages() func(*jsOptions)
- func Headfull() func(*jsOptions)
- func WithCache(cacheType, cachePath string) func(*Config) error
- func WithConcurrency(concurrency int) func(*Config) error
- func WithExitOnInactivity(duration time.Duration) func(*Config) error
- func WithInitJob(job scrapemate.IJob) func(*Config) error
- func WithJS(opts ...func(*jsOptions)) func(*Config) error
- func WithProvider(provider scrapemate.JobProvider) func(*Config) error
- func WithProxies(proxies []string) func(*Config) error
- type Config
- type ScrapemateApp
Constants ¶
View Source
const ( DefaultConcurrency = 1 DefaultProvider = "memory" )
Variables ¶
This section is empty.
Functions ¶
func DisableImages ¶ added in v0.6.0
func DisableImages() func(*jsOptions)
func Headfull ¶ added in v0.2.2
func Headfull() func(*jsOptions)
Headfull is a helper function to create a headfull browser. Use it as a parameter to WithJS.
func WithConcurrency ¶
WithConcurrency sets the concurrency of the app.
func WithExitOnInactivity ¶ added in v0.5.0
WithExitOnInactivity sets the duration after which the app will exit if there are no more jobs to run.
func WithInitJob ¶ added in v0.4.3
func WithInitJob(job scrapemate.IJob) func(*Config) error
WithInitJob sets the initial job of the app.
func WithProvider ¶
func WithProvider(provider scrapemate.JobProvider) func(*Config) error
WithProvider sets the provider of the app.
func WithProxies ¶ added in v0.7.0
WithProxies sets the proxies of the app.
Types ¶
type Config ¶ added in v0.3.0
type Config struct { // Concurrency is the number of concurrent scrapers to run. // If not set, it defaults to 1. Concurrency int `validate:"required,gte=1"` // Cache is the cache to use for storing scraped data. // If left empty then no caching will be used. // Otherwise the CacheType must be one of file or leveldb. CacheType string `validate:"omitempty,oneof=file leveldb"` // CachePath is the path to the cache file or directory. // It is required to be a valid path if CacheType is set. CachePath string `validate:"required_with=CacheType"` // UseJS is whether to use JavaScript to render the page. UseJS bool `validate:"omitempty"` // JSOpts are the options for the JavaScript renderer. JSOpts jsOptions // ProviderType is the type of provider to use. // It is required to be a valid type if Provider is set. // If not set the memory provider will be used. Provider scrapemate.JobProvider // Writers are the writers to use for writing the results. // At least one writer must be provided. Writers []scrapemate.ResultWriter `validate:"required,gt=0"` // InitJob is the job to initialize the app with. InitJob scrapemate.IJob // ExitOnInactivityDuration is whether to exit the app when there are no more jobs to run. ExitOnInactivityDuration time.Duration // Proxies are the proxies to use for the app. Proxies []string }
func NewConfig ¶
func NewConfig(writers []scrapemate.ResultWriter, options ...func(*Config) error) (*Config, error)
NewConfig creates a new config with default values.
type ScrapemateApp ¶
type ScrapemateApp struct {
// contains filtered or unexported fields
}
func NewScrapeMateApp ¶
func NewScrapeMateApp(cfg *Config) (*ScrapemateApp, error)
NewScrapemateApp creates a new ScrapemateApp.
func (*ScrapemateApp) Start ¶
func (app *ScrapemateApp) Start(ctx context.Context, seedJobs ...scrapemate.IJob) error
Start starts the app.
Click to show internal directories.
Click to hide internal directories.