Documentation ¶
Index ¶
- Constants
- Variables
- type Cache
- func (c Cache) GetScraper(scraperID string) *models.Scraper
- func (c Cache) ListScrapers(tys []models.ScrapeContentType) []*models.Scraper
- func (c *Cache) ReloadScrapers() error
- func (c Cache) ScrapeFragment(ctx context.Context, id string, input Input) (models.ScrapedContent, error)
- func (c Cache) ScrapeID(ctx context.Context, scraperID string, id int, ty models.ScrapeContentType) (models.ScrapedContent, error)
- func (c Cache) ScrapeName(ctx context.Context, id, query string, ty models.ScrapeContentType) ([]models.ScrapedContent, error)
- func (c Cache) ScrapeURL(ctx context.Context, url string, ty models.ScrapeContentType) (models.ScrapedContent, error)
- type GlobalConfig
- type Input
- type QueryType
Constants ¶
const FreeonesScraperID = "builtin_freeones"
FreeonesScraperID is the scraper ID for the built-in Freeones scraper
Variables ¶
var ( // ErrMaxRedirects is returned if the max number of HTTP redirects are reached. ErrMaxRedirects = errors.New("maximum number of HTTP redirects reached") // ErrNotFound is returned when an entity isn't found ErrNotFound = errors.New("scraper not found") // ErrNotSupported is returned when a given invocation isn't supported, and there // is a guard function which should be able to guard against it. ErrNotSupported = errors.New("scraper operation not supported") )
var ErrScraperScript = errors.New("scraper script error")
Functions ¶
This section is empty.
Types ¶
type Cache ¶ added in v0.3.0
type Cache struct {
// contains filtered or unexported fields
}
Cache stores the database of scrapers
func NewCache ¶ added in v0.3.0
func NewCache(globalConfig GlobalConfig, txnManager models.TransactionManager) (*Cache, error)
NewCache returns a new Cache loading scraper configurations from the scraper path provided in the global config object. It returns a new instance and an error if the scraper directory could not be loaded.
Scraper configurations are loaded from yml files in the provided scrapers directory and any subdirectories.
func (Cache) GetScraper ¶ added in v0.11.0
GetScraper returns the scraper matching the provided id.
func (Cache) ListScrapers ¶ added in v0.12.0
func (c Cache) ListScrapers(tys []models.ScrapeContentType) []*models.Scraper
ListScrapers lists scrapers matching one of the given types. Returns a list of scrapers, sorted by their ID.
func (*Cache) ReloadScrapers ¶ added in v0.3.0
ReloadScrapers clears the scraper cache and reloads from the scraper path. In the event of an error during loading, the cache will be left empty.
func (Cache) ScrapeFragment ¶ added in v0.12.0
func (c Cache) ScrapeFragment(ctx context.Context, id string, input Input) (models.ScrapedContent, error)
ScrapeFragment uses the given fragment input to scrape
func (Cache) ScrapeID ¶ added in v0.12.0
func (c Cache) ScrapeID(ctx context.Context, scraperID string, id int, ty models.ScrapeContentType) (models.ScrapedContent, error)
func (Cache) ScrapeName ¶ added in v0.12.0
func (c Cache) ScrapeName(ctx context.Context, id, query string, ty models.ScrapeContentType) ([]models.ScrapedContent, error)
func (Cache) ScrapeURL ¶ added in v0.12.0
func (c Cache) ScrapeURL(ctx context.Context, url string, ty models.ScrapeContentType) (models.ScrapedContent, error)
ScrapeURL scrapes a given url for the given content. Searches the scraper cache and picks the first scraper capable of scraping the given url into the desired content. Returns the scraped content or an error if the scrape fails.
type GlobalConfig ¶ added in v0.3.0
type GlobalConfig interface { GetScraperUserAgent() string GetScrapersPath() string GetScraperCDPPath() string GetScraperCertCheck() bool GetPythonPath() string }
GlobalConfig contains the global scraper options.
type Input ¶ added in v0.12.0
type Input struct { Performer *models.ScrapedPerformerInput Scene *models.ScrapedSceneInput Gallery *models.ScrapedGalleryInput }
Input coalesces inputs of different types into a single structure. The system expects one of these to be set, and the remaining to be set to nil.