Documentation ¶
Index ¶
- Variables
- func ExtractAuthor(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func ExtractBody(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func ExtractDate(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func FindContentPage(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func FindNextPage(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func GoToNextPage(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func LoadConfiguration(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func ReplaceStrings(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func StripTags(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- type Config
- type ConfigFolder
- type ConfigFolderList
- type FilterTest
Constants ¶
This section is empty.
Variables ¶
var DefaultConfigurationFolders = ConfigFolderList{
{siteConfigFS("custom"), "custom"},
{siteConfigFS("standard"), "standard"},
}
DefaultConfigurationFolders is a list of default locations with configuration files.
Functions ¶
func ExtractAuthor ¶
ExtractAuthor applies the "author" directives to find an author.
func ExtractBody ¶
ExtractBody tries to find a body as defined by the "body" directives in the configuration file.
func ExtractDate ¶
ExtractDate applies the "date" directives to find a date. If a date is found we try to parse it.
func FindContentPage ¶
FindContentPage searches for SinglePageLinkSelectors in the page and, if it finds one, it reset the process to its beginning with the newly found URL.
func FindNextPage ¶
FindNextPage looks for NextPageLinkSelectors and if it finds a URL, it's added to the message and can be processed later with GoToNextPage.
func GoToNextPage ¶
GoToNextPage checks if there is a "next_page" value in the process message. It then creates a new drop with the URL.
func LoadConfiguration ¶
LoadConfiguration will try to find a matching fftr configuration for the first Drop (the extraction starting point).
If a configuration is found, it will be added to the context.
If the configuration indicates custom HTTP headers, they'll be added to the client.
func ReplaceStrings ¶
ReplaceStrings applies all the replace_string directive in fftr configuration file on the received body.
Types ¶
type Config ¶
type Config struct { Files []string `json:"-"` TitleSelectors []string `json:"title_selectors"` BodySelectors []string `json:"body_selectors"` DateSelectors []string `json:"date_selectors"` AuthorSelectors []string `json:"author_selectors"` StripSelectors []string `json:"strip_selectors"` StripIDOrClass []string `json:"strip_id_or_class"` StripImageSrc []string `json:"strip_image_src"` NativeAdSelectors []string `json:"native_ad_selectors"` Tidy bool `json:"tidy"` Prune bool `json:"prune"` AutoDetectOnFailure bool `json:"autodetect_on_failure"` SinglePageLinkSelectors []string `json:"single_page_link_selectors"` NextPageLinkSelectors []string `json:"next_page_link_selectors"` ReplaceStrings [][2]string `json:"replace_strings"` HTTPHeaders map[string]string `json:"http_headers"` Tests []FilterTest `json:"tests"` }
Config holds the fivefilters configuration.
func NewConfigForURL ¶
func NewConfigForURL(src *url.URL, folders ConfigFolderList) (*Config, error)
NewConfigForURL loads site config configuration file(s) for a given URL.
type ConfigFolder ¶
ConfigFolder is an http.FileSystem with a name.
type ConfigFolderList ¶
type ConfigFolderList []*ConfigFolder
ConfigFolderList is a list of configuration folders.
type FilterTest ¶
FilterTest holds the values for a filter's test.