Documentation ¶
Overview ¶
Package htmlquerier provides a standardized declarative api for creating a querier that is able to fetch a string from a selector, apply filters to it, and send the result on for processing in the core.
Index ¶
- type Querier
- func (q *Querier) AddComplexFilter(fn func([]string) []string) *Querier
- func (q *Querier) AddFilter(fn func(string) string) *Querier
- func (q *Querier) AddSplitter(fn func(string) []string) *Querier
- func (q *Querier) After(sep string) *Querier
- func (q *Querier) Before(sep string) *Querier
- func (q *Querier) BeforeSelector(selector string) *Querier
- func (q *Querier) CutWrapper(prefix, suffix string) *Querier
- func (q *Querier) DeleteFrom(s string) *Querier
- func (q *Querier) DeleteUntil(s string) *Querier
- func (q *Querier) Execute(n *html.Node) (a []string, err error)
- func (q *Querier) FilterArtist(exp string, i int) *Querier
- func (q *Querier) FilterTitle(exp string, i int) *Querier
- func (q *Querier) HalfWidth() *Querier
- func (q *Querier) Join(sep string) *Querier
- func (q *Querier) KeepIndex(i int) *Querier
- func (q *Querier) Prefix(p string) *Querier
- func (q *Querier) ReplaceAll(old, new string) *Querier
- func (q *Querier) ReplaceAllRegex(exp, new string) *Querier
- func (q *Querier) Split(sep string) *Querier
- func (q *Querier) SplitIgnoreWithin(sep string, l, r rune) *Querier
- func (q *Querier) SplitIndex(sep string, i int) *Querier
- func (q *Querier) SplitRegex(exp string) *Querier
- func (q *Querier) SplitRegexIndex(exp string, i int) *Querier
- func (q *Querier) Trim() *Querier
- func (q *Querier) TrimPrefix(prefix string) *Querier
- func (q *Querier) TrimSuffix(suffix string) *Querier
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Querier ¶
type Querier struct { // Initialized specifies whether the querier has been initialized or not. Initialized bool // contains filtered or unexported fields }
Querier contains details telling livefetcher where to fetch a piece of information, and how to process the text fetched to get it ready for use in livefetcher.
It is recommended to not initialize struct directly. Instead, use Q or QAll.
func Q ¶
Q creates a pointer to a Querier struct. A Querier struct initialized using Q will only select the first match and get the string from that.
func QAll ¶
QAll creates a pointer to a Querier struct. A Querier struct initialized using QAll will fetch all instances of the selector, get the string within, and assign them all to arr.
Any basic filters specified will be applied individually on each match
func (*Querier) AddComplexFilter ¶
AddComplexFilter adds a filter that takes the full slice of strings, and returns a new slice. This should only be used if you need the full context of the array, or if you want to be able to entirely remove entries.
Make sure not to return an empty slice, at minimum return slice containing a single entry with empty string.
func (*Querier) AddFilter ¶
AddFilter adds a simple filter to the Querier struct. Simple filter will run once on each entry in slice, replacing each entry with the filtered version.
func (*Querier) AddSplitter ¶
AddSplitter adds a splitter filter for the Querier, which iterates over the slice, and may or may not turn the entry into multiple entries.
func (*Querier) After ¶
After adds a filter that removes any text before and including the first instance of given separator sep.
func (*Querier) Before ¶
Before adds a filter that removes any text after and including the first instance of given separator sep.
func (*Querier) BeforeSelector ¶
BeforeSelector sets an endSelector, and will ensure that only text before the selector specified is selected.
func (*Querier) CutWrapper ¶
CutWrapper adds a filter to the querier that removes a wrapping prefix and suffix only if both are present.
func (*Querier) DeleteFrom ¶
DeleteFrom adds a complex filter that deletes every item starting at an item with specific value
func (*Querier) DeleteUntil ¶
DeleteUntil adds a complex filter that deletes every item until and including an item with specific value
func (*Querier) Execute ¶
Execute executes the query. This is only used internally in the core, please do not call this in connectors.
func (*Querier) FilterArtist ¶
FilterArtist is meant to be run on a querier that has fetched title and artist, without knowing which. It will then try to return only the artist to the best of its ability.
exp is expected separator regex (FilterArtist will NOT split, you must do that separately after)
i is the most common index for artist to have (fallback)
func (*Querier) FilterTitle ¶
FilterTitle is meant to be run on a querier that has fetched title and artist, without knowing which. It will then try to return only the title to the best of its ability.
exp is expected separator regex for artists
i is the most common index for title to have (fallback)
func (*Querier) HalfWidth ¶
HalfWidth adds a filter that forces fullwidth alphanumeric characters to halfwidth characters. This is typically useful for sites that use fullwidth numbers for dates.
func (*Querier) Join ¶
Concat concatenates all the strings from the slice to one using a separator sep
func (*Querier) KeepIndex ¶
KeepIndex keeps only the element at specific index, or empty string if does not exist. Negative index will get index starting from last index.
func (*Querier) ReplaceAll ¶
ReplaceAll adds a filter that replaces all instances of a string old with string new.
func (*Querier) ReplaceAllRegex ¶
ReplaceAll adds a filter that replaces all instances of a regular expression exp with string new. ReplaceAll uses regexp.ReplaceAllString under the hood, so use $1, $2, etc for groups.
func (*Querier) SplitIgnoreWithin ¶
SplitIgnoreWithin adds a splitter that splits using a given separator, while ignoring that separator if its within a set of left and right brackets.
For instance, often, the slash character "/" is used as a separator between artists on websites. However, slash may also appear often in parentheses on individual artists to denote things like features etc.
In this case, we can use SplitIgnoreWithin to separate on "/", while ensuring splitting does not occur within the parentheses used by the site.
func (*Querier) SplitIndex ¶
SplitIndex adds a splitter that splits using a string, but only returns the entry at index i, or empty string if index i doesnt exist.
func (*Querier) SplitRegex ¶
SplitRegex adds a splitter that splits using a regular expression exp.
func (*Querier) SplitRegexIndex ¶
SplitRegexIndex works like SplitIndex, except using regex.
func (*Querier) Trim ¶
Trim adds a filter to the querier that removes any leading and trailing whitespace.
func (*Querier) TrimPrefix ¶
TrimPrefix adds a filter to the querier that removes a specific prefix from the string.
func (*Querier) TrimSuffix ¶
TrimSuffix adds a filter to the querier that removes a specific suffix from the string.