htmlquerier

package
v0.0.0-...-06ea526 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 17, 2024 License: AGPL-3.0 Imports: 6 Imported by: 0

Documentation

Overview

Package htmlquerier provides a standardized declarative api for creating a querier that is able to fetch a string from a selector, apply filters to it, and send the result on for processing in the core.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Querier

type Querier struct {
	// Initialized specifies whether the querier has been initialized or not.
	Initialized bool
	// contains filtered or unexported fields
}

Querier contains details telling livefetcher where to fetch a piece of information, and how to process the text fetched to get it ready for use in livefetcher.

It is recommended to not initialize struct directly. Instead, use Q or QAll.

func Q

func Q(selector string) *Querier

Q creates a pointer to a Querier struct. A Querier struct initialized using Q will only select the first match and get the string from that.

func QAll

func QAll(selector string) *Querier

QAll creates a pointer to a Querier struct. A Querier struct initialized using QAll will fetch all instances of the selector, get the string within, and assign them all to arr.

Any basic filters specified will be applied individually on each match

func (*Querier) AddComplexFilter

func (q *Querier) AddComplexFilter(fn func([]string) []string) *Querier

AddComplexFilter adds a filter that takes the full slice of strings, and returns a new slice. This should only be used if you need the full context of the array, or if you want to be able to entirely remove entries.

Make sure not to return an empty slice, at minimum return slice containing a single entry with empty string.

func (*Querier) AddFilter

func (q *Querier) AddFilter(fn func(string) string) *Querier

AddFilter adds a simple filter to the Querier struct. Simple filter will run once on each entry in slice, replacing each entry with the filtered version.

func (*Querier) AddSplitter

func (q *Querier) AddSplitter(fn func(string) []string) *Querier

AddSplitter adds a splitter filter for the Querier, which iterates over the slice, and may or may not turn the entry into multiple entries.

func (*Querier) After

func (q *Querier) After(sep string) *Querier

After adds a filter that removes any text before and including the first instance of given separator sep.

func (*Querier) Before

func (q *Querier) Before(sep string) *Querier

Before adds a filter that removes any text after and including the first instance of given separator sep.

func (*Querier) BeforeSelector

func (q *Querier) BeforeSelector(selector string) *Querier

BeforeSelector sets an endSelector, and will ensure that only text before the selector specified is selected.

func (*Querier) CutWrapper

func (q *Querier) CutWrapper(prefix, suffix string) *Querier

CutWrapper adds a filter to the querier that removes a wrapping prefix and suffix only if both are present.

func (*Querier) DeleteFrom

func (q *Querier) DeleteFrom(s string) *Querier

DeleteFrom adds a complex filter that deletes every item starting at an item with specific value

func (*Querier) DeleteUntil

func (q *Querier) DeleteUntil(s string) *Querier

DeleteUntil adds a complex filter that deletes every item until and including an item with specific value

func (*Querier) Execute

func (q *Querier) Execute(n *html.Node) (a []string, err error)

Execute executes the query. This is only used internally in the core, please do not call this in connectors.

func (*Querier) FilterArtist

func (q *Querier) FilterArtist(exp string, i int) *Querier

FilterArtist is meant to be run on a querier that has fetched title and artist, without knowing which. It will then try to return only the artist to the best of its ability.

exp is expected separator regex (FilterArtist will NOT split, you must do that separately after)

i is the most common index for artist to have (fallback)

func (*Querier) FilterTitle

func (q *Querier) FilterTitle(exp string, i int) *Querier

FilterTitle is meant to be run on a querier that has fetched title and artist, without knowing which. It will then try to return only the title to the best of its ability.

exp is expected separator regex for artists

i is the most common index for title to have (fallback)

func (*Querier) HalfWidth

func (q *Querier) HalfWidth() *Querier

HalfWidth adds a filter that forces fullwidth alphanumeric characters to halfwidth characters. This is typically useful for sites that use fullwidth numbers for dates.

func (*Querier) Join

func (q *Querier) Join(sep string) *Querier

Concat concatenates all the strings from the slice to one using a separator sep

func (*Querier) KeepIndex

func (q *Querier) KeepIndex(i int) *Querier

KeepIndex keeps only the element at specific index, or empty string if does not exist. Negative index will get index starting from last index.

func (*Querier) Prefix

func (q *Querier) Prefix(p string) *Querier

Prefix adds a filter that adds a prefix p in front of string.

func (*Querier) ReplaceAll

func (q *Querier) ReplaceAll(old, new string) *Querier

ReplaceAll adds a filter that replaces all instances of a string old with string new.

func (*Querier) ReplaceAllRegex

func (q *Querier) ReplaceAllRegex(exp, new string) *Querier

ReplaceAll adds a filter that replaces all instances of a regular expression exp with string new. ReplaceAll uses regexp.ReplaceAllString under the hood, so use $1, $2, etc for groups.

func (*Querier) Split

func (q *Querier) Split(sep string) *Querier

Split adds a splitter that splits on a given separator string sep.

func (*Querier) SplitIgnoreWithin

func (q *Querier) SplitIgnoreWithin(sep string, l, r rune) *Querier

SplitIgnoreWithin adds a splitter that splits using a given separator, while ignoring that separator if its within a set of left and right brackets.

For instance, often, the slash character "/" is used as a separator between artists on websites. However, slash may also appear often in parentheses on individual artists to denote things like features etc.

In this case, we can use SplitIgnoreWithin to separate on "/", while ensuring splitting does not occur within the parentheses used by the site.

func (*Querier) SplitIndex

func (q *Querier) SplitIndex(sep string, i int) *Querier

SplitIndex adds a splitter that splits using a string, but only returns the entry at index i, or empty string if index i doesnt exist.

func (*Querier) SplitRegex

func (q *Querier) SplitRegex(exp string) *Querier

SplitRegex adds a splitter that splits using a regular expression exp.

func (*Querier) SplitRegexIndex

func (q *Querier) SplitRegexIndex(exp string, i int) *Querier

SplitRegexIndex works like SplitIndex, except using regex.

func (*Querier) Trim

func (q *Querier) Trim() *Querier

Trim adds a filter to the querier that removes any leading and trailing whitespace.

func (*Querier) TrimPrefix

func (q *Querier) TrimPrefix(prefix string) *Querier

TrimPrefix adds a filter to the querier that removes a specific prefix from the string.

func (*Querier) TrimSuffix

func (q *Querier) TrimSuffix(suffix string) *Querier

TrimSuffix adds a filter to the querier that removes a specific suffix from the string.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL