Documentation ¶
Index ¶
- Variables
- func FilterValidNodes(list []*html.Node, filters []FilterErrFunc) ([]*html.Node, error)
- func GetDirectChildren(node *html.Node) []*html.Node
- type ActionType
- type Context
- func (c *Context) Close()
- func (c *Context) Context() context.Context
- func (c *Context) GetLastPage(url string, wait WaitFunc, f ExtractFunc[int]) (int, error)
- func (c *Context) GetNodes(sel any, opt func(*chromedp.Selector)) ([]*cdp.Node, error)
- func (c *Context) NewSubContext() *Context
- func (c *Context) ParseHTML(url string, loadedSignal ...chromedp.Action) (*html.Node, error)
- func (c *Context) RunTasks(tasks chromedp.Tasks) error
- type ErrNoDataNodeFound
- type ErrNoNodesFound
- type ErrNoTextNodeFound
- type ExtractFunc
- type FilterErrFunc
- type GTEFunc
- type HtmlTree
- func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) *html.Node
- func (t *HtmlTree) ExtractNodes(criterias ...slext.PredicateFilter[*html.Node]) []*html.Node
- func (t *HtmlTree) ExtractSpecificNode(matchFun slext.PredicateFilter[*html.Node]) []*html.Node
- func (t *HtmlTree) MatchNodes(matchFun slext.PredicateFilter[*html.Node]) []*html.Node
- type NodeListParser
- type SearchCriteria
- type WaitFunc
Constants ¶
This section is empty.
Variables ¶
var ( // FilterNilFEFuncs is a predicate filter that filters out nil FilterErrFuncs. FilterNilFEFuncs us.PredicateFilter[FilterErrFunc] )
var ( // GetChildrenFunc is a function that returns the children of an HTML node. GetChildrenFunc tlt.NextsFunc[*html.Node] = func(elem *html.Node, info uc.Copier) ([]*html.Node, error) { if elem == nil { return nil, ers.NewErrNilValue() } children := make([]*html.Node, 0) for c := elem.FirstChild; c != nil; c = c.NextSibling { children = append(children, c) } return children, nil } )
var IsTextNodeSearch slext.PredicateFilter[*html.Node] = NewSearchCriteria(html.TextNode).Build()
IsTextNodeSearch is a search criteria that matches text nodes.
Functions ¶
func FilterValidNodes ¶ added in v0.3.15
FilterValidNodes filters the valid nodes from a list.
Parameters:
- list: The list of nodes to filter.
- filters: The functions to check if a node is valid.
Returns:
- []*html.Node: The list of valid nodes.
- error: An error if no valid nodes are found.
Behaviors:
- If no valid nodes are found, the function returns the first error encountered.
- If list is empty or filters is empty, the function returns list, nil.
Types ¶
type ActionType ¶ added in v0.3.15
type ActionType int8
ActionType is an enumeration of the different actions that can be performed on a node.
const ( // OnlyDirectChildren is an action that extracts only the direct children of a node. OnlyDirectChildren ActionType = iota // DFSOne is an action that extracts only one node using depth-first search. DFSOne // BFSMany is an action that extracts multiple nodes using breadth-first search. BFSMany )
type Context ¶ added in v0.3.13
type Context struct {
// contains filtered or unexported fields
}
Context is the context of the session.
func InitializeContext ¶ added in v0.3.13
func InitializeContext() *Context
InitializeContext initializes a new context.
Returns:
- *Context: The new context.
func (*Context) Context ¶ added in v0.3.13
Context returns the context of the session.
Returns:
- context.Context: The context of the session.
func (*Context) GetLastPage ¶ added in v0.3.13
GetLastPage gets the last page of the URL.
Parameters:
- url: The URL of the page.
- waitTask: The task to wait for the page to load.
- f: The function to extract the last page from the HTML.
Returns:
- int: The last page of the URL.
- error: The error that occurred while getting the last page.
func (*Context) GetNodes ¶ added in v0.3.13
GetNodes gets the nodes that match the selector.
Parameters:
- sel: The selector of the nodes.
- opt: The options of the selector.
Returns:
- []*cdp.Node: The nodes that match the selector.
- error: The error that occurred while getting the nodes.
func (*Context) NewSubContext ¶ added in v0.3.13
NewSubContext creates a new sub context.
Returns:
- *Context: The new sub context.
type ErrNoDataNodeFound ¶ added in v0.3.15
type ErrNoDataNodeFound struct { // Data is the data that was not found. Data string }
ErrNoDataNodeFound is an error that is returned when no data nodes are found.
func NewErrNoDataNodeFound ¶ added in v0.3.15
func NewErrNoDataNodeFound(data string) *ErrNoDataNodeFound
NewErrNoDataNodeFound creates a new ErrNoDataNodeFound error.
Parameters:
- data: The data that was not found.
Returns:
- *ErrNoDataNodeFound: The new error.
func (*ErrNoDataNodeFound) Error ¶ added in v0.3.15
func (e *ErrNoDataNodeFound) Error() string
Error implements the error interface.
It returns the error message: "no <data> tags found".
type ErrNoNodesFound ¶ added in v0.3.15
type ErrNoNodesFound struct{}
ErrNoNodesFound is an error that is returned when no nodes are found.
func NewErrNoNodesFound ¶ added in v0.3.15
func NewErrNoNodesFound() *ErrNoNodesFound
NewErrNoNodesFound creates a new ErrNoNodesFound error.
Returns:
- *ErrNoNodesFound: The new error.
func (*ErrNoNodesFound) Error ¶ added in v0.3.15
func (e *ErrNoNodesFound) Error() string
Error implements the error interface.
It returns the error message: "no nodes found".
type ErrNoTextNodeFound ¶ added in v0.3.15
type ErrNoTextNodeFound struct {
IsFirstChild bool
}
ErrNoTextNodeFound is an error that is returned when no text nodes are found.
func NewErrNoTextNodeFound ¶ added in v0.3.15
func NewErrNoTextNodeFound(isFirstChild bool) *ErrNoTextNodeFound
NewErrNoTextNodeFound creates a new ErrNoTextNodeFound error.
Parameters:
- isFirstChild: Whether the first child is not a text node.
Returns:
- *ErrNoTextNodeFound: The new error.
func (*ErrNoTextNodeFound) Error ¶ added in v0.3.15
func (e *ErrNoTextNodeFound) Error() string
Error implements the error interface.
It returns the error message: "node is not a text node". However, if IsFirstChild is true, it returns the error message: "first child is not a text node".
type ExtractFunc ¶ added in v0.3.13
ExtractFunc is a function that extracts data from the HTML.
Parameters:
- doc: The HTML node of the page.
Returns:
- T: The data extracted from the HTML.
- error: The error that occurred while extracting the data.
type FilterErrFunc ¶ added in v0.3.15
FilterErrFunc is a function that returns an error if a condition is not met.
Parameters:
- node: The node to check.
Returns:
- error: An error if the condition is not met.
func FilterDataNode ¶ added in v0.3.15
func FilterDataNode(data string) FilterErrFunc
FilterDataNode returns an FilterErrFunc that checks if the node has the specified data.
Parameters:
- data: The data to check for.
Returns:
- FilterErrFunc: The FilterErrFunc that checks if the node has the specified data.
func FilterTextNode ¶ added in v0.3.15
func FilterTextNode(checkFirstChild bool) FilterErrFunc
FilterTextNode returns an FilterErrFunc that checks if the node is a text node.
Parameters:
- checkFirstChild: If true, the function checks if the first child is a text node. Otherwise, it checks if the node itself is a text node.
Returns:
- FilterErrFunc: The FilterErrFunc that checks if the node is a text node.
type GTEFunc ¶ added in v0.3.15
GTEFunc is a function that extracts nodes from a tree.
Parameters:
- tree: The tree to extract nodes from.
Returns:
- []*html.Node: The list of nodes extracted from the tree.
func GenericTreeExtraction ¶ added in v0.3.15
func GenericTreeExtraction(search *SearchCriteria, action ActionType) GTEFunc
GenericTreeExtraction creates a GTEFunc from the given parameters.
Parameters:
- search: The search criteria to use.
- action: The action to perform on the node.
Returns:
- GTEFunc: The created GTEFunc.
Behaviors:
- If search is nil, the function uses a nil filter.
- If action is not recognized, the function returns a GTEFunc that returns nil.
type HtmlTree ¶
type HtmlTree struct {
// contains filtered or unexported fields
}
HtmlTree is a struct that represents an HTML tree.
func NewHtmlTree ¶
NewHtmlTree constructs a tree from an HTML node.
Parameters:
- root: The root HTML node.
Returns:
- *HtmlTree: The tree constructed from the HTML node.
- error: An error if the tree construction fails.
Errors:
- *ers.ErrNilValue: If any html.Node is nil.
func (*HtmlTree) ExtractContentFromDocument ¶
func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) *html.Node
ExtractContentFromDocument performs a depth-first search on an HTML document, finding the first node that matches the provided search criteria.
Parameters:
- matchFun: The search criteria to apply to each node.
Returns:
- *html.Node: The first node that matches the search criteria, nil if no matching node is found.
func (*HtmlTree) ExtractNodes ¶
ExtractNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.
Parameters:
- criterias: A list of search criteria to apply to each node.
Returns:
- []*html.Node: A slice containing all nodes that match the search criteria.
Behavior:
- If no criteria is provided, then any node will match.
func (*HtmlTree) ExtractSpecificNode ¶
ExtractSpecificNode finds all nodes that match the given search criteria and that are direct children of the provided node.
Parameters:
- criteria: The search criteria to apply to each node.
Returns:
- nodes: A slice containing all nodes that match the search criteria.
Behavior:
- If no criteria is provided, then any node will match.
- If the node is nil, then a nil slice is returned.
func (*HtmlTree) MatchNodes ¶
MatchNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.
Parameters:
- matchFun: The search criteria to apply to each node.
Returns:
- []*html.Node: A slice containing all nodes that match the search criteria.
Behavior:
- It does not search the children of the nodes that match the criteria.
- If no criteria is provided, then the first node will match.
type NodeListParser ¶ added in v0.3.15
NodeListParser is a function that parses a list of nodes.
Parameters:
- list: The list of nodes to parse.
Returns:
- T: The parsed value.
- error: An error if the parsing fails.
func CEWithSearch ¶ added in v0.3.15
func CEWithSearch[T any](search *SearchCriteria, action ActionType, parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]
CEWithSearch creates a NodeListParser from the given parameters.
Parameters:
- search: The search criteria to use.
- action: The action to perform on the node.
- parse: The function that parses the list of nodes.
- filters: The functions that filter the list of nodes.
Returns:
- NodeListParser: The created NodeListParser.
Behaviors:
- If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
- Nil functions in filters are ignored.
- Uses a Stack to traverse the tree and GTEFunc to extract nodes.
- It terminates as soon as a valid result is found.
func CreateExtractor ¶ added in v0.3.15
func CreateExtractor[T any](parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]
CreateExtractor creates a NodeListParser from the given parameters.
Parameters:
- parse: The function that parses the list of nodes.
- filters: The functions that filter the list of nodes.
Returns:
- NodeListParser: The created NodeListParser.
Behaviors:
- If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
- Nil functions in filters are ignored.
type SearchCriteria ¶
type SearchCriteria struct { // NodeType specifies the type of the HTML node to search for. NodeType html.NodeType // Data represents the data contained within the node. Data *string // Attrs is a slice of attribute key-value pairs to match. Attrs []*cdp.Pair[string, slext.PredicateFilter[string]] }
SearchCriteria is a struct that encapsulates the parameters for searching within an HTML node.
func NewSearchCriteria ¶
func NewSearchCriteria(node_type html.NodeType) *SearchCriteria
NewSearchCriteria constructs a new SearchCriteria instance using the provided parameters.
Parameters:
- node_type: The type of the HTML node to search for.
Returns:
- *SearchCriteria: A new SearchCriteria instance.
func (*SearchCriteria) AppendAttr ¶
func (sc *SearchCriteria) AppendAttr(key string, val slext.PredicateFilter[string]) *SearchCriteria
AppendAttr is a method of the SearchCriteria type that appends an attribute key-value pair to the SearchCriteria instance.
Parameters:
- key: The attribute key to match.
- val: The attribute value to match.
Returns:
- *SearchCriteria: The SearchCriteria instance with the attribute key-value pair appended.
func (*SearchCriteria) Build ¶
func (sc *SearchCriteria) Build() slext.PredicateFilter[*html.Node]
Build is a method of the SearchCriteria type that constructs a slext.PredicateFilter function using the search criteria.
Returns:
- slext.PredicateFilter: A function that matches the search criteria.
func (*SearchCriteria) SetData ¶
func (sc *SearchCriteria) SetData(data string) *SearchCriteria
SetData sets the data field of the SearchCriteria instance.
Parameters:
- data: The data to set in the SearchCriteria instance.
Returns:
- *SearchCriteria: The SearchCriteria instance with the data field set.