Documentation
¶
Overview ¶
Code generated by go generate; EDIT THIS FILE DIRECTLY
Index ¶
- Variables
- func FilterValidNodes(list []*html.Node, filters []FilterErrFunc) ([]*html.Node, error)
- func GetDirectChildren(node *html.Node) []*html.Node
- type ActionType
- type AttrPair
- type Context
- func (c *Context) Close()
- func (c *Context) Context() context.Context
- func (c *Context) GetLastPage(url string, wait WaitFunc, f ExtractFunc[int]) (int, error)
- func (c *Context) GetNodes(sel any, opt func(*chromedp.Selector)) ([]*cdp.Node, error)
- func (c *Context) NewSubContext() *Context
- func (c *Context) ParseHTML(url string, loadedSignal ...chromedp.Action) (*html.Node, error)
- func (c *Context) RunTasks(tasks chromedp.Tasks) error
- type ErrNoDataNodeFound
- type ErrNoNodesFound
- type ErrNoTextNodeFound
- type ExtractFunc
- type FilterErrFunc
- type GTEFunc
- type HtmlTree
- func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) (*html.Node, error)
- func (t *HtmlTree) ExtractNodes(criterias ...slext.PredicateFilter[*html.Node]) ([]*html.Node, error)
- func (t *HtmlTree) ExtractSpecificNode(matchFun slext.PredicateFilter[*html.Node]) ([]*html.Node, error)
- func (t *HtmlTree) MatchNodes(matchFun slext.PredicateFilter[*html.Node]) ([]*html.Node, error)
- type NodeListParser
- type SearchCriteria
- type TreeNode
- func (tn *TreeNode) AddChild(target tree.Noder)
- func (tn *TreeNode) AddChildren(children []*TreeNode)
- func (tn *TreeNode) Cleanup() []tree.Noder
- func (tn *TreeNode) Copy() common.Copier
- func (tn *TreeNode) DeleteChild(target tree.Noder) []tree.Noder
- func (tn *TreeNode) GetChildren() []*TreeNode
- func (tn *TreeNode) GetFirstChild() tree.Noder
- func (tn *TreeNode) GetFirstSibling() *TreeNode
- func (tn *TreeNode) GetLastSibling() *TreeNode
- func (tn *TreeNode) GetParent() tree.Noder
- func (tn *TreeNode) HasChild(target *TreeNode) bool
- func (tn *TreeNode) IsChildOf(target *TreeNode) bool
- func (tn *TreeNode) IsLeaf() bool
- func (tn *TreeNode) IsSingleton() bool
- func (tn *TreeNode) Iterator() common.Iterater[tree.Noder]
- func (tn *TreeNode) LinkChildren(children []tree.Noder)
- func (tn *TreeNode) RemoveNode() []tree.Noder
- func (tn *TreeNode) String() string
- type TreeNodeIterator
- type WaitFunc
Constants ¶
This section is empty.
Variables ¶
var ( // IsTextNodeSearch is a search criteria that matches text nodes. IsTextNodeSearch slext.PredicateFilter[*html.Node] // GetChildrenFunc is a function that returns the children of an HTML node. GetChildrenFunc tr.NextsFunc[*TreeNode] )
var ( // FilterNilFEFuncs is a predicate filter that filters out nil FilterErrFuncs. FilterNilFEFuncs us.PredicateFilter[FilterErrFunc] )
Functions ¶
func FilterValidNodes ¶
FilterValidNodes filters the valid nodes from a list.
Parameters:
- list: The list of nodes to filter.
- filters: The functions to check if a node is valid.
Returns:
- []*html.Node: The list of valid nodes.
- error: An error if no valid nodes are found.
Behaviors:
- If no valid nodes are found, the function returns the first error encountered.
- If list is empty or filters is empty, the function returns list, nil.
Types ¶
type ActionType ¶
type ActionType int8
ActionType is an enumeration of the different actions that can be performed on a node.
const ( // OnlyDirectChildren is an action that extracts only the direct children of a node. OnlyDirectChildren ActionType = iota // DFSOne is an action that extracts only one node using depth-first search. DFSOne // BFSMany is an action that extracts multiple nodes using breadth-first search. BFSMany )
type AttrPair ¶
type AttrPair struct { // Attr is the attribute key to match. Attr string // FilterFunc is the filter function to apply to the attribute value. FilterFunc slext.PredicateFilter[string] }
AttrPair is a struct that encapsulates an attribute key-value pair and a filter function.
func NewAttrPair ¶
func NewAttrPair(attr string, filter_func slext.PredicateFilter[string]) *AttrPair
NewAttrPair constructs a new AttrPair instance using the provided parameters.
Parameters:
- attr: The attribute key to match.
- filter_func: The filter function to apply to the attribute value.
Returns:
- *AttrPair: A new AttrPair instance. Nil if the filter function is nil.
func (*AttrPair) Match ¶
Match is a method of the AttrPair type that checks if the attribute key-value pair matches the provided attribute.
Parameters:
- attr: The attribute key-value pair to match against.
Returns:
- bool: True if the attribute key-value pair matches the provided attribute, false otherwise.
type Context ¶
type Context struct {
// contains filtered or unexported fields
}
Context is the context of the session.
func InitializeContext ¶
func InitializeContext() *Context
InitializeContext initializes a new context.
Returns:
- *Context: The new context.
func (*Context) Context ¶
Context returns the context of the session.
Returns:
- context.Context: The context of the session.
func (*Context) GetLastPage ¶
GetLastPage gets the last page of the URL.
Parameters:
- url: The URL of the page.
- waitTask: The task to wait for the page to load.
- f: The function to extract the last page from the HTML.
Returns:
- int: The last page of the URL.
- error: The error that occurred while getting the last page.
func (*Context) GetNodes ¶
GetNodes gets the nodes that match the selector.
Parameters:
- sel: The selector of the nodes.
- opt: The options of the selector.
Returns:
- []*cdp.Node: The nodes that match the selector.
- error: The error that occurred while getting the nodes.
func (*Context) NewSubContext ¶
NewSubContext creates a new sub context.
Returns:
- *Context: The new sub context.
type ErrNoDataNodeFound ¶
type ErrNoDataNodeFound struct { // Data is the data that was not found. Data string }
ErrNoDataNodeFound is an error that is returned when no data nodes are found.
func NewErrNoDataNodeFound ¶
func NewErrNoDataNodeFound(data string) *ErrNoDataNodeFound
NewErrNoDataNodeFound creates a new ErrNoDataNodeFound error.
Parameters:
- data: The data that was not found.
Returns:
- *ErrNoDataNodeFound: The new error.
func (*ErrNoDataNodeFound) Error ¶
func (e *ErrNoDataNodeFound) Error() string
Error implements the error interface.
It returns the error message: "no <data> tags found".
type ErrNoNodesFound ¶
type ErrNoNodesFound struct{}
ErrNoNodesFound is an error that is returned when no nodes are found.
func NewErrNoNodesFound ¶
func NewErrNoNodesFound() *ErrNoNodesFound
NewErrNoNodesFound creates a new ErrNoNodesFound error.
Returns:
- *ErrNoNodesFound: The new error.
func (*ErrNoNodesFound) Error ¶
func (e *ErrNoNodesFound) Error() string
Error implements the error interface.
It returns the error message: "no nodes found".
type ErrNoTextNodeFound ¶
type ErrNoTextNodeFound struct { // IsFirstChild is whether the first child is not a text node. IsFirstChild bool }
ErrNoTextNodeFound is an error that is returned when no text nodes are found.
func NewErrNoTextNodeFound ¶
func NewErrNoTextNodeFound(isFirstChild bool) *ErrNoTextNodeFound
NewErrNoTextNodeFound creates a new ErrNoTextNodeFound error.
Parameters:
- isFirstChild: Whether the first child is not a text node.
Returns:
- *ErrNoTextNodeFound: The new error.
func (*ErrNoTextNodeFound) Error ¶
func (e *ErrNoTextNodeFound) Error() string
Error implements the error interface.
It returns the error message: "node is not a text node". However, if IsFirstChild is true, it returns the error message: "first child is not a text node".
type ExtractFunc ¶
ExtractFunc is a function that extracts data from the HTML.
Parameters:
- doc: The HTML node of the page.
Returns:
- T: The data extracted from the HTML.
- error: The error that occurred while extracting the data.
type FilterErrFunc ¶
FilterErrFunc is a function that returns an error if a condition is not met.
Parameters:
- node: The node to check.
Returns:
- error: An error if the condition is not met.
func FilterDataNode ¶
func FilterDataNode(data string) FilterErrFunc
FilterDataNode returns an FilterErrFunc that checks if the node has the specified data.
Parameters:
- data: The data to check for.
Returns:
- FilterErrFunc: The FilterErrFunc that checks if the node has the specified data.
func FilterTextNode ¶
func FilterTextNode(checkFirstChild bool) FilterErrFunc
FilterTextNode returns an FilterErrFunc that checks if the node is a text node.
Parameters:
- checkFirstChild: If true, the function checks if the first child is a text node. Otherwise, it checks if the node itself is a text node.
Returns:
- FilterErrFunc: The FilterErrFunc that checks if the node is a text node.
type GTEFunc ¶
GTEFunc is a function that extracts nodes from a tree.
Parameters:
- tree: The tree to extract nodes from.
Returns:
- []*html.Node: The list of nodes extracted from the tree.
- error: An error if the extraction fails.
func GenericTreeExtraction ¶
func GenericTreeExtraction(search *SearchCriteria, action ActionType) GTEFunc
GenericTreeExtraction creates a GTEFunc from the given parameters.
Parameters:
- search: The search criteria to use.
- action: The action to perform on the node.
Returns:
- GTEFunc: The created GTEFunc.
Behaviors:
- If search is nil, the function uses a nil filter.
- If action is not recognized, the function returns a GTEFunc that returns nil.
type HtmlTree ¶
type HtmlTree struct {
// contains filtered or unexported fields
}
HtmlTree is a struct that represents an HTML tree.
func NewHtmlTree ¶
NewHtmlTree constructs a tree from an HTML node.
Parameters:
- root: The root HTML node.
Returns:
- *HtmlTree: The tree constructed from the HTML node.
- error: An error if the tree construction fails.
Errors:
- *uc.ErrNilValue: If any html.Node is nil.
func (*HtmlTree) ExtractContentFromDocument ¶
func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) (*html.Node, error)
ExtractContentFromDocument performs a depth-first search on an HTML document, finding the first node that matches the provided search criteria.
Parameters:
- matchFun: The search criteria to apply to each node.
Returns:
- *html.Node: The first node that matches the search criteria, nil if no matching node is found.
func (*HtmlTree) ExtractNodes ¶
func (t *HtmlTree) ExtractNodes(criterias ...slext.PredicateFilter[*html.Node]) ([]*html.Node, error)
ExtractNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.
Parameters:
- criterias: A list of search criteria to apply to each node.
Returns:
- []*html.Node: A slice containing all nodes that match the search criteria.
Behavior:
- If no criteria is provided, then any node will match.
func (*HtmlTree) ExtractSpecificNode ¶
func (t *HtmlTree) ExtractSpecificNode(matchFun slext.PredicateFilter[*html.Node]) ([]*html.Node, error)
ExtractSpecificNode finds all nodes that match the given search criteria and that are direct children of the provided node.
Parameters:
- matchFun: The search criteria to apply to each node.
Returns:
- []*html.Node: A slice containing all nodes that match the search criteria.
- error: An error if the search fails.
Behavior:
- If no criteria is provided, then any node will match.
func (*HtmlTree) MatchNodes ¶
MatchNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.
Parameters:
- matchFun: The search criteria to apply to each node.
Returns:
- []*html.Node: A slice containing all nodes that match the search criteria.
Behavior:
- It does not search the children of the nodes that match the criteria.
- If no criteria is provided, then the first node will match.
type NodeListParser ¶
NodeListParser is a function that parses a list of nodes.
Parameters:
- list: The list of nodes to parse.
Returns:
- T: The parsed value.
- error: An error if the parsing fails.
func CEWithSearch ¶
func CEWithSearch[T any](search *SearchCriteria, action ActionType, parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]
CEWithSearch creates a NodeListParser from the given parameters.
Parameters:
- search: The search criteria to use.
- action: The action to perform on the node.
- parse: The function that parses the list of nodes.
- filters: The functions that filter the list of nodes.
Returns:
- NodeListParser: The created NodeListParser.
Behaviors:
- If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
- Nil functions in filters are ignored.
- Uses a Stack to traverse the tree and GTEFunc to extract nodes.
- It terminates as soon as a valid result is found.
func CreateExtractor ¶
func CreateExtractor[T any](parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]
CreateExtractor creates a NodeListParser from the given parameters.
Parameters:
- parse: The function that parses the list of nodes.
- filters: The functions that filter the list of nodes.
Returns:
- NodeListParser: The created NodeListParser.
Behaviors:
- If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
- Nil functions in filters are ignored.
type SearchCriteria ¶
type SearchCriteria struct { // NodeType specifies the type of the HTML node to search for. NodeType html.NodeType // Data represents the data contained within the node. Data *string // Attrs is a slice of attribute key-value pairs to match. Attrs []*AttrPair }
SearchCriteria is a struct that encapsulates the parameters for searching within an HTML node.
func NewSearchCriteria ¶
func NewSearchCriteria(node_type html.NodeType) *SearchCriteria
NewSearchCriteria constructs a new SearchCriteria instance using the provided parameters.
Parameters:
- node_type: The type of the HTML node to search for.
Returns:
- *SearchCriteria: A new SearchCriteria instance.
func (*SearchCriteria) AppendAttr ¶
func (sc *SearchCriteria) AppendAttr(key string, val slext.PredicateFilter[string]) *SearchCriteria
AppendAttr is a method of the SearchCriteria type that appends an attribute key-value pair to the SearchCriteria instance.
Parameters:
- key: The attribute key to match.
- val: The attribute value to match.
Returns:
- *SearchCriteria: The SearchCriteria instance with the attribute key-value pair appended.
func (*SearchCriteria) Build ¶
func (sc *SearchCriteria) Build() slext.PredicateFilter[*html.Node]
Build is a method of the SearchCriteria type that constructs a slext.PredicateFilter function using the search criteria.
Returns:
- slext.PredicateFilter: A function that matches the search criteria.
func (*SearchCriteria) SetData ¶
func (sc *SearchCriteria) SetData(data string) *SearchCriteria
SetData sets the data field of the SearchCriteria instance.
Parameters:
- data: The data to set in the SearchCriteria instance.
Returns:
- *SearchCriteria: The SearchCriteria instance with the data field set.
type TreeNode ¶
type TreeNode struct {
Parent, FirstChild, NextSibling, LastChild, PrevSibling *TreeNode
Data *html.Node
}
TreeNode is a node in a tree.
func NewTreeNode ¶
NewTreeNode creates a new node with the given data.
Parameters:
- Data: The Data of the node.
Returns:
- *TreeNode: A pointer to the newly created node. It is never nil.
func (*TreeNode) AddChildren ¶
AddChildren is a convenience function to add multiple children to the node at once. It is more efficient than adding them one by one. Therefore, the behaviors are the same as the behaviors of the TreeNode.AddChild function.
Parameters:
- children: The children to add.
func (*TreeNode) Copy ¶
Copy implements the tree.Noder interface.
Although this function never returns nil, it does not copy the parent nor the sibling pointers.
func (*TreeNode) DeleteChild ¶
DeleteChild implements the tree.Noder interface.
func (*TreeNode) GetChildren ¶
GetChildren returns the immediate children of the node.
The returned nodes are never nil and are not copied. Thus, modifying the returned nodes will modify the tree.
Returns:
- []*TreeNode: A slice of pointers to the children of the node.
func (*TreeNode) GetFirstChild ¶
GetFirstChild implements the tree.Noder interface.
func (*TreeNode) GetFirstSibling ¶
GetFirstSibling returns the first sibling of the node. If it has a parent, it returns the first child of the parent. Otherwise, it returns the first sibling of the node.
As an edge case, if the node has no parent and no previous sibling, it returns the node itself. Thus, this function never returns nil.
Returns:
- *TreeNode: A pointer to the first sibling.
func (*TreeNode) GetLastSibling ¶
GetLastSibling returns the last sibling of the node. If it has a parent, it returns the last child of the parent. Otherwise, it returns the last sibling of the node.
As an edge case, if the node has no parent and no next sibling, it returns the node itself. Thus, this function never returns nil.
Returns:
- *TreeNode: A pointer to the last sibling.
func (*TreeNode) HasChild ¶
HasChild returns true if the node has the given child.
Because children of a node cannot be nil, a nil target will always return false.
Parameters:
- target: The child to check for.
Returns:
- bool: True if the node has the child, false otherwise.
func (*TreeNode) IsChildOf ¶
IsChildOf returns true if the node is a child of the parent. If target is nil, it returns false.
Parameters:
- target: The target parent to check for.
Returns:
- bool: True if the node is a child of the parent, false otherwise.
func (*TreeNode) IsSingleton ¶
IsSingleton implements the tree.Noder interface.
func (*TreeNode) Iterator ¶
Iterator implements the tree.Noder interface.
This function returns an iterator that iterates over the direct children of the node. Implemented as a pull-based iterator, this function never returns nil and any of the values is guaranteed to be a non-nil node of type TreeNode.
func (*TreeNode) LinkChildren ¶
LinkChildren implements the tree.Noder interface.
func (*TreeNode) RemoveNode ¶
RemoveNode implements the tree.Noder interface.
type TreeNodeIterator ¶
type TreeNodeIterator struct {
// contains filtered or unexported fields
}
TreeNodeIterator is a pull-based iterator that iterates over the children of a TreeNode.
func (*TreeNodeIterator) Consume ¶
func (iter *TreeNodeIterator) Consume() (tree.Noder, error)
Consume implements the common.Iterater interface.
The only error type that can be returned by this function is the *common.ErrExhaustedIter type.
Moreover, the return value is always of type *TreeNode and never nil; unless the iterator has reached the end of the branch.
func (*TreeNodeIterator) Restart ¶
func (iter *TreeNodeIterator) Restart()
Restart implements the common.Iterater interface.