SiteNavigator

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 30, 2024 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Code generated by go generate; EDIT THIS FILE DIRECTLY

Index

Constants

This section is empty.

Variables

View Source
var (
	// IsTextNodeSearch is a search criteria that matches text nodes.
	IsTextNodeSearch slext.PredicateFilter[*html.Node]

	// GetChildrenFunc is a function that returns the children of an HTML node.
	GetChildrenFunc tr.NextsFunc[*TreeNode]
)
View Source
var (
	// FilterNilFEFuncs is a predicate filter that filters out nil FilterErrFuncs.
	FilterNilFEFuncs us.PredicateFilter[FilterErrFunc]
)

Functions

func FilterValidNodes

func FilterValidNodes(list []*html.Node, filters []FilterErrFunc) ([]*html.Node, error)

FilterValidNodes filters the valid nodes from a list.

Parameters:

  • list: The list of nodes to filter.
  • filters: The functions to check if a node is valid.

Returns:

  • []*html.Node: The list of valid nodes.
  • error: An error if no valid nodes are found.

Behaviors:

  • If no valid nodes are found, the function returns the first error encountered.
  • If list is empty or filters is empty, the function returns list, nil.

func GetDirectChildren

func GetDirectChildren(node *html.Node) []*html.Node

GetDirectChildren returns a slice of the direct children of the provided node.

Parameters:

  • node: The HTML node to extract the children from.

Returns:

  • []*html.Node: A slice containing the direct children of the node.

Types

type ActionType

type ActionType int8

ActionType is an enumeration of the different actions that can be performed on a node.

const (
	// OnlyDirectChildren is an action that extracts only the direct children of a node.
	OnlyDirectChildren ActionType = iota

	// DFSOne is an action that extracts only one node using depth-first search.
	DFSOne

	// BFSMany is an action that extracts multiple nodes using breadth-first search.
	BFSMany
)

type AttrPair

type AttrPair struct {
	// Attr is the attribute key to match.
	Attr string

	// FilterFunc is the filter function to apply to the attribute value.
	FilterFunc slext.PredicateFilter[string]
}

AttrPair is a struct that encapsulates an attribute key-value pair and a filter function.

func NewAttrPair

func NewAttrPair(attr string, filter_func slext.PredicateFilter[string]) *AttrPair

NewAttrPair constructs a new AttrPair instance using the provided parameters.

Parameters:

  • attr: The attribute key to match.
  • filter_func: The filter function to apply to the attribute value.

Returns:

  • *AttrPair: A new AttrPair instance. Nil if the filter function is nil.

func (*AttrPair) Match

func (ap *AttrPair) Match(attr []html.Attribute) bool

Match is a method of the AttrPair type that checks if the attribute key-value pair matches the provided attribute.

Parameters:

  • attr: The attribute key-value pair to match against.

Returns:

  • bool: True if the attribute key-value pair matches the provided attribute, false otherwise.

type Context

type Context struct {
	// contains filtered or unexported fields
}

Context is the context of the session.

func InitializeContext

func InitializeContext() *Context

InitializeContext initializes a new context.

Returns:

  • *Context: The new context.

func (*Context) Close

func (c *Context) Close()

Close closes the context.

func (*Context) Context

func (c *Context) Context() context.Context

Context returns the context of the session.

Returns:

  • context.Context: The context of the session.

func (*Context) GetLastPage

func (c *Context) GetLastPage(url string, wait WaitFunc, f ExtractFunc[int]) (int, error)

GetLastPage gets the last page of the URL.

Parameters:

  • url: The URL of the page.
  • waitTask: The task to wait for the page to load.
  • f: The function to extract the last page from the HTML.

Returns:

  • int: The last page of the URL.
  • error: The error that occurred while getting the last page.

func (*Context) GetNodes

func (c *Context) GetNodes(sel any, opt func(*chromedp.Selector)) ([]*cdp.Node, error)

GetNodes gets the nodes that match the selector.

Parameters:

  • sel: The selector of the nodes.
  • opt: The options of the selector.

Returns:

  • []*cdp.Node: The nodes that match the selector.
  • error: The error that occurred while getting the nodes.

func (*Context) NewSubContext

func (c *Context) NewSubContext() *Context

NewSubContext creates a new sub context.

Returns:

  • *Context: The new sub context.

func (*Context) ParseHTML

func (c *Context) ParseHTML(url string, loadedSignal ...chromedp.Action) (*html.Node, error)

ParseHTML parses the HTML of the URL.

Parameters:

  • url: The URL of the HTML.
  • loadedSignal: The signal that the page has loaded.

Returns:

  • *html.Node: The HTML node of the URL.
  • error: The error that occurred while parsing the HTML.

func (*Context) RunTasks

func (c *Context) RunTasks(tasks chromedp.Tasks) error

RunTasks runs the tasks on the session.

Parameters:

  • tasks: The tasks to run.

Returns:

  • error: The error that occurred while running the tasks.

type ErrNoDataNodeFound

type ErrNoDataNodeFound struct {
	// Data is the data that was not found.
	Data string
}

ErrNoDataNodeFound is an error that is returned when no data nodes are found.

func NewErrNoDataNodeFound

func NewErrNoDataNodeFound(data string) *ErrNoDataNodeFound

NewErrNoDataNodeFound creates a new ErrNoDataNodeFound error.

Parameters:

  • data: The data that was not found.

Returns:

  • *ErrNoDataNodeFound: The new error.

func (*ErrNoDataNodeFound) Error

func (e *ErrNoDataNodeFound) Error() string

Error implements the error interface.

It returns the error message: "no <data> tags found".

type ErrNoNodesFound

type ErrNoNodesFound struct{}

ErrNoNodesFound is an error that is returned when no nodes are found.

func NewErrNoNodesFound

func NewErrNoNodesFound() *ErrNoNodesFound

NewErrNoNodesFound creates a new ErrNoNodesFound error.

Returns:

  • *ErrNoNodesFound: The new error.

func (*ErrNoNodesFound) Error

func (e *ErrNoNodesFound) Error() string

Error implements the error interface.

It returns the error message: "no nodes found".

type ErrNoTextNodeFound

type ErrNoTextNodeFound struct {
	// IsFirstChild is whether the first child is not a text node.
	IsFirstChild bool
}

ErrNoTextNodeFound is an error that is returned when no text nodes are found.

func NewErrNoTextNodeFound

func NewErrNoTextNodeFound(isFirstChild bool) *ErrNoTextNodeFound

NewErrNoTextNodeFound creates a new ErrNoTextNodeFound error.

Parameters:

  • isFirstChild: Whether the first child is not a text node.

Returns:

  • *ErrNoTextNodeFound: The new error.

func (*ErrNoTextNodeFound) Error

func (e *ErrNoTextNodeFound) Error() string

Error implements the error interface.

It returns the error message: "node is not a text node". However, if IsFirstChild is true, it returns the error message: "first child is not a text node".

type ExtractFunc

type ExtractFunc[T any] func(doc *html.Node) (T, error)

ExtractFunc is a function that extracts data from the HTML.

Parameters:

  • doc: The HTML node of the page.

Returns:

  • T: The data extracted from the HTML.
  • error: The error that occurred while extracting the data.

type FilterErrFunc

type FilterErrFunc func(node *html.Node) error

FilterErrFunc is a function that returns an error if a condition is not met.

Parameters:

  • node: The node to check.

Returns:

  • error: An error if the condition is not met.

func FilterDataNode

func FilterDataNode(data string) FilterErrFunc

FilterDataNode returns an FilterErrFunc that checks if the node has the specified data.

Parameters:

  • data: The data to check for.

Returns:

  • FilterErrFunc: The FilterErrFunc that checks if the node has the specified data.

func FilterTextNode

func FilterTextNode(checkFirstChild bool) FilterErrFunc

FilterTextNode returns an FilterErrFunc that checks if the node is a text node.

Parameters:

  • checkFirstChild: If true, the function checks if the first child is a text node. Otherwise, it checks if the node itself is a text node.

Returns:

  • FilterErrFunc: The FilterErrFunc that checks if the node is a text node.

type GTEFunc

type GTEFunc func(tree *HtmlTree) ([]*html.Node, error)

GTEFunc is a function that extracts nodes from a tree.

Parameters:

  • tree: The tree to extract nodes from.

Returns:

  • []*html.Node: The list of nodes extracted from the tree.
  • error: An error if the extraction fails.

func GenericTreeExtraction

func GenericTreeExtraction(search *SearchCriteria, action ActionType) GTEFunc

GenericTreeExtraction creates a GTEFunc from the given parameters.

Parameters:

  • search: The search criteria to use.
  • action: The action to perform on the node.

Returns:

  • GTEFunc: The created GTEFunc.

Behaviors:

  • If search is nil, the function uses a nil filter.
  • If action is not recognized, the function returns a GTEFunc that returns nil.

type HtmlTree

type HtmlTree struct {
	// contains filtered or unexported fields
}

HtmlTree is a struct that represents an HTML tree.

func NewHtmlTree

func NewHtmlTree(root *html.Node) (*HtmlTree, error)

NewHtmlTree constructs a tree from an HTML node.

Parameters:

  • root: The root HTML node.

Returns:

  • *HtmlTree: The tree constructed from the HTML node.
  • error: An error if the tree construction fails.

Errors:

  • *uc.ErrNilValue: If any html.Node is nil.

func (*HtmlTree) ExtractContentFromDocument

func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) (*html.Node, error)

ExtractContentFromDocument performs a depth-first search on an HTML document, finding the first node that matches the provided search criteria.

Parameters:

  • matchFun: The search criteria to apply to each node.

Returns:

  • *html.Node: The first node that matches the search criteria, nil if no matching node is found.

func (*HtmlTree) ExtractNodes

func (t *HtmlTree) ExtractNodes(criterias ...slext.PredicateFilter[*html.Node]) ([]*html.Node, error)

ExtractNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.

Parameters:

  • criterias: A list of search criteria to apply to each node.

Returns:

  • []*html.Node: A slice containing all nodes that match the search criteria.

Behavior:

  • If no criteria is provided, then any node will match.

func (*HtmlTree) ExtractSpecificNode

func (t *HtmlTree) ExtractSpecificNode(matchFun slext.PredicateFilter[*html.Node]) ([]*html.Node, error)

ExtractSpecificNode finds all nodes that match the given search criteria and that are direct children of the provided node.

Parameters:

  • matchFun: The search criteria to apply to each node.

Returns:

  • []*html.Node: A slice containing all nodes that match the search criteria.
  • error: An error if the search fails.

Behavior:

  • If no criteria is provided, then any node will match.

func (*HtmlTree) MatchNodes

func (t *HtmlTree) MatchNodes(matchFun slext.PredicateFilter[*html.Node]) ([]*html.Node, error)

MatchNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.

Parameters:

  • matchFun: The search criteria to apply to each node.

Returns:

  • []*html.Node: A slice containing all nodes that match the search criteria.

Behavior:

  • It does not search the children of the nodes that match the criteria.
  • If no criteria is provided, then the first node will match.

type NodeListParser

type NodeListParser[T any] func(list []*html.Node) (T, error)

NodeListParser is a function that parses a list of nodes.

Parameters:

  • list: The list of nodes to parse.

Returns:

  • T: The parsed value.
  • error: An error if the parsing fails.

func CEWithSearch

func CEWithSearch[T any](search *SearchCriteria, action ActionType, parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]

CEWithSearch creates a NodeListParser from the given parameters.

Parameters:

  • search: The search criteria to use.
  • action: The action to perform on the node.
  • parse: The function that parses the list of nodes.
  • filters: The functions that filter the list of nodes.

Returns:

  • NodeListParser: The created NodeListParser.

Behaviors:

  • If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
  • Nil functions in filters are ignored.
  • Uses a Stack to traverse the tree and GTEFunc to extract nodes.
  • It terminates as soon as a valid result is found.

func CreateExtractor

func CreateExtractor[T any](parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]

CreateExtractor creates a NodeListParser from the given parameters.

Parameters:

  • parse: The function that parses the list of nodes.
  • filters: The functions that filter the list of nodes.

Returns:

  • NodeListParser: The created NodeListParser.

Behaviors:

  • If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
  • Nil functions in filters are ignored.

type SearchCriteria

type SearchCriteria struct {
	// NodeType specifies the type of the HTML node to search for.
	NodeType html.NodeType

	// Data represents the data contained within the node.
	Data *string

	// Attrs is a slice of attribute key-value pairs to match.
	Attrs []*AttrPair
}

SearchCriteria is a struct that encapsulates the parameters for searching within an HTML node.

func NewSearchCriteria

func NewSearchCriteria(node_type html.NodeType) *SearchCriteria

NewSearchCriteria constructs a new SearchCriteria instance using the provided parameters.

Parameters:

  • node_type: The type of the HTML node to search for.

Returns:

  • *SearchCriteria: A new SearchCriteria instance.

func (*SearchCriteria) AppendAttr

func (sc *SearchCriteria) AppendAttr(key string, val slext.PredicateFilter[string]) *SearchCriteria

AppendAttr is a method of the SearchCriteria type that appends an attribute key-value pair to the SearchCriteria instance.

Parameters:

  • key: The attribute key to match.
  • val: The attribute value to match.

Returns:

  • *SearchCriteria: The SearchCriteria instance with the attribute key-value pair appended.

func (*SearchCriteria) Build

Build is a method of the SearchCriteria type that constructs a slext.PredicateFilter function using the search criteria.

Returns:

  • slext.PredicateFilter: A function that matches the search criteria.

func (*SearchCriteria) SetData

func (sc *SearchCriteria) SetData(data string) *SearchCriteria

SetData sets the data field of the SearchCriteria instance.

Parameters:

  • data: The data to set in the SearchCriteria instance.

Returns:

  • *SearchCriteria: The SearchCriteria instance with the data field set.

type TreeNode

type TreeNode struct {
	Parent, FirstChild, NextSibling, LastChild, PrevSibling *TreeNode
	Data                                                    *html.Node
}

TreeNode is a node in a tree.

func NewTreeNode

func NewTreeNode(data *html.Node) *TreeNode

NewTreeNode creates a new node with the given data.

Parameters:

  • Data: The Data of the node.

Returns:

  • *TreeNode: A pointer to the newly created node. It is never nil.

func (*TreeNode) AddChild

func (tn *TreeNode) AddChild(target tree.Noder)

AddChild implements the tree.Noder interface.

func (*TreeNode) AddChildren

func (tn *TreeNode) AddChildren(children []*TreeNode)

AddChildren is a convenience function to add multiple children to the node at once. It is more efficient than adding them one by one. Therefore, the behaviors are the same as the behaviors of the TreeNode.AddChild function.

Parameters:

  • children: The children to add.

func (*TreeNode) Cleanup

func (tn *TreeNode) Cleanup() []tree.Noder

Cleanup implements the tree.Noder interface.

func (*TreeNode) Copy

func (tn *TreeNode) Copy() common.Copier

Copy implements the tree.Noder interface.

Although this function never returns nil, it does not copy the parent nor the sibling pointers.

func (*TreeNode) DeleteChild

func (tn *TreeNode) DeleteChild(target tree.Noder) []tree.Noder

DeleteChild implements the tree.Noder interface.

func (*TreeNode) GetChildren

func (tn *TreeNode) GetChildren() []*TreeNode

GetChildren returns the immediate children of the node.

The returned nodes are never nil and are not copied. Thus, modifying the returned nodes will modify the tree.

Returns:

  • []*TreeNode: A slice of pointers to the children of the node.

func (*TreeNode) GetFirstChild

func (tn *TreeNode) GetFirstChild() tree.Noder

GetFirstChild implements the tree.Noder interface.

func (*TreeNode) GetFirstSibling

func (tn *TreeNode) GetFirstSibling() *TreeNode

GetFirstSibling returns the first sibling of the node. If it has a parent, it returns the first child of the parent. Otherwise, it returns the first sibling of the node.

As an edge case, if the node has no parent and no previous sibling, it returns the node itself. Thus, this function never returns nil.

Returns:

  • *TreeNode: A pointer to the first sibling.

func (*TreeNode) GetLastSibling

func (tn *TreeNode) GetLastSibling() *TreeNode

GetLastSibling returns the last sibling of the node. If it has a parent, it returns the last child of the parent. Otherwise, it returns the last sibling of the node.

As an edge case, if the node has no parent and no next sibling, it returns the node itself. Thus, this function never returns nil.

Returns:

  • *TreeNode: A pointer to the last sibling.

func (*TreeNode) GetParent

func (tn *TreeNode) GetParent() tree.Noder

GetParent implements the tree.Noder interface.

func (*TreeNode) HasChild

func (tn *TreeNode) HasChild(target *TreeNode) bool

HasChild returns true if the node has the given child.

Because children of a node cannot be nil, a nil target will always return false.

Parameters:

  • target: The child to check for.

Returns:

  • bool: True if the node has the child, false otherwise.

func (*TreeNode) IsChildOf

func (tn *TreeNode) IsChildOf(target *TreeNode) bool

IsChildOf returns true if the node is a child of the parent. If target is nil, it returns false.

Parameters:

  • target: The target parent to check for.

Returns:

  • bool: True if the node is a child of the parent, false otherwise.

func (*TreeNode) IsLeaf

func (tn *TreeNode) IsLeaf() bool

IsLeaf implements the tree.Noder interface.

func (*TreeNode) IsSingleton

func (tn *TreeNode) IsSingleton() bool

IsSingleton implements the tree.Noder interface.

func (*TreeNode) Iterator

func (tn *TreeNode) Iterator() common.Iterater[tree.Noder]

Iterator implements the tree.Noder interface.

This function returns an iterator that iterates over the direct children of the node. Implemented as a pull-based iterator, this function never returns nil and any of the values is guaranteed to be a non-nil node of type TreeNode.

func (*TreeNode) LinkChildren

func (tn *TreeNode) LinkChildren(children []tree.Noder)

LinkChildren implements the tree.Noder interface.

func (*TreeNode) RemoveNode

func (tn *TreeNode) RemoveNode() []tree.Noder

RemoveNode implements the tree.Noder interface.

func (*TreeNode) String

func (tn *TreeNode) String() string

String implements the tree.Noder interface.

type TreeNodeIterator

type TreeNodeIterator struct {
	// contains filtered or unexported fields
}

TreeNodeIterator is a pull-based iterator that iterates over the children of a TreeNode.

func (*TreeNodeIterator) Consume

func (iter *TreeNodeIterator) Consume() (tree.Noder, error)

Consume implements the common.Iterater interface.

The only error type that can be returned by this function is the *common.ErrExhaustedIter type.

Moreover, the return value is always of type *TreeNode and never nil; unless the iterator has reached the end of the branch.

func (*TreeNodeIterator) Restart

func (iter *TreeNodeIterator) Restart()

Restart implements the common.Iterater interface.

type WaitFunc

type WaitFunc func(url string) chromedp.Tasks

WaitFunc is a function that waits for a page to load.

Parameters:

  • url: The URL of the page to wait for.

Returns:

  • chromedp.Tasks: The tasks to wait for the page to load.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL