SiteNavigator

package
v0.3.26 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2024 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// FilterNilFEFuncs is a predicate filter that filters out nil FilterErrFuncs.
	FilterNilFEFuncs us.PredicateFilter[FilterErrFunc]
)
View Source
var (
	// GetChildrenFunc is a function that returns the children of an HTML node.
	GetChildrenFunc tlt.NextsFunc[*html.Node] = func(elem *html.Node, info uc.Copier) ([]*html.Node, error) {
		if elem == nil {
			return nil, ers.NewErrNilValue()
		}

		children := make([]*html.Node, 0)

		for c := elem.FirstChild; c != nil; c = c.NextSibling {
			children = append(children, c)
		}

		return children, nil
	}
)

IsTextNodeSearch is a search criteria that matches text nodes.

Functions

func FilterValidNodes added in v0.3.15

func FilterValidNodes(list []*html.Node, filters []FilterErrFunc) ([]*html.Node, error)

FilterValidNodes filters the valid nodes from a list.

Parameters:

  • list: The list of nodes to filter.
  • filters: The functions to check if a node is valid.

Returns:

  • []*html.Node: The list of valid nodes.
  • error: An error if no valid nodes are found.

Behaviors:

  • If no valid nodes are found, the function returns the first error encountered.
  • If list is empty or filters is empty, the function returns list, nil.

func GetDirectChildren

func GetDirectChildren(node *html.Node) []*html.Node

GetDirectChildren returns a slice of the direct children of the provided node.

Parameters:

  • node: The HTML node to extract the children from.

Returns:

  • []*html.Node: A slice containing the direct children of the node.

Types

type ActionType added in v0.3.15

type ActionType int8

ActionType is an enumeration of the different actions that can be performed on a node.

const (
	// OnlyDirectChildren is an action that extracts only the direct children of a node.
	OnlyDirectChildren ActionType = iota

	// DFSOne is an action that extracts only one node using depth-first search.
	DFSOne

	// BFSMany is an action that extracts multiple nodes using breadth-first search.
	BFSMany
)

type Context added in v0.3.13

type Context struct {
	// contains filtered or unexported fields
}

Context is the context of the session.

func InitializeContext added in v0.3.13

func InitializeContext() *Context

InitializeContext initializes a new context.

Returns:

  • *Context: The new context.

func (*Context) Close added in v0.3.13

func (c *Context) Close()

Close closes the context.

func (*Context) Context added in v0.3.13

func (c *Context) Context() context.Context

Context returns the context of the session.

Returns:

  • context.Context: The context of the session.

func (*Context) GetLastPage added in v0.3.13

func (c *Context) GetLastPage(url string, wait WaitFunc, f ExtractFunc[int]) (int, error)

GetLastPage gets the last page of the URL.

Parameters:

  • url: The URL of the page.
  • waitTask: The task to wait for the page to load.
  • f: The function to extract the last page from the HTML.

Returns:

  • int: The last page of the URL.
  • error: The error that occurred while getting the last page.

func (*Context) GetNodes added in v0.3.13

func (c *Context) GetNodes(sel any, opt func(*chromedp.Selector)) ([]*cdp.Node, error)

GetNodes gets the nodes that match the selector.

Parameters:

  • sel: The selector of the nodes.
  • opt: The options of the selector.

Returns:

  • []*cdp.Node: The nodes that match the selector.
  • error: The error that occurred while getting the nodes.

func (*Context) NewSubContext added in v0.3.13

func (c *Context) NewSubContext() *Context

NewSubContext creates a new sub context.

Returns:

  • *Context: The new sub context.

func (*Context) ParseHTML added in v0.3.13

func (c *Context) ParseHTML(url string, loadedSignal ...chromedp.Action) (*html.Node, error)

ParseHTML parses the HTML of the URL.

Parameters:

  • url: The URL of the HTML.
  • loadedSignal: The signal that the page has loaded.

Returns:

  • *html.Node: The HTML node of the URL.
  • error: The error that occurred while parsing the HTML.

func (*Context) RunTasks added in v0.3.13

func (c *Context) RunTasks(tasks chromedp.Tasks) error

RunTasks runs the tasks on the session.

Parameters:

  • tasks: The tasks to run.

Returns:

  • error: The error that occurred while running the tasks.

type ErrNoDataNodeFound added in v0.3.15

type ErrNoDataNodeFound struct {
	// Data is the data that was not found.
	Data string
}

ErrNoDataNodeFound is an error that is returned when no data nodes are found.

func NewErrNoDataNodeFound added in v0.3.15

func NewErrNoDataNodeFound(data string) *ErrNoDataNodeFound

NewErrNoDataNodeFound creates a new ErrNoDataNodeFound error.

Parameters:

  • data: The data that was not found.

Returns:

  • *ErrNoDataNodeFound: The new error.

func (*ErrNoDataNodeFound) Error added in v0.3.15

func (e *ErrNoDataNodeFound) Error() string

Error implements the error interface.

It returns the error message: "no <data> tags found".

type ErrNoNodesFound added in v0.3.15

type ErrNoNodesFound struct{}

ErrNoNodesFound is an error that is returned when no nodes are found.

func NewErrNoNodesFound added in v0.3.15

func NewErrNoNodesFound() *ErrNoNodesFound

NewErrNoNodesFound creates a new ErrNoNodesFound error.

Returns:

  • *ErrNoNodesFound: The new error.

func (*ErrNoNodesFound) Error added in v0.3.15

func (e *ErrNoNodesFound) Error() string

Error implements the error interface.

It returns the error message: "no nodes found".

type ErrNoTextNodeFound added in v0.3.15

type ErrNoTextNodeFound struct {
	IsFirstChild bool
}

ErrNoTextNodeFound is an error that is returned when no text nodes are found.

func NewErrNoTextNodeFound added in v0.3.15

func NewErrNoTextNodeFound(isFirstChild bool) *ErrNoTextNodeFound

NewErrNoTextNodeFound creates a new ErrNoTextNodeFound error.

Parameters:

  • isFirstChild: Whether the first child is not a text node.

Returns:

  • *ErrNoTextNodeFound: The new error.

func (*ErrNoTextNodeFound) Error added in v0.3.15

func (e *ErrNoTextNodeFound) Error() string

Error implements the error interface.

It returns the error message: "node is not a text node". However, if IsFirstChild is true, it returns the error message: "first child is not a text node".

type ExtractFunc added in v0.3.13

type ExtractFunc[T any] func(doc *html.Node) (T, error)

ExtractFunc is a function that extracts data from the HTML.

Parameters:

  • doc: The HTML node of the page.

Returns:

  • T: The data extracted from the HTML.
  • error: The error that occurred while extracting the data.

type FilterErrFunc added in v0.3.15

type FilterErrFunc func(node *html.Node) error

FilterErrFunc is a function that returns an error if a condition is not met.

Parameters:

  • node: The node to check.

Returns:

  • error: An error if the condition is not met.

func FilterDataNode added in v0.3.15

func FilterDataNode(data string) FilterErrFunc

FilterDataNode returns an FilterErrFunc that checks if the node has the specified data.

Parameters:

  • data: The data to check for.

Returns:

  • FilterErrFunc: The FilterErrFunc that checks if the node has the specified data.

func FilterTextNode added in v0.3.15

func FilterTextNode(checkFirstChild bool) FilterErrFunc

FilterTextNode returns an FilterErrFunc that checks if the node is a text node.

Parameters:

  • checkFirstChild: If true, the function checks if the first child is a text node. Otherwise, it checks if the node itself is a text node.

Returns:

  • FilterErrFunc: The FilterErrFunc that checks if the node is a text node.

type GTEFunc added in v0.3.15

type GTEFunc func(tree *HtmlTree) []*html.Node

GTEFunc is a function that extracts nodes from a tree.

Parameters:

  • tree: The tree to extract nodes from.

Returns:

  • []*html.Node: The list of nodes extracted from the tree.

func GenericTreeExtraction added in v0.3.15

func GenericTreeExtraction(search *SearchCriteria, action ActionType) GTEFunc

GenericTreeExtraction creates a GTEFunc from the given parameters.

Parameters:

  • search: The search criteria to use.
  • action: The action to perform on the node.

Returns:

  • GTEFunc: The created GTEFunc.

Behaviors:

  • If search is nil, the function uses a nil filter.
  • If action is not recognized, the function returns a GTEFunc that returns nil.

type HtmlTree

type HtmlTree struct {
	// contains filtered or unexported fields
}

HtmlTree is a struct that represents an HTML tree.

func NewHtmlTree

func NewHtmlTree(root *html.Node) (*HtmlTree, error)

NewHtmlTree constructs a tree from an HTML node.

Parameters:

  • root: The root HTML node.

Returns:

  • *HtmlTree: The tree constructed from the HTML node.
  • error: An error if the tree construction fails.

Errors:

  • *ers.ErrNilValue: If any html.Node is nil.

func (*HtmlTree) ExtractContentFromDocument

func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) *html.Node

ExtractContentFromDocument performs a depth-first search on an HTML document, finding the first node that matches the provided search criteria.

Parameters:

  • matchFun: The search criteria to apply to each node.

Returns:

  • *html.Node: The first node that matches the search criteria, nil if no matching node is found.

func (*HtmlTree) ExtractNodes

func (t *HtmlTree) ExtractNodes(criterias ...slext.PredicateFilter[*html.Node]) []*html.Node

ExtractNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.

Parameters:

  • criterias: A list of search criteria to apply to each node.

Returns:

  • []*html.Node: A slice containing all nodes that match the search criteria.

Behavior:

  • If no criteria is provided, then any node will match.

func (*HtmlTree) ExtractSpecificNode

func (t *HtmlTree) ExtractSpecificNode(matchFun slext.PredicateFilter[*html.Node]) []*html.Node

ExtractSpecificNode finds all nodes that match the given search criteria and that are direct children of the provided node.

Parameters:

  • criteria: The search criteria to apply to each node.

Returns:

  • nodes: A slice containing all nodes that match the search criteria.

Behavior:

  • If no criteria is provided, then any node will match.
  • If the node is nil, then a nil slice is returned.

func (*HtmlTree) MatchNodes

func (t *HtmlTree) MatchNodes(matchFun slext.PredicateFilter[*html.Node]) []*html.Node

MatchNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.

Parameters:

  • matchFun: The search criteria to apply to each node.

Returns:

  • []*html.Node: A slice containing all nodes that match the search criteria.

Behavior:

  • It does not search the children of the nodes that match the criteria.
  • If no criteria is provided, then the first node will match.

type NodeListParser added in v0.3.15

type NodeListParser[T any] func(list []*html.Node) (T, error)

NodeListParser is a function that parses a list of nodes.

Parameters:

  • list: The list of nodes to parse.

Returns:

  • T: The parsed value.
  • error: An error if the parsing fails.

func CEWithSearch added in v0.3.15

func CEWithSearch[T any](search *SearchCriteria, action ActionType, parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]

CEWithSearch creates a NodeListParser from the given parameters.

Parameters:

  • search: The search criteria to use.
  • action: The action to perform on the node.
  • parse: The function that parses the list of nodes.
  • filters: The functions that filter the list of nodes.

Returns:

  • NodeListParser: The created NodeListParser.

Behaviors:

  • If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
  • Nil functions in filters are ignored.
  • Uses a Stack to traverse the tree and GTEFunc to extract nodes.
  • It terminates as soon as a valid result is found.

func CreateExtractor added in v0.3.15

func CreateExtractor[T any](parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]

CreateExtractor creates a NodeListParser from the given parameters.

Parameters:

  • parse: The function that parses the list of nodes.
  • filters: The functions that filter the list of nodes.

Returns:

  • NodeListParser: The created NodeListParser.

Behaviors:

  • If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
  • Nil functions in filters are ignored.

type SearchCriteria

type SearchCriteria struct {
	// NodeType specifies the type of the HTML node to search for.
	NodeType html.NodeType

	// Data represents the data contained within the node.
	Data *string

	// Attrs is a slice of attribute key-value pairs to match.
	Attrs []*uc.Pair[string, slext.PredicateFilter[string]]
}

SearchCriteria is a struct that encapsulates the parameters for searching within an HTML node.

func NewSearchCriteria

func NewSearchCriteria(node_type html.NodeType) *SearchCriteria

NewSearchCriteria constructs a new SearchCriteria instance using the provided parameters.

Parameters:

  • node_type: The type of the HTML node to search for.

Returns:

  • *SearchCriteria: A new SearchCriteria instance.

func (*SearchCriteria) AppendAttr

func (sc *SearchCriteria) AppendAttr(key string, val slext.PredicateFilter[string]) *SearchCriteria

AppendAttr is a method of the SearchCriteria type that appends an attribute key-value pair to the SearchCriteria instance.

Parameters:

  • key: The attribute key to match.
  • val: The attribute value to match.

Returns:

  • *SearchCriteria: The SearchCriteria instance with the attribute key-value pair appended.

func (*SearchCriteria) Build

Build is a method of the SearchCriteria type that constructs a slext.PredicateFilter function using the search criteria.

Returns:

  • slext.PredicateFilter: A function that matches the search criteria.

func (*SearchCriteria) SetData

func (sc *SearchCriteria) SetData(data string) *SearchCriteria

SetData sets the data field of the SearchCriteria instance.

Parameters:

  • data: The data to set in the SearchCriteria instance.

Returns:

  • *SearchCriteria: The SearchCriteria instance with the data field set.

type WaitFunc added in v0.3.13

type WaitFunc func(url string) chromedp.Tasks

WaitFunc is a function that waits for a page to load.

Parameters:

  • url: The URL of the page to wait for.

Returns:

  • chromedp.Tasks: The tasks to wait for the page to load.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL