SiteNavigator

package
v0.3.14 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 4, 2024 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// GetChildrenFunc is a function that returns the children of an HTML node.
	GetChildrenFunc tlt.NextsFunc[*html.Node] = func(elem *html.Node, info uc.Copier) ([]*html.Node, error) {
		if elem == nil {
			return nil, ers.NewErrNilValue()
		}

		children := make([]*html.Node, 0)

		for c := elem.FirstChild; c != nil; c = c.NextSibling {
			children = append(children, c)
		}

		return children, nil
	}
)

IsTextNodeSearch is a search criteria that matches text nodes.

Functions

func GetDirectChildren

func GetDirectChildren(node *html.Node) []*html.Node

GetDirectChildren returns a slice of the direct children of the provided node.

Parameters:

  • node: The HTML node to extract the children from.

Returns:

  • []*html.Node: A slice containing the direct children of the node.

Types

type Context added in v0.3.13

type Context struct {
	// contains filtered or unexported fields
}

Context is the context of the session.

func InitializeContext added in v0.3.13

func InitializeContext() *Context

InitializeContext initializes a new context.

Returns:

  • *Context: The new context.

func (*Context) Close added in v0.3.13

func (c *Context) Close()

Close closes the context.

func (*Context) Context added in v0.3.13

func (c *Context) Context() context.Context

Context returns the context of the session.

Returns:

  • context.Context: The context of the session.

func (*Context) GetLastPage added in v0.3.13

func (c *Context) GetLastPage(url string, wait WaitFunc, f ExtractFunc[int]) (int, error)

GetLastPage gets the last page of the URL.

Parameters:

  • url: The URL of the page.
  • waitTask: The task to wait for the page to load.
  • f: The function to extract the last page from the HTML.

Returns:

  • int: The last page of the URL.
  • error: The error that occurred while getting the last page.

func (*Context) GetNodes added in v0.3.13

func (c *Context) GetNodes(sel any, opt func(*chromedp.Selector)) ([]*cdp.Node, error)

GetNodes gets the nodes that match the selector.

Parameters:

  • sel: The selector of the nodes.
  • opt: The options of the selector.

Returns:

  • []*cdp.Node: The nodes that match the selector.
  • error: The error that occurred while getting the nodes.

func (*Context) NewSubContext added in v0.3.13

func (c *Context) NewSubContext() *Context

NewSubContext creates a new sub context.

Returns:

  • *Context: The new sub context.

func (*Context) ParseHTML added in v0.3.13

func (c *Context) ParseHTML(url string, loadedSignal ...chromedp.Action) (*html.Node, error)

ParseHTML parses the HTML of the URL.

Parameters:

  • url: The URL of the HTML.
  • loadedSignal: The signal that the page has loaded.

Returns:

  • *html.Node: The HTML node of the URL.
  • error: The error that occurred while parsing the HTML.

func (*Context) RunTasks added in v0.3.13

func (c *Context) RunTasks(tasks chromedp.Tasks) error

RunTasks runs the tasks on the session.

Parameters:

  • tasks: The tasks to run.

Returns:

  • error: The error that occurred while running the tasks.

type ExtractFunc added in v0.3.13

type ExtractFunc[T any] func(doc *html.Node) (T, error)

ExtractFunc is a function that extracts data from the HTML.

Parameters:

  • doc: The HTML node of the page.

Returns:

  • T: The data extracted from the HTML.
  • error: The error that occurred while extracting the data.

type HtmlTree

type HtmlTree struct {
	// contains filtered or unexported fields
}

HtmlTree is a struct that represents an HTML tree.

func NewHtmlTree

func NewHtmlTree(root *html.Node) (*HtmlTree, error)

NewHtmlTree constructs a tree from an HTML node.

Parameters:

  • root: The root HTML node.

Returns:

  • *HtmlTree: The tree constructed from the HTML node.
  • error: An error if the tree construction fails.

Errors:

  • *ers.ErrNilValue: If any html.Node is nil.

func (*HtmlTree) ExtractContentFromDocument

func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) *html.Node

ExtractContentFromDocument performs a depth-first search on an HTML document, finding the first node that matches the provided search criteria.

Parameters:

  • matchFun: The search criteria to apply to each node.

Returns:

  • *html.Node: The first node that matches the search criteria, nil if no matching node is found.

func (*HtmlTree) ExtractNodes

func (t *HtmlTree) ExtractNodes(criterias ...slext.PredicateFilter[*html.Node]) []*html.Node

ExtractNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.

Parameters:

  • criterias: A list of search criteria to apply to each node.

Returns:

  • []*html.Node: A slice containing all nodes that match the search criteria.

Behavior:

  • If no criteria is provided, then any node will match.

func (*HtmlTree) ExtractSpecificNode

func (t *HtmlTree) ExtractSpecificNode(matchFun slext.PredicateFilter[*html.Node]) []*html.Node

ExtractSpecificNode finds all nodes that match the given search criteria and that are direct children of the provided node.

Parameters:

  • criteria: The search criteria to apply to each node.

Returns:

  • nodes: A slice containing all nodes that match the search criteria.

Behavior:

  • If no criteria is provided, then any node will match.
  • If the node is nil, then a nil slice is returned.

func (*HtmlTree) MatchNodes

func (t *HtmlTree) MatchNodes(matchFun slext.PredicateFilter[*html.Node]) []*html.Node

MatchNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.

Parameters:

  • matchFun: The search criteria to apply to each node.

Returns:

  • []*html.Node: A slice containing all nodes that match the search criteria.

Behavior:

  • It does not search the children of the nodes that match the criteria.
  • If no criteria is provided, then the first node will match.

type SearchCriteria

type SearchCriteria struct {
	// NodeType specifies the type of the HTML node to search for.
	NodeType html.NodeType

	// Data represents the data contained within the node.
	Data *string

	// Attrs is a slice of attribute key-value pairs to match.
	Attrs []*cdp.Pair[string, slext.PredicateFilter[string]]
}

SearchCriteria is a struct that encapsulates the parameters for searching within an HTML node.

func NewSearchCriteria

func NewSearchCriteria(node_type html.NodeType) *SearchCriteria

NewSearchCriteria constructs a new SearchCriteria instance using the provided parameters.

Parameters:

  • node_type: The type of the HTML node to search for.

Returns:

  • *SearchCriteria: A new SearchCriteria instance.

func (*SearchCriteria) AppendAttr

func (sc *SearchCriteria) AppendAttr(key string, val slext.PredicateFilter[string]) *SearchCriteria

AppendAttr is a method of the SearchCriteria type that appends an attribute key-value pair to the SearchCriteria instance.

Parameters:

  • key: The attribute key to match.
  • val: The attribute value to match.

Returns:

  • *SearchCriteria: The SearchCriteria instance with the attribute key-value pair appended.

func (*SearchCriteria) Build

Build is a method of the SearchCriteria type that constructs a slext.PredicateFilter function using the search criteria.

Returns:

  • slext.PredicateFilter: A function that matches the search criteria.

func (*SearchCriteria) SetData

func (sc *SearchCriteria) SetData(data string) *SearchCriteria

SetData sets the data field of the SearchCriteria instance.

Parameters:

  • data: The data to set in the SearchCriteria instance.

Returns:

  • *SearchCriteria: The SearchCriteria instance with the data field set.

type WaitFunc added in v0.3.13

type WaitFunc func(url string) chromedp.Tasks

WaitFunc is a function that waits for a page to load.

Parameters:

  • url: The URL of the page to wait for.

Returns:

  • chromedp.Tasks: The tasks to wait for the page to load.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL