SiteNavigator

package

v0.3.26 Latest Latest Go to latest Published: Jun 15, 2024 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/PlayerR9/MyGoLib

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
func FilterValidNodes(list []*html.Node, filters []FilterErrFunc) ([]*html.Node, error)
func GetDirectChildren(node *html.Node) []*html.Node
type ActionType
type Context
- func InitializeContext() *Context
- func (c *Context) Close()
- func (c *Context) Context() context.Context
- func (c *Context) GetLastPage(url string, wait WaitFunc, f ExtractFunc[int]) (int, error)
- func (c *Context) GetNodes(sel any, opt func(*chromedp.Selector)) ([]*cdp.Node, error)
- func (c *Context) NewSubContext() *Context
- func (c *Context) ParseHTML(url string, loadedSignal ...chromedp.Action) (*html.Node, error)
- func (c *Context) RunTasks(tasks chromedp.Tasks) error
type ErrNoDataNodeFound
- func NewErrNoDataNodeFound(data string) *ErrNoDataNodeFound
- func (e *ErrNoDataNodeFound) Error() string
type ErrNoNodesFound
- func NewErrNoNodesFound() *ErrNoNodesFound
- func (e *ErrNoNodesFound) Error() string
type ErrNoTextNodeFound
- func NewErrNoTextNodeFound(isFirstChild bool) *ErrNoTextNodeFound
- func (e *ErrNoTextNodeFound) Error() string
type ExtractFunc
type FilterErrFunc
- func FilterDataNode(data string) FilterErrFunc
- func FilterTextNode(checkFirstChild bool) FilterErrFunc
type GTEFunc
- func GenericTreeExtraction(search *SearchCriteria, action ActionType) GTEFunc
type HtmlTree
- func NewHtmlTree(root *html.Node) (*HtmlTree, error)
- func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) *html.Node
- func (t *HtmlTree) ExtractNodes(criterias ...slext.PredicateFilter[*html.Node]) []*html.Node
- func (t *HtmlTree) ExtractSpecificNode(matchFun slext.PredicateFilter[*html.Node]) []*html.Node
- func (t *HtmlTree) MatchNodes(matchFun slext.PredicateFilter[*html.Node]) []*html.Node
type NodeListParser
- func CEWithSearch[T any](search *SearchCriteria, action ActionType, parse NodeListParser[T], ...) NodeListParser[T]
- func CreateExtractor[T any](parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]
type SearchCriteria
- func NewSearchCriteria(node_type html.NodeType) *SearchCriteria
- func (sc *SearchCriteria) AppendAttr(key string, val slext.PredicateFilter[string]) *SearchCriteria
- func (sc *SearchCriteria) Build() slext.PredicateFilter[*html.Node]
- func (sc *SearchCriteria) SetData(data string) *SearchCriteria
type WaitFunc

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// FilterNilFEFuncs is a predicate filter that filters out nil FilterErrFuncs.
	FilterNilFEFuncs us.PredicateFilter[FilterErrFunc]
)

View Source

var (
	// GetChildrenFunc is a function that returns the children of an HTML node.
	GetChildrenFunc tlt.NextsFunc[*html.Node] = func(elem *html.Node, info uc.Copier) ([]*html.Node, error) {
		if elem == nil {
			return nil, ers.NewErrNilValue()
		}

		children := make([]*html.Node, 0)

		for c := elem.FirstChild; c != nil; c = c.NextSibling {
			children = append(children, c)
		}

		return children, nil
	}
)

View Source

var IsTextNodeSearch slext.PredicateFilter[*html.Node] = NewSearchCriteria(html.TextNode).Build()

IsTextNodeSearch is a search criteria that matches text nodes.

Functions ¶

func FilterValidNodes ¶ added in v0.3.15

func FilterValidNodes(list []*html.Node, filters []FilterErrFunc) ([]*html.Node, error)

FilterValidNodes filters the valid nodes from a list.

Parameters:

list: The list of nodes to filter.
filters: The functions to check if a node is valid.

Returns:

[]*html.Node: The list of valid nodes.
error: An error if no valid nodes are found.

Behaviors:

If no valid nodes are found, the function returns the first error encountered.
If list is empty or filters is empty, the function returns list, nil.

func GetDirectChildren ¶

func GetDirectChildren(node *html.Node) []*html.Node

GetDirectChildren returns a slice of the direct children of the provided node.

Parameters:

node: The HTML node to extract the children from.

Returns:

[]*html.Node: A slice containing the direct children of the node.

Types ¶

type ActionType ¶ added in v0.3.15

type ActionType int8

ActionType is an enumeration of the different actions that can be performed on a node.

const (
	// OnlyDirectChildren is an action that extracts only the direct children of a node.
	OnlyDirectChildren ActionType = iota

	// DFSOne is an action that extracts only one node using depth-first search.
	DFSOne

	// BFSMany is an action that extracts multiple nodes using breadth-first search.
	BFSMany
)

type Context ¶ added in v0.3.13

type Context struct {
	// contains filtered or unexported fields
}

Context is the context of the session.

func InitializeContext ¶ added in v0.3.13

func InitializeContext() *Context

InitializeContext initializes a new context.

Returns:

*Context: The new context.

func (*Context) Close ¶ added in v0.3.13

func (c *Context) Close()

Close closes the context.

func (*Context) Context ¶ added in v0.3.13

func (c *Context) Context() context.Context

Context returns the context of the session.

Returns:

context.Context: The context of the session.

func (*Context) GetLastPage ¶ added in v0.3.13

func (c *Context) GetLastPage(url string, wait WaitFunc, f ExtractFunc[int]) (int, error)

GetLastPage gets the last page of the URL.

Parameters:

url: The URL of the page.
waitTask: The task to wait for the page to load.
f: The function to extract the last page from the HTML.

Returns:

int: The last page of the URL.
error: The error that occurred while getting the last page.

func (*Context) GetNodes ¶ added in v0.3.13

func (c *Context) GetNodes(sel any, opt func(*chromedp.Selector)) ([]*cdp.Node, error)

GetNodes gets the nodes that match the selector.

Parameters:

sel: The selector of the nodes.
opt: The options of the selector.

Returns:

[]*cdp.Node: The nodes that match the selector.
error: The error that occurred while getting the nodes.

func (*Context) NewSubContext ¶ added in v0.3.13

func (c *Context) NewSubContext() *Context

NewSubContext creates a new sub context.

Returns:

*Context: The new sub context.

func (*Context) ParseHTML ¶ added in v0.3.13

func (c *Context) ParseHTML(url string, loadedSignal ...chromedp.Action) (*html.Node, error)

ParseHTML parses the HTML of the URL.

Parameters:

url: The URL of the HTML.
loadedSignal: The signal that the page has loaded.

Returns:

*html.Node: The HTML node of the URL.
error: The error that occurred while parsing the HTML.

func (*Context) RunTasks ¶ added in v0.3.13

func (c *Context) RunTasks(tasks chromedp.Tasks) error

RunTasks runs the tasks on the session.

Parameters:

tasks: The tasks to run.

Returns:

error: The error that occurred while running the tasks.

type ErrNoDataNodeFound ¶ added in v0.3.15

type ErrNoDataNodeFound struct {
	// Data is the data that was not found.
	Data string
}

ErrNoDataNodeFound is an error that is returned when no data nodes are found.

func NewErrNoDataNodeFound ¶ added in v0.3.15

func NewErrNoDataNodeFound(data string) *ErrNoDataNodeFound

NewErrNoDataNodeFound creates a new ErrNoDataNodeFound error.

Parameters:

data: The data that was not found.

Returns:

*ErrNoDataNodeFound: The new error.

func (*ErrNoDataNodeFound) Error ¶ added in v0.3.15

func (e *ErrNoDataNodeFound) Error() string

Error implements the error interface.

It returns the error message: "no <data> tags found".

type ErrNoNodesFound ¶ added in v0.3.15

type ErrNoNodesFound struct{}

ErrNoNodesFound is an error that is returned when no nodes are found.

func NewErrNoNodesFound ¶ added in v0.3.15

func NewErrNoNodesFound() *ErrNoNodesFound

NewErrNoNodesFound creates a new ErrNoNodesFound error.

Returns:

*ErrNoNodesFound: The new error.

func (*ErrNoNodesFound) Error ¶ added in v0.3.15

func (e *ErrNoNodesFound) Error() string

Error implements the error interface.

It returns the error message: "no nodes found".

type ErrNoTextNodeFound ¶ added in v0.3.15

type ErrNoTextNodeFound struct {
	IsFirstChild bool
}

ErrNoTextNodeFound is an error that is returned when no text nodes are found.

func NewErrNoTextNodeFound ¶ added in v0.3.15

func NewErrNoTextNodeFound(isFirstChild bool) *ErrNoTextNodeFound

NewErrNoTextNodeFound creates a new ErrNoTextNodeFound error.

Parameters:

isFirstChild: Whether the first child is not a text node.

Returns:

*ErrNoTextNodeFound: The new error.

func (*ErrNoTextNodeFound) Error ¶ added in v0.3.15

func (e *ErrNoTextNodeFound) Error() string

Error implements the error interface.

It returns the error message: "node is not a text node". However, if IsFirstChild is true, it returns the error message: "first child is not a text node".

type ExtractFunc ¶ added in v0.3.13

type ExtractFunc[T any] func(doc *html.Node) (T, error)

ExtractFunc is a function that extracts data from the HTML.

Parameters:

doc: The HTML node of the page.

Returns:

T: The data extracted from the HTML.
error: The error that occurred while extracting the data.

type FilterErrFunc ¶ added in v0.3.15

type FilterErrFunc func(node *html.Node) error

FilterErrFunc is a function that returns an error if a condition is not met.

Parameters:

node: The node to check.

Returns:

error: An error if the condition is not met.

func FilterDataNode ¶ added in v0.3.15

func FilterDataNode(data string) FilterErrFunc

FilterDataNode returns an FilterErrFunc that checks if the node has the specified data.

Parameters:

data: The data to check for.

Returns:

FilterErrFunc: The FilterErrFunc that checks if the node has the specified data.

func FilterTextNode ¶ added in v0.3.15

func FilterTextNode(checkFirstChild bool) FilterErrFunc

FilterTextNode returns an FilterErrFunc that checks if the node is a text node.

Parameters:

checkFirstChild: If true, the function checks if the first child is a text node. Otherwise, it checks if the node itself is a text node.

Returns:

FilterErrFunc: The FilterErrFunc that checks if the node is a text node.

type GTEFunc ¶ added in v0.3.15

type GTEFunc func(tree *HtmlTree) []*html.Node

GTEFunc is a function that extracts nodes from a tree.

Parameters:

tree: The tree to extract nodes from.

Returns:

[]*html.Node: The list of nodes extracted from the tree.

func GenericTreeExtraction ¶ added in v0.3.15

func GenericTreeExtraction(search *SearchCriteria, action ActionType) GTEFunc

GenericTreeExtraction creates a GTEFunc from the given parameters.

Parameters:

search: The search criteria to use.
action: The action to perform on the node.

Returns:

GTEFunc: The created GTEFunc.

Behaviors:

If search is nil, the function uses a nil filter.
If action is not recognized, the function returns a GTEFunc that returns nil.

type HtmlTree ¶

type HtmlTree struct {
	// contains filtered or unexported fields
}

HtmlTree is a struct that represents an HTML tree.

func NewHtmlTree ¶

func NewHtmlTree(root *html.Node) (*HtmlTree, error)

NewHtmlTree constructs a tree from an HTML node.

Parameters:

root: The root HTML node.

Returns:

*HtmlTree: The tree constructed from the HTML node.
error: An error if the tree construction fails.

Errors:

*ers.ErrNilValue: If any html.Node is nil.

func (*HtmlTree) ExtractContentFromDocument ¶

func (t *HtmlTree) ExtractContentFromDocument(matchFun slext.PredicateFilter[*html.Node]) *html.Node

ExtractContentFromDocument performs a depth-first search on an HTML document, finding the first node that matches the provided search criteria.

Parameters:

matchFun: The search criteria to apply to each node.

Returns:

*html.Node: The first node that matches the search criteria, nil if no matching node is found.

func (*HtmlTree) ExtractNodes ¶

func (t *HtmlTree) ExtractNodes(criterias ...slext.PredicateFilter[*html.Node]) []*html.Node

ExtractNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.

Parameters:

criterias: A list of search criteria to apply to each node.

Returns:

[]*html.Node: A slice containing all nodes that match the search criteria.

Behavior:

If no criteria is provided, then any node will match.

func (*HtmlTree) ExtractSpecificNode ¶

func (t *HtmlTree) ExtractSpecificNode(matchFun slext.PredicateFilter[*html.Node]) []*html.Node

ExtractSpecificNode finds all nodes that match the given search criteria and that are direct children of the provided node.

Parameters:

criteria: The search criteria to apply to each node.

Returns:

nodes: A slice containing all nodes that match the search criteria.

Behavior:

If no criteria is provided, then any node will match.
If the node is nil, then a nil slice is returned.

func (*HtmlTree) MatchNodes ¶

func (t *HtmlTree) MatchNodes(matchFun slext.PredicateFilter[*html.Node]) []*html.Node

MatchNodes performs a breadth-first search on an HTML section returning a slice of nodes that match the provided search criteria.

Parameters:

matchFun: The search criteria to apply to each node.

Returns:

[]*html.Node: A slice containing all nodes that match the search criteria.

Behavior:

It does not search the children of the nodes that match the criteria.
If no criteria is provided, then the first node will match.

type NodeListParser ¶ added in v0.3.15

type NodeListParser[T any] func(list []*html.Node) (T, error)

NodeListParser is a function that parses a list of nodes.

Parameters:

list: The list of nodes to parse.

Returns:

T: The parsed value.
error: An error if the parsing fails.

func CEWithSearch ¶ added in v0.3.15

func CEWithSearch[T any](search *SearchCriteria, action ActionType, parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]

CEWithSearch creates a NodeListParser from the given parameters.

Parameters:

search: The search criteria to use.
action: The action to perform on the node.
parse: The function that parses the list of nodes.
filters: The functions that filter the list of nodes.

Returns:

NodeListParser: The created NodeListParser.

Behaviors:

If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
Nil functions in filters are ignored.
Uses a Stack to traverse the tree and GTEFunc to extract nodes.
It terminates as soon as a valid result is found.

func CreateExtractor ¶ added in v0.3.15

func CreateExtractor[T any](parse NodeListParser[T], filters ...FilterErrFunc) NodeListParser[T]

CreateExtractor creates a NodeListParser from the given parameters.

Parameters:

parse: The function that parses the list of nodes.
filters: The functions that filter the list of nodes.

Returns:

NodeListParser: The created NodeListParser.

Behaviors:

If parse is nil, the function returns a NodeListParser that returns the error *errors.ErrInvalidParameter.
Nil functions in filters are ignored.

type SearchCriteria ¶

type SearchCriteria struct {
	// NodeType specifies the type of the HTML node to search for.
	NodeType html.NodeType

	// Data represents the data contained within the node.
	Data *string

	// Attrs is a slice of attribute key-value pairs to match.
	Attrs []*uc.Pair[string, slext.PredicateFilter[string]]
}

SearchCriteria is a struct that encapsulates the parameters for searching within an HTML node.

func NewSearchCriteria ¶

func NewSearchCriteria(node_type html.NodeType) *SearchCriteria

NewSearchCriteria constructs a new SearchCriteria instance using the provided parameters.

Parameters:

node_type: The type of the HTML node to search for.

Returns:

*SearchCriteria: A new SearchCriteria instance.

func (*SearchCriteria) AppendAttr ¶

func (sc *SearchCriteria) AppendAttr(key string, val slext.PredicateFilter[string]) *SearchCriteria

AppendAttr is a method of the SearchCriteria type that appends an attribute key-value pair to the SearchCriteria instance.

Parameters:

key: The attribute key to match.
val: The attribute value to match.

Returns:

*SearchCriteria: The SearchCriteria instance with the attribute key-value pair appended.

func (*SearchCriteria) Build ¶

func (sc *SearchCriteria) Build() slext.PredicateFilter[*html.Node]

Build is a method of the SearchCriteria type that constructs a slext.PredicateFilter function using the search criteria.

Returns:

slext.PredicateFilter: A function that matches the search criteria.

func (*SearchCriteria) SetData ¶

func (sc *SearchCriteria) SetData(data string) *SearchCriteria

SetData sets the data field of the SearchCriteria instance.

Parameters:

data: The data to set in the SearchCriteria instance.

Returns:

*SearchCriteria: The SearchCriteria instance with the data field set.

type WaitFunc ¶ added in v0.3.13

type WaitFunc func(url string) chromedp.Tasks

WaitFunc is a function that waits for a page to load.

Parameters:

url: The URL of the page to wait for.

Returns:

chromedp.Tasks: The tasks to wait for the page to load.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL