htmlquery

package
v1.2.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 7, 2023 License: AGPL-3.0 Imports: 13 Imported by: 1

Documentation

Overview

Package htmlquery provides extract data from HTML documents using XPath expression.

Index

Constants

This section is empty.

Variables

View Source
var DisableSelectorCache = false

DisableSelectorCache will disable caching for the query selector if value is true.

View Source
var Exports = map[string]interface{}{
	"LoadHTMLDocument": func(htmlText interface{}) (*html.Node, error) {
		return Parse(strings.NewReader(utils.InterfaceToString(htmlText)))
	},
	"Find":                 Find,
	"FindOne":              FindOne,
	"QueryAll":             QueryAll,
	"Query":                Query,
	"InnerText":            InnerText,
	"SelectAttr":           SelectAttr,
	"ExistedAttr":          ExistsAttr,
	"CreateXPathNavigator": CreateXPathNavigator,

	"OutputHTML": func(doc *html.Node) string {
		return OutputHTML(doc, false)
	},
	"OutputHTMLSelf": func(doc *html.Node) string {
		return OutputHTML(doc, true)
	},
}
View Source
var SelectorCacheMaxEntries = 50

SelectorCacheMaxEntries allows how many selector object can be caching. Default is 50. Will disable caching if SelectorCacheMaxEntries <= 0.

Functions

func ExistsAttr

func ExistsAttr(n *html.Node, name string) bool

ExistsAttr returns whether attribute with specified name exists.

func Find

func Find(top *html.Node, expr string) []*html.Node

Find is like QueryAll but Will panics if the expression `expr` cannot be parsed.

See `QueryAll()` function.

func FindOne

func FindOne(top *html.Node, expr string) *html.Node

FindOne is like Query but will panics if the expression `expr` cannot be parsed. See `Query()` function.

func InnerText

func InnerText(n *html.Node) string

InnerText returns the text between the start and end tags of the object.

func LoadDoc

func LoadDoc(path string) (*html.Node, error)

LoadDoc loads the HTML document from the specified file path.

func LoadURL

func LoadURL(url string) (*html.Node, error)

LoadURL loads the HTML document from the specified URL.

func OutputHTML

func OutputHTML(n *html.Node, self bool) string

OutputHTML returns the text including tags name.

func Parse

func Parse(r io.Reader) (*html.Node, error)

Parse returns the parse tree for the HTML from the given Reader.

func Query

func Query(top *html.Node, expr string) (*html.Node, error)

Query runs the given XPath expression against the given html.Node and returns the first matching html.Node, or nil if no matches are found.

Returns an error if the expression `expr` cannot be parsed.

func QueryAll

func QueryAll(top *html.Node, expr string) ([]*html.Node, error)

QueryAll searches the html.Node that matches by the specified XPath expr. Return an error if the expression `expr` cannot be parsed.

func QuerySelector

func QuerySelector(top *html.Node, selector *xpath.Expr) *html.Node

QuerySelector returns the first matched html.Node by the specified XPath selector.

func QuerySelectorAll

func QuerySelectorAll(top *html.Node, selector *xpath.Expr) []*html.Node

QuerySelectorAll searches all of the html.Node that matches the specified XPath selectors.

func SelectAttr

func SelectAttr(n *html.Node, name string) (val string)

SelectAttr returns the attribute value with the specified name.

Types

type NodeNavigator

type NodeNavigator struct {
	// contains filtered or unexported fields
}

func CreateXPathNavigator

func CreateXPathNavigator(top *html.Node) *NodeNavigator

CreateXPathNavigator creates a new xpath.NodeNavigator for the specified html.Node.

func (*NodeNavigator) Copy

func (h *NodeNavigator) Copy() xpath.NodeNavigator

func (*NodeNavigator) Current

func (h *NodeNavigator) Current() *html.Node

func (*NodeNavigator) LocalName

func (h *NodeNavigator) LocalName() string

func (*NodeNavigator) MoveTo

func (h *NodeNavigator) MoveTo(other xpath.NodeNavigator) bool

func (*NodeNavigator) MoveToChild

func (h *NodeNavigator) MoveToChild() bool

func (*NodeNavigator) MoveToFirst

func (h *NodeNavigator) MoveToFirst() bool

func (*NodeNavigator) MoveToNext

func (h *NodeNavigator) MoveToNext() bool

func (*NodeNavigator) MoveToNextAttribute

func (h *NodeNavigator) MoveToNextAttribute() bool

func (*NodeNavigator) MoveToParent

func (h *NodeNavigator) MoveToParent() bool

func (*NodeNavigator) MoveToPrevious

func (h *NodeNavigator) MoveToPrevious() bool

func (*NodeNavigator) MoveToRoot

func (h *NodeNavigator) MoveToRoot()

func (*NodeNavigator) NodeType

func (h *NodeNavigator) NodeType() xpath.NodeType

func (*NodeNavigator) Prefix

func (*NodeNavigator) Prefix() string

func (*NodeNavigator) String

func (h *NodeNavigator) String() string

func (*NodeNavigator) Value

func (h *NodeNavigator) Value() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL