htree

package module
v2.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 2, 2024 License: MIT Imports: 6 Imported by: 1

README

Htree - Go package for working with html.Node trees

Go Reference Go Report Card Tests Coverage Status Mentioned in Awesome Go

This is htree, a Go package that helps traverse, navigate, filter, and otherwise process trees of html.Node objects.

Usage

root, err := html.Parse(input)
if err != nil { ... }

body := htree.FindEl(root, func(n *html.Node) bool {
  return n.DataAtom == atom.Body
})

content := htree.FindEl(body, func(n *html.Node) bool {
  return n.DataAtom == atom.Div && htree.ElClassContains(n, "content")
})

...etc...

Documentation

Overview

Package htree is a collection of tools for working with trees of html.Nodes.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ElAttr

func ElAttr(node *html.Node, key string) string

ElAttr returns `node`'s value for the attribute `key`.

func ElClassContains

func ElClassContains(node *html.Node, probe string) bool

ElClassContains tells whether `node` has a `class` attribute containing the class name `probe`.

func Find

func Find(tree *html.Node, pred func(*html.Node) bool) *html.Node

Find finds the first node, in a depth-first search of the given tree, satisfying the given predicate.

func FindEl

func FindEl(tree *html.Node, pred func(*html.Node) bool) *html.Node

FindEl finds the first `ElementNode`-typed node, in a depth-first search of the tree, satisfying the given predicate.

func Prune

func Prune(node *html.Node, pred func(*html.Node) bool) *html.Node

Prune returns a copy of `node` and its children, minus any subnodes that cause the supplied predicate to return true. If `node` itself is pruned, the return value is nil.

func Text

func Text(node *html.Node) (string, error)

Text returns the content of the tree rooted at `node` as plain text. HTML entities are decoded, and <br> nodes are turned into newlines.

func WriteText

func WriteText(w io.Writer, node *html.Node) error

WriteText converts the content of the tree rooted at `node` into plain text and writes it to `w`. HTML entities are decoded, <script> and <style> nodes are pruned, and <br> nodes are turned into newlines.

Types

type Seq

type Seq = iter.Seq[*html.Node]

Seq is the type of an iterator over HTML-tree nodes.

func FindAll

func FindAll(tree *html.Node, pred func(*html.Node) bool) Seq

FindAll produces an iterator over the nodes in the tree that satisfy the given predicate, skipping that node's children.

To continue walking the subtree of a node `n` that passes `pred`, call `FindAllChildren(n, pred)`. Example:

for n := range FindAll(tree, pred) {
  doSomething(n, pred)
}

And elsewhere:

func doSomething(n *html.Node, pred func(*html.Node) bool) {
  // ...do something with n...
  for child := range FindAllChildren(n, pred) {
    doSomething(child, pred)
  }
}

func FindAllChildEls

func FindAllChildEls(node *html.Node, pred func(*html.Node) bool) Seq

FindAllChildEls is the same as FindAllEls but operates only on the children of `node`, not `node` itself.

As with FindAll, the children of a node that passes `pred` are skipped. To continue walking the subtree of a node `n` that passes `pred`, call `FindAllChildEls(n, pred)`.

func FindAllChildren

func FindAllChildren(node *html.Node, pred func(*html.Node) bool) Seq

FindAllChildren is the same as FindAll but operates only on the children of `node`, not `node` itself.

As with FindAll, the children of a node that passes `pred` are skipped. To continue walking the subtree of a node `n` that passes `pred`, call `FindAllChildren(n, pred)`.

func FindAllEls

func FindAllEls(node *html.Node, pred func(*html.Node) bool) Seq

FindAllEls is like FindAll but calls `pred` only for nodes with type `ElementNode`.

As with FindAll, the children of a node that passes `pred` are skipped. To continue walking the subtree of a node `n` that passes `pred`, call `FindAllChildEls(n, pred)`.

func Walk

func Walk(tree *html.Node) Seq

Walk produces an iterator over the nodes in the tree in a recursive, preorder, depth-first walk.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL