htmlnode

package module

v0.0.0-...-2915b32 Latest Latest Go to latest Published: Sep 26, 2015 License: GPL-3.0 Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

Htmlnode

Package htmlnode provides functions for searching, traversing and printing parsed HTML. It is based on the html.Node data type from the golang.org/x/net/html package.

Documentation at https://godoc.org/xi2.org/x/htmlnode.

Download and install with go get xi2.org/x/htmlnode.

Documentation ¶

Overview ¶

Package htmlnode provides functions for searching, traversing and printing parsed HTML. It is based on the html.Node data type from the golang.org/x/net/html package.

Note: The API is presently experimental and may change.

Example ¶

The most useful function in the package is probably Find, and is best illustrated with an example.

Suppose we have a parsed HTML tree as follows:

    R
      E html
        E head
        E body
          E div id="lowframe" style="position: fixed;" ...
          C  #lowframe
          E div id="topbar"
            E div class="container"
              E div class="top-heading" id="heading-wide"
                E a href="/"
                  T The Go Programming Language
              E div class="top-heading" id="heading-narrow"
                E a href="/"
(1)               T Go
              E a href="#" id="menu-button"
                E span id="menu-button-arrow"
                  T ▽
              E form method="GET" action="/search"
                E div id="menu"
(2)               E a href="/doc/"
                    T Documents
(3)               E a href="/pkg/"
                    T Packages
(4)               E a href="/project/"
                    T The Project
(5)               E a href="/help/"
                    T Help
(6)               E a href="/blog/"
                    T Blog
(7)               E a href="http://play.golang.org/" ...
                    T Play
                  E input type="text" id="search" name="q" ...

This is actually a section of the golang.org front page and demonstrates the Print function in this package. Some of the nodes are numbered and referred to below.

The following call to Find will return the nodes numbered (2) -> (7) in a slice:

Find(root, `<form><div><a>`)

This is because tracing these nodes down from the root, they end with the three element nodes <form>, <div> and <a>. It does not matter that the <div> specified is missing the id="menu" attribute. All that matters are that its attributes (none) are a subset of those in the tree. If, however, we were to use:

Find(root, `<form><div id="someotherid"><a>`)

we would get no results since the id="someotherid" does not match in the tree.

Another example. Calling:

Find(root, `<a href=/>Go`)

returns node (1), so you can pick out non-element nodes too.

A note on fragments ¶

The fragment passed to Find has to parse in the context of having a generic element node as its parent. So it is fine to call:

Find(root, `<table><tr><td>`)

on some document, but if you were to instead use:

Find(root, `<tr><td>`)

you would get an empty slice, since the fragment `<tr><td>` is not valid directly under an arbitrary element node and will not parse. However, it should always be possible to specify more parent nodes in the fragment, even if they are not within the tree being searched.

To illustrate this, suppose you have a subtree that looks like this

E tr
  E td
  E td
  E td
  E td

and you want to call Find to get the <td> elements. You cannot use

Find(subtree, `<td>`)

but it is still OK to use

Find(subtree, `<table><tr><td>`)

even though there is no <table> in subtree. The matcher will look look in subtree's parents.

Index ¶

func Attr(n *html.Node, key string) (string, bool)
func AttrNS(n *html.Node, namespace, key string) (string, bool)
func Compare(n1, n2 *html.Node) bool
func Find(root *html.Node, fragment string) []*html.Node
func Flatten(root *html.Node) string
func Leaf(fragment string) *html.Node
func Match(n1 *html.Node, n2 *html.Node) bool
func Next(n *html.Node, root *html.Node) (*html.Node, int)
func NextSibElt(n *html.Node) *html.Node
func Prev(n *html.Node, root *html.Node) (*html.Node, int)
func PrevSibElt(n *html.Node) *html.Node
func Print(root *html.Node) error
func PrintTree(w io.Writer, root *html.Node, colour bool) error
func String(n *html.Node, colour bool) string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Attr ¶

func Attr(n *html.Node, key string) (string, bool)

Attr returns the Val field of the first attribute in n.Attr whose Key field is equal to key. The second return value indicates if the node has such an attribute. If no such attribute exists Attr returns ("",false). Note that the Namespace fields of n.Attr are not compared.

func AttrNS ¶

func AttrNS(n *html.Node, namespace, key string) (string, bool)

AttrNS is like Attr but additionally compares the Namespace fields in the attributes.

func Compare ¶

func Compare(n1, n2 *html.Node) bool

Compare returns true if node n1 has the same Type, Data and Namespace fields as n2, and if the attributes of n2 are equal to or are a subset of the attributes of n1.

func Find ¶

func Find(root *html.Node, fragment string) []*html.Node

Find is for locating nodes matching fragment within root. It first converts fragment into a leaf node (call it n2) using the Leaf function. It then does a depth first search of root and returns the slice of all nodes n in root which satisfy Match(n,n2). If there are no such nodes it returns the empty slice.

Please note that fragment must parse in the context of having a generic element node as its parent, since it is passed to Leaf. See "A note on fragments" in the introduction for more details.

func Flatten ¶

func Flatten(root *html.Node) string

Flatten walks the tree under root finding all html.TextNodes and returns the string resulting from appending all their Data fields.

func Leaf ¶

func Leaf(fragment string) *html.Node

Leaf converts an HTML fragment into a parse tree (without html/head/body ElementNodes or DoctypeNode), and then from the root of this tree repeatedly follows FirstChild until it finds a leaf node. This leaf node is returned as its result. In order to parse fragment, Leaf calls html.ParseFragment with a context of html.Node{Type: html.ElementNode}. If there is an error parsing fragment or no nodes are returned then Leaf returns a node of type html.ErrorNode. The return value of Leaf is intended to be passed to Match as its second argument.

func Match ¶

func Match(n1 *html.Node, n2 *html.Node) bool

Match compares the slice of nodes obtained by tracing n1's root node down to n1 with the equivalent slice obtained by tracing n2's root down to n2. Call these slices ns1 and ns2. If the tail of ns1 matches ns2 with respect to Compare then Match returns true.

func Next ¶

func Next(n *html.Node, root *html.Node) (*html.Node, int)

Next returns the next node in a depth first traversal of the tree at root (where the current node is node n), together with a delta indicating by how much it has descended or ascended the tree (descending being positive). When there are no more nodes it returns nil. If a value of nil is supplied for root it is assumed that the root node is the first node encountered with Parent == nil.

func NextSibElt ¶

func NextSibElt(n *html.Node) *html.Node

NextSibElt returns the next sibling of node n with type html.ElementNode (or nil if no such sibling).

func Prev ¶

func Prev(n *html.Node, root *html.Node) (*html.Node, int)

Prev behaves like Next, but returns the previous node instead.

func PrevSibElt ¶

func PrevSibElt(n *html.Node) *html.Node

PrevSibElt behaves like NextSibElt, but returns the previous sibling html.ElementNode instead.

func Print ¶

func Print(root *html.Node) error

Print calls PrintTree, using os.Stdout as the io.Writer and with colour set to true.

func PrintTree ¶

func PrintTree(w io.Writer, root *html.Node, colour bool) error

PrintTree prints the tree at root to the supplied io.Writer using String to print the nodes. It uses indention to convey the document structure. Like String, it can optionally colourize the output. It skips printing whitespace-only nodes of type html.TextNode.

PrintTree returns any error it gets when calling fmt.Fprintf.

func String ¶

func String(n *html.Node, colour bool) string

String returns a human readable representation of the single node n, with optional terminal colouring using ANSI escape codes. The representation begins with a capital letter indicating the html.NodeType. These are one of: X - ErrorNode, T - TextNode, R - DocumentNode, E - ElementNode, C - CommentNode, D - DoctypeNode.

Types ¶

This section is empty.

Source Files ¶

View all Source files

htmlnode.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL