html

package
v0.27.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2021 License: BSD-3-Clause Imports: 3 Imported by: 1

Documentation

Overview

Package html extends the golang.org/x/net/html by providing simplified methods to Node.

The x/net/html package currently only provide bare raw functionalities to iterate tree, there is no check for empty node, and no function to get attribute by name without looping it manually.

This package extends the parent package by adding methods to get node's attribute by name, get the first non-empty child, get the next non-empty sibling, and method to iterate the tree.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Node

type Node struct {
	*html.Node
}

Node extends the html.Node.

func NewNode

func NewNode(el *html.Node) *Node

NewNode create new node by embedding html.Node "el".

func (*Node) GetAttrValue

func (node *Node) GetAttrValue(key string) string

GetAttrValue get the value of node's attribute with specific key or empty if key not found.

func (*Node) GetFirstChild

func (node *Node) GetFirstChild() *Node

GetFirstChild get the first non-empty child of node or nil if no child left.

func (*Node) GetNextSibling

func (node *Node) GetNextSibling() *Node

GetNextSibling get the next non-empty sibling of node or nil if no more sibling left.

func (*Node) IsElement

func (node *Node) IsElement() bool

IsElement will return true if node type is html.ElementNode.

type NodeIterator

type NodeIterator struct {
	// contains filtered or unexported fields
}

NodeIterator simplify iterating each node from top to bottom.

func Parse

func Parse(r io.Reader) (iter *NodeIterator, err error)

Parse returns the NodeIterator to iterate through HTML tree.

Example
rawHTML := `
<ul>
	<li>
		<b>item</b>
		<span>one</span>
	</li>
</ul>
`

r := strings.NewReader(rawHTML)

iter, err := Parse(r)
if err != nil {
	log.Fatal(err)
}

for node := iter.Next(); node != nil; node = iter.Next() {
	if node.IsElement() {
		fmt.Printf("%s\n", node.Data)
	} else {
		fmt.Printf("\t%s\n", node.Data)
	}
}
Output:

html
head
body
ul
li
b
	item
b
span
	one
span
li
ul
body
html

func (*NodeIterator) Next

func (iter *NodeIterator) Next() *Node

Next return the first child or the next sibling of current node. If no more node in the tree, it will return nil.

func (*NodeIterator) SetNext

func (iter *NodeIterator) SetNext(el *Node)

SetNext set the node for iteration to Node "el" only if its not nil.

Example
rawHTML := `
<ul>
	<li>
		<b>item</b>
		<span>one</span>
	</li>
</ul>
<h2>Jump here</h2>
`

r := strings.NewReader(rawHTML)

iter, err := Parse(r)
if err != nil {
	log.Fatal(err)
}

for node := iter.Next(); node != nil; node = iter.Next() {
	if node.IsElement() {
		if node.Data == "ul" {
			// Skip iterating the "ul" element.
			iter.SetNext(node.GetNextSibling())
			continue
		}
		fmt.Printf("%s\n", node.Data)
	} else {
		fmt.Printf("\t%s\n", node.Data)
	}
}
Output:

html
head
body
h2
	Jump here
h2
body
html

func (*NodeIterator) SetNextNode

func (iter *NodeIterator) SetNextNode(el *html.Node)

SetNextNode set the next iteration node to html.Node "el" only if its not nil.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL