goDOM

package module
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2024 License: BSD-2-Clause Imports: 6 Imported by: 0

README

goDOM GoDoc

goDOM parses HTML documents and makes data extraction easy.

What does goDOM do?

goDOM provides methodes to work with html documents in a similar way as the Javascript Document interface.

After parsing a document many of the well known DOM methods can be used to find and filter nodes inside the document.

How do I use goDOM?

Install
go get -u github.com/...
Example 1

Get all the urls from a html document.

import (
	"fmt"
	"goDOM"
	"net/http"
)

func main() {
	// fetch data from a website
	url := "https://en.wikipedia.org/wiki/Go_(programming_language)"
	resp, err := http.Get(url)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()

	// parse website content
	dom, err := goDOM.New(resp.Body)
	links := dom.GetElementsByTagName("a")

	// print urls
	for _, link := range links {
		fmt.Println(link.Attributes()["href"])
	}

	// Print output:
	// /wiki/Main_Page
	// /wiki/Wikipedia:Contents
	// /wiki/Portal:Current_events
	// /wiki/Special:Random
	// /wiki/Wikipedia:About
	// ...
}

Example 2

Print the title of a html document.

import (
	"fmt"
	"goDOM"
	"net/http"
)

func main() {
	// fetch data from a website
	url := "https://en.wikipedia.org/wiki/Go_(programming_language)"
	resp, err := http.Get(url)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()

	// parse website content
	dom, err := goDOM.New(resp.Body)
	titleElement := dom.GetElementsByTagName("title")[0]

	// print title
	fmt.Println(titleElement.Text(false))

	// Print output:
	// Go (programming language) - Wikipedia
}
Example 3

Print the text of a specific node in html document.

import (
	"fmt"
	"goDOM"
	"net/http"
)

func main() {
	// fetch data from a website
	url := "https://en.wikipedia.org/wiki/Go_(programming_language)"
	resp, err := http.Get(url)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()

	// parse website content
	dom, err := goDOM.New(resp.Body)
	historyElement := dom.GetElementById("History")
	paragraph := historyElement.Parent().NextElementSibling()

	// print paragraph
	fmt.Println(paragraph.Text(true))

	// Print output:
	// Go was designed at Google in 2007 to improve programming productivity in...
}

Documentation

Find the full documentation of the package here: https://pkg.go.dev/github.com/richi0/goDOM

Documentation

Overview

Package goDOM provides methodes similar to the Javascript Document interface.

It can be used to extract data from HTML documents. See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Document

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DOM

type DOM struct {
	// contains filtered or unexported fields
}

A DOM represents a parsed HTML document. It implements methodes to extract data from the document.

func New

func New(r io.Reader) (*DOM, error)

New returns the parsed tree for the HTML from the given Reader as a DOM object.

func (*DOM) Attributes

func (d *DOM) Attributes() map[string]string

Attributes returns a map of all attribute nodes registered to the specified node

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/attributes

func (*DOM) ChildElementCount

func (d *DOM) ChildElementCount() int

ChildElementCount returns the number of child elements of this element.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/childElementCount

func (*DOM) Children

func (d *DOM) Children() []*DOM

Children returns a slice which contains all of the child elements of the element upon which it was called.

The Children slice includes only element nodes. Other node types like text or comment are ignored.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/children

func (*DOM) ClassList

func (d *DOM) ClassList() []string

ClassList returns a slice containing all the classes of the current element.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/classList

func (*DOM) ClassName

func (d *DOM) ClassName() string

ClassName returns a string representing the class or space-separated classes of the current element.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/className

func (*DOM) FirstElementChild

func (d *DOM) FirstElementChild() *DOM

FirstElementChild returns the document's first child Element, or nil if there are no child elements.

FirstElementChild includes only element nodes. Other node types like text or comment are ignored.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Document/firstElementChild

func (*DOM) GetElementById

func (d *DOM) GetElementById(id string) *DOM

GetElementById returns a DOM object representing the element whose id property matches the specified string.

Since element IDs are required to be unique if specified, they're a useful way to get access to a specific element quickly.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementById

func (*DOM) GetElementsByClassName

func (d *DOM) GetElementsByClassName(class string) []*DOM

GetElementsByClassName returns a slice of elements with the given class name.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/getElementsByClassName

func (*DOM) GetElementsByTagName

func (d *DOM) GetElementsByTagName(tag string) []*DOM

GetElementsByTagName returns a slice of elements with the given tag name.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/getElementsByTagName

func (*DOM) HasAttribute

func (d *DOM) HasAttribute(key string) bool

HasAttribute returns a Boolean value indicating whether the specified element has the specified attribute or not.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/hasAttribute

func (*DOM) HasAttributes

func (d *DOM) HasAttributes() bool

HasAttribute returns a boolean value indicating whether the current element has any attributes or not.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/hasAttributes

func (*DOM) Id

func (d *DOM) Id() string

Id returns returns a string representing the id of the current element.

If the id value is not the empty string, it must be unique in a document.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/id

func (*DOM) LastElementChild

func (d *DOM) LastElementChild() *DOM

LastElementChild returns the document's last child Element, or nil if there are no child elements.

LastElementChild includes only element nodes. Other node types like text or comment are ignored.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Document/lastElementChild

func (*DOM) NextElementSibling

func (d *DOM) NextElementSibling() *DOM

NextElementSibling returns the element immediately following the specified one in its parent's children list, or null if the specified element is the last one in the list.

NextElementSibling includes only element nodes. Other node types like text or comment are ignored.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/nextElementSibling

func (*DOM) Parent

func (d *DOM) Parent() *DOM

Parent returns the parent of the specified node in the DOM tree.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Node/parentNode

func (*DOM) PreviousElementSibling

func (d *DOM) PreviousElementSibling() *DOM

PreviousElementSibling returns the element immediately prior the specified one in its parent's children list, or null if the specified element is the first one in the list

PreviousElementSibling includes only element nodes. Other node types like text or comment are ignored.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Element/previousElementSibling

func (*DOM) Render

func (d *DOM) Render() (string, error)

Render returns a string representation of the DOM.

func (*DOM) TagName

func (d *DOM) TagName() string

TagName returns a string representation of the nodes tag.

func (*DOM) Text

func (d *DOM) Text(full bool) string

Text returns the text content of the node.

If full is set to true, Text returns the text content of the node and its descendants.

See Javascript equivalent: https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL