unhtml

package module

v0.0.0-...-c4becdc Latest Latest Go to latest Published: Apr 20, 2014 License: BSD-3-Clause Imports: 7 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Wessie/unhtml

Links

Open Source Insights

README ¶

unhtml

See the documentation for API documentation

See the examples for a few simple examples

Documentation ¶

Overview ¶

unhtml is a package to parse HTML in the style of marshalling, it uses a similar approach as encoding/xml and encoding/json to parse HTML.

Directions to the unmarshaller are done with xpath. unhtml currently uses http://godoc.org/gopkg.in/xmlpath.v1 for its xpath needs. Reference the documentation of xmlpath for supported xpath features.

Index ¶

func Unmarshal(r io.Reader, result interface{}, rootpath string) error
type Decoder
- func NewDecoder(r io.Reader) (*Decoder, error)
- func (d *Decoder) Unmarshal(result interface{}) error
- func (d *Decoder) UnmarshalRelative(path string, res interface{}) error
type InvalidUnmarshalError
- func (e *InvalidUnmarshalError) Error() string
type NoNodesAvailable
- func (e NoNodesAvailable) Error() string
type UnmarshalTypeError
- func (e *UnmarshalTypeError) Error() string
type Unmarshaler

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Unmarshal ¶

func Unmarshal(r io.Reader, result interface{}, rootpath string) error

Unmarshal parses the HTML in reader and extracts data from it, results are stored in the value pointed to by result.

rootpath is an xpath that can be given to move the root node before unmarshalling, pass an empty string to omit moving the root node.

Unmarshal can store values into the following types:

string, []byte, []rune
Any size unsigned integer and signed integer
float32, float64
An Unmarshaller
An encoding.TextUnmarshaller
Structs (Only considers filling fields with an `unhtml` tag)
Slices and arrays containing any of the above types

Types ¶

type Decoder ¶

type Decoder struct {
	// contains filtered or unexported fields
}

func NewDecoder ¶

func NewDecoder(r io.Reader) (*Decoder, error)

NewDecoder returns a new Decoder by using the contents of the io.Reader as HTML input. The io.Reader is consumed whole and contents parsed before this function returns.

An error return means something went wrong parsing the HTML.

func (*Decoder) Unmarshal ¶

func (d *Decoder) Unmarshal(result interface{}) error

Unmarshal tries to fill the value given with the input previously given to the Decoder.

Unmarshal only takes a struct as result type, use UnmarshalRelative for other types.

func (*Decoder) UnmarshalRelative ¶

func (d *Decoder) UnmarshalRelative(path string, res interface{}) error

UnmarshalRelative unmarshals from the node depicted by the path given. This allows you to move the root node before unmarshalling.

UnmarshalRelative can return the following errors: - any unhtml errors - xmlpath path compiling - encoding.TextUnmarshaler - unhtml.Unmarshaler

type InvalidUnmarshalError ¶

type InvalidUnmarshalError struct {
	Type reflect.Type
}

Error returned if invalid input was given

func (*InvalidUnmarshalError) Error ¶

func (e *InvalidUnmarshalError) Error() string

type NoNodesAvailable ¶

type NoNodesAvailable string

NoNodesAvailable is returned when an xpath to *Relative functions are unable to find any matching nodes.

func (NoNodesAvailable) Error ¶

func (e NoNodesAvailable) Error() string

type UnmarshalTypeError ¶

type UnmarshalTypeError struct {
	Value string
	Type  reflect.Type
}

Error returned if there was an issue with type compatibility

func (*UnmarshalTypeError) Error ¶

func (e *UnmarshalTypeError) Error() string

type Unmarshaler ¶

type Unmarshaler interface {
	UnmarshalHTML([]byte) error
}

Unmarshaler is an interface that can be implemented to receive the raw resulting node to unmarshal into the type.

This receives a []byte of all text nodes found concatted together in the current node.

Source Files ¶

View all Source files

decoder.go

Directories ¶

Path	Synopsis
examples A small example on how to use unhtml to parse the GitHub commits page This is for example purpose only, use the GitHub API for actual programmatic access to GitHub.	A small example on how to use unhtml to parse the GitHub commits page This is for example purpose only, use the GitHub API for actual programmatic access to GitHub.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL