Documentation ¶
Overview ¶
Package restify provides inter-related functions for retrieving HTML content, selecting a subset of those HTML nodes, and converting HTML nodes to a JSON representation.
The main function in the cmd package can be referenced as an example use of the functions provided here.
Index ¶
- Constants
- func ConvertHtmlToJson(nodes []*html.Node) ([]byte, error)
- func FindSubsetByAttributeName(root *html.Node, attribute string) []*html.Node
- func FindSubsetByAttributeNameValue(root *html.Node, attribute string, value string) []*html.Node
- func FindSubsetByClass(root *html.Node, className string) []*html.Node
- func FindSubsetById(root *html.Node, id string) (n *html.Node, ok bool)
- func FindSubsetByTagName(root *html.Node, tagName string) []*html.Node
- func LoadBuffer(buffer []byte) (*html.Node, error)
- func LoadContent(url *url.URL, userAgent string, configs ...RequestConfig) (*html.Node, error)
- func LoadFile(url *url.URL, userAgent string, configs ...RequestConfig) (*html.Node, error)
- func LoadReader(reader *io.Reader) (*html.Node, error)
- type JsonNode
- type RequestConfig
Constants ¶
const HttpRequestTimeout = time.Second * 60
Variables ¶
This section is empty.
Functions ¶
func ConvertHtmlToJson ¶
ConvertHtmlToJson the given HTML nodes into JSON content where each HTML node is represented by the JsonNode structure.
func FindSubsetByAttributeName ¶
FindSubsetByAttributeName retrieves the HTML nodes that have the requested attribute, regardless of their values.
func FindSubsetByAttributeNameValue ¶
FindSubsetByAttributeNameValue retrieves the HTML nodes that have the requested attribute with a specific value.
func FindSubsetByClass ¶
FindSubsetByClass locates the HTML nodes with the given root that have the given className.
func FindSubsetById ¶
FindSubsetById locates the HTML node within the given root that has an id attribute of given value. If the node is not found, then ok will be false.
func FindSubsetByTagName ¶
FindSubsetByTagName retrieves the HTML nodes with the given tagName
func LoadBuffer ¶
LoadFile retrieves the HTML content from the given file URL.
func LoadContent ¶
LoadContent retrieves the HTML content from the given url. The userAgent is optional, but if provided should conform with https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
Types ¶
type JsonNode ¶
type JsonNode struct { // Name is the name/tag of the element Name string `json:"name,omitempty"` // Attributes contains the attributs of the element other than id, class, and href Attributes map[string]string `json:"attributes,omitempty"` // Class contains the class attribute of the element Class string `json:"class,omitempty"` // Id contains the id attribute of the element Id string `json:"id,omitempty"` // Href contains the href attribute of the element Href string `json:"href,omitempty"` // Text contains the inner text of the element Text string `json:"text,omitempty"` // Elements contains the child elements of the element Elements []JsonNode `json:"elements,omitempty"` }
JsonNode is a JSON-ready representation of an HTML node.
type RequestConfig ¶
func WithHeaders ¶
func WithHeaders(headers map[string]string) RequestConfig
WithHeaders configures additional headers in the request used in LoadContent