domutil

package
v0.0.0-...-25b8d04 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 26, 2024 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CloneAndProcessList

func CloneAndProcessList(outputNodes []*html.Node, pageURL *nurl.URL) *html.Node

CloneAndProcessList clones and process list of relevant nodes for output.

func CloneAndProcessTree

func CloneAndProcessTree(root *html.Node, pageURL *nurl.URL) *html.Node

CloneAndProcessTree clone and process a given node tree/subtree. In original dom-distiller this will ignore hidden elements, unfortunately we can't do that here, so we will include hidden elements as well. NEED-COMPUTE-CSS.

func Contains

func Contains(node, child *html.Node) bool

Contains checks if child is inside node.

func GetAllSrcSetURLs

func GetAllSrcSetURLs(root *html.Node) []string

func GetAncestors

func GetAncestors(nodes ...*html.Node) (map[*html.Node]int, *html.Node)

GetAncestors returns all ancestor of the `nodes` and also the nearest common ancestor.

func GetArea

func GetArea(node *html.Node) int

GetArea in original code returns area of a node by multiplying offsetWidth and offsetHeight. Since it's not possible in Go, we simply return 0. NEED-COMPUTE-CSS

func GetDisplayStyle

func GetDisplayStyle(node *html.Node) string

GetDisplayStyle returns the default "display" in style property for the specified node.

func GetFirstElementByTagName

func GetFirstElementByTagName(root *html.Node, tagName string) *html.Node

GetFirstElementByTagName returns the first element with `tagName` in the tree rooted at `root`. Nil if none is found.

func GetFirstElementByTagNameInc

func GetFirstElementByTagNameInc(root *html.Node, tagName string) *html.Node

GetFirstElementByTagNameInc returns the first element with `tagName` in the tree rooted at `root`, including root. Nil if none is found.

func GetNearestCommonAncestor

func GetNearestCommonAncestor(nodes ...*html.Node) *html.Node

GetNearestCommonAncestor returns the nearest common ancestor of nodes.

func GetNodeDepth

func GetNodeDepth(node *html.Node) int

GetNodeDepth the depth of the given node in the DOM tree.

func GetOutputNodes

func GetOutputNodes(root *html.Node) []*html.Node

GetOutputNodes returns list of relevant nodes for output from a subtree.

func GetParentElement

func GetParentElement(node *html.Node) *html.Node

GetParentElement returns the nearest element parent.

func GetParentNodes

func GetParentNodes(node *html.Node) []*html.Node

GetParentNodes returns list of all the parents of this node starting with the node itself.

func GetSrcSetURLs

func GetSrcSetURLs(node *html.Node) []string

func HasAncestor

func HasAncestor(node *html.Node, ancestorTagNames ...string) bool

HasAncestor check if node has ancestor with specified tag names.

func HasRootDomain

func HasRootDomain(url string, root string) bool

HasRootDomain checks if a provided URL has the specified root domain (ex. http://a.b.c/foo/bar has root domain of b.c).

func InnerText

func InnerText(node *html.Node) string

InnerText in JS and GWT is used to capture text from an element while excluding text from hidden children. A child is hidden if it's computed width is 0, whether because its CSS (e.g `display: none`, `visibility: hidden`, etc), or if the child has `hidden` attribute. Since we can't compute stylesheet, we only look at `hidden` attribute and inline style.

Besides excluding text from hidden children, difference between this function and `dom.TextContent` is the latter will skip <br> tag while this function will preserve <br> as whitespace. NEED-COMPUTE-CSS

func IsProbablyVisible

func IsProbablyVisible(node *html.Node) bool

IsProbablyVisible determines if a node is visible.

func MakeAllLinksAbsolute

func MakeAllLinksAbsolute(root *html.Node, pageURL *nurl.URL)

MakeAllLinksAbsolute makes all anchors and video posters absolute.

func MakeAllSrcAttributesAbsolute

func MakeAllSrcAttributesAbsolute(root *html.Node, pageURL *nurl.URL)

MakeAllSrcAttributesAbsolute makes all "img", "source", "track", and "video" tags have an absolute "src" attribute.

func MakeAllSrcSetAbsolute

func MakeAllSrcSetAbsolute(root *html.Node, pageURL *nurl.URL)

MakeAllSrcSetAbsolute makes all `srcset` within root absolute.

func NodeName

func NodeName(node *html.Node) string

NodeName returns the name of the current node as a string. See https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeName

func SomeNode

func SomeNode(nodeList []*html.Node, fn func(*html.Node) bool) bool

SomeNode iterates over a NodeList, return true if any of the provided iterate function calls returns true, false otherwise.

func StripAttributes

func StripAttributes(node *html.Node)

func TreeClone

func TreeClone(nodes []*html.Node) *html.Node

TreeClone takes a list of nodes and returns a clone of the minimum tree in the DOM that contains all of them. This is done by going through each node, cloning its parent and adding children to that parent until the next node is not contained in that parent (originally). The list cannot contain a parent of any of the other nodes. Children of the nodes in the provided list are excluded.

This implementation doesn't come from the original dom-distiller code. Instead I created it from scratch to make it simpler and more Go idiomatic.

func WalkNodes

func WalkNodes(root *html.Node, fnVisit func(*html.Node) bool, fnExit func(*html.Node))

WalkNodes used to walk the subtree of the DOM rooted at a particular root. It has two function parameters, i.e. fnVisit and fnExit :

  • fnVisit is called when we reach a node during the walk. If it returns false, children of the node will be skipped and fnExit won't be called for this node.
  • fnExit is called when exiting a node, after visiting all of its children.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL