htmlindex

package
v1.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2024 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var Nodes = map[atom.Atom]Node{
	atom.A: {
		Attributes: []string{href},
	},
	atom.Area: {
		Attributes: []string{href},
	},
	atom.Base: {
		Attributes: []string{href},
	},
	atom.Audio: {
		Attributes: []string{src},
	},
	atom.Body: {
		Attributes: []string{background},
	},
	atom.Embed: {
		Attributes: []string{src},
	},
	atom.Iframe: {
		Attributes: []string{src},
	},
	atom.Img: {
		Attributes: []string{src, dataSrc, srcSet, dataSrcSet},
		// contains filtered or unexported fields
	},
	atom.Input: {
		Attributes: []string{src},
	},
	atom.Link: {
		Attributes: []string{href},
	},
	atom.Object: {
		Attributes: []string{data},
	},
	atom.Script: {
		Attributes: []string{src},
	},
	atom.Source: {
		Attributes: []string{src},
	},
	atom.Video: {
		Attributes: []string{poster},
	},
}

Nodes describes the HTML tags and their attributes that can contain URL. See https://html.spec.whatwg.org/multipage/indices.html#attributes-3 and https://html.spec.whatwg.org/multipage/indices.html#elements-3 Not yet present: style attribute can contain CSS links

View Source
var SrcSetAttributes = map[string]struct{}{
	// contains filtered or unexported fields
}

SrcSetAttributes contains the attributes that contain srcset values.

Functions

This section is empty.

Types

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index provides an index for all HTML tags of relevance for scraping.

func New

func New() *Index

New returns a new index.

func (*Index) Index

func (h *Index) Index(baseURL *url.URL, node *html.Node)

Index the given HTML document.

func (*Index) Nodes

func (h *Index) Nodes(tag atom.Atom) map[string][]*html.Node

Nodes returns a map of all URLs and their HTML nodes.

func (*Index) URLs

func (h *Index) URLs(tag atom.Atom) (Refs, error)

URLs returns all URLs of the references found for a specific tag.

type Node

type Node struct {
	Attributes []string
	// contains filtered or unexported fields
}

type Refs

type Refs []*url.URL

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL