Documentation ¶
Index ¶
- Constants
- Variables
- func Clean(c *Config, fragment string) string
- func CleanNode(c *Config, n *html.Node) *html.Node
- func CleanNodes(c *Config, nodes []*html.Node) []*html.Node
- func Parse(fragment string) []*html.Node
- func ParseDepth(fragment string, maxDepth int) []*html.Node
- func Preprocess(config *Config, fragment string) string
- func Render(nodes ...*html.Node) string
- func SafeURLScheme(u *url.URL) bool
- type Config
- func (c *Config) Elem(names ...string) *Config
- func (c *Config) ElemAtom(elem ...atom.Atom) *Config
- func (c *Config) ElemAttr(elem string, attr ...string) *Config
- func (c *Config) ElemAttrAtom(elem atom.Atom, attr ...atom.Atom) *Config
- func (c *Config) ElemAttrAtomMatch(elem, attr atom.Atom, match *regexp.Regexp) *Config
- func (c *Config) ElemAttrMatch(elem, attr string, match *regexp.Regexp) *Config
- func (c *Config) GlobalAttr(names ...string) *Config
- func (c *Config) GlobalAttrAtom(a atom.Atom) *Config
- func (c *Config) WrapTextInside(names ...string) *Config
- func (c *Config) WrapTextInsideAtom(elem ...atom.Atom) *Config
Examples ¶
Constants ¶
const DefaultMaxDepth = 100
DefaultMaxDepth is the default maximum depth of the node trees returned by Parse.
Variables ¶
var DefaultConfig = (&Config{ ValidateURL: SafeURLScheme, }).GlobalAttrAtom(atom.Title). ElemAttrAtom(atom.A, atom.Href). ElemAttrAtom(atom.Img, atom.Src, atom.Alt). ElemAttrAtom(atom.Video, atom.Src, atom.Poster, atom.Controls). ElemAttrAtom(atom.Audio, atom.Src, atom.Controls). ElemAtom(atom.B, atom.I, atom.U, atom.S). ElemAtom(atom.Em, atom.Strong, atom.Strike). ElemAtom(atom.Big, atom.Small, atom.Sup, atom.Sub). ElemAtom(atom.Ins, atom.Del). ElemAtom(atom.Abbr, atom.Address, atom.Cite, atom.Q). ElemAtom(atom.P, atom.Blockquote, atom.Pre). ElemAtom(atom.Code, atom.Kbd, atom.Tt). ElemAttrAtom(atom.Details, atom.Open). ElemAtom(atom.Summary)
DefaultConfig is the default settings for htmlcleaner.
Functions ¶
func Clean ¶
Clean a fragment of HTML using the specified Config, or the DefaultConfig if it is nil.
Example ¶
package main import ( "fmt" "net/url" "regexp" "golang.org/x/net/html/atom" "github.com/BenLubar/htmlcleaner" ) func main() { config := (&htmlcleaner.Config{ ValidateURL: func(u *url.URL) bool { return u.Scheme == "https" }, }).ElemAttrAtomMatch(atom.Span, atom.Class, regexp.MustCompile(`\Afa-spin\z`)).ElemAttrAtom(atom.A, atom.Href) fmt.Println(htmlcleaner.Clean(config, htmlcleaner.Preprocess(config, `<span class="fa-spin">[whee]</span> <span class="hello">[aww]</span> <a href="https://www.google.com">Google</a> <a href="http://www.google.com">Google</a> <some tag that doesn't exist>`))) }
Output: <span class="fa-spin">[whee]</span> <span>[aww]</span> <a href="https://www.google.com">Google</a> <a>Google</a> <some tag that doesn't exist>
func CleanNode ¶
CleanNode cleans an HTML node using the specified config. Text nodes are returned as-is. Element nodes are recursively checked for legality and have their attributes checked for legality as well. Elements with illegal attributes are copied and the problematic attributes are removed. Elements that are not in the set of legal elements are replaced with a textual version of their source code.
Example ¶
package main import ( "fmt" "github.com/BenLubar/htmlcleaner" ) func main() { var config *htmlcleaner.Config = nil nodes := htmlcleaner.Parse(`<a href="http://golang.org/" onclick="malicious()" title="Go">hello</a> <script>malicious()</script>`) for i, n := range nodes { nodes[i] = htmlcleaner.CleanNode(config, n) } fmt.Println(htmlcleaner.Render(nodes...)) }
Output: <a href="http://golang.org/" title="Go">hello</a> <script>malicious()</script>
func CleanNodes ¶
CleanNodes calls CleanNode on each node, and additionally wraps inline elements in <p> tags and wraps dangling <li> tags in <ul> tags.
func ParseDepth ¶
ParseDepth is a convenience function that wraps html.ParseFragment but takes a string instead of an io.Reader and omits deep trees.
func Preprocess ¶ added in v1.1.0
Preprocess escapes disallowed tags in a cleaner way, but does not fix nesting problems. Use with Clean.
func Render ¶
Render is a convenience function that wraps html.Render and renders to a string instead of an io.Writer.
func SafeURLScheme ¶
SafeURLScheme returns true if u.Scheme is http, https, mailto, data, or an empty string.
Types ¶
type Config ¶
type Config struct { // A custom URL validation function. If it is set and returns false, // the attribute will be removed. Called for attributes such as src // and href. ValidateURL func(*url.URL) bool // If true, HTML comments are turned into text. EscapeComments bool // Wrap text nodes in at least one tag. WrapText bool // contains filtered or unexported fields }
Config holds the settings for htmlcleaner.
func (*Config) Elem ¶
Elem ensures an element name is allowed. The receiver is returned to allow call chaining.
func (*Config) ElemAtom ¶
ElemAtom ensures an element name is allowed. The receiver is returned to allow call chaining.
func (*Config) ElemAttr ¶
ElemAttr allows an attribute name on the specified element. The receiver is returned to allow call chaining.
func (*Config) ElemAttrAtom ¶
ElemAttrAtom allows an attribute name on the specified element. The receiver is returned to allow call chaining.
func (*Config) ElemAttrAtomMatch ¶
ElemAttrAtomMatch allows an attribute name on the specified element, but only if the value matches a regular expression. The receiver is returned to allow call chaining.
func (*Config) ElemAttrMatch ¶
ElemAttrMatch allows an attribute name on the specified element, but only if the value matches a regular expression. The receiver is returned to allow call chaining.
func (*Config) GlobalAttr ¶
GlobalAttr allows an attribute name on all allowed elements. The receiver is returned to allow call chaining.
func (*Config) GlobalAttrAtom ¶
GlobalAttrAtom allows an attribute name on all allowed elements. The receiver is returned to allow call chaining.
func (*Config) WrapTextInside ¶
WrapTextInside makes an element's children behave as if they are root nodes in the context of WrapText. The receiver is returned to allow call chaining.