strip

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 19, 2023 License: BSD-3-Clause Imports: 14 Imported by: 293

README

HTML StripTags for Go

Used By Build Status Go Report Card Docs License

This is a Go package containing an extracted version of the unexported stripTags function in html/template/html.go.

⚠ This package does not protect against untrusted input. Please use bluemonday if you have untrusted data ⚠

Background

  • The stripTags function in html/template/html.go is very useful, however, it is not exported.
  • Requests were made on GitHub without success.
  • This package is a repo for work done by Christopher Hesse provided in this Gist.

Installation

$ go get github.com/grokify/html-strip-tags-go

Usage

import(
    "github.com/grokify/html-strip-tags-go" // => strip
)

func main() {
    original := "<h1>Hello World</h1>"
    stripped := strip.StripTags(original) // => "Hello World"
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func HTMLEscape

func HTMLEscape(w io.Writer, b []byte)

HTMLEscape writes to w the escaped HTML equivalent of the plain text data b.

func HTMLEscapeString

func HTMLEscapeString(s string) string

HTMLEscapeString returns the escaped HTML equivalent of the plain text data s.

func HTMLEscaper

func HTMLEscaper(args ...interface{}) string

HTMLEscaper returns the escaped HTML equivalent of the textual representation of its arguments.

func JSEscape

func JSEscape(w io.Writer, b []byte)

JSEscape writes to w the escaped JavaScript equivalent of the plain text data b.

func JSEscapeString

func JSEscapeString(s string) string

JSEscapeString returns the escaped JavaScript equivalent of the plain text data s.

func JSEscaper

func JSEscaper(args ...interface{}) string

JSEscaper returns the escaped JavaScript equivalent of the textual representation of its arguments.

func StripTags

func StripTags(html string) string

stripTags takes a snippet of HTML and returns only the text content. For example, `<b>&iexcl;Hi!</b> <script>...</script>` -> `&iexcl;Hi! `.

func URLQueryEscaper

func URLQueryEscaper(args ...interface{}) string

URLQueryEscaper returns the escaped value of the textual representation of its arguments in a form suitable for embedding in a URL query.

Types

type CSS

type CSS string

CSS encapsulates known safe content that matches any of:

  1. The CSS3 stylesheet production, such as `p { color: purple }`.
  2. The CSS3 rule production, such as `a[href=~"https:"].foo#bar`.
  3. CSS3 declaration productions, such as `color: red; margin: 2px`.
  4. The CSS3 value production, such as `rgba(0, 0, 255, 127)`.

See http://www.w3.org/TR/css3-syntax/#parsing and https://web.archive.org/web/20090211114933/http://w3.org/TR/css3-syntax#style

type Error

type Error struct {
	// ErrorCode describes the kind of error.
	ErrorCode ErrorCode
	// Name is the name of the template in which the error was encountered.
	Name string
	// Line is the line number of the error in the template source or 0.
	Line int
	// Description is a human-readable description of the problem.
	Description string
}

Error describes a problem encountered during template Escaping.

func (*Error) Error

func (e *Error) Error() string

type ErrorCode

type ErrorCode int

ErrorCode is a code for a kind of error.

const (
	// OK indicates the lack of an error.
	OK ErrorCode = iota

	// ErrAmbigContext: "... appears in an ambiguous URL context"
	// Example:
	//   <a href="
	//      {{if .C}}
	//        /path/
	//      {{else}}
	//        /search?q=
	//      {{end}}
	//      {{.X}}
	//   ">
	// Discussion:
	//   {{.X}} is in an ambiguous URL context since, depending on {{.C}},
	//  it may be either a URL suffix or a query parameter.
	//   Moving {{.X}} into the condition removes the ambiguity:
	//   <a href="{{if .C}}/path/{{.X}}{{else}}/search?q={{.X}}">
	ErrAmbigContext

	// ErrBadHTML: "expected space, attr name, or end of tag, but got ...",
	//   "... in unquoted attr", "... in attribute name"
	// Example:
	//   <a href = /search?q=foo>
	//   <href=foo>
	//   <form na<e=...>
	//   <option selected<
	// Discussion:
	//   This is often due to a typo in an HTML element, but some runes
	//   are banned in tag names, attribute names, and unquoted attribute
	//   values because they can tickle parser ambiguities.
	//   Quoting all attributes is the best policy.
	ErrBadHTML

	// ErrBranchEnd: "{{if}} branches end in different contexts"
	// Example:
	//   {{if .C}}<a href="{{end}}{{.X}}
	// Discussion:
	//   Package html/template statically examines each path through an
	//   {{if}}, {{range}}, or {{with}} to escape any following pipelines.
	//   The example is ambiguous since {{.X}} might be an HTML text node,
	//   or a URL prefix in an HTML attribute. The context of {{.X}} is
	//   used to figure out how to escape it, but that context depends on
	//   the run-time value of {{.C}} which is not statically known.
	//
	//   The problem is usually something like missing quotes or angle
	//   brackets, or can be avoided by refactoring to put the two contexts
	//   into different branches of an if, range or with. If the problem
	//   is in a {{range}} over a collection that should never be empty,
	//   adding a dummy {{else}} can help.
	ErrBranchEnd

	// ErrEndContext: "... ends in a non-text context: ..."
	// Examples:
	//   <div
	//   <div title="no close quote>
	//   <script>f()
	// Discussion:
	//   Executed templates should produce a DocumentFragment of HTML.
	//   Templates that end without closing tags will trigger this error.
	//   Templates that should not be used in an HTML context or that
	//   produce incomplete Fragments should not be executed directly.
	//
	//   {{define "main"}} <script>{{template "helper"}}</script> {{end}}
	//   {{define "helper"}} document.write(' <div title=" ') {{end}}
	//
	//   "helper" does not produce a valid document fragment, so should
	//   not be Executed directly.
	ErrEndContext

	// ErrNoSuchTemplate: "no such template ..."
	// Examples:
	//   {{define "main"}}<div {{template "attrs"}}>{{end}}
	//   {{define "attrs"}}href="{{.URL}}"{{end}}
	// Discussion:
	//   Package html/template looks through template calls to compute the
	//   context.
	//   Here the {{.URL}} in "attrs" must be treated as a URL when called
	//   from "main", but you will get this error if "attrs" is not defined
	//   when "main" is parsed.
	ErrNoSuchTemplate

	// ErrOutputContext: "cannot compute output context for template ..."
	// Examples:
	//   {{define "t"}}{{if .T}}{{template "t" .T}}{{end}}{{.H}}",{{end}}
	// Discussion:
	//   A recursive template does not end in the same context in which it
	//   starts, and a reliable output context cannot be computed.
	//   Look for typos in the named template.
	//   If the template should not be called in the named start context,
	//   look for calls to that template in unexpected contexts.
	//   Maybe refactor recursive templates to not be recursive.
	ErrOutputContext

	// ErrPartialCharset: "unfinished JS regexp charset in ..."
	// Example:
	//     <script>var pattern = /foo[{{.Chars}}]/</script>
	// Discussion:
	//   Package html/template does not support interpolation into regular
	//   expression literal character sets.
	ErrPartialCharset

	// ErrPartialEscape: "unfinished escape sequence in ..."
	// Example:
	//   <script>alert("\{{.X}}")</script>
	// Discussion:
	//   Package html/template does not support actions following a
	//   backslash.
	//   This is usually an error and there are better solutions; for
	//   example
	//     <script>alert("{{.X}}")</script>
	//   should work, and if {{.X}} is a partial escape sequence such as
	//   "xA0", mark the whole sequence as safe content: JSStr(`\xA0`)
	ErrPartialEscape

	// ErrRangeLoopReentry: "on range loop re-entry: ..."
	// Example:
	//   <script>var x = [{{range .}}'{{.}},{{end}}]</script>
	// Discussion:
	//   If an iteration through a range would cause it to end in a
	//   different context than an earlier pass, there is no single context.
	//   In the example, there is missing a quote, so it is not clear
	//   whether {{.}} is meant to be inside a JS string or in a JS value
	//   context.  The second iteration would produce something like
	//
	//     <script>var x = ['firstValue,'secondValue]</script>
	ErrRangeLoopReentry

	// ErrSlashAmbig: '/' could start a division or regexp.
	// Example:
	//   <script>
	//     {{if .C}}var x = 1{{end}}
	//     /-{{.N}}/i.test(x) ? doThis : doThat();
	//   </script>
	// Discussion:
	//   The example above could produce `var x = 1/-2/i.test(s)...`
	//   in which the first '/' is a mathematical division operator or it
	//   could produce `/-2/i.test(s)` in which the first '/' starts a
	//   regexp literal.
	//   Look for missing semicolons inside branches, and maybe add
	//   parentheses to make it clear which interpretation you intend.
	ErrSlashAmbig
)

We define codes for each error that manifests while escaping templates, but escaped templates may also fail at runtime.

Output: "ZgotmplZ" Example:

<img src="{{.X}}">
where {{.X}} evaluates to `javascript:...`

Discussion:

"ZgotmplZ" is a special value that indicates that unsafe content reached a
CSS or URL context at runtime. The output of the example will be
  <img src="#ZgotmplZ">
If the data comes from a trusted source, use content types to exempt it
from filtering: URL(`javascript:...`).

type FuncMap

type FuncMap map[string]interface{}

FuncMap is the type of the map defining the mapping from names to functions. Each function must have either a single return value, or two return values of which the second has type error. In that case, if the second (error) argument evaluates to non-nil during execution, execution terminates and Execute returns that error. FuncMap has the same base type as FuncMap in "text/template", copied here so clients need not import "text/template".

type HTML

type HTML string

HTML encapsulates a known safe HTML document fragment. It should not be used for HTML from a third-party, or HTML with unclosed tags or comments. The outputs of a sound HTML sanitizer and a template escaped by this package are fine for use with HTML.

type HTMLAttr

type HTMLAttr string

HTMLAttr encapsulates an HTML attribute from a trusted source, for example, ` dir="ltr"`.

type JS

type JS string

JS encapsulates a known safe EcmaScript5 Expression, for example, `(x + y * z())`. Template authors are responsible for ensuring that typed expressions do not break the intended precedence and that there is no statement/expression ambiguity as when passing an expression like "{ foo: bar() }\n['foo']()", which is both a valid Expression and a valid Program with a very different meaning.

type JSStr

type JSStr string

JSStr encapsulates a sequence of characters meant to be embedded between quotes in a JavaScript expression. The string must match a series of StringCharacters:

StringCharacter :: SourceCharacter but not `\` or LineTerminator
                 | EscapeSequence

Note that LineContinuations are not allowed. JSStr("foo\\nbar") is fine, but JSStr("foo\\\nbar") is not.

type Template

type Template struct {

	// The underlying template's parse tree, updated to be HTML-safe.
	Tree *parse.Tree
	// contains filtered or unexported fields
}

Template is a specialized Template from "text/template" that produces a safe HTML document fragment.

func Must

func Must(t *Template, err error) *Template

Must is a helper that wraps a call to a function returning (*Template, error) and panics if the error is non-nil. It is intended for use in variable initializations such as

var t = template.Must(template.New("name").Parse("html"))

func New

func New(name string) *Template

New allocates a new HTML template with the given name.

func ParseFiles

func ParseFiles(filenames ...string) (*Template, error)

ParseFiles creates a new Template and parses the template definitions from the named files. The returned template's name will have the (base) name and (parsed) contents of the first file. There must be at least one file. If an error occurs, parsing stops and the returned *Template is nil.

func ParseGlob

func ParseGlob(pattern string) (*Template, error)

ParseGlob creates a new Template and parses the template definitions from the files identified by the pattern, which must match at least one file. The returned template will have the (base) name and (parsed) contents of the first file matched by the pattern. ParseGlob is equivalent to calling ParseFiles with the list of files matched by the pattern.

func (*Template) AddParseTree

func (t *Template) AddParseTree(name string, tree *parse.Tree) (*Template, error)

AddParseTree creates a new template with the name and parse tree and associates it with t.

It returns an error if t has already been executed.

func (*Template) Clone

func (t *Template) Clone() (*Template, error)

Clone returns a duplicate of the template, including all associated templates. The actual representation is not copied, but the name space of associated templates is, so further calls to Parse in the copy will add templates to the copy but not to the original. Clone can be used to prepare common templates and use them with variant definitions for other templates by adding the variants after the clone is made.

It returns an error if t has already been executed.

func (*Template) Delims

func (t *Template) Delims(left, right string) *Template

Delims sets the action delimiters to the specified strings, to be used in subsequent calls to Parse, ParseFiles, or ParseGlob. Nested template definitions will inherit the settings. An empty delimiter stands for the corresponding default: {{ or }}. The return value is the template, so calls can be chained.

func (*Template) Execute

func (t *Template) Execute(wr io.Writer, data interface{}) error

Execute applies a parsed template to the specified data object, writing the output to wr. If an error occurs executing the template or writing its output, execution stops, but partial results may already have been written to the output writer. A template may be executed safely in parallel.

func (*Template) ExecuteTemplate

func (t *Template) ExecuteTemplate(wr io.Writer, name string, data interface{}) error

ExecuteTemplate applies the template associated with t that has the given name to the specified data object and writes the output to wr. If an error occurs executing the template or writing its output, execution stops, but partial results may already have been written to the output writer. A template may be executed safely in parallel.

func (*Template) Funcs

func (t *Template) Funcs(funcMap FuncMap) *Template

Funcs adds the elements of the argument map to the template's function map. It panics if a value in the map is not a function with appropriate return type. However, it is legal to overwrite elements of the map. The return value is the template, so calls can be chained.

func (*Template) Lookup

func (t *Template) Lookup(name string) *Template

Lookup returns the template with the given name that is associated with t, or nil if there is no such template.

func (*Template) Name

func (t *Template) Name() string

Name returns the name of the template.

func (*Template) New

func (t *Template) New(name string) *Template

New allocates a new HTML template associated with the given one and with the same delimiters. The association, which is transitive, allows one template to invoke another with a {{template}} action.

func (*Template) Parse

func (t *Template) Parse(src string) (*Template, error)

Parse parses a string into a template. Nested template definitions will be associated with the top-level template t. Parse may be called multiple times to parse definitions of templates to associate with t. It is an error if a resulting template is non-empty (contains content other than template definitions) and would replace a non-empty template with the same name. (In multiple calls to Parse with the same receiver template, only one call can contain text other than space, comments, and template definitions.)

func (*Template) ParseFiles

func (t *Template) ParseFiles(filenames ...string) (*Template, error)

ParseFiles parses the named files and associates the resulting templates with t. If an error occurs, parsing stops and the returned template is nil; otherwise it is t. There must be at least one file.

func (*Template) ParseGlob

func (t *Template) ParseGlob(pattern string) (*Template, error)

ParseGlob parses the template definitions in the files identified by the pattern and associates the resulting templates with t. The pattern is processed by filepath.Glob and must match at least one file. ParseGlob is equivalent to calling t.ParseFiles with the list of files matched by the pattern.

func (*Template) Templates

func (t *Template) Templates() []*Template

Templates returns a slice of the templates associated with t, including t itself.

type URL

type URL string

URL encapsulates a known safe URL or URL substring (see RFC 3986). A URL like `javascript:checkThatFormNotEditedBeforeLeavingPage()` from a trusted source should go in the page, but by default dynamic `javascript:` URLs are filtered out since they are a frequently exploited injection vector.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL