htmltags

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2018 License: Apache-2.0 Imports: 3 Imported by: 3

README

HTML Strip tags

Build Status Docs Go Report Card License

This is a Go package which strip HTML tags from a string. Also, you can provide an array of allowableTags that can be skipped. Strip HTML tags library is very useful if you work with web crawlers, or just want to strip all or specific tags from a string.

nodes, err := Strip(content string, allowableTags []string, stripInlineAttributes bool) (Nodes, error)
nodes.Elements //HTML nodes structure of type *html.Node
nodes.ToString() //returns stripped HTML string

Installation

$ go get github.com/darkoatanasovski/htmltags

Parameters

input                   - string
allowableTags           - []string{} //array of strings e.g. []string{"p", "span"}
removeInlineAttributes  - bool // true/false

Return values

Returns node structure. You can get the stripped string with nodes.ToString(). If there are errors, it will return the first error message

Usage

If you want to keep the inline attributes of the tags, set the third parameter to false

stripped, err := htmltags.Strip("<h1>Header text with <span style=\"color:red\">color</span></h1>", []string{"span"}, false)

Or if you want to strip all tags from the string, and get a pure text, the second parameter has to be empty array

stripped, err := htmltags.Strip("<h1>Header text with <span style=\"color:red\">color</span></h1>", []string{}, false)

A working example

package main

import(
    "fmt"
    "github.com/darkoatanasovski/htmltags"
)

func main() {
    original := "<div>This is <strong style=\"font-size:50px\">complex</strong> text with <span>children <i>nodes</i></span></div>"
    allowableTags := []string{"strong", "i"}
    removeInlineAttributes := false
    stripped, _ := htmltags.Strip(original, allowableTags, removeInlineAttributes)
    
    fmt.Println(stripped) //output: Node structure
    fmt.Println(stripped.ToString()) //output string: This is <strong>complex</strong> text with children <i>nodes</i>
}

Development

If you have cloned this repo you will probably need the dependency:

go get golang.org/x/net/html

Notes

The broken or partial html will be fixed. If your input HTML string is <p>Content <i>italic, the fixed string will be <p>Content <i>italic</i></i>

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Nodes

type Nodes struct {
	Elements *html.Node
}

Nodes structure with html.Node elements

func Strip

func Strip(content string, allowableTags []string, stripInlineAttributes bool) (Nodes, error)

Strip HTML tags from a string. This function allows you to provide an array of allowable tags which will be skipped from removing. Also, you can strip the HTML tag attributes (e.g. style, class, id ...)

func (*Nodes) ToString

func (nodes *Nodes) ToString() string

ToString is a Nodes method. Converts Nodes.Elements to string

Directories

Path Synopsis
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL