html2text

package module
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2023 License: MIT Imports: 4 Imported by: 126

README

GoDoc Build Status Coverage Status Report Card

html2text

A simple Golang package to convert HTML to plain text (without non-standard dependencies).

It converts HTML tags to text and also parses HTML entities into characters they represent. A <head> section of the HTML document, as well as most other tags are stripped out but links are properly converted into their href attribute.

It can be used for converting HTML emails into text.

Some tests are installed as well. Uses semantic versioning and no breaking changes are planned.

Fell free to publish a pull request if you have suggestions for improvement but please note that the library can now be considered feature-complete and API stable. If you need more than this basic conversion, please use an alternative mentioned at the bottom.

Install

go get github.com/k3a/html2text

Usage

package main

import (
	"fmt"
	"github.com/k3a/html2text"
)

func main() {
	html := `<html><head><title>Good</title></head><body><strong>clean</strong> text</body>`
	
	plain := html2text.HTML2Text(html)
			  
	fmt.Println(plain)
}

/*	Outputs:

	clean text
*/

To see all features, please look info html2text_test.go.

Alternatives

License

MIT

Documentation

Index

Constants

View Source
const (
	WIN_LBR  = "\r\n"
	UNIX_LBR = "\n"
)

Line break constants Deprecated: Please use HTML2TextWithOptions(text, WithUnixLineBreak())

Variables

This section is empty.

Functions

func HTML2Text

func HTML2Text(html string) string

HTML2Text converts html into a text form

func HTML2TextWithOptions added in v1.0.9

func HTML2TextWithOptions(html string, reqOpts ...Option) string

HTML2TextWithOptions converts html into a text form with additional options

func HTMLEntitiesToText

func HTMLEntitiesToText(htmlEntsText string) string

HTMLEntitiesToText decodes HTML entities inside a provided string and returns decoded text

func SetUnixLbr

func SetUnixLbr(b bool)

SetUnixLbr with argument true sets Unix-style line-breaks in output ("\n") with argument false sets Windows-style line-breaks in output ("\r\n", the default) Deprecated: Please use HTML2TextWithOptions(text, WithUnixLineBreak())

Types

type Option added in v1.0.9

type Option func(*options)

Option is a functional option

func WithLinksInnerText added in v1.0.9

func WithLinksInnerText() Option

WithLinksInnerText instructs the converter to retain link tag inner text and append href URLs in angle brackets after the text Example: click news <http://bit.ly/2n4wXRs>

func WithListSupport added in v1.2.0

func WithListSupport() Option

WithListSupport formats <ul> and <li> lists with " - " prefix

func WithListSupportPrefix added in v1.2.1

func WithListSupportPrefix(prefix string) Option

WithListSupportPrefix formats <ul> and <li> lists with the specified prefix

func WithUnixLineBreaks added in v1.0.9

func WithUnixLineBreaks() Option

WithUnixLineBreaks instructs the converter to use unix line breaks ("\n" instead of "\r\n" default)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL