htmltomarkdown

package module
v2.0.2-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 31, 2024 License: MIT Imports: 3 Imported by: 0

README

html-to-markdown

[!WARNING] This is an early experimental version of the library.

We encourage testing and bug reporting. However, please note:

  • Not production-ready
    • Default options are well-tested, but custom configurations have limited coverage
  • Functionality is currently restricted
    • Focus is on stabilization and core features
  • No compatibility guarantee
    • Only use htmltomarkdown.ConvertString() and htmltomarkdown.ConvertNode() from the root package. They are unlikely to change.
    • Other functions and nested packages are very like to change.

Golang Library

package main

import (
	"fmt"
	"log"

	htmltomarkdown "github.com/JohannesKaufmann/html-to-markdown/v2"
)

func main() {
	input := `<strong>Bold Text</strong>`

	markdown, err := htmltomarkdown.ConvertString(input)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(markdown)
	// Output: **Bold Text**
}

The function htmltomarkdown.ConvertString() is just a small wrapper around converter.NewConverter() and commonmark.NewCommonmarkPlugin(). If you want more control, use the following:

package main

import (
	"fmt"
	"log"

	"github.com/JohannesKaufmann/html-to-markdown/v2/converter"
	"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/commonmark"
)

func main() {
	input := `<strong>Bold Text</strong>`

	conv := converter.NewConverter(
		converter.WithPlugins(
			commonmark.NewCommonmarkPlugin(
				commonmark.WithStrongDelimiter("__"),
				// ...additional configurations for the plugin
			),
		),
	)

	markdown, err := conv.ConvertString(input)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(markdown)
	// Output: __Bold Text__
}

[!NOTE]
If you use NewConverter directly make sure to also register the commonmark plugin.



CLI - Using it on the command line

Using the Golang library provides the most customization, while the CLI is the simplest way to get started.

Installation

Download the pre-compiled binaries from the releases page and copy them to the desired location.

html2markdown --version

[!NOTE]
Make sure that --version prints 2.X.X as there is a different CLI for V2 of the converter.

Usage

$ echo "<strong>important</strong>" | html2markdown

**important**
$ curl --no-progress-meter http://example.com | html2markdown

# Example Domain

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

[More information...](https://www.iana.org/domains/example)

(The cli does not support every option yet. Over time more customization will be added)

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func ConvertNode

func ConvertNode(doc *html.Node) ([]byte, error)

ConvertNode converts a `*html.Node` to a markdown byte slice.

If you have already parsed an HTML page using the `html.Parse()` function from the "golang.org/x/net/html" package then you can pass this node directly to the converter.

Example
package main

import (
	"fmt"
	"log"
	"strings"

	htmltomarkdown "github.com/JohannesKaufmann/html-to-markdown/v2"
	"golang.org/x/net/html"
)

func main() {
	input := `<strong>Bold Text</strong>`

	doc, err := html.Parse(strings.NewReader(input))
	if err != nil {
		log.Fatal(err)
	}

	markdown, err := htmltomarkdown.ConvertNode(doc)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(markdown))
}
Output:

**Bold Text**

func ConvertString

func ConvertString(htmlInput string) (string, error)

ConvertString converts a html-string to a markdown-string.

Under the hood `html.Parse()` is used to parse the HTML.

Example
package main

import (
	"fmt"
	"log"

	htmltomarkdown "github.com/JohannesKaufmann/html-to-markdown/v2"
)

func main() {
	input := `<strong>Bold Text</strong>`

	markdown, err := htmltomarkdown.ConvertString(input)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(markdown)
}
Output:

**Bold Text**

Types

This section is empty.

Directories

Path Synopsis
cli
cmd
collapse can collapse whitespace in html elements.
collapse can collapse whitespace in html elements.
examples
internal
plugin

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL