blackfriday

package
v0.8.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2019 License: Apache-2.0, BSD-2-Clause Imports: 9 Imported by: 0

README

Blackfriday Build Status

Blackfriday is a Markdown processor implemented in Go. It is paranoid about its input (so you can safely feed it user-supplied data), it is fast, it supports common extensions (tables, smart punctuation substitutions, etc.), and it is safe for all utf-8 (unicode) input.

HTML output is currently supported, along with Smartypants extensions.

It started as a translation from C of Sundown.

Installation

Blackfriday is compatible with any modern Go release. With Go 1.7 and git installed:

go get gopkg.in/russross/blackfriday.v2

will download, compile, and install the package into your $GOPATH directory hierarchy. Alternatively, you can achieve the same if you import it into a project:

import "gopkg.in/russross/blackfriday.v2"

and go get without parameters.

Versions

Currently maintained and recommended version of Blackfriday is v2. It's being developed on its own branch: https://github.com/russross/blackfriday/v2. You should install and import it via gopkg.in at gopkg.in/russross/blackfriday.v2.

Version 2 offers a number of improvements over v1:

  • Cleaned up API
  • A separate call to Parse, which produces an abstract syntax tree for the document
  • Latest bug fixes
  • Flexibility to easily add your own rendering extensions

Potential drawbacks:

  • Our benchmarks show v2 to be slightly slower than v1. Currently in the ballpark of around 15%.
  • API breakage. If you can't afford modifying your code to adhere to the new API and don't care too much about the new features, v2 is probably not for you.
  • Several bug fixes are trailing behind and still need to be forward-ported to v2. See issue #348 for tracking.

Usage

For the most sensible markdown processing, it is as simple as getting your input into a byte slice and calling:

output := blackfriday.Run(input)

Your input will be parsed and the output rendered with a set of most popular extensions enabled. If you want the most basic feature set, corresponding with the bare Markdown specification, use:

output := blackfriday.Run(input, blackfriday.WithNoExtensions())
Sanitize untrusted content

Blackfriday itself does nothing to protect against malicious content. If you are dealing with user-supplied markdown, we recommend running Blackfriday's output through HTML sanitizer such as Bluemonday.

Here's an example of simple usage of Blackfriday together with Bluemonday:

import (
    "github.com/microcosm-cc/bluemonday"
    "github.com/russross/blackfriday"
)

// ...
unsafe := blackfriday.Run(input)
html := bluemonday.UGCPolicy().SanitizeBytes(unsafe)
Custom options

If you want to customize the set of options, use blackfriday.WithExtensions, blackfriday.WithRenderer and blackfriday.WithRefOverride.

You can also check out blackfriday-tool for a more complete example of how to use it. Download and install it using:

go get github.com/russross/blackfriday-tool

This is a simple command-line tool that allows you to process a markdown file using a standalone program. You can also browse the source directly on github if you are just looking for some example code:

Note that if you have not already done so, installing blackfriday-tool will be sufficient to download and install blackfriday in addition to the tool itself. The tool binary will be installed in $GOPATH/bin. This is a statically-linked binary that can be copied to wherever you need it without worrying about dependencies and library versions.

Features

All features of Sundown are supported, including:

  • Compatibility. The Markdown v1.0.3 test suite passes with the --tidy option. Without --tidy, the differences are mostly in whitespace and entity escaping, where blackfriday is more consistent and cleaner.

  • Common extensions, including table support, fenced code blocks, autolinks, strikethroughs, non-strict emphasis, etc.

  • Safety. Blackfriday is paranoid when parsing, making it safe to feed untrusted user input without fear of bad things happening. The test suite stress tests this and there are no known inputs that make it crash. If you find one, please let me know and send me the input that does it.

    NOTE: "safety" in this context means runtime safety only. In order to protect yourself against JavaScript injection in untrusted content, see this example.

  • Fast processing. It is fast enough to render on-demand in most web applications without having to cache the output.

  • Thread safety. You can run multiple parsers in different goroutines without ill effect. There is no dependence on global shared state.

  • Minimal dependencies. Blackfriday only depends on standard library packages in Go. The source code is pretty self-contained, so it is easy to add to any project, including Google App Engine projects.

  • Standards compliant. Output successfully validates using the W3C validation tool for HTML 4.01 and XHTML 1.0 Transitional.

Extensions

In addition to the standard markdown syntax, this package implements the following extensions:

  • Intra-word emphasis supression. The _ character is commonly used inside words when discussing code, so having markdown interpret it as an emphasis command is usually the wrong thing. Blackfriday lets you treat all emphasis markers as normal characters when they occur inside a word.

  • Tables. Tables can be created by drawing them in the input using a simple syntax:

    Name    | Age
    --------|------
    Bob     | 27
    Alice   | 23
    
  • Fenced code blocks. In addition to the normal 4-space indentation to mark code blocks, you can explicitly mark them and supply a language (to make syntax highlighting simple). Just mark it like this:

    ```go
    func getTrue() bool {
        return true
    }
    ```
    

    You can use 3 or more backticks to mark the beginning of the block, and the same number to mark the end of the block.

  • Definition lists. A simple definition list is made of a single-line term followed by a colon and the definition for that term.

    Cat
    : Fluffy animal everyone likes
    
    Internet
    : Vector of transmission for pictures of cats
    

    Terms must be separated from the previous definition by a blank line.

  • Footnotes. A marker in the text that will become a superscript number; a footnote definition that will be placed in a list of footnotes at the end of the document. A footnote looks like this:

    This is a footnote.[^1]
    
    [^1]: the footnote text.
    
  • Autolinking. Blackfriday can find URLs that have not been explicitly marked as links and turn them into links.

  • Strikethrough. Use two tildes (~~) to mark text that should be crossed out.

  • Hard line breaks. With this extension enabled newlines in the input translate into line breaks in the output. This extension is off by default.

  • Smart quotes. Smartypants-style punctuation substitution is supported, turning normal double- and single-quote marks into curly quotes, etc.

  • LaTeX-style dash parsing is an additional option, where -- is translated into –, and --- is translated into —. This differs from most smartypants processors, which turn a single hyphen into an ndash and a double hyphen into an mdash.

  • Smart fractions, where anything that looks like a fraction is translated into suitable HTML (instead of just a few special cases like most smartypant processors). For example, 4/5 becomes <sup>4</sup>&frasl;<sub>5</sub>, which renders as 45.

Other renderers

Blackfriday is structured to allow alternative rendering engines. Here are a few of note:

  • github_flavored_markdown: provides a GitHub Flavored Markdown renderer with fenced code block highlighting, clickable heading anchor links.

    It's not customizable, and its goal is to produce HTML output equivalent to the GitHub Markdown API endpoint, except the rendering is performed locally.

  • markdownfmt: like gofmt, but for markdown.

  • LaTeX output: renders output as LaTeX.

Todo

  • More unit testing
  • Improve unicode support. It does not understand all unicode rules (about what constitutes a letter, a punctuation symbol, etc.), so it may fail to detect word boundaries correctly in some instances. It is safe on all utf-8 input.

License

Blackfriday is distributed under the Simplified BSD License

Documentation

Overview

Package blackfriday is a markdown processor.

It translates plain text with simple formatting rules into an AST, which can then be further processed to HTML (provided by Blackfriday itself) or other formats (provided by the community).

The simplest way to invoke Blackfriday is to call the Run function. It will take a text input and produce a text output in HTML (or other format).

A slightly more sophisticated way to use Blackfriday is to create a Markdown processor and to call Parse, which returns a syntax tree for the input document. You can leverage Blackfriday's parsing for content extraction from markdown documents. You can assign a custom renderer and set various options to the Markdown processor.

If you're interested in calling Blackfriday from command line, see https://github.com/russross/blackfriday-tool.

Index

Constants

View Source
const (
	NoExtensions           Extensions = 0
	NoIntraEmphasis        Extensions = 1 << iota // Ignore emphasis markers inside words
	Tables                                        // Render tables
	FencedCode                                    // Render fenced code blocks
	Autolink                                      // Detect embedded URLs that are not explicitly marked
	Strikethrough                                 // Strikethrough text using ~~test~~
	LaxHTMLBlocks                                 // Loosen up HTML block parsing rules
	SpaceHeadings                                 // Be strict about prefix heading rules
	HardLineBreak                                 // Translate newlines into line breaks
	TabSizeEight                                  // Expand tabs to eight spaces instead of four
	Footnotes                                     // Pandoc-style footnotes
	NoEmptyLineBeforeBlock                        // No need to insert an empty line to start a (code, quote, ordered list, unordered list) block
	HeadingIDs                                    // specify heading IDs  with {#id}
	Titleblock                                    // Titleblock ala pandoc
	AutoHeadingIDs                                // Create the heading ID from the text
	BackslashLineBreak                            // Translate trailing backslashes into line breaks
	DefinitionLists                               // Render definition lists

	CommonHTMLFlags HTMLFlags = UseXHTML | Smartypants |
		SmartypantsFractions | SmartypantsDashes | SmartypantsLatexDashes

	CommonExtensions Extensions = NoIntraEmphasis | Tables | FencedCode |
		Autolink | Strikethrough | SpaceHeadings | HeadingIDs |
		BackslashLineBreak | DefinitionLists
)

These are the supported markdown parsing extensions. OR these values together to select multiple extensions.

View Source
const (
	TabSizeDefault = 4
	TabSizeDouble  = 8
)

The size of a tab stop.

View Source
const Version = "2.0"

Version string of the package. Appears in the rendered document when CompletePage flag is on.

Variables

This section is empty.

Functions

func Run

func Run(input []byte, opts ...Option) []byte

Run is the main entry point to Blackfriday. It parses and renders a block of markdown-encoded text.

The simplest invocation of Run takes one argument, input:

output := Run(input)

This will parse the input with CommonExtensions enabled and render it with the default HTMLRenderer (with CommonHTMLFlags).

Variadic arguments opts can customize the default behavior. Since Markdown type does not contain exported fields, you can not use it directly. Instead, use the With* functions. For example, this will call the most basic functionality, with no extensions:

output := Run(input, WithNoExtensions())

You can use any number of With* arguments, even contradicting ones. They will be applied in order of appearance and the latter will override the former:

output := Run(input, WithNoExtensions(), WithExtensions(exts),
    WithRenderer(yourRenderer))

Types

type CellAlignFlags

type CellAlignFlags int

CellAlignFlags holds a type of alignment in a table cell.

const (
	TableAlignmentLeft CellAlignFlags = 1 << iota
	TableAlignmentRight
	TableAlignmentCenter = (TableAlignmentLeft | TableAlignmentRight)
)

These are the possible flag values for the table cell renderer. Only a single one of these values will be used; they are not ORed together. These are mostly of interest if you are writing a new output format.

type CodeBlockData

type CodeBlockData struct {
	IsFenced    bool   // Specifies whether it's a fenced code block or an indented one
	Info        []byte // This holds the info string
	FenceChar   byte
	FenceLength int
	FenceOffset int
}

CodeBlockData contains fields relevant to a CodeBlock node type.

type Extensions

type Extensions int

Extensions is a bitwise or'ed collection of enabled Blackfriday's extensions.

type HTMLFlags

type HTMLFlags int

HTMLFlags control optional behavior of HTML renderer.

const (
	HTMLFlagsNone           HTMLFlags = 0
	SkipHTML                HTMLFlags = 1 << iota // Skip preformatted HTML blocks
	SkipImages                                    // Skip embedded images
	SkipLinks                                     // Skip all links
	Safelink                                      // Only link to trusted protocols
	NofollowLinks                                 // Only link with rel="nofollow"
	NoreferrerLinks                               // Only link with rel="noreferrer"
	HrefTargetBlank                               // Add a blank target
	CompletePage                                  // Generate a complete HTML page
	UseXHTML                                      // Generate XHTML output instead of HTML
	FootnoteReturnLinks                           // Generate a link at the end of a footnote to return to the source
	Smartypants                                   // Enable smart punctuation substitutions
	SmartypantsFractions                          // Enable smart fractions (with Smartypants)
	SmartypantsDashes                             // Enable smart dashes (with Smartypants)
	SmartypantsLatexDashes                        // Enable LaTeX-style dashes (with Smartypants)
	SmartypantsAngledQuotes                       // Enable angled double quotes (with Smartypants) for double quotes rendering
	SmartypantsQuotesNBSP                         // Enable « French guillemets » (with Smartypants)
	TOC                                           // Generate a table of contents
)

HTML renderer configuration options.

type HTMLRenderer

type HTMLRenderer struct {
	HTMLRendererParameters
	// contains filtered or unexported fields
}

HTMLRenderer is a type that implements the Renderer interface for HTML output.

Do not create this directly, instead use the NewHTMLRenderer function.

func NewHTMLRenderer

func NewHTMLRenderer(params HTMLRendererParameters) *HTMLRenderer

NewHTMLRenderer creates and configures an HTMLRenderer object, which satisfies the Renderer interface.

func (*HTMLRenderer) RenderFooter

func (r *HTMLRenderer) RenderFooter(w io.Writer, ast *Node)

RenderFooter writes HTML document footer.

func (*HTMLRenderer) RenderHeader

func (r *HTMLRenderer) RenderHeader(w io.Writer, ast *Node)

RenderHeader writes HTML document preamble and TOC if requested.

func (*HTMLRenderer) RenderNode

func (r *HTMLRenderer) RenderNode(w io.Writer, node *Node, entering bool) WalkStatus

RenderNode is a default renderer of a single node of a syntax tree. For block nodes it will be called twice: first time with entering=true, second time with entering=false, so that it could know when it's working on an open tag and when on close. It writes the result to w.

The return value is a way to tell the calling walker to adjust its walk pattern: e.g. it can terminate the traversal by returning Terminate. Or it can ask the walker to skip a subtree of this node by returning SkipChildren. The typical behavior is to return GoToNext, which asks for the usual traversal to the next node.

type HTMLRendererParameters

type HTMLRendererParameters struct {
	// Prepend this text to each relative URL.
	AbsolutePrefix string
	// Add this text to each footnote anchor, to ensure uniqueness.
	FootnoteAnchorPrefix string
	// Show this text inside the <a> tag for a footnote return link, if the
	// HTML_FOOTNOTE_RETURN_LINKS flag is enabled. If blank, the string
	// <sup>[return]</sup> is used.
	FootnoteReturnLinkContents string
	// If set, add this text to the front of each Heading ID, to ensure
	// uniqueness.
	HeadingIDPrefix string
	// If set, add this text to the back of each Heading ID, to ensure uniqueness.
	HeadingIDSuffix string

	Title string // Document title (used if CompletePage is set)
	CSS   string // Optional CSS file URL (used if CompletePage is set)
	Icon  string // Optional icon file URL (used if CompletePage is set)

	Flags HTMLFlags // Flags allow customizing this renderer's behavior
}

HTMLRendererParameters is a collection of supplementary parameters tweaking the behavior of various parts of HTML renderer.

type HeadingData

type HeadingData struct {
	Level        int    // This holds the heading level number
	HeadingID    string // This might hold heading ID, if present
	IsTitleblock bool   // Specifies whether it's a title block
}

HeadingData contains fields relevant to a Heading node type.

type LinkData

type LinkData struct {
	Destination []byte // Destination is what goes into a href
	Title       []byte // Title is the tooltip thing that goes in a title attribute
	NoteID      int    // NoteID contains a serial number of a footnote, zero if it's not a footnote
	Footnote    *Node  // If it's a footnote, this is a direct link to the footnote Node. Otherwise nil.
}

LinkData contains fields relevant to a Link node type.

type ListData

type ListData struct {
	ListFlags       ListType
	Tight           bool   // Skip <p>s around list item data if true
	BulletChar      byte   // '*', '+' or '-' in bullet lists
	Delimiter       byte   // '.' or ')' after the number in ordered lists
	RefLink         []byte // If not nil, turns this list item into a footnote item and triggers different rendering
	IsFootnotesList bool   // This is a list of footnotes
}

ListData contains fields relevant to a List and Item node type.

type ListType

type ListType int

ListType contains bitwise or'ed flags for list and list item objects.

const (
	ListTypeOrdered ListType = 1 << iota
	ListTypeDefinition
	ListTypeTerm

	ListItemContainsBlock
	ListItemBeginningOfList // TODO: figure out if this is of any use now
	ListItemEndOfList
)

These are the possible flag values for the ListItem renderer. Multiple flag values may be ORed together. These are mostly of interest if you are writing a new output format.

type Markdown

type Markdown struct {
	// contains filtered or unexported fields
}

Markdown is a type that holds extensions and the runtime state used by Parse, and the renderer. You can not use it directly, construct it with New.

func New

func New(opts ...Option) *Markdown

New constructs a Markdown processor. You can use the same With* functions as for Run() to customize parser's behavior and the renderer.

func (*Markdown) Parse

func (p *Markdown) Parse(input []byte) *Node

Parse is an entry point to the parsing part of Blackfriday. It takes an input markdown document and produces a syntax tree for its contents. This tree can then be rendered with a default or custom renderer, or analyzed/transformed by the caller to whatever non-standard needs they have. The return value is the root node of the syntax tree.

type Node

type Node struct {
	Type       NodeType // Determines the type of the node
	Parent     *Node    // Points to the parent
	FirstChild *Node    // Points to the first child, if any
	LastChild  *Node    // Points to the last child, if any
	Prev       *Node    // Previous sibling; nil if it's the first child
	Next       *Node    // Next sibling; nil if it's the last child

	Literal []byte // Text contents of the leaf nodes

	HeadingData   // Populated if Type is Heading
	ListData      // Populated if Type is List
	CodeBlockData // Populated if Type is CodeBlock
	LinkData      // Populated if Type is Link
	TableCellData // Populated if Type is TableCell
	// contains filtered or unexported fields
}

Node is a single element in the abstract syntax tree of the parsed document. It holds connections to the structurally neighboring nodes and, for certain types of nodes, additional information that might be needed when rendering.

func NewNode

func NewNode(typ NodeType) *Node

NewNode allocates a node of a specified type.

func (*Node) AppendChild

func (n *Node) AppendChild(child *Node)

AppendChild adds a node 'child' as a child of 'n'. It panics if either node is nil.

func (*Node) InsertBefore

func (n *Node) InsertBefore(sibling *Node)

InsertBefore inserts 'sibling' immediately before 'n'. It panics if either node is nil.

func (*Node) String

func (n *Node) String() string
func (n *Node) Unlink()

Unlink removes node 'n' from the tree. It panics if the node is nil.

func (*Node) Walk

func (n *Node) Walk(visitor NodeVisitor)

Walk is a convenience method that instantiates a walker and starts a traversal of subtree rooted at n.

type NodeType

type NodeType int

NodeType specifies a type of a single node of a syntax tree. Usually one node (and its type) corresponds to a single markdown feature, e.g. emphasis or code block.

const (
	Document NodeType = iota
	BlockQuote
	List
	Item
	Paragraph
	Heading
	HorizontalRule
	Emph
	Strong
	Del
	Link
	Image
	Text
	HTMLBlock
	CodeBlock
	Softbreak
	Hardbreak
	Code
	HTMLSpan
	Table
	TableCell
	TableHead
	TableBody
	TableRow
)

Constants for identifying different types of nodes. See NodeType.

func (NodeType) String

func (t NodeType) String() string

type NodeVisitor

type NodeVisitor func(node *Node, entering bool) WalkStatus

NodeVisitor is a callback to be called when traversing the syntax tree. Called twice for every node: once with entering=true when the branch is first visited, then with entering=false after all the children are done.

type Option

type Option func(*Markdown)

Option customizes the Markdown processor's default behavior.

func WithExtensions

func WithExtensions(e Extensions) Option

WithExtensions allows you to pick some of the many extensions provided by Blackfriday. You can bitwise OR them.

func WithNoExtensions

func WithNoExtensions() Option

WithNoExtensions turns off all extensions and custom behavior.

func WithRefOverride

func WithRefOverride(o ReferenceOverrideFunc) Option

WithRefOverride sets an optional function callback that is called every time a reference is resolved.

In Markdown, the link reference syntax can be made to resolve a link to a reference instead of an inline URL, in one of the following ways:

  • [link text][refid]
  • [refid][]

Usually, the refid is defined at the bottom of the Markdown document. If this override function is provided, the refid is passed to the override function first, before consulting the defined refids at the bottom. If the override function indicates an override did not occur, the refids at the bottom will be used to fill in the link details.

func WithRenderer

func WithRenderer(r Renderer) Option

WithRenderer allows you to override the default renderer.

type Reference

type Reference struct {
	// Link is usually the URL the reference points to.
	Link string
	// Title is the alternate text describing the link in more detail.
	Title string
	// Text is the optional text to override the ref with if the syntax used was
	// [refid][]
	Text string
}

Reference represents the details of a link. See the documentation in Options for more details on use-case.

type ReferenceOverrideFunc

type ReferenceOverrideFunc func(reference string) (ref *Reference, overridden bool)

ReferenceOverrideFunc is expected to be called with a reference string and return either a valid Reference type that the reference string maps to or nil. If overridden is false, the default reference logic will be executed. See the documentation in Options for more details on use-case.

type Renderer

type Renderer interface {
	// RenderNode is the main rendering method. It will be called once for
	// every leaf node and twice for every non-leaf node (first with
	// entering=true, then with entering=false). The method should write its
	// rendition of the node to the supplied writer w.
	RenderNode(w io.Writer, node *Node, entering bool) WalkStatus

	// RenderHeader is a method that allows the renderer to produce some
	// content preceding the main body of the output document. The header is
	// understood in the broad sense here. For example, the default HTML
	// renderer will write not only the HTML document preamble, but also the
	// table of contents if it was requested.
	//
	// The method will be passed an entire document tree, in case a particular
	// implementation needs to inspect it to produce output.
	//
	// The output should be written to the supplied writer w. If your
	// implementation has no header to write, supply an empty implementation.
	RenderHeader(w io.Writer, ast *Node)

	// RenderFooter is a symmetric counterpart of RenderHeader.
	RenderFooter(w io.Writer, ast *Node)
}

Renderer is the rendering interface. This is mostly of interest if you are implementing a new rendering format.

Only an HTML implementation is provided in this repository, see the README for external implementations.

type SPRenderer

type SPRenderer struct {
	// contains filtered or unexported fields
}

SPRenderer is a struct containing state of a Smartypants renderer.

func NewSmartypantsRenderer

func NewSmartypantsRenderer(flags HTMLFlags) *SPRenderer

NewSmartypantsRenderer constructs a Smartypants renderer object.

func (*SPRenderer) Process

func (r *SPRenderer) Process(w io.Writer, text []byte)

Process is the entry point of the Smartypants renderer.

type TableCellData

type TableCellData struct {
	IsHeader bool           // This tells if it's under the header row
	Align    CellAlignFlags // This holds the value for align attribute
}

TableCellData contains fields relevant to a TableCell node type.

type WalkStatus

type WalkStatus int

WalkStatus allows NodeVisitor to have some control over the tree traversal. It is returned from NodeVisitor and different values allow Node.Walk to decide which node to go to next.

const (
	// GoToNext is the default traversal of every node.
	GoToNext WalkStatus = iota
	// SkipChildren tells walker to skip all children of current node.
	SkipChildren
	// Terminate tells walker to terminate the traversal.
	Terminate
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL