highlight

package module

v0.0.0-...-f73c30d Latest Latest Go to latest Published: Sep 19, 2017 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/johnsto/go-highlight

Links

Open Source Insights

README ¶

go-highlight

Description

A somewhat crude syntax highlighter loosely based on Pygments.

See chroma for a newer, better, and far more complete syntax highlighter!

Installation

go-highlight can be installed with the regular go get command:

go get github.com/johnsto/go-highlight

Lexers are stored in a separate package, so remember to install those, too:

go get github.com/johnsto/go-highlight/lexers

The CLI app can also be installed with go get:

go get github.com/johnsto/go-highlight/cmd/highlight

Run tests:

go test github.com/johnsto/go-highlight

Usage

Importing for use in your code is as simple as importing both the base highlight package, and registering the default lexers as an anonymous import:

import "github.com/johnsto/go-highlight"
import _ "github.com/johnsto/go-highlight/lexers"

Tokenizers can be retrieved by content type or filename:

tokenizer, err = highlight.GetTokenizerForContentType("application/json")
// or:
tokenizer, err = highlight.GetTokenizerForFilename("futurama.json")

Use Tokenize or TokenizeString to tokenize an io.Reader or string respectively:

err = tokenizer.Tokenize(reader, func(t highlight.Token) error {
	_, err := fmt.Printf(t.Value)
	return err
})

For colourised terminal output, import the "term" package:

import "github.com/johnsto/go-highlight/output/term"

Then instantiate the emitter and tokenize to it:

emitter = output.NewDebugOutputter()
err = tokenizer.Tokenize(reader, emitter.Emit)

Or for formatted (indented) output, use Format instead:

err = tokenizer.Format(reader, emitter.Emit)

Documentation ¶

Index ¶

Constants
Variables
func GetTokenizers() map[string]Tokenizer
func Register(name string, t Tokenizer)
type Emitter
type Filter
type FilterFunc
- func (f FilterFunc) Filter(out func(Token) error) func(Token) error
type Filters
- func (fs Filters) Filter(out func(Token) error) func(Token) error
type IncludeRule
- func (r IncludeRule) Find(subject string) (int, Rule)
- func (r IncludeRule) Match(subject string) (int, Rule, []Token, error)
- func (r IncludeRule) Stack() []string
type Lexer
- func (l Lexer) AcceptsFilename(name string) (bool, error)
- func (l Lexer) AcceptsMediaType(media string) (bool, error)
- func (l Lexer) Format(r *bufio.Reader, emit func(Token) error) error
- func (l Lexer) ListFilenames() []string
- func (l Lexer) ListMediaTypes() []string
- func (l Lexer) Tokenize(br *bufio.Reader, emit func(Token) error) error
- func (l Lexer) TokenizeString(s string) ([]Token, error)
type RegexpRule
- func NewRegexpRule(re string, t TokenType, subTypes []TokenType, next []string) RegexpRule
- func (r RegexpRule) Find(subject string) (int, Rule)
- func (r RegexpRule) Match(subject string) (int, Rule, []Token, error)
- func (r RegexpRule) Stack() []string
type Rule
type RuleSpec
- func (rs RuleSpec) Compile(sm *StateMap) Rule
type Stack
- func (s *Stack) Empty()
- func (s *Stack) Len() int
- func (s Stack) Peek() string
- func (s *Stack) Pop() string
- func (s *Stack) Push(v string)
type State
- func (s State) Find(subject string) (int, Rule)
- func (s State) Match(subject string) (int, Rule, []Token, error)
type StateMap
- func (m StateMap) Compile() (States, error)
- func (m StateMap) Get(name string) State
type States
type StatesSpec
- func (m StatesSpec) Compile() (States, error)
- func (m StatesSpec) Get(name string) State
- func (m StatesSpec) MustCompile() States
type Token
- func (t Token) String() string
type TokenType
type Tokenizer
- func GetTokenizer(name string) Tokenizer
- func GetTokenizerForContentType(contentType string) (Tokenizer, error)
- func GetTokenizerForFilename(name string) (Tokenizer, error)

Constants ¶

View Source

const (
	// Error, emitted when unexpected token was encountered.
	Error TokenType = "error"
	// Comment e.g. `// this should never happen`
	Comment = "comment"
	// Number - e.g. `2716057` in `"serial": 2716057` or `serial = 2716057;`
	Number = "number"
	// String - e.g. `Fry` in `"name": "Fry"` or `var name = "Fry";`
	String = "string"
	// Text - e.g. `Fry` in `<p>Fry</p>`
	Text = "text"
	// Attribute - e.g. `name` in `"name": "Fry"`, or `font-size` in
	// `font-size: 1.2rem;`
	Attribute = "attribute"
	// Assignment - e.g. `=` in `int x = y;` or `:` in `font-size: 1.2rem;`
	Assignment = "assignment"
	// Operator - e.g. `+`/`-` in `int x = a + b - c;`
	Operator = "operator"
	// Punctuation - e.g. semi/colons in `int x, j;`
	Punctuation = "punctuation"
	// Literal - e.g. `true`/`false`/`null`.
	Literal = "literal"
	// Tag - e.g. `html`/`div`/`b`
	Tag = "tag"
	// Whitespace - e.g. \n, \t
	Whitespace = "whitespace"
)

Variables ¶

View Source

var EndToken = Token{}

View Source

var MergeTokensFilter = FilterFunc(
	func(out func(Token) error) func(Token) error {
		curr := Token{}

		return func(t Token) error {
			if t.Type == "" {
				out(curr)
				return io.EOF
			} else if t.Type == curr.Type {

				curr.Value += t.Value
				return nil
			} else if curr.Value != "" {
				out(curr)
			}
			curr = Token{
				Value: t.Value,
				Type:  t.Type,
				State: t.State,
			}
			return nil
		}
	})

MergeTokensFilter combines Tokens if they have the same type.

View Source

var PassthroughFilter = FilterFunc(
	func(out func(Token) error) func(Token) error {
		return func(t Token) error {
			return out(t)
		}
	})

PassthroughFilter simply emits each token to the output without modification.

View Source

var RemoveEmptiesFilter = FilterFunc(
	func(out func(Token) error) func(Token) error {
		return func(t Token) error {
			if t == EndToken || t.Value != "" {
				return out(t)
			}
			return nil
		}
	})

RemoveEmptiesFilter removes empty (zero-length) tokens from the output.

Functions ¶

func GetTokenizers ¶

func GetTokenizers() map[string]Tokenizer

GetTokenizers returns the map of known Tokenizers.

func Register ¶

func Register(name string, t Tokenizer)

Register registers the given Tokenizer under the specified name. Any existing Tokenizer under that name will be replaced.

Types ¶

type Emitter ¶

type Emitter interface {
	// Emit emits the given token to some output
	Emit(t Token) error
}

Emitter is any type that supports emitting tokens to some output

type Filter ¶

type Filter interface {
	// Filter reads tokens from `in` and outputs tokens to `out`, typically
	// modifying or filtering them along the way. The function should return
	// as soon as the input is exhausted (i.e. the channel is closed), or an
	// error is encountered.
	Filter(out func(Token) error) func(Token) error
}

Filter describes a type that is capable of filtering/processing tokens.

type FilterFunc ¶

type FilterFunc func(out func(Token) error) func(Token) error

FilterFunc is a helper type allowing filter functions to be used as filters.

func (FilterFunc) Filter ¶

func (f FilterFunc) Filter(out func(Token) error) func(Token) error

type Filters ¶

type Filters []Filter

func (Filters) Filter ¶

func (fs Filters) Filter(out func(Token) error) func(Token) error

Filter runs the input through each filter in series, emitting the final result to `out`. This function will return as soon as the last token has been processed, or iff an error is encountered by one of the filters.

It is safe to close the output channel as soon as this function returns.

type IncludeRule ¶

type IncludeRule struct {
	StateMap  *StateMap
	StateName string
}

IncludeRule allows the states of another Rule to be referenced.

func (IncludeRule) Find ¶

func (r IncludeRule) Find(subject string) (int, Rule)

func (IncludeRule) Match ¶

func (r IncludeRule) Match(subject string) (int, Rule, []Token, error)

func (IncludeRule) Stack ¶

func (r IncludeRule) Stack() []string

type Lexer ¶

type Lexer struct {
	Name      string
	States    States
	Filters   Filters
	Formatter Filter
	Filenames []string
	MimeTypes []string
}

Lexer defines a simple state-based lexer.

func (Lexer) AcceptsFilename ¶

func (l Lexer) AcceptsFilename(name string) (bool, error)

AcceptsFilename returns true if this Lexer thinks it is suitable for the given filename. An error will be returned iff an invalid filename pattern is registered by the Lexer.

func (Lexer) AcceptsMediaType ¶

func (l Lexer) AcceptsMediaType(media string) (bool, error)

AcceptsMediaType returns true if this Lexer thinks it is suitable for the given meda (MIME) type. An error wil be returned iff the given mime type is invalid.

func (Lexer) Format ¶

func (l Lexer) Format(r *bufio.Reader, emit func(Token) error) error

func (Lexer) ListFilenames ¶

func (l Lexer) ListFilenames() []string

ListFilenames lists the filename patterns this Lexer supports, e.g. ["*.json"]

func (Lexer) ListMediaTypes ¶

func (l Lexer) ListMediaTypes() []string

ListMediaTypes lists the media types this Lexer supports, e.g. ["application/json"]

func (Lexer) Tokenize ¶

func (l Lexer) Tokenize(br *bufio.Reader, emit func(Token) error) error

Tokenize reads from the given input and emits tokens to the output channel. Will end on any error from the reader, including io.EOF to signify the end of input.

func (Lexer) TokenizeString ¶

func (l Lexer) TokenizeString(s string) ([]Token, error)

TokenizeString is a convenience method

type RegexpRule ¶

type RegexpRule struct {
	Regexp     *regexp.Regexp
	Type       TokenType
	SubTypes   []TokenType
	NextStates []string
}

RegexpRule matches a state if the subject matches a regular expression.

func NewRegexpRule ¶

func NewRegexpRule(re string, t TokenType, subTypes []TokenType,
	next []string) RegexpRule

NewRegexpRule creates a new regular expression Rule.

func (RegexpRule) Find ¶

func (r RegexpRule) Find(subject string) (int, Rule)

Find returns the first position in subject where this Rule will match, or -1 if no match was found.

func (RegexpRule) Match ¶

func (r RegexpRule) Match(subject string) (int, Rule, []Token, error)

Match attempts to match against the beginning of the given search string. Returns the number of characters matched, and an array of tokens.

If the regular expression contains groups, they will be matched with the corresponding token type in `Rule.Types`. Any text inbetween groups will be returned using the token type defined by `Rule.Type`.

func (RegexpRule) Stack ¶

func (r RegexpRule) Stack() []string

type Rule ¶

type Rule interface {
	Find(subject string) (int, Rule)
	Match(subject string) (int, Rule, []Token, error)
	Stack() []string
}

type RuleSpec ¶

type RuleSpec struct {
	// Regexp is the regular expression this rule should match against.
	Regexp string
	// Type is the token type for strings that match this rule.
	Type TokenType
	// SubTypes contains an ordered array of token types matching the order
	// of groups in the Regexp expression.
	SubTypes []TokenType
	// State indicates the next state to migrate to if this rule is
	// triggered.
	State string
	// Include specifies a state to run
	Include string
}

Rule describes the conditions required to match some subject text.

func (RuleSpec) Compile ¶

func (rs RuleSpec) Compile(sm *StateMap) Rule

Compile converts the RuleSpec shorthand into a fully-fledged Rule.

type Stack ¶

type Stack []string

Stack is a simple stack of string values.

func (*Stack) Empty ¶

func (s *Stack) Empty()

Empty removes all elements from the stack.

func (*Stack) Len ¶

func (s *Stack) Len() int

func (Stack) Peek ¶

func (s Stack) Peek() string

Peek returns the item on the top of the stack, but does not pop it.

func (*Stack) Pop ¶

func (s *Stack) Pop() string

Pop removes and returns the item on the top of the stack.

func (*Stack) Push ¶

func (s *Stack) Push(v string)

Push puts a new value on to the top of the stack

type State ¶

type State []Rule

State is a list of matching Rules.

func (State) Find ¶

func (s State) Find(subject string) (int, Rule)

Find examines the provided string, looking for a match within the current state. It returns the position `n` at which a rule match was found, and the rule itself.

-1 will be returned if no rule could be matched, in which case the caller should disregard the string entirely (emit it as an error), and continue onto the next line of input.

0 will be returned if a rule matches at the start of the string.

Otherwise, this function will return a number of characters to skip before reaching the first matched rule. The caller should emit those first `n` characters as an error, and emit the remaining characters according to the rule.

func (State) Match ¶

func (s State) Match(subject string) (int, Rule, []Token, error)

Match tests the subject text against all rules within the State. If a match is found, it returns the number of characters consumed, a series of tokens consumed from the subject text, and the specific Rule that was succesfully matched against.

If the start of the subject text can not be matched against any known rule, it will return a position of -1 and a nil Rule.

type StateMap ¶

type StateMap map[string]State

StateMap is a map of states to their names.

func (StateMap) Compile ¶

func (m StateMap) Compile() (States, error)

Compile does nothing.

func (StateMap) Get ¶

func (m StateMap) Get(name string) State

Get returns the State with the given name.

type States ¶

type States interface {
	Get(name string) State
	Compile() (States, error)
}

States contains lexer states

type StatesSpec ¶

type StatesSpec map[string][]RuleSpec

StatesSpec is a container for Lexer rule specifications, and can be compiled into a full state machine.

func (StatesSpec) Compile ¶

func (m StatesSpec) Compile() (States, error)

Compile compiles the specified states into a complete State machine, returning an error if any state fails to compile for any reason.

func (StatesSpec) Get ¶

func (m StatesSpec) Get(name string) State

func (StatesSpec) MustCompile ¶

func (m StatesSpec) MustCompile() States

MustCompile is a helper method that compiles the State specification, panicing on error.

type Token ¶

type Token struct {
	Value string
	Type  TokenType
	State string
}

Token represents one item of parsed output, containing the parsed value and its detected type.

func (Token) String ¶

func (t Token) String() string

type TokenType ¶

type TokenType string

type Tokenizer ¶

type Tokenizer interface {
	// Tokenize reads from the given input and emits tokens to the output
	// channel. Will end on any error from the reader, including io.EOF to
	// signify the end of input.
	Tokenize(*bufio.Reader, func(Token) error) error

	// Format behaves exactly as Tokenize, except it also formats the output.
	Format(*bufio.Reader, func(Token) error) error

	// AcceptsFilename returns true if this Lexer thinks it is suitable for
	// the given filename. An error will be returned iff an invalid filename
	// pattern is registered by the Lexer.
	AcceptsFilename(name string) (bool, error)

	// AcceptsMediaType returns true if this Lexer thinks it is suitable for
	// the given meda (MIME) type. An error wil be returned iff the given mime
	// type is invalid.
	AcceptsMediaType(name string) (bool, error)

	// ListMediaTypes lists the media types this Tokenizer advertises support
	// for, e.g. ["application/json"]
	ListMediaTypes() []string

	// ListFilenames lists the filename patterns this Tokenizer advertises
	// support for, e.g. ["*.json"]
	ListFilenames() []string
}

Tokenizer represents a type capable of tokenizing data from an input source.

func GetTokenizer ¶

func GetTokenizer(name string) Tokenizer

GetTokenizer returns the Tokenizer of the given name.

func GetTokenizerForContentType ¶

func GetTokenizerForContentType(contentType string) (Tokenizer, error)

GetTokenizerForContentType returns a Tokenizer for the given content type (e.g. "text/html" or "application/json"), or nil if one is not found.

func GetTokenizerForFilename ¶

func GetTokenizerForFilename(name string) (Tokenizer, error)

GetTokenizerForFilename returns a Tokenizer for the given filename (e.g. "index.html" or "jasons.json"), or nil if one is not found.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
highlight
lexers
output
term

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL