js

package

v2.3.1+incompatible Latest Latest Go to latest Published: Nov 12, 2017 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

JS

This package is a JS lexer (ECMA-262, edition 6.0) written in Go. It follows the specification at ECMAScript Language Specification. The lexer takes an io.Reader and converts it into tokens until the EOF.

Installation

Run the following command

go get github.com/tdewolff/parse/js

or add the following import and run project with go get

import "github.com/tdewolff/parse/js"

Lexer

Usage

The following initializes a new Lexer with io.Reader r:

l := js.NewLexer(r)

To tokenize until EOF an error, use:

for {
	tt, text := l.Next()
	switch tt {
	case js.ErrorToken:
		// error or EOF set in l.Err()
		return
	// ...
	}
}

All tokens (see ECMAScript Language Specification):

ErrorToken          TokenType = iota // extra token when errors occur
UnknownToken                         // extra token when no token can be matched
WhitespaceToken                      // space \t \v \f
LineTerminatorToken                  // \r \n \r\n
CommentToken
IdentifierToken // also: null true false
PunctuatorToken /* { } ( ) [ ] . ; , < > <= >= == != === !==  + - * % ++ -- << >>
   >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= => */
NumericToken
StringToken
RegexpToken
TemplateToken

Quirks

Because the ECMAScript specification for PunctuatorToken (of which the / and /= symbols) and RegexpToken depends on a parser state to differentiate between the two, the lexer (to remain modular) uses different rules. It aims to correctly disambiguate contexts and returns RegexpToken or PunctuatorToken where appropriate with only few exceptions which don't make much sense in runtime and so don't happen in a real-world code: function literal division (x = function y(){} / z) and object literal division (x = {y:1} / z).

Another interesting case introduced by ES2015 is yield operator in function generators vs yield as an identifier in regular functions. This was done for backward compatibility, but is very hard to disambiguate correctly on a lexer level without essentially implementing entire parsing spec as a state machine and hurting performance, code readability and maintainability, so, instead, yield is just always assumed to be an operator. In combination with above paragraph, this means that, for example, yield /x/i will be always parsed as yield-ing regular expression and not as yield identifier divided by x and then i. There is no evidence though that this pattern occurs in any popular libraries.

Examples

package main

import (
	"os"

	"github.com/tdewolff/parse/js"
)

// Tokenize JS from stdin.
func main() {
	l := js.NewLexer(os.Stdin)
	for {
		tt, text := l.Next()
		switch tt {
		case js.ErrorToken:
			if l.Err() != io.EOF {
				fmt.Println("Error on line", l.Line(), ":", l.Err())
			}
			return
		case js.IdentifierToken:
			fmt.Println("Identifier", string(text))
		case js.NumericToken:
			fmt.Println("Numeric", string(text))
		// ...
		}
	}
}

License

Released under the MIT license.

Documentation ¶

Overview ¶

Package js is an ECMAScript5.1 lexer following the specifications at http://www.ecma-international.org/ecma-262/5.1/.

Examples ¶

NewLexer

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Hash ¶

type Hash uint32

Hash defines perfect hashes for a predefined list of strings

const (
	Break      Hash = 0x5    // break
	Case       Hash = 0x3404 // case
	Catch      Hash = 0xba05 // catch
	Class      Hash = 0x505  // class
	Const      Hash = 0x2c05 // const
	Continue   Hash = 0x3e08 // continue
	Debugger   Hash = 0x8408 // debugger
	Default    Hash = 0xab07 // default
	Delete     Hash = 0xcd06 // delete
	Do         Hash = 0x4c02 // do
	Else       Hash = 0x3704 // else
	Enum       Hash = 0x3a04 // enum
	Export     Hash = 0x1806 // export
	Extends    Hash = 0x4507 // extends
	False      Hash = 0x5a05 // false
	Finally    Hash = 0x7a07 // finally
	For        Hash = 0xc403 // for
	Function   Hash = 0x4e08 // function
	If         Hash = 0x5902 // if
	Implements Hash = 0x5f0a // implements
	Import     Hash = 0x6906 // import
	In         Hash = 0x4202 // in
	Instanceof Hash = 0x710a // instanceof
	Interface  Hash = 0x8c09 // interface
	Let        Hash = 0xcf03 // let
	New        Hash = 0x1203 // new
	Null       Hash = 0x5504 // null
	Package    Hash = 0x9507 // package
	Private    Hash = 0x9c07 // private
	Protected  Hash = 0xa309 // protected
	Public     Hash = 0xb506 // public
	Return     Hash = 0xd06  // return
	Static     Hash = 0x2f06 // static
	Super      Hash = 0x905  // super
	Switch     Hash = 0x2606 // switch
	This       Hash = 0x2304 // this
	Throw      Hash = 0x1d05 // throw
	True       Hash = 0xb104 // true
	Try        Hash = 0x6e03 // try
	Typeof     Hash = 0xbf06 // typeof
	Var        Hash = 0xc703 // var
	Void       Hash = 0xca04 // void
	While      Hash = 0x1405 // while
	With       Hash = 0x2104 // with
	Yield      Hash = 0x8005 // yield
)

Unique hash definitions to be used instead of strings

func ToHash ¶

func ToHash(s []byte) Hash

ToHash returns the hash whose name is s. It returns zero if there is no such hash. It is case sensitive.

func (Hash) String ¶

func (i Hash) String() string

String returns the hash' name.

type Lexer ¶

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer is the state for the lexer.

func NewLexer ¶

func NewLexer(r io.Reader) *Lexer

NewLexer returns a new Lexer for a given io.Reader.

Example ¶

l := NewLexer(bytes.NewBufferString("var x = 'lorem ipsum';"))
out := ""
for {
	tt, data := l.Next()
	if tt == ErrorToken {
		break
	}
	out += string(data)
}
fmt.Println(out)

Output:

var x = 'lorem ipsum';

func (*Lexer) Err ¶

func (l *Lexer) Err() error

Err returns the error encountered during lexing, this is often io.EOF but also other errors can be returned.

func (*Lexer) Next ¶

func (l *Lexer) Next() (TokenType, []byte)

Next returns the next Token. It returns ErrorToken when an error was encountered. Using Err() one can retrieve the error message.

func (*Lexer) Restore ¶

func (l *Lexer) Restore()

Restore restores the NULL byte at the end of the buffer.

type ParsingContext ¶

type ParsingContext uint32

ParsingContext determines the context in which following token should be parsed. This affects parsing regular expressions and template literals.

const (
	GlobalContext ParsingContext = iota
	StmtParensContext
	ExprParensContext
	BracesContext
	TemplateContext
)

ParsingContext values

type TokenState ¶

type TokenState uint32

TokenState determines a state in which next token should be read

const (
	ExprState TokenState = iota
	StmtParensState
	SubscriptState
	PropNameState
)

TokenState values

type TokenType ¶

type TokenType uint32

TokenType determines the type of token, eg. a number or a semicolon.

const (
	ErrorToken          TokenType = iota // extra token when errors occur
	UnknownToken                         // extra token when no token can be matched
	WhitespaceToken                      // space \t \v \f
	LineTerminatorToken                  // \r \n \r\n
	CommentToken
	IdentifierToken
	PunctuatorToken /* { } ( ) [ ] . ; , < > <= >= == != === !==  + - * % ++ -- << >>
	   >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= >= */
	NumericToken
	StringToken
	RegexpToken
	TemplateToken
)

TokenType values.

func (TokenType) String ¶

func (tt TokenType) String() string

String returns the string representation of a TokenType.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL