README
¶
JS

This package is a JS lexer (ECMA-262, edition 6.0) written in Go. It follows the specification at ECMAScript Language Specification. The lexer takes an io.Reader and converts it into tokens until the EOF.
Installation
Run the following command
go get github.com/tdewolff/parse/js
or add the following import and run project with go get
import "github.com/tdewolff/parse/js"
Lexer
Usage
The following initializes a new Lexer with io.Reader r
:
l := js.NewLexer(r)
To tokenize until EOF an error, use:
for {
tt, text := l.Next()
switch tt {
case js.ErrorToken:
// error or EOF set in l.Err()
return
// ...
}
}
All tokens (see ECMAScript Language Specification):
ErrorToken TokenType = iota // extra token when errors occur
UnknownToken // extra token when no token can be matched
WhitespaceToken // space \t \v \f
LineTerminatorToken // \r \n \r\n
CommentToken
IdentifierToken // also: null true false
PunctuatorToken /* { } ( ) [ ] . ; , < > <= >= == != === !== + - * % ++ -- << >>
>>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= => */
NumericToken
StringToken
RegexpToken
TemplateToken
Quirks
Because the ECMAScript specification for PunctuatorToken
(of which the /
and /=
symbols) and RegexpToken
depends on a parser state to differentiate between the two, the lexer (to remain modular) uses different rules. It aims to correctly disambiguate contexts and returns RegexpToken
or PunctuatorToken
where appropriate with only few exceptions which don't make much sense in runtime and so don't happen in a real-world code: function literal division (x = function y(){} / z
) and object literal division (x = {y:1} / z
).
Another interesting case introduced by ES2015 is yield
operator in function generators vs yield
as an identifier in regular functions. This was done for backward compatibility, but is very hard to disambiguate correctly on a lexer level without essentially implementing entire parsing spec as a state machine and hurting performance, code readability and maintainability, so, instead, yield
is just always assumed to be an operator. In combination with above paragraph, this means that, for example, yield /x/i
will be always parsed as yield
-ing regular expression and not as yield
identifier divided by x
and then i
. There is no evidence though that this pattern occurs in any popular libraries.
Examples
package main
import (
"os"
"github.com/tdewolff/parse/js"
)
// Tokenize JS from stdin.
func main() {
l := js.NewLexer(os.Stdin)
for {
tt, text := l.Next()
switch tt {
case js.ErrorToken:
if l.Err() != io.EOF {
fmt.Println("Error on line", l.Line(), ":", l.Err())
}
return
case js.IdentifierToken:
fmt.Println("Identifier", string(text))
case js.NumericToken:
fmt.Println("Numeric", string(text))
// ...
}
}
}
License
Released under the MIT license.
Documentation
¶
Overview ¶
Package js is an ECMAScript5.1 lexer following the specifications at http://www.ecma-international.org/ecma-262/5.1/.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Hash ¶
type Hash uint32
Hash defines perfect hashes for a predefined list of strings
const ( Break Hash = 0x5 // break Case Hash = 0x3404 // case Catch Hash = 0xba05 // catch Class Hash = 0x505 // class Const Hash = 0x2c05 // const Continue Hash = 0x3e08 // continue Debugger Hash = 0x8408 // debugger Default Hash = 0xab07 // default Delete Hash = 0xcd06 // delete Do Hash = 0x4c02 // do Else Hash = 0x3704 // else Enum Hash = 0x3a04 // enum Export Hash = 0x1806 // export Extends Hash = 0x4507 // extends False Hash = 0x5a05 // false Finally Hash = 0x7a07 // finally For Hash = 0xc403 // for Function Hash = 0x4e08 // function If Hash = 0x5902 // if Implements Hash = 0x5f0a // implements Import Hash = 0x6906 // import In Hash = 0x4202 // in Instanceof Hash = 0x710a // instanceof Interface Hash = 0x8c09 // interface Let Hash = 0xcf03 // let New Hash = 0x1203 // new Null Hash = 0x5504 // null Package Hash = 0x9507 // package Private Hash = 0x9c07 // private Protected Hash = 0xa309 // protected Public Hash = 0xb506 // public Return Hash = 0xd06 // return Static Hash = 0x2f06 // static Super Hash = 0x905 // super Switch Hash = 0x2606 // switch This Hash = 0x2304 // this Throw Hash = 0x1d05 // throw True Hash = 0xb104 // true Try Hash = 0x6e03 // try Typeof Hash = 0xbf06 // typeof Var Hash = 0xc703 // var Void Hash = 0xca04 // void While Hash = 0x1405 // while With Hash = 0x2104 // with Yield Hash = 0x8005 // yield )
Unique hash definitions to be used instead of strings
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer is the state for the lexer.
func NewLexer ¶
NewLexer returns a new Lexer for a given io.Reader.
Example ¶
Output: var x = 'lorem ipsum';
func (*Lexer) Err ¶
Err returns the error encountered during lexing, this is often io.EOF but also other errors can be returned.
type ParsingContext ¶
type ParsingContext uint32
ParsingContext determines the context in which following token should be parsed. This affects parsing regular expressions and template literals.
const ( GlobalContext ParsingContext = iota StmtParensContext ExprParensContext BracesContext TemplateContext )
ParsingContext values
type TokenState ¶
type TokenState uint32
TokenState determines a state in which next token should be read
const ( ExprState TokenState = iota StmtParensState SubscriptState PropNameState )
TokenState values
type TokenType ¶
type TokenType uint32
TokenType determines the type of token, eg. a number or a semicolon.
const ( ErrorToken TokenType = iota // extra token when errors occur UnknownToken // extra token when no token can be matched WhitespaceToken // space \t \v \f LineTerminatorToken // \r \n \r\n CommentToken IdentifierToken PunctuatorToken /* { } ( ) [ ] . ; , < > <= >= == != === !== + - * % ++ -- << >> >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= >= */ NumericToken StringToken RegexpToken TemplateToken )
TokenType values.