README
¶
JS

This package is a JS lexer (ECMA-262, edition 6.0) written in Go. It follows the specification at ECMAScript Language Specification. The lexer takes an io.Reader and converts it into tokens until the EOF.
Installation
Run the following command
go get github.com/tdewolff/parse/js
or add the following import and run project with go get
import "github.com/tdewolff/parse/js"
Lexer
Usage
The following initializes a new Lexer with io.Reader r
:
l := js.NewLexer(r)
To tokenize until EOF an error, use:
for {
tt, text := l.Next()
switch tt {
case js.ErrorToken:
// error or EOF set in l.Err()
return
// ...
}
}
All tokens (see ECMAScript Language Specification):
ErrorToken TokenType = iota // extra token when errors occur
UnknownToken // extra token when no token can be matched
WhitespaceToken // space \t \v \f
LineTerminatorToken // \r \n \r\n
CommentToken
IdentifierToken // also: null true false
PunctuatorToken /* { } ( ) [ ] . ; , < > <= >= == != === !== + - * % ++ -- << >>
>>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= => */
NumericToken
StringToken
RegexpToken
TemplateToken
Quirks
Because the ECMAScript specification for PunctuatorToken
(of which the /
and /=
symbols) and RegexpToken
depends on a parser state to differentiate between the two, the lexer (to remain modular) uses different rules. Whenever /
is encountered and the previous token is one of (,=:[!&|?{};
, it returns a RegexpToken
, otherwise it returns a PunctuatorToken
. This is the same rule JSLint appears to use.
Examples
package main
import (
"os"
"github.com/tdewolff/parse/js"
)
// Tokenize JS from stdin.
func main() {
l := js.NewLexer(os.Stdin)
for {
tt, text := l.Next()
switch tt {
case js.ErrorToken:
if l.Err() != io.EOF {
fmt.Println("Error on line", l.Line(), ":", l.Err())
}
return
case js.IdentifierToken:
fmt.Println("Identifier", string(text))
case js.NumericToken:
fmt.Println("Numeric", string(text))
// ...
}
}
}
License
Released under the MIT license.
Documentation
¶
Overview ¶
Package js is an ECMAScript5.1 lexer following the specifications at http://www.ecma-international.org/ecma-262/5.1/.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Hash ¶
type Hash uint32
uses github.com/tdewolff/hasher
const ( Break Hash = 0x5 Case Hash = 0x3404 Catch Hash = 0xba05 Class Hash = 0x505 Const Hash = 0x2c05 Continue Hash = 0x3e08 Debugger Hash = 0x8408 Default Hash = 0xab07 Delete Hash = 0xcd06 Do Hash = 0x4c02 Else Hash = 0x3704 Enum Hash = 0x3a04 Export Hash = 0x1806 Extends Hash = 0x4507 False Hash = 0x5a05 Finally Hash = 0x7a07 For Hash = 0xc403 Function Hash = 0x4e08 If Hash = 0x5902 Implements Hash = 0x5f0a Import Hash = 0x6906 In Hash = 0x4202 Instanceof Hash = 0x710a Interface Hash = 0x8c09 Let Hash = 0xcf03 New Hash = 0x1203 Null Hash = 0x5504 Package Hash = 0x9507 Private Hash = 0x9c07 Protected Hash = 0xa309 Public Hash = 0xb506 Return Hash = 0xd06 Static Hash = 0x2f06 Super Hash = 0x905 Switch Hash = 0x2606 This Hash = 0x2304 Throw Hash = 0x1d05 True Hash = 0xb104 Try Hash = 0x6e03 Typeof Hash = 0xbf06 Var Hash = 0xc703 Void Hash = 0xca04 While Hash = 0x1405 With Hash = 0x2104 Yield Hash = 0x8005 )
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer is the state for the lexer.
func NewLexer ¶
NewLexer returns a new Lexer for a given io.Reader.
Example ¶
Output: var x = 'lorem ipsum';
func (*Lexer) Err ¶
Err returns the error encountered during lexing, this is often io.EOF but also other errors can be returned.
type TokenType ¶
type TokenType uint32
TokenType determines the type of token, eg. a number or a semicolon.
const ( ErrorToken TokenType = iota // extra token when errors occur UnknownToken // extra token when no token can be matched WhitespaceToken // space \t \v \f LineTerminatorToken // \r \n \r\n CommentToken IdentifierToken PunctuatorToken /* { } ( ) [ ] . ; , < > <= >= == != === !== + - * % ++ -- << >> >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= >= */ NumericToken StringToken RegexpToken TemplateToken )
TokenType values.