Documentation ¶
Overview ¶
Package tokenizer tokenizes CSS based on part four of the CSS Syntax Module Level 3 (W3C Candidate Recommendation Draft), 24 December 2021.
The main elements of this package are the New function, which returns a new Tokenizer, and the Tokenizer.Next method.
This package also exposes several low-level "Consume" functions, which implement specific algorithms in the CSS specification. Note that all "Consume" functions may panic on I/O error. The Tokenizer.Next method catches these panics. Also note that all "Consume" functions operate on a steam of filtered code points (see https://www.w3.org/TR/css-syntax-3/#input-preprocessing), not raw input. This is implemented by css/tokenizer/filter.Transform and automatically handled by a New Tokenizer.
Disclaimer: although this software runs against a thorough and diverse set of test cases, no claims are made of this software's performance or conformance against the W3C Specification itself (because there is no official W3C test suite for the tokenization step alone).
This software includes material derived from CSS Syntax Module Level 3, W3C Candidate Recommendation Draft, 24 December 2021. Copyright © 2021 W3C® (MIT, ERCIM, Keio, Beihang). See LICENSE-PARTS.txt and TRADEMARKS.md.
Index ¶
- Variables
- func ConsumeBadUrl(rdr *runeio.Reader)
- func ConsumeComments(rdr *runeio.Reader) error
- func ConsumeEscapedCodepoint(rdr *runeio.Reader) rune
- func ConsumeIdentLikeToken(rdr *runeio.Reader) (token.Token, error)
- func ConsumeIdentSequence(rdr *runeio.Reader) string
- func ConsumeNumber(rdr *runeio.Reader) (nt token.NumberType, repr string, value float64)
- func ConsumeNumericToken(rdr *runeio.Reader) token.Token
- func ConsumeString(rdr *runeio.Reader, endpoint rune) (t token.Token, err error)
- func ConsumeUrlToken(rdr *runeio.Reader) (token.Token, error)
- func ConsumeWhitespace(rdr *runeio.Reader) token.Token
- func StringToNumber(x string) float64
- type Tokenizer
Examples ¶
Constants ¶
This section is empty.
Variables ¶
Functions ¶
func ConsumeBadUrl ¶
ConsumeBadUrl consumes the remnants of a bad url from a stream of code points, "cleaning up" after the tokenizer realizes that it’s in the middle of a <bad-url-token> rather than a <url-token>. It returns nothing; its sole use is to consume enough of the input stream to reach a recovery point where normal tokenizing can resume.
func ConsumeComments ¶
ConsumeComments consumes zero or more CSS comments.
func ConsumeEscapedCodepoint ¶
ConsumeEscapedCodepoint consumes an escaped code point. It assumes that the U+005C REVERSE SOLIDUS (\) has already been consumed and that the next input code point has already been verified to be part of a valid escape.
func ConsumeIdentLikeToken ¶
ConsumeIdentLikeToken consumes an ident-like token from a stream of code points. It returns an <ident-token>, <function-token>, <url-token>, or <bad-url-token>.
func ConsumeIdentSequence ¶
ConsumeIdentSequence consumes an ident sequence from a stream of code points. It returns a string containing the largest name that can be formed from adjacent code points in the stream, starting from the first.
Note: This algorithm does not do the verification of the first few code points that are necessary to ensure the returned code points would constitute an <ident-token>. If that is the intended use, ensure that the stream starts with an ident sequence before calling this algorithm.
func ConsumeNumber ¶
ConsumeNumber consumes a number from a stream of code points. It returns a representation, a numeric value, and a type which is either "integer" or "number".
The representation is the token lexeme as it appears in the input stream. This preserves details such as whether .009 was written as .009 or 9e-3.
Note: This algorithm does not do the verification of the first few code points that are necessary to ensure a number can be obtained from the stream. Ensure that the stream starts with a number before calling this algorithm.
func ConsumeNumericToken ¶
ConsumeNumericToken consumes a numeric token from a stream of code points. It returns either a <number-token>, <percentage-token>, or <dimension-token>.
func ConsumeString ¶
ConsumeString consumes a string token. It is assumed that the character that opens a string (if any) has already been consumed. Returns either a <string-token> or a <bad-string-token>. Endpoint specifies the codepoint that terminates the string (e.g. a double or single quotation mark).
func ConsumeUrlToken ¶
ConsumeUrlToken describes how to consume a url token from a stream of code points. It returns either a <url-token> or a <bad-url-token>.
Note: This algorithm assumes that the initial "url(" has already been consumed. This algorithm also assumes that it’s being called to consume an "unquoted" value, like url(foo). A quoted value, like url("foo"), is parsed as a <function-token>. ConsumeIdentLikeToken automatically handles this distinction; this algorithm shouldn’t be called directly otherwise.
func ConsumeWhitespace ¶
ConsumeWhitespace consumes as much whitespace as possible and returns a <whitespace-token>.
func StringToNumber ¶
StringToNumber describes how to convert a string to a number according to the CSS specification.
Note: This algorithm does not do any verification to ensure that the string contains only a number. Ensure that the string contains only a valid CSS number before calling this algorithm.
Types ¶
type Tokenizer ¶
type Tokenizer struct {
// contains filtered or unexported fields
}
Example ¶
package main import ( "fmt" "strings" "github.com/tawesoft/golib/v2/css/tokenizer" "github.com/tawesoft/golib/v2/css/tokenizer/token" ) func main() { str := `/* example */ #something[rel~="external"] { background-color: rgb(128, 64, 64); }` t := tokenizer.New(strings.NewReader(str)) for { tok := t.NextExcept(token.TypeWhitespace) if tok.Is(token.TypeEOF) { break } fmt.Println(tok) } if len(t.Errors()) > 0 { fmt.Printf("%v\n", t.Errors()) } }
Output: <hash-token>{type: "id", value: "something"} <[-token> <ident-token>{value: "rel"} <delim-token>{delim: '~'} <delim-token>{delim: '='} <string-token>{value: "external"} <]-token> <{-token> <ident-token>{value: "background-color"} <colon-token> <function-token>{value: "rgb"} <number-token>{type: "integer", value: 128.000000, repr: "128"} <comma-token> <number-token>{type: "integer", value: 64.000000, repr: "64"} <comma-token> <number-token>{type: "integer", value: 64.000000, repr: "64"} <)-token> <semicolon-token> <}-token>
func (*Tokenizer) Next ¶
Next returns the next token from the input stream. Once the stream has ended, it returns token.EOF().
Check z.Errors() once the stream has ended, or at any point if you want to fail-fast without recovering, to detect parse errors.
func (*Tokenizer) NextExcept ¶
NextExcept is like Tokenizer.Next however any tokens matching the given types are suppressed. For example, it is common to ignore whitespace. token.EOF() is never ignored.
Directories ¶
Path | Synopsis |
---|---|
Package filter implements a [transform.Transformer] that performs the Unicode code point filtering preprocessing step defined in [CSS Syntax Module Level 3, section 3.3]:
|
Package filter implements a [transform.Transformer] that performs the Unicode code point filtering preprocessing step defined in [CSS Syntax Module Level 3, section 3.3]: |
Package token defines CSS tokens produced by a tokenizer.
|
Package token defines CSS tokens produced by a tokenizer. |