Documentation ¶
Index ¶
- Variables
- func StringifyTokenSequence(seq []Token) string
- func TokenAssemble(na datamodel.NodeAssembler, tr TokenReader, budget int64) error
- func TokenWalk(n datamodel.Node, visitFn func(tk *Token) error) error
- type ErrMalformedTokenSequence
- type NodeTokenizer
- type Token
- type TokenAssembler
- type TokenKind
- type TokenReader
Constants ¶
This section is empty.
Variables ¶
var TokenWalkSkip = errors.New("token walk: skip")
Functions ¶
func StringifyTokenSequence ¶
StringifyTokenSequence is utility function often handy for testing. (Doing a diff on strings of tokens gives very good reports for minimal effort.)
func TokenAssemble ¶
func TokenAssemble(na datamodel.NodeAssembler, tr TokenReader, budget int64) error
TokenAssemble takes an datamodel.NodeAssembler and a TokenReader, and repeatedly pumps the TokenReader for tokens and feeds their data into the datamodel.NodeAssembler until it finishes a complete value.
To compare and contrast to other token oriented tools: TokenAssemble does the same direction of information transfer as the TokenAssembler gadget does, but TokenAssemble moves completely through a value in one step, whereas the TokenAssembler accepts tokens pumped into it one step at a time.
TokenAssemble does not enforce the "map keys must be strings" rule which is present in the Data Model; it will also happily do even recursive structures in map keys, meaning it can be used when handling schema values like maps with complex keys.
func TokenWalk ¶
TokenWalk walks an ipld Node and repeatedly calls the visitFn, calling it once for every "token" yielded by the walk. Every map and list is yielded as a token at their beginning, and another token when they're finished; every scalar value (strings, bools, bytes, ints, etc) is yielded as a single token.
The token pointer given to the visitFn will be identical on every call, but the data it contains will vary. The token may contain invalid data that is leftover from previous calls in some of its union fields; correct behavior requires looking at the token's Kind field before handling any of its other fields.
If any error is returned by the visitFn, it will cause the walk to halt, and TokenWalk will return that error. However, if the error is the value TokenWalkSkip, and it's been returned when visitFn was called with a MapOpen or ListOpen token, the walk will skip forward over that entire map or list, and continue (with the next token being the close token that complements the open token). Returning a TokenWalkSkip when the token was any of the scalar kinds (e.g. anything other than a MapOpen or a ListOpen) has no effect.
TokenAssembler is the rough dual of TokenWalk.
Types ¶
type ErrMalformedTokenSequence ¶
type ErrMalformedTokenSequence struct {
Detail string
}
func (ErrMalformedTokenSequence) Error ¶
func (e ErrMalformedTokenSequence) Error() string
type NodeTokenizer ¶
type NodeTokenizer struct {
// contains filtered or unexported fields
}
func (*NodeTokenizer) Initialize ¶
func (nt *NodeTokenizer) Initialize(n datamodel.Node)
func (*NodeTokenizer) ReadToken ¶
func (nt *NodeTokenizer) ReadToken() (next *Token, err error)
ReadToken fits the TokenReader functional interface, and so may be used anywhere a TokenReader is required.
type Token ¶
type Token struct { Kind TokenKind Length int64 // Present for MapOpen or ListOpen. May be -1 for "unknown" (e.g. a json tokenizer will yield this). Bool bool // Value. Union: only has meaning if Kind is TokenKind_Bool. Int int64 // Value. Union: only has meaning if Kind is TokenKind_Int. Float float64 // Value. Union: only has meaning if Kind is TokenKind_Float. Str string // Value. Union: only has meaning if Kind is TokenKind_String. ('Str' rather than 'String' to avoid collision with method.) Bytes []byte // Value. Union: only has meaning if Kind is TokenKind_Bytes. Link datamodel.Link // Value. Union: only has meaning if Kind is TokenKind_Link. Node datamodel.Node // Direct pointer to the original data, if this token is used to communicate data during a walk of existing in-memory data. Absent when token is being used during deserialization. // contains filtered or unexported fields }
func (*Token) Normalize ¶
func (tk *Token) Normalize()
Normalize sets any value in the token to its zero value if it's not applicable for the token's kind. E.g., if the token kind is string, the float, bytes, and etc fields are all zero'd. Path and offset progress information is left unmodified. This is sometimes helpful in writing test fixtures and equality assertions.
type TokenAssembler ¶
type TokenAssembler struct {
// contains filtered or unexported fields
}
func (*TokenAssembler) Initialize ¶
func (ta *TokenAssembler) Initialize(na datamodel.NodeAssembler, budget int64)
func (*TokenAssembler) Process ¶
func (ta *TokenAssembler) Process(tk *Token) (err error)
Process takes a Token pointer as an argument. (Notice how this function happens to match the definition of the visitFn that's usable as an argument to TokenWalk.) The token argument can be understood to be "borrowed" for the duration of the Process call, but will not be mutated. The use of a pointer here is so that a single Token can be reused by multiple calls, avoiding unnecessary allocations.
Note that Process does very little sanity checking of token sequences itself, mostly handing information to the NodeAssemblers directly, which presumably will reject the data if it is out of line. The NodeAssembler this TokenAssembler is wrapping should already be enforcing the relevant logical rules, so it is not useful for TokenAssembler.Process to attempt to duplicate those checks; TokenAssembler.Process will also return any errors from the NodeAssembler without attempting to enforce a pattern on those errors. In particular, TokenAssembler.Process does not check if every MapOpen is paired with a MapClose; it does not check if every ListOpen is paired with a ListClose; and it does not check if the token stream is continuing after all open recursives have been closed. TODO: review this documentation; more of these checks turn out necessary anyway than originally expected.
type TokenKind ¶
type TokenKind uint8
const ( TokenKind_MapOpen TokenKind = '{' TokenKind_MapClose TokenKind = '}' TokenKind_ListOpen TokenKind = '[' TokenKind_ListClose TokenKind = ']' TokenKind_Null TokenKind = '0' TokenKind_Bool TokenKind = 'b' TokenKind_Int TokenKind = 'i' TokenKind_Float TokenKind = 'f' TokenKind_String TokenKind = 's' TokenKind_Bytes TokenKind = 'x' TokenKind_Link TokenKind = '/' )
type TokenReader ¶
A TokenReader can be produced from any datamodel.Node using NodeTokenizer. TokenReader are also commonly implemented by codec packages, wherein they're created over a serial data stream and tokenize that stream when pumped.
TokenReader implementations are encouraged to yield the same token pointer repeatedly, just varying the contents of the value, in order to avoid unnecessary allocations.
A 'budget' parameter must be provided to a TokenReader as a pointer to an integer. The TokenReader should limit how much memory it uses according to the budget remaining. (The budget is considered to be roughly in units of bytes, but can be treated as an approximation.) The budget should primarily be managed by the caller of the TokenReader (e.g., after the TokenReader returns a 20 byte string, the caller should decrement the budget by 20), but a TokenReader may also do its own decrements to the budget if some operations are particularly costly and the TokenReader wants this to be accounted for. The budget may be ignored if the TokenReader just yielding access to already in-memory information; the main intent of the budget is to avoid resource exhausting when bringing new data into program memory.