codectools

package
v0.12.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 8, 2021 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var TokenWalkSkip = errors.New("token walk: skip")

Functions

func StringifyTokenSequence

func StringifyTokenSequence(seq []Token) string

StringifyTokenSequence is utility function often handy for testing. (Doing a diff on strings of tokens gives very good reports for minimal effort.)

func TokenAssemble

func TokenAssemble(na datamodel.NodeAssembler, tr TokenReader, budget int64) error

TokenAssemble takes an datamodel.NodeAssembler and a TokenReader, and repeatedly pumps the TokenReader for tokens and feeds their data into the datamodel.NodeAssembler until it finishes a complete value.

To compare and contrast to other token oriented tools: TokenAssemble does the same direction of information transfer as the TokenAssembler gadget does, but TokenAssemble moves completely through a value in one step, whereas the TokenAssembler accepts tokens pumped into it one step at a time.

TokenAssemble does not enforce the "map keys must be strings" rule which is present in the Data Model; it will also happily do even recursive structures in map keys, meaning it can be used when handling schema values like maps with complex keys.

func TokenWalk

func TokenWalk(n datamodel.Node, visitFn func(tk *Token) error) error

TokenWalk walks an ipld Node and repeatedly calls the visitFn, calling it once for every "token" yielded by the walk. Every map and list is yielded as a token at their beginning, and another token when they're finished; every scalar value (strings, bools, bytes, ints, etc) is yielded as a single token.

The token pointer given to the visitFn will be identical on every call, but the data it contains will vary. The token may contain invalid data that is leftover from previous calls in some of its union fields; correct behavior requires looking at the token's Kind field before handling any of its other fields.

If any error is returned by the visitFn, it will cause the walk to halt, and TokenWalk will return that error. However, if the error is the value TokenWalkSkip, and it's been returned when visitFn was called with a MapOpen or ListOpen token, the walk will skip forward over that entire map or list, and continue (with the next token being the close token that complements the open token). Returning a TokenWalkSkip when the token was any of the scalar kinds (e.g. anything other than a MapOpen or a ListOpen) has no effect.

TokenAssembler is the rough dual of TokenWalk.

Types

type ErrMalformedTokenSequence

type ErrMalformedTokenSequence struct {
	Detail string
}

func (ErrMalformedTokenSequence) Error

type NodeTokenizer

type NodeTokenizer struct {
	// contains filtered or unexported fields
}

func (*NodeTokenizer) Initialize

func (nt *NodeTokenizer) Initialize(n datamodel.Node)

func (*NodeTokenizer) ReadToken

func (nt *NodeTokenizer) ReadToken() (next *Token, err error)

ReadToken fits the TokenReader functional interface, and so may be used anywhere a TokenReader is required.

type Token

type Token struct {
	Kind TokenKind

	Length int64          // Present for MapOpen or ListOpen.  May be -1 for "unknown" (e.g. a json tokenizer will yield this).
	Bool   bool           // Value.  Union: only has meaning if Kind is TokenKind_Bool.
	Int    int64          // Value.  Union: only has meaning if Kind is TokenKind_Int.
	Float  float64        // Value.  Union: only has meaning if Kind is TokenKind_Float.
	Str    string         // Value.  Union: only has meaning if Kind is TokenKind_String.  ('Str' rather than 'String' to avoid collision with method.)
	Bytes  []byte         // Value.  Union: only has meaning if Kind is TokenKind_Bytes.
	Link   datamodel.Link // Value.  Union: only has meaning if Kind is TokenKind_Link.

	Node datamodel.Node // Direct pointer to the original data, if this token is used to communicate data during a walk of existing in-memory data.  Absent when token is being used during deserialization.
	// contains filtered or unexported fields
}

func (*Token) Normalize

func (tk *Token) Normalize()

Normalize sets any value in the token to its zero value if it's not applicable for the token's kind. E.g., if the token kind is string, the float, bytes, and etc fields are all zero'd. Path and offset progress information is left unmodified. This is sometimes helpful in writing test fixtures and equality assertions.

func (Token) String

func (tk Token) String() string

type TokenAssembler

type TokenAssembler struct {
	// contains filtered or unexported fields
}

func (*TokenAssembler) Initialize

func (ta *TokenAssembler) Initialize(na datamodel.NodeAssembler, budget int64)

func (*TokenAssembler) Process

func (ta *TokenAssembler) Process(tk *Token) (err error)

Process takes a Token pointer as an argument. (Notice how this function happens to match the definition of the visitFn that's usable as an argument to TokenWalk.) The token argument can be understood to be "borrowed" for the duration of the Process call, but will not be mutated. The use of a pointer here is so that a single Token can be reused by multiple calls, avoiding unnecessary allocations.

Note that Process does very little sanity checking of token sequences itself, mostly handing information to the NodeAssemblers directly, which presumably will reject the data if it is out of line. The NodeAssembler this TokenAssembler is wrapping should already be enforcing the relevant logical rules, so it is not useful for TokenAssembler.Process to attempt to duplicate those checks; TokenAssembler.Process will also return any errors from the NodeAssembler without attempting to enforce a pattern on those errors. In particular, TokenAssembler.Process does not check if every MapOpen is paired with a MapClose; it does not check if every ListOpen is paired with a ListClose; and it does not check if the token stream is continuing after all open recursives have been closed. TODO: review this documentation; more of these checks turn out necessary anyway than originally expected.

type TokenKind

type TokenKind uint8
const (
	TokenKind_MapOpen   TokenKind = '{'
	TokenKind_MapClose  TokenKind = '}'
	TokenKind_ListOpen  TokenKind = '['
	TokenKind_ListClose TokenKind = ']'
	TokenKind_Null      TokenKind = '0'
	TokenKind_Bool      TokenKind = 'b'
	TokenKind_Int       TokenKind = 'i'
	TokenKind_Float     TokenKind = 'f'
	TokenKind_String    TokenKind = 's'
	TokenKind_Bytes     TokenKind = 'x'
	TokenKind_Link      TokenKind = '/'
)

type TokenReader

type TokenReader func(budget *int64) (next *Token, err error)

A TokenReader can be produced from any datamodel.Node using NodeTokenizer. TokenReader are also commonly implemented by codec packages, wherein they're created over a serial data stream and tokenize that stream when pumped.

TokenReader implementations are encouraged to yield the same token pointer repeatedly, just varying the contents of the value, in order to avoid unnecessary allocations.

A 'budget' parameter must be provided to a TokenReader as a pointer to an integer. The TokenReader should limit how much memory it uses according to the budget remaining. (The budget is considered to be roughly in units of bytes, but can be treated as an approximation.) The budget should primarily be managed by the caller of the TokenReader (e.g., after the TokenReader returns a 20 byte string, the caller should decrement the budget by 20), but a TokenReader may also do its own decrements to the budget if some operations are particularly costly and the TokenReader wants this to be accounted for. The budget may be ignored if the TokenReader just yielding access to already in-memory information; the main intent of the budget is to avoid resource exhausting when bringing new data into program memory.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL