tell

package module
v0.9.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 20, 2024 License: MIT Imports: 12 Imported by: 1

README

Tell

A yaml-like text format with json-ish values.

Tell: "A yaml-like text format."

# Does this look suspiciously like the yaml overview?
# I have no idea how that could have happened.
What It Is: "A way of describing data containing string, number, and boolean values, 
   as well as collections of those values. As in yaml, collections can be 
   both key-value mappings, and sequences."

What It Is Not: "A subset of yaml."

Can Contain: 
    - "Some javascript-ish values"
    - # with c, go, python, etc. style escape codes.
      [ 5, 2.3, 1e-3, 0x20, "\n", "🐈", "\U0001f408" ]

Related Projects:
  - "YAML"       # https://yaml.org/
  - "JSON"       # http://json.org/
  - "NestedText" # https://nestedtext.org/

Some differences from yaml:

  • Supports a single unified UTF-8 document.
  • String literals must be quoted ( as in json. )
  • Boolean values are only true or false ( as in json. )
  • Multiline string blocks use a custom heredoc syntax.
  • No flow style ( although inline arrays are supported. )
  • No anchors or references.
  • Comments can be captured during decoding, and returned as part of the data.

It isn't intended to be a subset of yaml, but it tries to be close enoughtm to leverage existing yaml syntax highlighting and validation.

Status

The go implementation successfully reads and writes well-formed documents.

PkgGoDev Go Go Report Card

Missing features

  • serialization of structs not supported.
  • arrays should (probably) support nested arrays.
  • arrays should (probably) handle trailing comments.
  • error reporting could use improvement.

see also the issues page.

Usage


// Read a tell document.
func ExampleUnmarshal() {
	var out any
	const msg = `- Hello: "\U0001F30F"`
	if e := tell.Unmarshal([]byte(msg), &out); e != nil {
		panic(e)
	} else {
		fmt.Printf("%#v", out)
	}
	// Output:
	// []interface {}{map[string]interface {}{"Hello:":"🌏"}}
}

// Write a tell document.
func ExampleMarshal() {
	m := map[string]any{
		"Tell":           "A yaml-like text format.",
		"What It Is":     "A way of describing data...",
		"What It Is Not": "A subset of yaml.",
	}
	if out, e := tell.Marshal(m); e != nil {
		panic(e)
	} else {
		fmt.Println(string(out))
	}
	// Output:
	// Tell: "A yaml-like text format."
	// What It Is: "A way of describing data..."
	// What It Is Not: "A subset of yaml."
}

// slightly lower level usage:
func ExampleDocument() {
	str := `true` // some tell document
	// maps/imap contains a slice based ordered map implementation.
	// maps/stdmap generates standard (unordered) go maps.
	// maps/orderedmap uses Ian Coleman's ordered map.
	// ( https://github.com/iancoleman/orderedmap ) 
	doc := decode.NewDocument(imap.Make, notes.DiscardComments())
	// ReadDoc takes a string reader
	if res, e := doc.ReadDoc(strings.NewReader(str)); e != nil {
		panic(e)
	} else {
		fmt.Println(res)
	}
	// Output: true
}

Description

Tell consists of collections of values, along with optional comments. These types are described below.

Collections

  • Document: a collection containing a single value.
  • Sequences: aka lists: an ordered series of one or more values.
  • Mappings: aka ordered dictionaries: relates keys to values.

The individual elements of a sequence, and the pairs of key-values in a mapping, are called the "terms" of the collection.

Documents

Documents are most often text files. UTF8, no byte order marks.

Whitespace is restricted to the ascii space ( 0x20 ) and the ascii linefeed ( 0xa ). The exception is quoted strings which additionally allow horizontal tabs ( ascii 0x9. ) All other control codes are disallowed ( and, so cr/lf is considered an error. )

TBD: should comments allow horizontal tabs?

Values

Any scalar, array, sequence, mapping, or heredoc.

Scalars

  • bool: true, or false.
  • raw string ( backtick ): `Preserves *all* whitespace. Backslashes are backslashes.`
  • trimmed string ( single quotes ): 'Treats newlines as semantic: folding lines together by injecting a single space. Eats all indentation while still preserving trailing whitespace. Backslashes are backslashes.
  • interpreted string ( double quotes ): "Treats newlines as semantic: folding lines together by injecting a single space. Eats all indentation while still preserving trailing whitespace. Backslashes indicate escaped characters."
  • number: 64-bit int or float numbers optionally starting with +/-; floats can have exponents [e|E][|+/-]...; hex values can be specified with 0xnotation. As per json, Inf and NaN are not supported. ( TBD: may expand to support https://go.dev/ref/spec#Integer_literals )
  • null: There is no null keyword. instead, null is implicit where no explicit value was provided.

Scalar strings act like their yaml counterparts. They can span lines, and the "trimmed" and "interpreted" strings use semantic newlines. This means linefeeds in the text are treated as a single space. Only a fully blank line is treated as having a newline. As per yaml: all indentation at the start of line is ignored, and ( although i do not like it ) all trailing space is kept. Also like yaml, a single backslash at the end of a line eliminates any space, joining the following line seamlessly.

The "raw string" type does not exist in yaml. It acts like the Go raw string. It preserves all whitespace in between the opening tick and the closing tick exactly as is.

Escaping: Backslashes in interpreted strings can preceded certain characters to provide special values: a (alert - 0x7) ,b (backspace - 0x8), f (formfeed - 0xc), n (linefeed - 0xa), r (return - 0xd ), t (htab - 0x9), v (vtab - 0xb), \ (backslash - 0x5c), " (doublequote - 0x22), and linefeeds (for joining lines.) For describing explicit unicode points, tell uses the same rules as Go, namely: \x escapes any unprintable ascii chars (bytes less than 128), \u any unprintable code points of less than 3 bytes, and \U for (four?) the rest.

TBD: tell could support css hex colors ( ex. #ffffff ) because comments are defined as "hash followed by a space". still thinking about this one....

Arrays

Arrays use a syntax similar to javascript (ex. [1, 2, ,3] ) except that a comma with no explicit value indicates a null value. Arrays cannot contain collections; heredocs in arrays are discouraged. ( TODO: arrays cannot currently contain other arrays, nor can they contain comments. )

Sequences

Sequences define an ordered list of values. Entries in a sequence start with a dash and whitespace separates the value. Additional entries in the same sequence start on the next line with the same indentation as the previous entry.

  - true
  - false

As in yaml, whitespace after a dash can include newlines. And that rule means nested sequences can start inline. For example, - - 5 is equivalent to the json [[5]].

Unlike yaml, if a value is specified on a line following its dash, the value must start on a column at least two spaces to the right of the dash. ( ie. while newlines and spaces are both whitespace, indentation still matters. ) This rule keeps values aligned.

  - "here"
  - 
    "there"
Mappings

Mappings relate keys to values in an ordered fashion.

Keys for mappings are defined using signatures: a series of one or more words, separated by colons, ending with a colon and whitespace. For example: Hello:there: . The first character of each word must be a (unicode) letter; subsequent characters can include letters, digits, and underscores ( TBD: this is somewhat arbitrary; what does yaml do? )

For the same reason that nested sequences can appear inline, mappings can. However, yaml doesn't allow this and it's probably bad style. For example: Key: Nested: "some value" is equivalent to the json {"Key:": {"Nested:": "some value" }. Like sequences, if the value of a mapping appears on a following line, two spaces of indentation are required.

Note: Tapestry wants those trailing colons. In this implementation the interpretation of key: is therefore "key:" not "key". This feels like an implementation detail, and could be an exposed as an option.

Heredocs

Heredocs exist both to capture newlines, and to control the leading indentation of strings. They can appear anywhere a scalar string can, except not within inline arrays. Unlike the scalar strings: newlines are interpreted as actual newlines. Indentation is controlled by the indentation of the closing quotes ( or closing tag. )

There are three heredoc types, one for each scalar string type:

  1. raw (```) using triple backticks. Backslashes are backslashes; the final newline of the heredoc is preserved.
  2. trimmed (''') using triple single quotes. Backslashes are backslashes; the final newline gets eaten.
  3. interpreted (""") using triple double quotes. Backslashes follow the same rules as interpreted scalar strings. Quotes don't need to be escaped in heredocs (\") but can be. The final newline is preserved by default, but a backslash on the end of the final line can eat it.

The position of the closing heredoc tag controls the overall indentation. Any text to the left of the closing tag is an error. All three kinds can define an custom tag.

  - """
        i am an interpreted heredoc.
          this line has two extra spaces in front.
        lines are not automatically folded together.
        but this line ends with a backslash, \
        so it folds seamless into this line.
        the newline following this line is preserved.
        """

  - ```<<<END
    i am a raw heredoc with a custom closing tag.
    all three heredoc types support custom closing tags.
    raw strings preserve whitespace, including the newline after this line.
    END
    
  - '''
    this here is a trimmed doc. 
    backslashes \ are backslashes.
    trimmed heredocs eat the final newline.
    '''

Custom end tags

I like the way markdown allows syntax coloring for block quotes if there's a filetype after the quotes. ( for example: ```go ) Many implementations also nicely ignore any text after the filetype, and so any opening like ```go something something, even if not technically legal, still works okay.

With that in mind, tell uses a redirection marker (<<<) to define a custom end tag. ( Triple to match the quotes. ) The redirection allows an author to still include a filetype, or not. For example: ```go <<<END. Or, if a filetype isn't desired, just: ```<<<END.

Maybe in some far off distant age, tell-aware syntax coloring could display the heredoc with fancy colors.

Yaml compatibility

Because tell relies on existing yaml syntax validation ( and color schemes ), there is one additional heredoc type provided for compatibility. It opens with the yaml pipe (|), but ends with one of the tell triple quotes.

  - |
        i am a heredoc starting with a pipe (|) for compatibility.
        if i end with double quotes, then backslashes are interpreted \
        otherwise, they are not. raw and trimmed strings are also still the same.
        and, indentation is controlled by the position of the closing quotes.
        ( just like all heredocs. )
        """

It looks a bit odd, but it allows yaml validation to succeed. ( Until tell conquers the world or something, and has validators and syntax highlighting in all the best editors. Then the pipe syntax can be deprecated. )

None of the "chomping" indicators are allowed here ( the triple quote styles subsume that functionality. ) And, neither is the "folded style" ( since that's equivalent to the scalar string functionality. ) Custom tags aren't allowed either, unfortunately, because nothing is allowed to follow the pipe.

Why pipe? In yaml, the pipe preserves newlines, and its default chomping also preserves the final newline. That's enough context enough so that -- if this was yaml -- the resulting string could still be evaluated to produce a tell-like result.

Comments

Hate me forever, comments are significant, they must follow the indentation rules of the document, and -- in this implementation -- they can be accessed directly as part of the data.

Similar to yaml, tell comments begin with the # hash, but must be followed by a space. They continue to the end of their line. Comments cannot appear within a scalar.

When comments are preserved, collections are one-indexed. This means no special types are needed to store tell data: only native go maps and slices. Different implementations could handle this in other ways. The basic point is that comments are both well-defined and easily accessible.

Rationale: Comments are a good mechanism for communicating human intent. In Tapestry, story files can be edited by hand, visually edited using blockly, or even extracted for documentation. Therefore, it's important to preserve an author's comments across different transformations. ( This was one of the motivations for creating tell. )

The readme in package note gets into all the specifics.

Version History

0.9.1 -> 0.9.2: improved error reporting for package decoder.

0.9.0 -> 0.9.1: improved error reporting for package charm.

0.8.1 -> 0.9.0: changes string folding; adds new string types.

  • scalar strings can now span lines. they follow the same rules as yaml's strings.
  • heredocs no longer fold lines since that's the role of the scalar strings.
  • for yaml compatibility, heredocs can optionally start with a pipe (|).
  • bug fix for the charm utility function ParseEof() ( affected tapestry, but not tell. )
  • fixes for various staticcheck warnings

0.8.0 -> 0.8.1:

  • catch tabs in whitespace
  • bug fix: report better errors when unable to decode a mistyped boolean literal (ex. truex )

0.7.0 -> 0.8.0:

  • Changes the encoder's interface to support customizing the comment style of mappings and sequences independently.
  • bug fix: when specifying map values: allow sequences to start at the same indentation as the key and allow a new map term to start after the sequence ends. ( previously, it generated an error, and an additional indentation was required. ) For example:
  - First:  # the value of First is a sequence containing "yes"
    - "yes" 
    Second: # Second is an additional entry in the same map as First
    - "okay" 
  • bug fix: for all other values, an indentation greater than the key is required. For example:
  First:
  "this is an error."

0.6 -> 0.7.0:

  • replace comment raw string buffer usage with an opaque object ( to make any future changes more friendly )

0.5 -> 0.6:

  • bug fixes, and re-encoding of comments

0.4 -> 0.5:

  • simplify comment handling

0.3 -> 0.4:

  • adopt the golang (package stringconv) rules for escaping strings.
  • simplify the attribution of comments in the space between a key (or dash) and its value.
  • change the decoder api to support custom sequences, mirroring custom maps; package 'maps' is now more generically package 'collect'.
  • encoding/decoding heredocs for multiline strings
  • encoding/decoding of arrays; ( encoding will write empty collections as arrays; future: a heuristic to determine what should be encoded as an array, vs. sequence. )
  • the original idea for arrays was to use a bare comma full-stop format. switched to square brackets because they are easier to decode, they can support nesting, and are going to be more familiar to most users. ( plus, full stop (.) is tiny and easy to miss when looking at documents. )

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Marshal

func Marshal(v any) (ret []byte, err error)

Marshal returns a tell document representing the passed value.

It traverses the passed type recursively to produce tell data. If a value implements encode.Mapper or encode.Sequencer, Marshal will use their iterators to serialize their contents.

Otherwise, Marshal() uses the following rules:

Boolean values are encoded as either 'true' or 'false'.

Integer and floating point values are encoded as per go's strconv.FormatInt, strconv.FormatUnit, strconv.FormatFloat except int16 and uint16 are encoded as hex values starting with '0x'. NaN, infinities, and complex numbers will return an error.

Strings are encoded as per strconv.Quote

Arrays and slice values are encoded as tell sequences. []byte is not handled in any special way. ( fix? )

Maps with string keys are encoded as tell mappings; sorted by string. other key types return an error.

Pointers and interface values are encoded in place as the value they represent. Cyclic data is not handled and will never return. ( fix? )

Any other types will error ( ie. functions, channels, and structs )

All documents end with a newline.

Example

Write a tell document.

package main

import (
	"fmt"

	"github.com/ionous/tell"
)

func main() {
	m := map[string]any{
		"Tell":           "A yaml-like text format.",
		"What It Is":     "A way of describing data...",
		"What It Is Not": "A subset of yaml.",
	}
	if out, e := tell.Marshal(m); e != nil {
		panic(e)
	} else {
		fmt.Println(string(out))
	}
}
Output:

Tell: "A yaml-like text format."
What It Is: "A way of describing data..."
What It Is Not: "A subset of yaml."

func Unmarshal

func Unmarshal(in []byte, pv any) (err error)

Unmarshal from a tell formatted document and store the result into the value pointed to by pv.

Permissible values include: bool, floating point, signed and unsigned integers, maps and slices.

For more flexibility, see package decode

Example

Read a tell document.

package main

import (
	"fmt"

	"github.com/ionous/tell"
)

func main() {
	var out any
	const msg = `- Hello: "\U0001F30F"`
	if e := tell.Unmarshal([]byte(msg), &out); e != nil {
		panic(e)
	} else {
		fmt.Printf("%#v", out)
	}
}
Output:

[]interface {}{map[string]interface {}{"Hello:":"🌏"}}

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder - follows the pattern of encoding/json

func NewDecoder

func NewDecoder(src io.Reader) *Decoder

NewDecoder -

func (*Decoder) Decode

func (dec *Decoder) Decode(pv any) (err error)

read a tell document from the stream configured in NewDecoder, and store the result at the value pointed by pv.

func (*Decoder) SetMapper

func (d *Decoder) SetMapper(maps collect.MapFactory)

control the creation of mappings for the upcoming Decode. the default is to create native maps ( via stdmap.Make )

func (*Decoder) SetSequencer

func (d *Decoder) SetSequencer(seq collect.SequenceFactory)

control the creation of sequences for the upcoming Decode. the default is to create native slices ( via stdseq.Make )

func (*Decoder) UseFloats

func (d *Decoder) UseFloats()

configure the upcoming Decode to produce only floating point numbers. otherwise it will produce int for integers, and unit for hex specifications.

func (*Decoder) UseNotes

func (d *Decoder) UseNotes(b *note.Book)

pass a valid target for collecting document level comments during an upcoming call to Decode. the default behavior is to discard comments. ( passing nil will also discard them. )

type Encoder

type Encoder encode.Encoder

Encoder - follows the pattern of encoding/json

func NewEncoder

func NewEncoder(w io.Writer) *Encoder

NewEncoder -

func (*Encoder) Encode

func (enc *Encoder) Encode(v any) (err error)

Encode - serializes the passed document to the encoder's stream followed by a newline character. tell doesnt support multiple documents in the same file, but this interface doesn't stop callers from trying

func (*Encoder) SetMapper

func (enc *Encoder) SetMapper(n encode.StartCollection, c encode.Commenting) *Encoder

configure how mappings are encoded returns self for chaining

func (*Encoder) SetSequencer

func (enc *Encoder) SetSequencer(n encode.StartCollection, c encode.Commenting) *Encoder

configure how sequences are encoded returns self for chaining

type InvalidUnmarshalError

type InvalidUnmarshalError struct {
	Type r.Type
}

As per package encoding/json, describes an invalid argument passed to Unmarshal or Decode. Arguments must be non-nil pointers

func (*InvalidUnmarshalError) Error

func (e *InvalidUnmarshalError) Error() (ret string)

Directories

Path Synopsis
Package charm provides common utilities for parsing documents using hand-rolled hierarchical state machines.
Package charm provides common utilities for parsing documents using hand-rolled hierarchical state machines.
Package charmed provides common useful states for document parsing
Package charmed provides common useful states for document parsing
orderedmap
package orderedmap implements tell maps interface for ian coleman's ordered map implementation https://github.com/iancoleman/orderedmap
package orderedmap implements tell maps interface for ian coleman's ordered map implementation https://github.com/iancoleman/orderedmap

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL