tell

package module
v0.8.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2024 License: MIT Imports: 12 Imported by: 1

README

Tell

A yaml-like text format with json-ish values.

Tell: "A yaml-like text format."

# Does this look suspiciously like the yaml overview?
# I have no idea how that could have happened.
What It Is: """
   A way of describing data containing string, number, and boolean values, 
   and collections of those values. As in yaml, collections can be 
   both key-value mappings, and sequences of values.
   """

What It Is Not: "A subset of yaml."

Can Contain: 
    - "Some javascript-ish values"
    - [ 5, 2.3, 1e-3, 0x20, "\n", "🐈", "\U0001f408" ]
      # supports c,go,python,etc. style escape codes

Related Projects:
  - "YAML"       # https://yaml.org/
  - "JSON"       # http://json.org/
  - "NestedText" # https://nestedtext.org/

Some differences from yaml:

  • Documents hold a single value.
  • String literals must be quoted ( as in json. )
  • Multiline strings use a custom heredoc syntax.
  • No flow style ( although there is an array syntax. )
  • No anchors or references.
  • Comments can be captured during decoding, and returned as part of the data.

It isn't intended to be a subset of yaml, but it tries to be close enough to leverage some syntax highlighting in markdown, editors, etc.

Status

The go implementation successfully reads and writes well-formed documents.

PkgGoDev Go Go Report Card

Missing features

  • serialization of structs not supported ( only maps, slices, and primitive values. )
  • arrays should (probably) support nested arrays;
  • arrays should (probably) support comments.
  • error reporting could use improvement.

see also the issues page.

Usage


// Read a tell document.
func ExampleUnmarshal() {
	var out any
	const msg = `- Hello: "\U0001F30F"`
	if e := tell.Unmarshal([]byte(msg), &out); e != nil {
		panic(e)
	} else {
		fmt.Printf("%#v", out)
	}
	// Output:
	// []interface {}{map[string]interface {}{"Hello:":"🌏"}}
}

// Write a tell document.
func ExampleMarshal() {
	m := map[string]any{
		"Tell":           "A yaml-like text format.",
		"What It Is":     "A way of describing data...",
		"What It Is Not": "A subset of yaml.",
	}
	if out, e := tell.Marshal(m); e != nil {
		panic(e)
	} else {
		fmt.Println(string(out))
	}
	// Output:
	// Tell: "A yaml-like text format."
	// What It Is: "A way of describing data..."
	// What It Is Not: "A subset of yaml."
}

// slightly lower level usage:
func ExampleDocument() {
	str := `true` // some tell document
	// maps/imap contains a slice based ordered map implementation.
	// maps/stdmap generates standard (unordered) go maps.
	// maps/orderedmap uses Ian Coleman's ordered map implementation.
	doc := decode.NewDocument(imap.Make, notes.DiscardComments())
	// ReadDoc takes a string reader
	if res, e := doc.ReadDoc(strings.NewReader(str)); e != nil {
		panic(e)
	} else {
		fmt.Println(res)
	}
	// Output: true
}

Description

Tell consists of collections of values, along with optional comments. These types are described below.

Collections

  • Document: a collection containing a single value.
  • Sequences: aka lists: an ordered series of one or more values.
  • Mappings: aka ordered dictionaries: relates keys to values.

The individual elements of a sequence, and pairs of key-values in a mapping are called the "terms" of the collection.

Documents

Documents are most often text files. UTF8, no byte order marks.

"Structural whitespace" in documents is restricted to the ascii space and the ascii linefeed. Quoted strings can have horizontal tabs; single line strings, for perhaps obvious reasons, can't contain linefeeds. All other Unicode control codes are disallowed ( and, so cr/lf is considered an error. )

Values

Any scalar, array, sequence, mapping, or heredoc.

Scalars

  • bool: true, or false.
  • raw string ( backtick ): `backslashes are backslashes.`
  • interpreted string ( double quotes ): "backslashes indicate escaped characters."
  • number: 64-bit int or float numbers optionally starting with +/-; floats can have exponents [e|E][|+/-]...; hex values can be specified with 0xnotation. Like json, but unlike yaml: Inf and NaN are not supported. ( may expand to support https://go.dev/ref/spec#Integer_literals, etc. as needed. )

A scalar value always appears on a single line. There is no null keyword, null is implicit where no explicit value was provided. Only heredocs support multi-line strings. ( Comments are defined as a hash followed by a space in order to maybe support css style hex colors, ie. #ffffff. Still thinking about this one. )

Escaping: The individually escaped characters are: a ,b ,f ,n ,r ,t ,v ,\ ,". And, for describing explicit unicode points, tell uses the same rules as Go, namely: \x escapes for any unprintable ascii chars (bytes less than 128), \u for unprintable code points of less than 3 bytes, and \U for (four?) the rest.

Arrays

Arrays use a syntax similar to javascript (ex. [1, 2, ,3] ) except that a comma with no explicit value indicates a null value. Arrays cannot contain collections; heredocs in arrays are discouraged. ( fix: Currently, arrays cannot contain other arrays, nor can they contain comments. )

Sequences

Sequences define an ordered list of values. Entries in a sequence start with a dash, whitespace separates the value. Additional entries in the same sequence start on the next line with the same indentation as the previous entry.

  - true
  - false

As in yaml, whitespace after the dash can include newlines. And, the lack of differentiation between newline and space implies that nested sequences can be declared on one line. For example, - - 5 is equivalent to the json [[5]].

Unlike yaml, if a value is specified on a line following its dash, the value must start on a column two spaces to the right of the dash. ( ie. while newlines and spaces are both whitespace, indentation still matters. ) This rule keeps values aligned.

  - "here"
  - 
    "there"
Mappings

Mappings relate keys to values in an ordered fashion.

Keys for mappings are defined using signatures: a series of one or more words, separated by colons, ending with a colon and whitespace. For example: Hello:there: . The first character of each word must be a (unicode) letter; subsequent characters can include letters, digits, and underscores ( TBD: this is somewhat arbitrary; what does yaml do? )

For the same reason that nested sequences can appear inline, mappings can. However, yaml doesn't allow this and it's probably bad style. For example: Key: Nested: "some value" is equivalent to the json {"Key:": {"Nested:": "some value" }. Like sequences, if the value of a mapping appears on a following line, two spaces of indentation are required.

Note: Tapestry wants those colons. In this implementation the interpretation of key: is therefore "key:" not "key". This feels like an implementation detail, and could be an exposed as an option.

Heredocs

Heredocs provide multi-line strings wherever a scalar string is permitted ( but not in an array, dear god. )

There are two types, one for each string type:

  1. raw, triple backticks: newlines are structure; backslashes are backslashes.
  2. interpreted, triple quotes: newlines act as word separators; backslashes are special; double newlines provide structure; single quotes don't need to be escaped ( but can be. )

Whitespace in both string types is influenced by the position of the closing heredoc marker. Therefore, any text to the left of the closing marker is an error. Both string types can define an custom tag to end the heredoc ( even if, unfortunately, that breaks yaml syntax highlighting. )

(TBD: if documents should be trimmed of trailing whitespace: many editing programs are likely to do this by default. however, that would make intentional trailing whitespace in raw heredocs impossible.)

  - """
    i am a heredoc interpreted string.
    these lines are run together
    each separated by a single space.
     this sentence has an extra space in front.

    a blank line ^ becomes a single newline.
    trailing spaces in that line, or any line, are eaten.
    """

  - """
    this interpreted string starts with
1234 spaces. ( due to the position of the closing triple-quotes. )
"""

  - ```<<<END
    i am a heredoc raw string using a custom closing tag.
     this line has a single leading space.

    a blank line ^ is a blank line
    because raw strings preserve any and all whitespace, except:
    the starting and ending markers don't introduce newlines.
    ( so this line doesn't end with a newline. )
    END

Note that the interpreted heredoc is different from some more common implementations. The newline here exists for formatting the tell document, not the string.

"""
hello
doc
"""

yields: hello doc

while:

"""
hello

line
"""

yields:

hello 
line

Custom end tags

I quite like the way some markdown implementations provide syntax coloring of triple quoted strings when there's a filetype after the quotes. ( for example: ```C++ ) Many of them, also nicely ignore any text after the filetype, and so lines like ```C++ something something, even if maybe not technically legal, still provide good syntax highlighting.

With that in mind, Tell uses a triple less-than redirection marker (<<<) to define a custom end tag. ( Triple to match the quotes. ) The redirection marker allows an author to have a filetype, or not. For example: ```C++ <<<END, or if no filetype is desired: ```<<<END.

Maybe in some far off distant age, tell-aware syntax coloring could display the heredoc with fancy colors.

Comments

Hate me forever, comments are preserved, are significant, and introduce their own indentation rules.

Rationale: Comments are a good mechanism for communicating human intent. In Tapestry, story files can be edited by hand, visually edited using blockly, or even extracted to present documentation; therefore, it's important to preserve an author's comments across different transformations. ( This was one of the motivations for creating tell. )

Similar to yaml, tell comments begin with the # hash, followed by a space, and continue to the end of a line. Comments cannot appear within a scalar ( TBD: comma separated arrays split across lines might be an exception. )

This implementation stores the comments for a collection in a string called a "comment block". Each collection has its own comment block stored in the zeroth element of its sequence, the blank key of its mappings, or the comment field of its document.

When comments are preserved, collections are one-indexed. On the bright side, this means that no special types are needed to store tell data: just native go maps and slices.

The readme in package note gets into all the specifics.

Changes

0.8.0 -> 0.8.1:

  • catch tabs in whitespace
  • bug fix: report better errors when unable to decode a mistyped boolean literal (ex. truex )

0.7.0 -> 0.8.0:

  • Changes the encoder's interface to support customizing the comment style of mappings and sequences independently.
  • bug fix: when specifying map values: allow sequences to start at the same indentation as the key and allow a new map term to start after the sequence ends. ( previously, it generated an error, and an additional indentation was required. ) For example:
  - First:  # the value of First is a sequence containing "yes"
    - "yes" 
    Second: # Second is an additional entry in the same map as First
    - "okay" 
  • bug fix: for all other values, an indentation greater than the key is required. For example:
  First:
  "this is an error."

0.6 -> 0.7.0:

  • replace comment raw string buffer usage with an opaque object ( to make any future changes more friendly )

0.5 -> 0.6:

  • bug fixes, and re-encoding of comments

0.4 -> 0.5:

  • simplify comment handling

0.3 -> 0.4:

  • adopt the golang (package stringconv) rules for escaping strings.
  • simplify the attribution of comments in the space between a key (or dash) and its value.
  • change the decoder api to support custom sequences, mirroring custom maps; package 'maps' is now more generically package 'collect'.
  • encoding/decoding heredocs for multiline strings
  • encoding/decoding of arrays; ( encoding will write empty collections as arrays; future: a heuristic to determine what should be encoded as an array, vs. sequence. )
  • the original idea for arrays was to use a bare comma full-stop format. switched to square brackets because they are easier to decode, they can support nesting, and are going to be more familiar to most users. ( plus, full stop (.) is tiny and easy to miss when looking at documents. )

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Marshal

func Marshal(v any) (ret []byte, err error)

Marshal returns a tell document representing the passed value.

It traverses the passed type recursively to produce tell data. If a value implements encode.Mapper or encode.Sequencer, Marshal will use their iterators to serialize their contents.

Otherwise, Marshal() uses the following rules:

Boolean values are encoded as either 'true' or 'false'.

Integer and floating point values are encoded as per go's strconv.FormatInt, strconv.FormatUnit, strconv.FormatFloat except int16 and uint16 are encoded as hex values starting with '0x'. NaN, infinities, and complex numbers will return an error.

Strings are encoded as per strconv.Quote

Arrays and slice values are encoded as tell sequences. []byte is not handled in any special way. ( fix? )

Maps with string keys are encoded as tell mappings; sorted by string. other key types return an error.

Pointers and interface values are encoded in place as the value they represent. Cyclic data is not handled and will never return. ( fix? )

Any other types will error ( ie. functions, channels, and structs )

All documents end with a newline.

Example

Write a tell document.

package main

import (
	"fmt"

	"github.com/ionous/tell"
)

func main() {
	m := map[string]any{
		"Tell":           "A yaml-like text format.",
		"What It Is":     "A way of describing data...",
		"What It Is Not": "A subset of yaml.",
	}
	if out, e := tell.Marshal(m); e != nil {
		panic(e)
	} else {
		fmt.Println(string(out))
	}
}
Output:

Tell: "A yaml-like text format."
What It Is: "A way of describing data..."
What It Is Not: "A subset of yaml."

func Unmarshal

func Unmarshal(in []byte, pv any) (err error)

Unmarshal from a tell formatted document and store the result into the value pointed to by pv.

Permissible values include: bool, floating point, signed and unsigned integers, maps and slices.

For more flexibility, see package decode

Example

Read a tell document.

package main

import (
	"fmt"

	"github.com/ionous/tell"
)

func main() {
	var out any
	const msg = `- Hello: "\U0001F30F"`
	if e := tell.Unmarshal([]byte(msg), &out); e != nil {
		panic(e)
	} else {
		fmt.Printf("%#v", out)
	}
}
Output:

[]interface {}{map[string]interface {}{"Hello:":"🌏"}}

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder - follows the pattern of encoding/json

func NewDecoder

func NewDecoder(src io.Reader) *Decoder

NewDecoder -

func (*Decoder) Decode

func (dec *Decoder) Decode(pv any) (err error)

read a tell document from the stream configured in NewDecoder, and store the result at the value pointed by pv.

func (*Decoder) SetMapper

func (d *Decoder) SetMapper(maps collect.MapFactory)

control the creation of mappings for the upcoming Decode. the default is to create native maps ( via stdmap.Make )

func (*Decoder) SetSequencer

func (d *Decoder) SetSequencer(seq collect.SequenceFactory)

control the creation of sequences for the upcoming Decode. the default is to create native slices ( via stdseq.Make )

func (*Decoder) UseFloats

func (d *Decoder) UseFloats()

configure the upcoming Decode to produce only floating point numbers. otherwise it will produce int for integers, and unit for hex specifications.

func (*Decoder) UseNotes

func (d *Decoder) UseNotes(b *note.Book)

pass a valid target for collecting document level comments during an upcoming call to Decode. the default behavior is to discard comments. ( passing nil will also discard them. )

type Encoder

type Encoder encode.Encoder

Encoder - follows the pattern of encoding/json

func NewEncoder

func NewEncoder(w io.Writer) *Encoder

NewEncoder -

func (*Encoder) Encode

func (enc *Encoder) Encode(v any) (err error)

Encode - serializes the passed document to the encoder's stream followed by a newline character. tell doesnt support multiple documents in the same file, but this interface doesn't stop callers from trying

func (*Encoder) SetMapper

func (enc *Encoder) SetMapper(n encode.StartCollection, c encode.Commenting) *Encoder

configure how mappings are encoded returns self for chaining

func (*Encoder) SetSequencer

func (enc *Encoder) SetSequencer(n encode.StartCollection, c encode.Commenting) *Encoder

configure how sequences are encoded returns self for chaining

type InvalidUnmarshalError

type InvalidUnmarshalError struct {
	Type r.Type
}

As per package encoding/json, describes an invalid argument passed to Unmarshal or Decode. Arguments must be non-nil pointers

func (*InvalidUnmarshalError) Error

func (e *InvalidUnmarshalError) Error() (ret string)

Directories

Path Synopsis
Package charm provides common utilities for parsing documents using hand-rolled hierarchical state machines.
Package charm provides common utilities for parsing documents using hand-rolled hierarchical state machines.
Package charmed provides common useful states for document parsing
Package charmed provides common useful states for document parsing
orderedmap
package orderedmap implements tell maps interface for ian coleman's ordered map implementation https://github.com/iancoleman/orderedmap
package orderedmap implements tell maps interface for ian coleman's ordered map implementation https://github.com/iancoleman/orderedmap

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL