jsontext

package

v0.0.0-...-3d76ae0 Latest Latest Go to latest Published: Jan 24, 2025 License: BSD-3-Clause Imports: 15 Imported by: 66

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

Documentation ¶

Overview ¶

Package jsontext implements syntactic processing of JSON as specified in RFC 4627, RFC 7159, RFC 7493, RFC 8259, and RFC 8785. JSON is a simple data interchange format that can represent primitive data types such as booleans, strings, and numbers, in addition to structured data types such as objects and arrays.

The Encoder and Decoder types are used to encode or decode a stream of JSON tokens or values.

Tokens and Values ¶

A JSON token refers to the basic structural elements of JSON:

a JSON literal (i.e., null, true, or false)
a JSON string (e.g., "hello, world!")
a JSON number (e.g., 123.456)
a start or end delimiter for a JSON object (i.e., '{' or '}')
a start or end delimiter for a JSON array (i.e., '[' or ']')

A JSON token is represented by the Token type in Go. Technically, there are two additional structural characters (i.e., ':' and ','), but there is no Token representation for them since their presence can be inferred by the structure of the JSON grammar itself. For example, there must always be an implicit colon between the name and value of a JSON object member.

A JSON value refers to a complete unit of JSON data:

a JSON literal, string, or number
a JSON object (e.g., `{"name":"value"}`)
a JSON array (e.g., `[1,2,3,]`)

A JSON value is represented by the Value type in Go and is a []byte containing the raw textual representation of the value. There is some overlap between tokens and values as both contain literals, strings, and numbers. However, only a value can represent the entirety of a JSON object or array.

The Encoder and Decoder types contain methods to read or write the next Token or Value in a sequence. They maintain a state machine to validate whether the sequence of JSON tokens and/or values produces a valid JSON. Options may be passed to the NewEncoder or NewDecoder constructors to configure the syntactic behavior of encoding and decoding.

Terminology ¶

The terms "encode" and "decode" are used for syntactic functionality that is concerned with processing JSON based on its grammar, and the terms "marshal" and "unmarshal" are used for semantic functionality that determines the meaning of JSON values as Go values and vice-versa. This package (i.e., jsontext) deals with JSON at a syntactic layer, while encoding/json/v2 deals with JSON at a semantic layer. The goal is to provide a clear distinction between functionality that is purely concerned with encoding versus that of marshaling. For example, one can directly encode a stream of JSON tokens without needing to marshal a concrete Go value representing them. Similarly, one can decode a stream of JSON tokens without needing to unmarshal them into a concrete Go value.

This package uses JSON terminology when discussing JSON, which may differ from related concepts in Go or elsewhere in computing literature.

a JSON "object" refers to an unordered collection of name/value members.
a JSON "array" refers to an ordered sequence of elements.
a JSON "value" refers to either a literal (i.e., null, false, or true), string, number, object, or array.

See RFC 8259 for more information.

Specifications ¶

Relevant specifications include RFC 4627, RFC 7159, RFC 7493, RFC 8259, and RFC 8785. Each RFC is generally a stricter subset of another RFC. In increasing order of strictness:

RFC 4627 and RFC 7159 do not require (but recommend) the use of UTF-8 and also do not require (but recommend) that object names be unique.
RFC 8259 requires the use of UTF-8, but does not require (but recommends) that object names be unique.
RFC 7493 requires the use of UTF-8 and also requires that object names be unique.
RFC 8785 defines a canonical representation. It requires the use of UTF-8 and also requires that object names be unique and in a specific ordering. It specifies exactly how strings and numbers must be formatted.

The primary difference between RFC 4627 and RFC 7159 is that the former restricted top-level values to only JSON objects and arrays, while RFC 7159 and subsequent RFCs permit top-level values to additionally be JSON nulls, booleans, strings, or numbers.

By default, this package operates on RFC 7493, but can be configured to operate according to the other RFC specifications. RFC 7493 is a stricter subset of RFC 8259 and fully compliant with it. In particular, it makes specific choices about behavior that RFC 8259 leaves as undefined in order to ensure greater interoperability.

Example (StringReplace) ¶

This example demonstrates the use of the Encoder and Decoder to parse and modify JSON without unmarshaling it into a concrete Go type.

package main

import (
	"bytes"
	"fmt"
	"io"
	"log"
	"strings"

	"github.com/go-json-experiment/json/jsontext"
)

func main() {
	// Example input with non-idiomatic use of "Golang" instead of "Go".
	const input = `{
		"title": "Golang version 1 is released",
		"author": "Andrew Gerrand",
		"date": "2012-03-28",
		"text": "Today marks a major milestone in the development of the Golang programming language.",
		"otherArticles": [
			"Twelve Years of Golang",
			"The Laws of Reflection",
			"Learn Golang from your browser"
		]
	}`

	// Using a Decoder and Encoder, we can parse through every token,
	// check and modify the token if necessary, and
	// write the token to the output.
	var replacements []jsontext.Pointer
	in := strings.NewReader(input)
	dec := jsontext.NewDecoder(in)
	out := new(bytes.Buffer)
	enc := jsontext.NewEncoder(out, jsontext.Multiline(true)) // expand for readability
	for {
		// Read a token from the input.
		tok, err := dec.ReadToken()
		if err != nil {
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}

		// Check whether the token contains the string "Golang" and
		// replace each occurrence with "Go" instead.
		if tok.Kind() == '"' && strings.Contains(tok.String(), "Golang") {
			replacements = append(replacements, dec.StackPointer())
			tok = jsontext.String(strings.ReplaceAll(tok.String(), "Golang", "Go"))
		}

		// Write the (possibly modified) token to the output.
		if err := enc.WriteToken(tok); err != nil {
			log.Fatal(err)
		}
	}

	// Print the list of replacements and the adjusted JSON output.
	if len(replacements) > 0 {
		fmt.Println(`Replaced "Golang" with "Go" in:`)
		for _, where := range replacements {
			fmt.Println("\t" + where)
		}
		fmt.Println()
	}
	fmt.Println("Result:", out.String())

}

Output:

Replaced "Golang" with "Go" in:
	/title
	/text
	/otherArticles/0
	/otherArticles/2

Result: {
	"title": "Go version 1 is released",
	"author": "Andrew Gerrand",
	"date": "2012-03-28",
	"text": "Today marks a major milestone in the development of the Go programming language.",
	"otherArticles": [
		"Twelve Years of Go",
		"The Laws of Reflection",
		"Learn Go from your browser"
	]
}

Index ¶

Variables
func AppendFormat(dst, src []byte, opts ...Options) ([]byte, error)
func AppendQuote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)
func AppendUnquote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)
type Decoder
- func NewDecoder(r io.Reader, opts ...Options) *Decoder
- func (d *Decoder) InputOffset() int64
- func (d *Decoder) PeekKind() Kind
- func (d *Decoder) ReadToken() (Token, error)
- func (d *Decoder) ReadValue() (Value, error)
- func (d *Decoder) Reset(r io.Reader, opts ...Options)
- func (d *Decoder) SkipValue() error
- func (d *Decoder) StackDepth() int
- func (d *Decoder) StackIndex(i int) (Kind, int64)
- func (d *Decoder) StackPointer() Pointer
- func (d *Decoder) UnreadBuffer() []byte
type Encoder
- func NewEncoder(w io.Writer, opts ...Options) *Encoder
- func (e *Encoder) OutputOffset() int64
- func (e *Encoder) Reset(w io.Writer, opts ...Options)
- func (e *Encoder) StackDepth() int
- func (e *Encoder) StackIndex(i int) (Kind, int64)
- func (e *Encoder) StackPointer() Pointer
- func (e *Encoder) UnusedBuffer() []byte
- func (e *Encoder) WriteToken(t Token) error
- func (e *Encoder) WriteValue(v Value) error
type Kind
- func (k Kind) String() string
type Options
- func AllowDuplicateNames(v bool) Options
- func AllowInvalidUTF8(v bool) Options
- func CanonicalizeRawFloats(v bool) Options
- func CanonicalizeRawInts(v bool) Options
- func EscapeForHTML(v bool) Options
- func EscapeForJS(v bool) Options
- func Multiline(v bool) Options
- func PreserveRawStrings(v bool) Options
- func ReorderRawObjects(v bool) Options
- func SpaceAfterColon(v bool) Options
- func SpaceAfterComma(v bool) Options
- func WithIndent(indent string) Options
- func WithIndentPrefix(prefix string) Options
type Pointer
- func (p Pointer) AppendToken(tok string) Pointer
- func (p1 Pointer) Contains(p2 Pointer) bool
- func (p Pointer) IsValid() bool
- func (p Pointer) LastToken() string
- func (p Pointer) Parent() Pointer
- func (p Pointer) Tokens() iter.Seq[string]
type SyntacticError
- func (e *SyntacticError) Error() string
- func (e *SyntacticError) Unwrap() error
type Token
- func Bool(b bool) Token
- func Float(n float64) Token
- func Int(n int64) Token
- func String(s string) Token
- func Uint(n uint64) Token
- func (t Token) Bool() bool
- func (t Token) Clone() Token
- func (t Token) Float() float64
- func (t Token) Int() int64
- func (t Token) Kind() Kind
- func (t Token) String() string
- func (t Token) Uint() uint64
type Value
- func (v *Value) Canonicalize(opts ...Options) error
- func (v Value) Clone() Value
- func (v *Value) Compact(opts ...Options) error
- func (v *Value) Format(opts ...Options) error
- func (v *Value) Indent(opts ...Options) error
- func (v Value) IsValid(opts ...Options) bool
- func (v Value) Kind() Kind
- func (v Value) MarshalJSON() ([]byte, error)
- func (v Value) String() string
- func (v *Value) UnmarshalJSON(b []byte) error

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// ErrDuplicateName indicates that a JSON token could not be
	// encoded or decoded because it results in a duplicate JSON object name.
	// This error is directly wrapped within a [SyntacticError] when produced.
	//
	// The name of a duplicate JSON object member can be extracted as:
	//
	//	err := ...
	//	var serr jsontext.SyntacticError
	//	if errors.As(err, &serr) && serr.Err == jsontext.ErrDuplicateName {
	//		ptr := serr.JSONPointer // JSON pointer to duplicate name
	//		name := ptr.LastToken() // duplicate name itself
	//		...
	//	}
	//
	// This error is only returned if [AllowDuplicateNames] is false.
	ErrDuplicateName = errors.New("duplicate object member name")

	// ErrNonStringName indicates that a JSON token could not be
	// encoded or decoded because it is not a string,
	// as required for JSON object names according to RFC 8259, section 4.
	// This error is directly wrapped within a [SyntacticError] when produced.
	ErrNonStringName = errors.New("object member name must be a string")
)

View Source

var Internal exporter

Internal is for internal use only. This is exempt from the Go compatibility agreement.

Functions ¶

func AppendFormat ¶

func AppendFormat(dst, src []byte, opts ...Options) ([]byte, error)

AppendFormat formats the JSON value in src and appends it to dst according to the specified options. See Value.Format for more details about the formatting behavior.

The dst and src may overlap. If an error is reported, then the entirety of src is appended to dst.

func AppendQuote ¶

func AppendQuote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)

AppendQuote appends a double-quoted JSON string literal representing src to dst and returns the extended buffer. It uses the minimal string representation per RFC 8785, section 3.2.2.2. Invalid UTF-8 bytes are replaced with the Unicode replacement character and an error is returned at the end indicating the presence of invalid UTF-8. The dst must not overlap with the src.

func AppendUnquote ¶

func AppendUnquote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)

AppendUnquote appends the decoded interpretation of src as a double-quoted JSON string literal to dst and returns the extended buffer. The input src must be a JSON string without any surrounding whitespace. Invalid UTF-8 bytes are replaced with the Unicode replacement character and an error is returned at the end indicating the presence of invalid UTF-8. Any trailing bytes after the JSON string literal results in an error. The dst must not overlap with the src.

Types ¶

type Decoder ¶

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder is a streaming decoder for raw JSON tokens and values. It is used to read a stream of top-level JSON values, each separated by optional whitespace characters.

Decoder.ReadToken and Decoder.ReadValue calls may be interleaved. For example, the following JSON value:

{"name":"value","array":[null,false,true,3.14159],"object":{"k":"v"}}

can be parsed with the following calls (ignoring errors for brevity):

d.ReadToken() // {
d.ReadToken() // "name"
d.ReadToken() // "value"
d.ReadValue() // "array"
d.ReadToken() // [
d.ReadToken() // null
d.ReadToken() // false
d.ReadValue() // true
d.ReadToken() // 3.14159
d.ReadToken() // ]
d.ReadValue() // "object"
d.ReadValue() // {"k":"v"}
d.ReadToken() // }

The above is one of many possible sequence of calls and may not represent the most sensible method to call for any given token/value. For example, it is probably more common to call Decoder.ReadToken to obtain a string token for object names.

func NewDecoder ¶

func NewDecoder(r io.Reader, opts ...Options) *Decoder

NewDecoder constructs a new streaming decoder reading from r.

If r is a bytes.Buffer, then the decoder parses directly from the buffer without first copying the contents to an intermediate buffer. Additional writes to the buffer must not occur while the decoder is in use.

func (*Decoder) InputOffset ¶

func (d *Decoder) InputOffset() int64

InputOffset returns the current input byte offset. It gives the location of the next byte immediately after the most recently returned token or value. The number of bytes actually read from the underlying io.Reader may be more than this offset due to internal buffering effects.

func (*Decoder) PeekKind ¶

func (d *Decoder) PeekKind() Kind

PeekKind retrieves the next token kind, but does not advance the read offset. It returns 0 if there are no more tokens.

func (*Decoder) ReadToken ¶

func (d *Decoder) ReadToken() (Token, error)

ReadToken reads the next Token, advancing the read offset. The returned token is only valid until the next Peek, Read, or Skip call. It returns io.EOF if there are no more tokens.

func (*Decoder) ReadValue ¶

func (d *Decoder) ReadValue() (Value, error)

ReadValue returns the next raw JSON value, advancing the read offset. The value is stripped of any leading or trailing whitespace and contains the exact bytes of the input, which may contain invalid UTF-8 if AllowInvalidUTF8 is specified.

The returned value is only valid until the next Peek, Read, or Skip call and may not be mutated while the Decoder remains in use. If the decoder is currently at the end token for an object or array, then it reports a SyntacticError and the internal state remains unchanged. It returns io.EOF if there are no more values.

func (*Decoder) Reset ¶

func (d *Decoder) Reset(r io.Reader, opts ...Options)

Reset resets a decoder such that it is reading afresh from r and configured with the provided options. Reset must not be called on an a Decoder passed to the encoding/json/v2.UnmarshalerFrom.UnmarshalJSONFrom method or the encoding/json/v2.UnmarshalFromFunc function.

func (*Decoder) SkipValue ¶

func (d *Decoder) SkipValue() error

SkipValue is semantically equivalent to calling Decoder.ReadValue and discarding the result except that memory is not wasted trying to hold the entire result.

func (*Decoder) StackDepth ¶

func (d *Decoder) StackDepth() int

StackDepth returns the depth of the state machine for read JSON data. Each level on the stack represents a nested JSON object or array. It is incremented whenever an ObjectStart or ArrayStart token is encountered and decremented whenever an ObjectEnd or ArrayEnd token is encountered. The depth is zero-indexed, where zero represents the top-level JSON value.

func (*Decoder) StackIndex ¶

func (d *Decoder) StackIndex(i int) (Kind, int64)

StackIndex returns information about the specified stack level. It must be a number between 0 and Decoder.StackDepth, inclusive. For each level, it reports the kind:

0 for a level of zero,
'{' for a level representing a JSON object, and
'[' for a level representing a JSON array.

It also reports the length of that JSON object or array. Each name and value in a JSON object is counted separately, so the effective number of members would be half the length. A complete JSON object must have an even length.

func (*Decoder) StackPointer ¶

func (d *Decoder) StackPointer() Pointer

StackPointer returns a JSON Pointer (RFC 6901) to the most recently read value. Object names are only present if AllowDuplicateNames is false, otherwise object members are represented using their index within the object.

func (*Decoder) UnreadBuffer ¶

func (d *Decoder) UnreadBuffer() []byte

UnreadBuffer returns the data remaining in the unread buffer, which may contain zero or more bytes. The returned buffer must not be mutated while Decoder continues to be used. The buffer contents are valid until the next Peek, Read, or Skip call.

type Encoder ¶

type Encoder struct {
	// contains filtered or unexported fields
}

Encoder is a streaming encoder from raw JSON tokens and values. It is used to write a stream of top-level JSON values, each terminated with a newline character.

Encoder.WriteToken and Encoder.WriteValue calls may be interleaved. For example, the following JSON value:

{"name":"value","array":[null,false,true,3.14159],"object":{"k":"v"}}

can be composed with the following calls (ignoring errors for brevity):

e.WriteToken(ObjectStart)        // {
e.WriteToken(String("name"))     // "name"
e.WriteToken(String("value"))    // "value"
e.WriteValue(Value(`"array"`))   // "array"
e.WriteToken(ArrayStart)         // [
e.WriteToken(Null)               // null
e.WriteToken(False)              // false
e.WriteValue(Value("true"))      // true
e.WriteToken(Float(3.14159))     // 3.14159
e.WriteToken(ArrayEnd)           // ]
e.WriteValue(Value(`"object"`))  // "object"
e.WriteValue(Value(`{"k":"v"}`)) // {"k":"v"}
e.WriteToken(ObjectEnd)          // }

The above is one of many possible sequence of calls and may not represent the most sensible method to call for any given token/value. For example, it is probably more common to call Encoder.WriteToken with a string for object names.

func NewEncoder ¶

func NewEncoder(w io.Writer, opts ...Options) *Encoder

NewEncoder constructs a new streaming encoder writing to w configured with the provided options. It flushes the internal buffer when the buffer is sufficiently full or when a top-level value has been written.

If w is a bytes.Buffer, then the encoder appends directly into the buffer without copying the contents from an intermediate buffer.

func (*Encoder) OutputOffset ¶

func (e *Encoder) OutputOffset() int64

OutputOffset returns the current output byte offset. It gives the location of the next byte immediately after the most recently written token or value. The number of bytes actually written to the underlying io.Writer may be less than this offset due to internal buffering effects.

func (*Encoder) Reset ¶

func (e *Encoder) Reset(w io.Writer, opts ...Options)

Reset resets an encoder such that it is writing afresh to w and configured with the provided options. Reset must not be called on a Encoder passed to the encoding/json/v2.MarshalerTo.MarshalJSONTo method or the encoding/json/v2.MarshalToFunc function.

func (*Encoder) StackDepth ¶

func (e *Encoder) StackDepth() int

StackDepth returns the depth of the state machine for written JSON data. Each level on the stack represents a nested JSON object or array. It is incremented whenever an ObjectStart or ArrayStart token is encountered and decremented whenever an ObjectEnd or ArrayEnd token is encountered. The depth is zero-indexed, where zero represents the top-level JSON value.

func (*Encoder) StackIndex ¶

func (e *Encoder) StackIndex(i int) (Kind, int64)

StackIndex returns information about the specified stack level. It must be a number between 0 and Encoder.StackDepth, inclusive. For each level, it reports the kind:

0 for a level of zero,
'{' for a level representing a JSON object, and
'[' for a level representing a JSON array.

It also reports the length of that JSON object or array. Each name and value in a JSON object is counted separately, so the effective number of members would be half the length. A complete JSON object must have an even length.

func (*Encoder) StackPointer ¶

func (e *Encoder) StackPointer() Pointer

StackPointer returns a JSON Pointer (RFC 6901) to the most recently written value. Object names are only present if AllowDuplicateNames is false, otherwise object members are represented using their index within the object.

func (*Encoder) UnusedBuffer ¶

func (e *Encoder) UnusedBuffer() []byte

UnusedBuffer returns a zero-length buffer with a possible non-zero capacity. This buffer is intended to be used to populate a Value being passed to an immediately succeeding Encoder.WriteValue call.

Example usage:

b := d.UnusedBuffer()
b = append(b, '"')
b = appendString(b, v) // append the string formatting of v
b = append(b, '"')
... := d.WriteValue(b)

It is the user's responsibility to ensure that the value is valid JSON.

func (*Encoder) WriteToken ¶

func (e *Encoder) WriteToken(t Token) error

WriteToken writes the next token and advances the internal write offset.

The provided token kind must be consistent with the JSON grammar. For example, it is an error to provide a number when the encoder is expecting an object name (which is always a string), or to provide an end object delimiter when the encoder is finishing an array. If the provided token is invalid, then it reports a SyntacticError and the internal state remains unchanged. The offset reported in SyntacticError will be relative to the Encoder.OutputOffset.

func (*Encoder) WriteValue ¶

func (e *Encoder) WriteValue(v Value) error

WriteValue writes the next raw value and advances the internal write offset. The Encoder does not simply copy the provided value verbatim, but parses it to ensure that it is syntactically valid and reformats it according to how the Encoder is configured to format whitespace and strings. If AllowInvalidUTF8 is specified, then any invalid UTF-8 is mangled as the Unicode replacement character, U+FFFD.

The provided value kind must be consistent with the JSON grammar (see examples on Encoder.WriteToken). If the provided value is invalid, then it reports a SyntacticError and the internal state remains unchanged. The offset reported in SyntacticError will be relative to the Encoder.OutputOffset plus the offset into v of any encountered syntax error.

type Kind ¶

type Kind byte

Kind represents each possible JSON token kind with a single byte, which is conveniently the first byte of that kind's grammar with the restriction that numbers always be represented with '0':

'n': null
'f': false
't': true
'"': string
'0': number
'{': object start
'}': object end
'[': array start
']': array end

An invalid kind is usually represented using 0, but may be non-zero due to invalid JSON data.

func (Kind) String ¶

func (k Kind) String() string

String prints the kind in a humanly readable fashion.

type Options ¶

type Options = jsonopts.Options

Options configures NewEncoder, Encoder.Reset, NewDecoder, and Decoder.Reset with specific features. Each function takes in a variadic list of options, where properties set in latter options override the value of previously set properties.

The Options type is identical to encoding/json.Options and encoding/json/v2.Options. Options from the other packages may be passed to functionality in this package, but are ignored. Options from this package may be used with the other packages.

func AllowDuplicateNames ¶

func AllowDuplicateNames(v bool) Options

AllowDuplicateNames specifies that JSON objects may contain duplicate member names. Disabling the duplicate name check may provide performance benefits, but breaks compliance with RFC 7493, section 2.3. The input or output will still be compliant with RFC 8259, which leaves the handling of duplicate names as unspecified behavior.

This affects either encoding or decoding.

func AllowInvalidUTF8 ¶

func AllowInvalidUTF8(v bool) Options

AllowInvalidUTF8 specifies that JSON strings may contain invalid UTF-8, which will be mangled as the Unicode replacement character, U+FFFD. This causes the encoder or decoder to break compliance with RFC 7493, section 2.1, and RFC 8259, section 8.1.

This affects either encoding or decoding.

func CanonicalizeRawFloats ¶

func CanonicalizeRawFloats(v bool) Options

CanonicalizeRawFloats specifies that when encoding a raw JSON floating-pointer number (i.e., a number with a fraction or exponent) in a Token or Value, the number is canonicalized according to RFC 8785, section 3.2.2.3. As a special case, the number -0 is canonicalized as 0.

JSON numbers are treated as IEEE 754 double precision numbers. It is safe to canonicalize a serialized single precision number and parse it back as a single precision number and expect the same value. If a number exceeds ±1.7976931348623157e+308, which is the maximum finite number, then it saturated at that value and formatted as such.

This only affects encoding and is ignored when decoding.

func CanonicalizeRawInts ¶

func CanonicalizeRawInts(v bool) Options

CanonicalizeRawInts specifies that when encoding a raw JSON integer number (i.e., a number without a fraction and exponent) in a Token or Value, the number is canonicalized according to RFC 8785, section 3.2.2.3. As a special case, the number -0 is canonicalized as 0.

JSON numbers are treated as IEEE 754 double precision numbers. Any numbers with precision beyond what is representable by that form will lose their precision when canonicalized. For example, integer values beyond ±2⁵³ will lose their precision. For example, 1234567890123456789 is formatted as 1234567890123456800.

This only affects encoding and is ignored when decoding.

func EscapeForHTML ¶

func EscapeForHTML(v bool) Options

EscapeForHTML specifies that '<', '>', and '&' characters within JSON strings should be escaped as a hexadecimal Unicode codepoint (e.g., \u003c) so that the output is safe to embed within HTML.

This only affects encoding and is ignored when decoding.

Example ¶

Directly embedding JSON within HTML requires special handling for safety. Escape certain runes to prevent JSON directly treated as HTML from being able to perform <script> injection.

This example shows how to obtain equivalent behavior provided by the v1 encoding/json package that is no longer directly supported by this package. Newly written code that intermix JSON and HTML should instead be using the github.com/google/safehtml module for safety purposes.

package main

import (
	"fmt"
	"log"

	"github.com/go-json-experiment/json"
	"github.com/go-json-experiment/json/jsontext"
)

func main() {
	page := struct {
		Title string
		Body  string
	}{
		Title: "Example Embedded Javascript",
		Body:  `<script> console.log("Hello, world!"); </script>`,
	}

	b, err := json.Marshal(&page,
		// Escape certain runes within a JSON string so that
		// JSON will be safe to directly embed inside HTML.
		jsontext.EscapeForHTML(true),
		jsontext.EscapeForJS(true),
		jsontext.Multiline(true)) // expand for readability
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(b))

}

Output:

{
	"Title": "Example Embedded Javascript",
	"Body": "\u003cscript\u003e console.log(\"Hello, world!\"); \u003c/script\u003e"
}

func EscapeForJS ¶

func EscapeForJS(v bool) Options

EscapeForJS specifies that U+2028 and U+2029 characters within JSON strings should be escaped as a hexadecimal Unicode codepoint (e.g., \u2028) so that the output is valid to embed within JavaScript. See RFC 8259, section 12.

This only affects encoding and is ignored when decoding.

func Multiline ¶

func Multiline(v bool) Options

Multiline specifies that the JSON output should expand to multiple lines, where every JSON object member or JSON array element appears on a new, indented line according to the nesting depth.

If SpaceAfterColon is not specified, then the default is true. If SpaceAfterComma is not specified, then the default is false. If WithIndent is not specified, then the default is "\t".

If set to false, then the output is a single-line, where the only whitespace emitted is determined by the current values of SpaceAfterColon and SpaceAfterComma.

This only affects encoding and is ignored when decoding.

func PreserveRawStrings ¶

func PreserveRawStrings(v bool) Options

PreserveRawStrings specifies that when encoding a raw JSON string in a Token or Value, pre-escaped sequences in a JSON string are preserved to the output. However, raw strings still respect EscapeForHTML and EscapeForJS such that the relevant characters are escaped. If AllowInvalidUTF8 is enabled, bytes of invalid UTF-8 are preserved to the output.

This only affects encoding and is ignored when decoding.

func ReorderRawObjects ¶

func ReorderRawObjects(v bool) Options

ReorderRawObjects specifies that when encoding a raw JSON object in a Value, the object members are reordered according to RFC 8785, section 3.2.3.

This only affects encoding and is ignored when decoding.

func SpaceAfterColon ¶

func SpaceAfterColon(v bool) Options

SpaceAfterColon specifies that the JSON output should emit a space character after each colon separator following a JSON object name. If false, then no space character appears after the colon separator.

This only affects encoding and is ignored when decoding.

func SpaceAfterComma ¶

func SpaceAfterComma(v bool) Options

SpaceAfterComma specifies that the JSON output should emit a space character after each comma separator following a JSON object value or array element. If false, then no space character appears after the comma separator.

This only affects encoding and is ignored when decoding.

func WithIndent ¶

func WithIndent(indent string) Options

WithIndent specifies that the encoder should emit multiline output where each element in a JSON object or array begins on a new, indented line beginning with the indent prefix (see WithIndentPrefix) followed by one or more copies of indent according to the nesting depth. The indent must only be composed of space or tab characters.

If the intent to emit indented output without a preference for the particular indent string, then use Multiline instead.

This only affects encoding and is ignored when decoding. Use of this option implies Multiline being set to true.

func WithIndentPrefix ¶

func WithIndentPrefix(prefix string) Options

WithIndentPrefix specifies that the encoder should emit multiline output where each element in a JSON object or array begins on a new, indented line beginning with the indent prefix followed by one or more copies of indent (see WithIndent) according to the nesting depth. The prefix must only be composed of space or tab characters.

This only affects encoding and is ignored when decoding. Use of this option implies Multiline being set to true.

type Pointer ¶

type Pointer string

Pointer is a JSON Pointer (RFC 6901) that references a particular JSON value relative to the root of the top-level JSON value.

A Pointer is a slash-separated list of tokens, where each token is either a JSON object name or an index to a JSON array element encoded as a base-10 integer value. It is impossible to distinguish between an array index and an object name (that happens to be an base-10 encoded integer) without also knowing the structure of the top-level JSON value that the pointer refers to.

There is exactly one representation of a pointer to a particular value, so comparability of Pointer values is equivalent to checking whether they both point to the exact same value.

func (Pointer) AppendToken ¶

func (p Pointer) AppendToken(tok string) Pointer

AppendToken appends a token to the end of p and returns the full pointer.

func (Pointer) Contains ¶

func (p1 Pointer) Contains(p2 Pointer) bool

Contains reports whether the JSON value that p1 points to is equal to or contains the JSON value that p2 points to.

func (Pointer) IsValid ¶

func (p Pointer) IsValid() bool

IsValid reports whether p is a valid JSON Pointer according to RFC 6901. Note that the concatenation of two valid pointers produces a valid pointer.

func (Pointer) LastToken ¶

func (p Pointer) LastToken() string

LastToken returns the last token in the pointer. The last token of an empty p is an empty string.

func (Pointer) Parent ¶

func (p Pointer) Parent() Pointer

Parent strips off the last token and returns the remaining pointer. The parent of an empty p is an empty string.

func (Pointer) Tokens ¶

func (p Pointer) Tokens() iter.Seq[string]

Tokens returns an iterator over the reference tokens in the JSON pointer, starting from the first token until the last token (unless stopped early).

type SyntacticError ¶

type SyntacticError struct {

	// ByteOffset indicates that an error occurred after this byte offset.
	ByteOffset int64
	// JSONPointer indicates that an error occurred within this JSON value
	// as indicated using the JSON Pointer notation (see RFC 6901).
	JSONPointer Pointer

	// Err is the underlying error.
	Err error
	// contains filtered or unexported fields
}

SyntacticError is a description of a syntactic error that occurred when encoding or decoding JSON according to the grammar.

The contents of this error as produced by this package may change over time.

func (*SyntacticError) Error ¶

func (e *SyntacticError) Error() string

func (*SyntacticError) Unwrap ¶

func (e *SyntacticError) Unwrap() error

type Token ¶

type Token struct {
	// contains filtered or unexported fields
}

Token represents a lexical JSON token, which may be one of the following:

a JSON literal (i.e., null, true, or false)
a JSON string (e.g., "hello, world!")
a JSON number (e.g., 123.456)
a start or end delimiter for a JSON object (i.e., { or } )
a start or end delimiter for a JSON array (i.e., [ or ] )

A Token cannot represent entire array or object values, while a Value can. There is no Token to represent commas and colons since these structural tokens can be inferred from the surrounding context.

var (
	Null  Token = rawToken("null")
	False Token = rawToken("false")
	True  Token = rawToken("true")

	ObjectStart Token = rawToken("{")
	ObjectEnd   Token = rawToken("}")
	ArrayStart  Token = rawToken("[")
	ArrayEnd    Token = rawToken("]")
)

func Bool ¶

func Bool(b bool) Token

Bool constructs a Token representing a JSON boolean.

func Float ¶

func Float(n float64) Token

Float constructs a Token representing a JSON number. The values NaN, +Inf, and -Inf will be represented as a JSON string with the values "NaN", "Infinity", and "-Infinity".

func Int ¶

func Int(n int64) Token

Int constructs a Token representing a JSON number from an int64.

func String ¶

func String(s string) Token

String constructs a Token representing a JSON string. The provided string should contain valid UTF-8, otherwise invalid characters may be mangled as the Unicode replacement character.

func Uint ¶

func Uint(n uint64) Token

Uint constructs a Token representing a JSON number from a uint64.

func (Token) Bool ¶

func (t Token) Bool() bool

Bool returns the value for a JSON boolean. It panics if the token kind is not a JSON boolean.

func (Token) Clone ¶

func (t Token) Clone() Token

Clone makes a copy of the Token such that its value remains valid even after a subsequent [Decoder.Read] call.

func (Token) Float ¶

func (t Token) Float() float64

Float returns the floating-point value for a JSON number. It returns a NaN, +Inf, or -Inf value for any JSON string with the values "NaN", "Infinity", or "-Infinity". It panics for all other cases.

func (Token) Int ¶

func (t Token) Int() int64

Int returns the signed integer value for a JSON number. The fractional component of any number is ignored (truncation toward zero). Any number beyond the representation of an int64 will be saturated to the closest representable value. It panics if the token kind is not a JSON number.

func (Token) Kind ¶

func (t Token) Kind() Kind

Kind returns the token kind.

func (Token) String ¶

func (t Token) String() string

String returns the unescaped string value for a JSON string. For other JSON kinds, this returns the raw JSON representation.

func (Token) Uint ¶

func (t Token) Uint() uint64

Uint returns the unsigned integer value for a JSON number. The fractional component of any number is ignored (truncation toward zero). Any number beyond the representation of an uint64 will be saturated to the closest representable value. It panics if the token kind is not a JSON number.

type Value ¶

type Value []byte

Value represents a single raw JSON value, which may be one of the following:

a JSON literal (i.e., null, true, or false)
a JSON string (e.g., "hello, world!")
a JSON number (e.g., 123.456)
an entire JSON object (e.g., {"fizz":"buzz"} )
an entire JSON array (e.g., [1,2,3] )

Value can represent entire array or object values, while Token cannot. Value may contain leading and/or trailing whitespace.

func (*Value) Canonicalize ¶

func (v *Value) Canonicalize(opts ...Options) error

Canonicalize canonicalizes the raw JSON value according to the JSON Canonicalization Scheme (JCS) as defined by RFC 8785 where it produces a stable representation of a JSON value.

JSON strings are formatted to use their minimal representation, JSON numbers are formatted as double precision numbers according to some stable serialization algorithm. JSON object members are sorted in ascending order by name. All whitespace is removed.

The output stability is dependent on the stability of the application data (see RFC 8785, Appendix E). It cannot produce stable output from fundamentally unstable input. For example, if the JSON value contains ephemeral data (e.g., a frequently changing timestamp), then the value is still unstable regardless of whether this is called.

Canonicalize is equivalent to calling Value.Format with the following options:

CanonicalizeRawInts(true)
CanonicalizeRawFloats(true)
ReorderRawObjects(true)

Any options specified by the caller are applied after the initial set and may deliberately override prior options.

Note that JCS treats all JSON numbers as IEEE 754 double precision numbers. Any numbers with precision beyond what is representable by that form will lose their precision when canonicalized. For example, integer values beyond ±2⁵³ will lose their precision. To preserve the original representation of JSON integers, additionally set CanonicalizeRawInts to false:

v.Canonicalize(jsontext.CanonicalizeRawInts(false))

func (Value) Clone ¶

func (v Value) Clone() Value

Clone returns a copy of v.

func (*Value) Compact ¶

func (v *Value) Compact(opts ...Options) error

Compact removes all whitespace from the raw JSON value.

It does not reformat JSON strings or numbers to use any other representation. To maximize the set of JSON values that can be formatted, this permits values with duplicate names and invalid UTF-8.

Compact is equivalent to calling Value.Format with the following options:

AllowDuplicateNames(true)
AllowInvalidUTF8(true)
PreserveRawStrings(true)

Any options specified by the caller are applied after the initial set and may deliberately override prior options.

func (*Value) Format ¶

func (v *Value) Format(opts ...Options) error

Format formats the raw JSON value in place.

By default (if no options are specified), it validates according to RFC 7493 and produces the minimal JSON representation, where all whitespace is elided and JSON strings use the shortest encoding.

Relevant options include:

AllowDuplicateNames
AllowInvalidUTF8
EscapeForHTML
EscapeForJS
PreserveRawStrings
CanonicalizeRawInts
CanonicalizeRawFloats
ReorderRawObjects
SpaceAfterColon
SpaceAfterComma
Multiline
WithIndent
WithIndentPrefix

All other options are ignored.

It is guaranteed to succeed if the value is valid according to the same options. If the value is already formatted, then the buffer is not mutated.

func (*Value) Indent ¶

func (v *Value) Indent(opts ...Options) error

Indent reformats the whitespace in the raw JSON value so that each element in a JSON object or array begins on a indented line according to the nesting.

It does not reformat JSON strings or numbers to use any other representation. To maximize the set of JSON values that can be formatted, this permits values with duplicate names and invalid UTF-8.

Indent is equivalent to calling Value.Format with the following options:

AllowDuplicateNames(true)
AllowInvalidUTF8(true)
PreserveRawStrings(true)
Multiline(true)

Any options specified by the caller are applied after the initial set and may deliberately override prior options.

func (Value) IsValid ¶

func (v Value) IsValid(opts ...Options) bool

IsValid reports whether the raw JSON value is syntactically valid according to the specified options.

By default (if no options are specified), it validates according to RFC 7493. It verifies whether the input is properly encoded as UTF-8, that escape sequences within strings decode to valid Unicode codepoints, and that all names in each object are unique. It does not verify whether numbers are representable within the limits of any common numeric type (e.g., float64, int64, or uint64).

Relevant options include:

AllowDuplicateNames
AllowInvalidUTF8

All other options are ignored.

func (Value) Kind ¶

func (v Value) Kind() Kind

Kind returns the starting token kind. For a valid value, this will never include '}' or ']'.

func (Value) MarshalJSON ¶

func (v Value) MarshalJSON() ([]byte, error)

MarshalJSON returns v as the JSON encoding of v. It returns the stored value as the raw JSON output without any validation. If v is nil, then this returns a JSON null.

func (Value) String ¶

func (v Value) String() string

String returns the string formatting of v.

func (*Value) UnmarshalJSON ¶

func (v *Value) UnmarshalJSON(b []byte) error

UnmarshalJSON sets v as the JSON encoding of b. It stores a copy of the provided raw JSON input without any validation.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL