c14n

package
v0.63.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 24, 2023 License: Apache-2.0 Imports: 9 Imported by: 0

README

GOBL Canonicalization

Introduction

One of the hardest issues to solve around digital signatures is generating a digest or hash of the source data consistently and despite any structural changes that may have a happened while in transit. The objective of canonicalization is to ensure that the data is logically equivalent at the source and destination so that the digest can be calculated reliably on both sides, and thus be used in digital signatures.

In the world of XML, the Canonical XML Version 1.1 W3C recommendation aims to set out rules to be used to create consistent documents. Anyone who's worked with XML signatures however (XML-DSIG) knows that despite the best intentions and libraries, it can still be difficult to get the expected results, especially when using different languages at the source and destination.

JSON conversely lacks clearly defined or active industrial standard around canonicalization, despite having a much simpler syntax. Indeed, the JSON Web Token specification gets around pesky canonical issues by including the actual signed payload data as a Base 64 string inside the signatures.

One of the objectives of GoBL is to create a document that could potentially be stored in any key-value format alternative to JSON, like YAML, Protobuf, or maybe even XML. Perhaps GOBL documents need to be persisted to a document database like CouchDB or a JSONB field in PostgreSQL. It should not matter what the underlying format or persistence engine is, as long as the logical contents are exactly the same. Thus when signing documents it's essential we have a reliable canonical version of JSON, even if the data is stored somewhere else.

This c14n package, inspired by the works of others, thus aims to define a simple standardized approach to canonical JSON that could potentially be implemented easily in other languages. More than just a definition, the code here is a reference implementation from which libraries can be made in languages other than Go.

GoBL JSON C14n

GoBL considers the following JSON values as explicit types:

  • a string
  • a number, which extends the JSON spec and is split into:
    • an integer
    • a float
  • an object
  • an array
  • a boolean
  • null

JSON in canonical form:

  1. MUST be encoded in VALID UTF-8. A document with invalid character encoding will be rejected.
  2. MUST NOT include insignificant whitespace.
  3. MUST order the attributes of objects lexicographically by the UCS (Unicode Character Set) code points of their names.
  4. MUST remove attributes from objects whose value is null.
  5. MUST NOT remove null values from arrays.
  6. MUST represent integer numbers, those with a zero-valued fractional part, WITHOUT:
    1. a leading minus sign when the value is zero,
    2. a decimal point,
    3. an exponent, thus limiting numbers to 64 bits, and
    4. insignificant leading zeroes, as already required by JSON.
  7. MUST represent floating point numbers in exponential notation, INCLUDING:
    1. a nonzero single-digit part to the left of the decimal point,
    2. a nonempty fractional part to the right of the decimal point,
    3. no trailing zeroes to right of the decimal point except to comply with the previous point,
    4. a capital E for the exponent indicator,
    5. no plus sign in the mantissa nor exponent, and
    6. no insignificant leading zeros in the exponent.
  8. MUST represent all strings, including object attribute keys, in their minimal length UTF-8 encoding:
    1. using two-character escape sequences where possible for characters that require escaping, specifically:
      • \" U+0022 Quotation Mark
      • \\ U+005C Reverse Solidus (backslash)
      • \b U+0008 Backspace
      • \t U+0009 Character Tabulation (tab)
      • \n U+000A Line Feed (newline)
      • \f U+000C Form Feed
      • \r U+000D Carriage Return
    2. using six-character \u00XX uppercase hexadecimal escape sequences for control characters that require escaping but lack a two-character sequence described previously, and
    3. reject any string containing invalid encoding.

The GoBL JSON c14n package has been designed to operate using any raw JSON source and uses the Go encoding/json library's streaming methods to parse and recreate a document in memory. A simplified object model is used to map JSON structures ready to be converted into canonical JSON.

Usage Example

package main

import (
  "fmt"
  "strings"

  "github.com/invopop/gobl/c14n"
)

func main() {
  d := `{ "foo":"bar", "c": 123.4, "a": 56, "b": 0.0, "y":null}`
  r := strings.NewReader(d)
  res, err := c14n.CanonicalJSON(r)
  if err != nil {
    panic(err.Error())
  }
  fmt.Printf("Result: %v\n", string(res))
  // Output: {"a":56,"b":0.0E0,"c":1.234E2,"foo":"bar"}
}

Prior Art

This specification and implementation is based on the gibson042 canonicaljson specification with simplifications concerning invalid UTF-8 characters, null values in objects, and a reference implementation that is more explicit making it potentially easier to be recreated in other programming languages.

The gibson042 specification is in turn based on the now expired JSON Canonical Form internet draft which lacks clarity on the handling of integer numbers, is missing details on escape sequences, and doesn't consider invalid UTF-8 characters.

Canonical representation of floats is consistent with XML Schema 2, section 3.2.4.2, and expects integer numbers without an exponential component as defined in RFC 7638 - JSON Web Key Thumbprint.

Documentation

Overview

Package c14n provides canonical JSON encoding and decoding.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func CanonicalJSON

func CanonicalJSON(src io.Reader) ([]byte, error)

CanonicalJSON performs the unmarshal and marshal commands in one go.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/invopop/gobl/c14n"
)

func main() {
	d := `{ "foo":"bar", "c": 123.4, "a": 56, "b": 0.0, "y":null}`
	r := strings.NewReader(d)
	res, err := c14n.CanonicalJSON(r)
	if err != nil {
		panic(err.Error())
	}
	fmt.Printf("%v\n", string(res))
}
Output:

{"a":56,"b":0.0E0,"c":1.234E2,"foo":"bar"}

Types

type Array

type Array struct {
	Values []Canonicalable
}

Array contains a list of canonicable values, as opposed to the objects key-value pairs.

func (*Array) MarshalJSON

func (a *Array) MarshalJSON() ([]byte, error)

MarshalJSON recursively marshals all of the arrays items and joins the results together to form a JSON byte array.

type Attribute

type Attribute struct {
	Key   string
	Value Canonicalable
}

Attribute represents a key-value pair used in objects. Using an array guarantees ordering of keys, which is one of the fundamental requirements for canonicalization.

func (*Attribute) MarshalJSON

func (a *Attribute) MarshalJSON() ([]byte, error)

MarshalJSON creates a key-value pair in JSON format. A null value in an attribute will return an empty byte array.

type Bool

type Bool bool

Bool handles binary true or false.

func (Bool) MarshalJSON

func (b Bool) MarshalJSON() ([]byte, error)

MarshalJSON provides the JSON standard true or false response.

type Canonicalable

type Canonicalable interface {
	MarshalJSON() ([]byte, error)
}

Canonicalable defines what we expect from objects that need to be converted into our standardized JSON. All structures that comply with this interface are expected to contain data that was already sourced from a JSON document, so we don't need to worry too much about the conversion process.

func UnmarshalJSON

func UnmarshalJSON(src io.Reader) (Canonicalable, error)

UnmarshalJSON expects an io Reader whose data will be parsed using a streaming JSON decoder and converted into a "Canonicalable" set of structures. The resulting objects can then be re-encoded back into canonical JSON suitable for sending to a hashing algorithm.

type Float

type Float float64

Float numbers must be represented by a 64-bit signed integer and exponential that reflects the position of the decimal place. We're not going to support numbers whose signifcant digits do not fit inside an int64, for big numbers, use an alternative method of serialization such as Base64.

func (Float) MarshalJSON

func (f Float) MarshalJSON() ([]byte, error)

MarshalJSON for floats uses the strconv library with some output hacks to ensure we're in a valid JSON format. This is also actually what the fmt package and Printf related methods do to get around all the complexities of float conversion.

type Integer

type Integer int64

Integer numbers have no decimal places and are limited to 64 bits.

func (Integer) MarshalJSON

func (i Integer) MarshalJSON() ([]byte, error)

MarshalJSON provides string representation of integer.

type Null

type Null struct{}

Null wraps around a null value

func (Null) MarshalJSON

func (n Null) MarshalJSON() ([]byte, error)

MarshalJSON provides the null string.

type Object

type Object struct {
	Attributes []*Attribute
}

Object contains a simple list of items, which are in essence key-value pairs. The item array means that attributes can be ordered by their key.

func (*Object) MarshalJSON

func (o *Object) MarshalJSON() ([]byte, error)

MarshalJSON combines all the objects elements into an ordered key-value list of marshalled attributes.

func (*Object) Sort

func (o *Object) Sort()

Sort ensures all the object's attributes are ordered according to the key.

type String

type String string

String is our representation of a regular string, prepared for JSON marshalling.

func (String) MarshalJSON

func (o String) MarshalJSON() ([]byte, error)

MarshalJSON provides byte array of the string inside quotes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL