revisor

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 30, 2023 License: MIT Imports: 13 Imported by: 3

README

Revisor

Revisor allows you to define specifications for NewsDoc contents as a series of declarations and pattern matching extensions to existing declarations.

Local testing

For running the actual tests and benchmarks, see the section on Testing.

The easiest way to test specifications against documents is by running the "revisor" command like so:

$ revisor document ./testdata/article-borked.json

That will validate the document using only the specifications in "./constraints/core.json".

Try running the same validation against a document with organisation specific content:

$ revisor document ./testdata/example-article.json
meta block 2 (tt/slugline): undeclared block type or rel
attribute "type" of meta block 2 (tt/slugline): undeclared block attribute
attribute "value" of meta block 2 (tt/slugline): undeclared block attribute
content block 2 (tt/visual): undeclared block type or rel
attribute "type" of content block 2 (tt/visual): undeclared block attribute
data attribute "caption" of content block 2 (tt/visual): unknown attribute
link 1 self(tt/picture) of content block 2 (tt/visual): undeclared block type or rel
attribute "type" of link 1 self(tt/picture) of content block 2 (tt/visual): undeclared block attribute
attribute "uri" of link 1 self(tt/picture) of content block 2 (tt/visual): undeclared block attribute
attribute "url" of link 1 self(tt/picture) of content block 2 (tt/visual): undeclared block attribute
attribute "rel" of link 1 self(tt/picture) of content block 2 (tt/visual): undeclared block attribute
data attribute "credit" of link 1 self(tt/picture) of content block 2 (tt/visual): unknown attribute
data attribute "height" of link 1 self(tt/picture) of content block 2 (tt/visual): unknown attribute
data attribute "hiresScale" of link 1 self(tt/picture) of content block 2 (tt/visual): unknown attribute
data attribute "width" of link 1 self(tt/picture) of content block 2 (tt/visual): unknown attribute
content block 3 (tt/dateline): undeclared block type or rel
attribute "type" of content block 3 (tt/dateline): undeclared block attribute
data attribute "text" of content block 3 (tt/dateline): unknown attribute
documents had validation errors

Use the flag -spec ./constraints/tt.json to load the organisation specific constraints for TT.

Running a revisor server

It's also possible to run revisor as a service with the serve command, it takes the same --spec/--core-spec as the document command, and adds --addr to control the address to listen to.

Start the server in one shell:

$ revisor serve

...and post the example article to it in another using curl:

$ curl --data @testdata/example-article.json localhost:8000

You should get the same validation errors as in the previous example, but in JSON format. An empty array is returned for valid documents.

Writing specifications

The main entities points in a specification are documents, blocks and properties. Documents are declared by type, blocks by type and/or rel, and properties by name. An entity is not valid if we don't have a matching declaration for it, regardless of whether somebody has pattern-matched against it.

Both pattern matching and a lot of the validation that's performed is done though key value pairs of a name and a string constraint. Say that we want to match all links that have a rel of "subject", "channel", or "section" and add the ability to have "broader" links added to them, the specification would then look like this:

{
  "name": "Associated with and broader links",
  "description": "Extends subject, channel, and section links with broader links",
  "match": {"rel": {
    "enum": ["subject", "channel", "section"]
  }},
  "links": [
    {
      "declares": {"rel":"broader"},
      "attributes": {
        "type": {},
        "title": {}
      }
    }
  ]
}

Here we declare that links with rel "broader" are valid for all blocks that matches our expression, see "Block attributes" for a list of attributes that can be used in pattern matching. We also define that the attributes type and title must be present. The {"enum":...} object and the empty objects ({}) for type and title are all examples of string constraints.

String constraints
Name Use
optional Set to true if the value doesn't have to be present
allowEmpty Set to true if an empty value is ok.
const A specific "value" that must match
enum A list ["of", "values"] where one must match
pattern A regular expression that the value must match
glob A list of glob patterns ["http://**", "https://**"] where one must match
format A named format that the value must follow
time A time format specification

The distinction between optional and allowEmpty is only relevant for data attributes. The document and block attributes defined in the NewsDoc schema always exist, so optional and allowEmpty will be treated as equivalent.

Formats

The following formats are available:

  • RFC3339: an RFC3339 timestamp ("2022-05-11T14:10:32Z")
  • int: an integer ("1234")
  • float: a floating point number ("12.34")
  • bool: a boolean ("true" or "false")
  • html: validate the contents as HTML
  • uuid: validate the string as a UUID

When using the format "html" it's also possible to use htmlPolicy to use a specific HTML policy. See the section on HTML policies.

The document and block uuid attributes are always validated as UUIDs and need no additional "uuid" format specified.

Time formats

A Go time parsing layout (see the time package for documentation) that should be used to validate the timestamp.

Globs

Glob matching uses https://github.com/gobwas/glob for matching, and the glob patterns are compiled with "/" and "+" as separators.

Writing a document specification

A specification for a document contains:

  • documentation attributes name and description
  • a declaration (declares) or pattern matching rule (match)
  • attribute constraints (attributes)
  • meta, links, and content block specifications
{
  "name": "Planning item",
  "description": "Planned news coverage",
  "declares": "core/newscoverage",
  "meta": [
    {
      "name": "Main metadata block",
      "declares": {"type":"core/newscoverage"},
      "count": 1,
      "data": {
        "dateGranularity": {"enum":["date", "datetime"]},
        "description": {"allowEmpty":true},
        "start": {"format":"RFC3339"},
        "end": {"format":"RFC3339"},
        "priority": {},
        "publicDescription":{"allowEmpty":true},
        "slug": {"allowEmpty":true}
      }
    }
  ],
  "links": [
    {
      "declares": {"type": "x-im/assignment"},
      "links": [
        {
          "declares": {
            "rel":"assignment", "type": "x-im/assignment"
          },
          "attributes": {
            "uuid": {}
          }
        }
      ]
    }
  ]
}
Writing a block specification

A block specification can contain:

  • documentation attributes name and description
  • a declaration (declares) or pattern matching rule (match)
  • attribute constraints (attributes)
  • data constraints
  • meta, links, and content block specifications
  • count, minCount and maxCount to control how many times a block can occur in the list of blocks it's in
  • blocksFrom directives that borrows the allowed blocks from a declared document type.
{
  "declares": {"type": "core/socialembed"},
  "links": [
    {
      "declares": {"rel":"self", "type":"core/tweet"},
      "maxCount": 1,
      "attributes": {
        "uri": {"glob":["core://tweet/*"]},
        "url": {"glob":["https://twitter.com/narendramodi/status/*"]}
      }
    },
    {
      "declares": {"rel":"alternate", "type":"text/html"},
      "maxCount": 1,
      "attributes": {
        "url": {"glob":["https://**"]},
        "title": {}
      },
      "data": {
        "context": {},
        "provider": {}
      }
    }
  ]
}
HTML policies

HTML policies are used to restrict what elements and attributes can be used in strings with the format "html". Attributes are defined as string constraints on elements. The default policy could look like this:

  "htmlPolicies": [
    {
      "name": "default",
      "elements": {
        "strong": {
          "attributes": {
            "id": {"optional":true}
          }
        },
        "a": {
          "attributes": {
            "id": {"optional":true},
            "href": {}
          }
        }
      }
    },
    {
      "name": "table",
      "uses": "default",
      "elements": {
        "tr": {
          "attributes": {
            "id": {"optional":true}
          }
        },
        "td": {
          "attributes": {
            "id": {"optional":true}
          }
        },
        "th": {
          "attributes": {
            "id": {"optional":true}
          }
        }
      }
    }
  ]

All "html" strings that use the default policy would then be able to use <strong> and <a>, and the "href" attribute would be requred for <a>. A "html" string that uses the "table" policy would be able to use everything from the default policy and <tr>, <td>, and <th>.

A customer can extend HTML policies using the "extend" attribute:

  "htmlPolicies": [
    {
      "extends": "default",
      "elements": {
        "personTag": {
          "attributes": {
            "id": {}
          }
        }
      }
    }
  ]

This would add support for "/" (HTML is case insensitive) to the default policy, and any policies that use it. Only one level of "extends" and "uses" is allowed, further chaining policies will result in an error.

Attribute reference
Document attributes

A list of available document attributes, and whether they can be used in pattern matching.

Name Description Match
uuid The document uuid No
uri The URI that identifies the document No
url A web-browsable location for the document No
type The type of the document Yes
language The document language No
title The document title No
Block attributes

A list of available block attributes, and whether they can be used in pattern matching.

Name Description Match
uuid The UUID of the document the block represents No
type The type of the block Yes
uri Identifies a resource in in URI form Yes
url A web-browsable location for the block Yes
title Human readable title of the block No
rel The relationship the block describes Yes
name A name that identifies the block Yes
value A generic value for the block Yes
contenttype The content type of the resource that the block describes Yes
role The role that the block or resource has Yes

Testing

Revisor implements a file-driven test in TestValidateDocument that checks so that all the "testdata/results/*.json" files match the validation results for the corresponding document under "testdata/". Result files with the prefix "base-" will be validated against "constraints/naviga.json", for result files with the prefix "example-" the "constraints/example.json" constraints will be used as well.

If the constraints have been updated, or new example documents have been added, the result files can be regenerated using ./update-test-results.sh.

Benchmarks

The benchmark BenchmarkValidateDocument tests the performance of validating "testdata/example-article.json" against the naviga and example organisation contsraint sets.

To run the benchmark execute:

$ go test -bench . -benchmem -cpu 1

Add the flags -memprofile memprofile.out -cpuprofile profile.out to collect CPU and memory profiles. Run go tool pprof -web profile.out for the respective profile files to open a profile graph in your web browser.

Comparing benchmarks

Install benchstat: go install golang.org/x/perf/cmd/benchstat@latest.

Run the benchmark on the unchanged code (stash your changes or check out main):

$ go test -bench . -benchmem -count 5 -cpu 1 | tee old.txt

Then run the benchmarks on the new code:

$ go test -bench . -benchmem -count 5 -cpu 1 | tee new.txt

Finally, run benchstat to get a summary of the change:

$ benchstat old.txt new.txt
name              old time/op    new time/op    delta
ValidateDocument     203µs ± 7%      99µs ± 3%  -51.03%  (p=0.008 n=5+5)

name              old alloc/op   new alloc/op   delta
ValidateDocument     134kB ± 0%      35kB ± 0%  -73.74%  (p=0.008 n=5+5)

name              old allocs/op  new allocs/op  delta
ValidateDocument     1.05k ± 0%     0.59k ± 0%  -43.48%  (p=0.008 n=5+5)
Fuzz tests

There are two fuzz targets in the project: FuzzValidationWide that allows fuzzing of the document and two constraint sets. It will load the core constraints, the example organisation constraints, and all documents in "./testdata/" and add them as fuzzing seeds. FuzzValidationConstraints adds all constraint sets from the "./constraints/" and adds them as fuzzing seeds. The fuzzing operation is then done against all documents in "./testdata/".

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ValidateEntity

func ValidateEntity(data []byte) (int, error)

ValidateEntity checks if the entity that starts at the beginning of the byte slice is valid and returns its length in bytes.

Types

type BlockConstraint

type BlockConstraint struct {
	Declares    *BlockSignature    `json:"declares,omitempty"`
	Name        string             `json:"name,omitempty"`
	Description string             `json:"description,omitempty"`
	Match       ConstraintMap      `json:"match,omitempty"`
	Count       *int               `json:"count,omitempty"`
	MaxCount    *int               `json:"maxCount,omitempty"`
	MinCount    *int               `json:"minCount,omitempty"`
	Links       []*BlockConstraint `json:"links,omitempty"`
	Meta        []*BlockConstraint `json:"meta,omitempty"`
	Content     []*BlockConstraint `json:"content,omitempty"`
	Attributes  ConstraintMap      `json:"attributes,omitempty"`
	Data        ConstraintMap      `json:"data,omitempty"`
	BlocksFrom  []BlocksFrom       `json:"blocksFrom,omitempty"`
}

BlockConstraint is a specification for a block.

func (BlockConstraint) BlockConstraints

func (bc BlockConstraint) BlockConstraints(kind BlockKind) []*BlockConstraint

BlockConstraints implements the BlockConstraintsSet interface.

func (BlockConstraint) DescribeCountConstraint

func (bc BlockConstraint) DescribeCountConstraint(kind BlockKind) string

DescribeCountConstraint returns a human readable (english) description of the count contstraint for the block constraint.

func (BlockConstraint) Matches

func (bc BlockConstraint) Matches(b *newsdoc.Block) (Match, []string)

Matches checks if the given block matches the constraint and returns the names of the attributes that matched.

func (*BlockConstraint) UnmarshalJSON

func (bc *BlockConstraint) UnmarshalJSON(data []byte) error

type BlockConstraintSet

type BlockConstraintSet interface {
	// BlockConstraints returns the constraints of the specified kind.
	BlockConstraints(kind BlockKind) []*BlockConstraint
}

type BlockKind

type BlockKind string

BlockKind describes the different kinds of blocks that are available.

const (
	BlockKindLink    BlockKind = "link"
	BlockKindMeta    BlockKind = "meta"
	BlockKindContent BlockKind = "content"
)

The different kinds of blocks that a block source can have.

func (BlockKind) Description

func (bk BlockKind) Description(n int) string

Description returns the pluralised name of the block kind.

type BlockSignature

type BlockSignature struct {
	Type string `json:"type,omitempty"`
	Rel  string `json:"rel,omitempty"`
}

BlockSignature is the signature of a block declaration.

type BlockSource

type BlockSource interface {
	// GetBlocks returns the child blocks of the specified type.
	GetBlocks(kind BlockKind) []newsdoc.Block
}

BlockSource acts as an intermediary to allow us to write code that can treat both documents and blocks as a source of blocks.

type BlocksFrom

type BlocksFrom struct {
	DocType string    `json:"docType,omitempty"`
	Global  bool      `json:"global,omitempty"`
	Kind    BlockKind `json:"kind"`
}

BlocksFrom allows a block to borrow definitions for its child blocks from a document type.

type BorrowedBlocks

type BorrowedBlocks struct {
	Kind   BlockKind
	Source BlockConstraintSet
}

BorrowedBlocks wraps a block constraint set that has been borrowed.

func (BorrowedBlocks) BlockConstraints

func (bb BorrowedBlocks) BlockConstraints(kind BlockKind) []*BlockConstraint

BlockConstraints implements the BlockConstraintsSet interface.

type ConstraintMap

type ConstraintMap map[string]StringConstraint

func (ConstraintMap) Requirements

func (cm ConstraintMap) Requirements() string

type ConstraintSet

type ConstraintSet struct {
	Version      int                  `json:"version,omitempty"`
	Schema       string               `json:"$schema,omitempty"`
	Name         string               `json:"name"`
	Documents    []DocumentConstraint `json:"documents,omitempty"`
	Links        []*BlockConstraint   `json:"links,omitempty"`
	Meta         []*BlockConstraint   `json:"meta,omitempty"`
	Content      []*BlockConstraint   `json:"content,omitempty"`
	Attributes   ConstraintMap        `json:"attributes,omitempty"`
	HTMLPolicies []HTMLPolicy         `json:"htmlPolicies,omitempty"`
}

func (ConstraintSet) BlockConstraints

func (cs ConstraintSet) BlockConstraints(kind BlockKind) []*BlockConstraint

func (ConstraintSet) Validate

func (cs ConstraintSet) Validate() error

type DocumentBlocks

type DocumentBlocks struct {
	// contains filtered or unexported fields
}

func NewDocumentBlocks

func NewDocumentBlocks(document *newsdoc.Document) DocumentBlocks

func (DocumentBlocks) GetBlocks

func (db DocumentBlocks) GetBlocks(kind BlockKind) []newsdoc.Block

type DocumentConstraint

type DocumentConstraint struct {
	Name        string `json:"name,omitempty"`
	Description string `json:"description,omitempty"`
	// Declares is used to declare a document type.
	Declares string `json:"declares,omitempty"`
	// Match is used to extend other document declarations.
	Match      ConstraintMap      `json:"match,omitempty"`
	Links      []*BlockConstraint `json:"links,omitempty"`
	Meta       []*BlockConstraint `json:"meta,omitempty"`
	Content    []*BlockConstraint `json:"content,omitempty"`
	Attributes ConstraintMap      `json:"attributes,omitempty"`
}

DocumentConstraint describes a set of constraints for a document. Either by declaring a document type, or matching against a document that has been declared somewhere else.

func (DocumentConstraint) BlockConstraints

func (dc DocumentConstraint) BlockConstraints(kind BlockKind) []*BlockConstraint

BlockConstraints implements the BlockConstraintsSet interface.

func (DocumentConstraint) Matches

func (dc DocumentConstraint) Matches(
	d *newsdoc.Document, vCtx *ValidationContext,
) Match

Matches checks if the given document matches the constraint.

func (*DocumentConstraint) UnmarshalJSON

func (dc *DocumentConstraint) UnmarshalJSON(data []byte) error

type EntityRef

type EntityRef struct {
	RefType   RefType   `json:"refType"`
	BlockKind BlockKind `json:"kind,omitempty"`
	Index     int       `json:"index,omitempty"`
	Name      string    `json:"name,omitempty"`
	Type      string    `json:"type,omitempty"`
	Rel       string    `json:"rel,omitempty"`
}

func (EntityRef) String

func (er EntityRef) String() string

type Glob

type Glob struct {
	// contains filtered or unexported fields
}

Glob is used to represent a compiled glob pattern that can be used with JSON marshalling and unmarshalling.

func CompileGlob

func CompileGlob(pattern string) (*Glob, error)

CompileGlob compiles a glob pattern.

func (*Glob) MarshalJSON

func (g *Glob) MarshalJSON() ([]byte, error)

func (*Glob) Match

func (g *Glob) Match(s string) bool

Match checks if the string matches the pattern.

func (*Glob) UnmarshalJSON

func (g *Glob) UnmarshalJSON(data []byte) error

type GlobList

type GlobList []*Glob

GlobList is a Glob slice with some convenience functions.

func (GlobList) MatchOrEmpty

func (gl GlobList) MatchOrEmpty(v string) bool

MatchOrEmpty returns true if the value matches any of the glob patterns, or if the list is nil or empty.

func (GlobList) String

func (gl GlobList) String() string

String returns a human readable (english) description of the glob constraint.

type HTMLElement

type HTMLElement struct {
	Attributes ConstraintMap `json:"attributes,omitempty"`
}

HTMLElement describes the constraints for a HTML element.

type HTMLPolicy

type HTMLPolicy struct {
	Name        string `json:"name,omitempty"`
	Description string `json:"description,omitempty"`

	// Uses will base the policy on another policy.
	Uses string `json:"uses,omitempty"`
	// Extends will add the declared elements to another policy.
	Extends string `json:"extends,omitempty"`

	Elements map[string]HTMLElement `json:"elements"`
	// contains filtered or unexported fields
}

HTMLPolicy is used to declare supported elements, and what attributes they can have.

func (*HTMLPolicy) Check

func (hp *HTMLPolicy) Check(v string) error

Check that the given value follows the constraints of the policy.

type HTMLPolicySet

type HTMLPolicySet struct {
	// contains filtered or unexported fields
}

HTMLPolicySet is a set of declared HTML policies.

func NewHTMLPolicySet

func NewHTMLPolicySet() *HTMLPolicySet

func (*HTMLPolicySet) Add

func (s *HTMLPolicySet) Add(source string, policies ...HTMLPolicy) error

Add policies to the set.

func (*HTMLPolicySet) Resolve

func (s *HTMLPolicySet) Resolve() (map[string]*HTMLPolicy, error)

Resolve all extensions and usages and return the finished policies.

type Match

type Match int

Match describes if and how a block constraint matches a block.

const (
	NoMatch Match = iota
	Matches
	MatchDeclaration
)

Match constants for no match / match / matched declaration.

type NestedBlocks

type NestedBlocks struct {
	// contains filtered or unexported fields
}

func NewNestedBlocks

func NewNestedBlocks(block *newsdoc.Block) NestedBlocks

func (NestedBlocks) GetBlocks

func (nb NestedBlocks) GetBlocks(kind BlockKind) []newsdoc.Block

type RefType

type RefType string
const (
	RefTypeBlock     RefType = "block"
	RefTypeAttribute RefType = "attribute"
	RefTypeData      RefType = "data attribute"
)

func (RefType) String

func (rt RefType) String() string

type Regexp

type Regexp struct {
	// contains filtered or unexported fields
}

func (*Regexp) MarshalJSON

func (r *Regexp) MarshalJSON() ([]byte, error)

func (*Regexp) Match

func (r *Regexp) Match(v string) bool

func (*Regexp) String

func (r *Regexp) String() string

func (*Regexp) UnmarshalJSON

func (r *Regexp) UnmarshalJSON(data []byte) error

type StringConstraint

type StringConstraint struct {
	Name        string       `json:"name,omitempty"`
	Description string       `json:"description,omitempty"`
	Optional    bool         `json:"optional,omitempty"`
	AllowEmpty  bool         `json:"allowEmpty,omitempty"`
	Const       *string      `json:"const,omitempty"`
	Enum        []string     `json:"enum,omitempty"`
	Pattern     *Regexp      `json:"pattern,omitempty"`
	Glob        GlobList     `json:"glob,omitempty"`
	Format      StringFormat `json:"format,omitempty"`
	Time        string       `json:"time,omitempty"`
	HTMLPolicy  string       `json:"htmlPolicy,omitempty"`
}

func (*StringConstraint) Requirement

func (sc *StringConstraint) Requirement() string

func (*StringConstraint) Validate

func (sc *StringConstraint) Validate(
	value string, exists bool, vCtx *ValidationContext,
) error

type StringFormat

type StringFormat string
const (
	StringFormatNone    StringFormat = ""
	StringFormatRFC3339 StringFormat = "RFC3339"
	StringFormatInt     StringFormat = "int"
	StringFormatFloat   StringFormat = "float"
	StringFormatBoolean StringFormat = "bool"
	StringFormatHTML    StringFormat = "html"
	StringFormatUUID    StringFormat = "uuid"
)

func (StringFormat) Describe

func (f StringFormat) Describe() string

type ValidationContext

type ValidationContext struct {
	ValidateHTML func(policyName, value string) error
	// contains filtered or unexported fields
}

type ValidationOptionFunc

type ValidationOptionFunc func(vc *ValidationContext)

func WithValueCollector

func WithValueCollector(
	collector ValueCollector,
) ValidationOptionFunc

type ValidationResult

type ValidationResult struct {
	Entity []EntityRef `json:"entity,omitempty"`
	Error  string      `json:"error,omitempty"`
}

func (ValidationResult) String

func (vr ValidationResult) String() string

type Validator

type Validator struct {
	// contains filtered or unexported fields
}

func NewValidator

func NewValidator(
	constraints ...ConstraintSet,
) (*Validator, error)

func (*Validator) ValidateDocument

func (v *Validator) ValidateDocument(
	document *newsdoc.Document, opts ...ValidationOptionFunc,
) []ValidationResult

type ValueAnnotation

type ValueAnnotation struct {
	Ref        []EntityRef      `json:"ref"`
	Constraint StringConstraint `json:"constraint"`
	Value      string           `json:"value"`
}

type ValueCollector

type ValueCollector interface {
	CollectValue(a ValueAnnotation)
	With(ref EntityRef) ValueCollector
}

type ValueDiscarder

type ValueDiscarder struct{}

func (ValueDiscarder) CollectValue

func (ValueDiscarder) CollectValue(_ ValueAnnotation)

CollectValue implements ValueCollector.

func (ValueDiscarder) With

With implements ValueCollector.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL