newsdoc

package module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2024 License: MIT Imports: 5 Imported by: 9

README

NewsDoc

This package provides type declarations for NewsDoc as Go types, protobuf messages, and a JSON schema. Protobuf and JSON schemas are generated from the Go type declarations.

NewsDoc was created to be a convenient and type-safe document format for editorial data like articles and concept metadata that minimises the need for evolving the schema to adapt to new types of data. It avoids this by not using data structure for expressing relationships ({categories:['a', 'b'], seeAlso:['c', 'd']}) or type/identity of the data ({articleMetadata:{teaserHeadline:"v", teaserText:"w"}, headline:"x", "lead_in":"y", paragraphs:["z"]}). An example of a hypothetical format that does this:

{
    "categories": [
        "28b94216-77d7-41e9-be08-a6bfbe59f1d5",
        "a23528b7-31af-4ae2-bbca-0c78f1cbc959",
    ],
    "readMore": [
        "6dd826dd-d866-459b-a07e-0da4bad7bce0",
        "043c248f-92ac-4e0b-b0ec-76cc26323634"
    ],
    "articleMetadata": {
        "teaserHeadline": "v",
        "teaserText": "w"
    },
    "headline": "x",
    "lead_in": "y",
    "paragraphs": ["z"],
    "image": "https://example.com/an-image.jpg",
    "image_width": 128,
    "image_height": 128,
    "image_alt_text": "desc"
}

Instead it adopts a view of documents as a set of links expressing relationships to other entities, a set of typed metadata blocks, and a list of typed content blocks that represent the actual content of f.ex. an article. The article hinted at in the above paragraph would instead look like this:

{
    "type": "example/article",
    "links": [
        {"rel":"category", "uuid":"28b94216-77d7-41e9-be08-a6bfbe59f1d5"},
        {"rel":"category", "uuid":"a23528b7-31af-4ae2-bbca-0c78f1cbc959"},
        {
            "rel":"see-also", "type":"example/article",
            "uuid":"6dd826dd-d866-459b-a07e-0da4bad7bce0"
        },
        {
            "rel":"see-also", "type":"example/article",
            "uuid":"043c248f-92ac-4e0b-b0ec-76cc26323634"
        }
    ],
    "meta": [
        {
            "type": "example/teaser",
            "title": "v",
            "data": {
                "text": "w"
            }
        }
    ],
    "content": [
        {
            "type": "example/headline",
            "data": {
                "text": "x"
            }
        },
        {
            "type": "example/image",
            "url": "https://example.com/an-image.jpg",
            "data": {
                "width": "128",
                "height": "128",
                "alt": "desc"
            }
        },
        {
            "type": "example/lead-in",
            "data": {
                "text": "y"
            }
        },
        {
            "type": "example/paragraph",
            "data": {
                "text": "z"
            }
        },
    ]
}

This kind of structure allows a system that's using NewsDoc to f.ex. recognise that there is a link to another entity, or a content element with text, without knowing about the specific type of relationship or content. On the flip side it's also easy to ignore f.ex. a metadata block with a type that you don't recognize.

One thing is lost in translation here, the "data" object of a block is a string->string key value structure, so the width 128 becomes "128". We sacrifice the specific types of some data to be able to have a largely static type system. But the "type contract" between content producers and consumers in a system like this is that "width" and "height" always must be integers. Revisor is our attempt to formalise and enforce these type contracts.

A revisor schema for the above format could look like this:

{"documents":[{
  "name": "News article",
  "description": "A basic news article example",
  "declares": "example/article",
  "links": [
    {
      "name": "Category",
      "description": "A category assigned to the article",
      "declares": {"rel":"category"},
      "attributes": {"uuid": {}}
    }
    {
      "name": "Read more",
      "description": "A link to other articles that are interesting",
      "declares": {"rel":"see-also", "type": "example/article"},
      "attributes": {"uuid": {}}
    }
  ],
  "meta": [
    {
      "name": "Teaser",
      "declares": {"type":"example/teaser"},
      "attributes": {"title": {}},
      "data": {"text": {}},
      "count": 1
    }
  ],
  "content": [
    {
      "name": "Headline",
      "declares": {"type":"example/headline"},
      "data": {"text": {}}
    },
    {
      "name": "Lead-in",
      "declares": {"type":"example/lead-in"},
      "data": {"text": {}}
    },
    {
      "name": "Paragraph",
      "declares": {"type":"example/paragraph"},
      "data": {"text": {}}
    },
    {
      "name": "Image",
      "declares": {"type":"example/image"},
      "attributes": {
        "url": {"glob":"https://**"}
      },
      "data": {
        "width": {"format":"int"},
        "height": {"format":"int"},
        "alt": {},
      }
    }
  ]
}]}

This schema can then be used to validate documents to ensure the data quality of stored documents. It's also serves as documentation, and can be used by automated systems like a full text index provide a hint about the correct way to index the data.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func JSONSchema

func JSONSchema() []byte

JSONSchema returns the NewsDoc JSON schema.

Types

type Block

type Block struct {
	// ID is the block ID,
	ID string `json:"id,omitempty" proto:"1"`
	// UUID is used to reference another Document in a block.
	UUID string `json:"uuid,omitempty" jsonschema_extras:"format=uuid" proto:"2"`
	// URI is used to reference another entity in a document.
	URI string `json:"uri,omitempty"  jsonschema_extras:"format=uri" proto:"3"`
	// URL is a browseable URL for the the block.
	URL string `json:"url,omitempty" jsonschema_extras:"format=uri" proto:"4"`
	// Type is the type of the block
	Type string `json:"type,omitempty" proto:"5"`
	// Title is the title/headline of the block, typically used in the
	// presentation of the block.
	Title string `json:"title,omitempty" proto:"6"`
	// Data contains block data.
	Data DataMap `json:"data,omitempty" proto:"7"`
	// Rel describes the relationship to the document/parent entity.
	Rel string `json:"rel,omitempty" proto:"8"`
	// Role is used either as an alternative to rel, or for nuancing the
	// relationship.
	Role string `json:"role,omitempty" proto:"9"`
	// Name is a name for the block. An alternative to "rel" when
	// relationship is a term that doesn't fit.
	Name string `json:"name,omitempty" proto:"10"`
	// Value is a value for the block. Useful when we want to store a
	// primitive value.
	Value string `json:"value,omitempty" proto:"11"`
	// ContentType is used to describe the content type of the block/linked
	// entity if it differs from the type of the block.
	Contenttype string `json:"contenttype,omitempty" proto:"12"`
	// Links are used to link to other resources and documents.
	Links []Block `json:"links,omitempty" proto:"13"`
	// Content is used to embed content blocks.
	Content []Block `json:"content,omitempty" proto:"14"`
	// Meta is used to embed metadata
	Meta []Block `json:"meta,omitempty" proto:"15"`
	// Sensitivity can be use to communicate how the information in a block
	// can be handled. It could f.ex. be set to "internal", to show that it
	// contains information that must be removed or transformed before
	// publishing.
	Sensitivity string `json:"sensitivity,omitempty" proto:"16"`
}

Block is the building block for data embedded in documents. It is used for both content, links and metadata. Blocks have can be nested, but that's nothing to strive for, keep it simple.

type DataMap

type DataMap map[string]string

DataMap is used as key -> (string) value data for blocks.

func (DataMap) MarshalJSON

func (bd DataMap) MarshalJSON() ([]byte, error)

MarshalJSON implements a custom marshaler to make the JSON output of a document deterministic. Maps are unordered.

type Document

type Document struct {
	// UUID is a unique ID for the document, this can for example be a
	// random v4 UUID, or a URI-derived v5 UUID.
	UUID string `json:"uuid,omitempty" jsonschema_extras:"format=uuid" proto:"1"`
	// Type is the content type of the document.
	Type string `json:"type,omitempty"  proto:"2"`
	// URI identifies the document (in a more human-readable way than the
	// UUID).
	URI string `json:"uri,omitempty" jsonschema_extras:"format=uri" proto:"3"`
	// URL is the browseable location of the document (if any).
	URL string `json:"url,omitempty" jsonschema_extras:"format=uri" proto:"4"`
	// Title is the title of the document, can be used as the document name,
	// or the headline when the document is displayed.
	Title string `json:"title,omitempty" proto:"5"`
	// Content is the content of the document, this is essentially what gets
	// rendered on the page when you view a document.
	Content []Block `json:"content,omitempty" proto:"6"`
	// Meta is the metadata for a document, this could be things like
	// teasers, open graph data, newsvalues.
	Meta []Block `json:"meta,omitempty" proto:"7"`
	// Links are links to other resources and entities. This could be links
	// to topics, categories and subjects for the document, or credited
	// authors.
	Links []Block `json:"links,omitempty" proto:"8"`
	// Language is the language used in the document as an IETF language
	// tag. F.ex. "en", "en-UK", "es", or "sv-SE".
	Language string `json:"language,omitempty" proto:"9"`
}

Document is a NewsDoc document.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL