gparse

package module
v0.0.0-...-0aaf74e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2023 License: MIT Imports: 4 Imported by: 1

README

gparse

Package gparse processes Golang markup language tokens: initially XML, but with support for other flavors of LwDITA markup RSN (HTML5, Markdown-XP).

Files in this directory use Markdown, so use godoc2md on'em.

This package makes its own versions of Golang XML structures so that they get sensible new names and handy methods.

Two shortened names (Att Attribute, Elm Element) keep code readable.

About XML content, including mixed content

When working with XML we can generally distinguish three types of files:

  • Record-oriented XML data - expressed using XML elements, that contain only other elements
  • Mixed content XML documents - also expressed using XML elements, that can also contain text, not just other elements
  • Validation rules - generally expressed as XSD, RNG, or DTD. It is interesting to note that DTDs actually obey the same fundamental XML syntax rules as the other two types (record-oriented, mixed content); the typical DTD file extensions (.dtd .mod) are helpful to humans but are not strictly required as a signal to a parser that fully understands the syntax of all three XML file types.

That being said, this package can superficially digest many directives (e.g. ELEMENT, ATTLIST, ENTITY) but does not yet (at this level) completely parse them, or act on them (by performing transclusion).

Documentation

Overview

Package gparse processes markup language tokens, primarily supporting the three formats of LwDITA: XDITA (XML), HDITA (HTML5), MDITA (Markdown-XP).

Terminology: Instead of the term "parse tree", this package uses the term "CST" (Concrete Syntax Tree), to contrast & compare to AST (Abstract Syntax Tree). See for example this introduction: https://en.wikipedia.org/wiki/Abstract_syntax_tree

This package uses yuin/goldmark https://github.com/yuin/goldmark as its Markdown parser for several reasons but mainly because work on v2 of the BlackFriday https://github.com/russross/blackfriday Markdown parser seems to have mostly stalled. Also because goldmark creates a CST (which goldmark calls an AST) whereas Blackfriday does not yet. Also goldmark is Commonmark-compliant, but note that this does not guarantee compliance with LwDITA MDITA and MDITA-XP.

Technical Approach

This package makes its own new types from Go stdlib XML structures, so that they get sensible new names and handy methods while retaining an ability to be type-cast back to the Golang stdlib equivalents.

Short names (Att for Attribute, Elm for Element, Doc for Document) keep code readable.

This code *should* work with XML namespaces, but this is as yet untested.

Method Naming

- NewFoo(..) always allocates new memory and returns a pointer. - Echo() echoes an object back in source XML form, but normalized. - String() outputs a human-friendly form useful for development and debugging but tipicly indigestible to an XML parser.

About XML content, including mixed content

When working with XML we can generally distinguish three types of files: - Record-oriented XML data - expressed using XML elements - Natural language XML documents - also expressed using XML elements, and known as "mixed content" - Validation rules - generally expressed as XSD, RNG, or DTD. It is interesting to note that DTDs actually obey the same fundamental XML syntax rules as the other two types (record-oriented, mixed content); the typical DTD file extensions (`.dtd .mod`) are helpful to humans but are not strictly required as a signal to a parser that fully understands the syntax of all three XML file types (and all types of XML entities)

That being said, this package can superficially digest many directives (e.g. ELEMENT, ATTLIST, ENTITY) but does not yet (at this level) completely parse them, or act on them (by performing transclusion).

Index

Constants

This section is empty.

Variables

View Source
var DispFmtgTypes = []DispFmtgType{
	"nilerror",
	"ROOT",
	"BLCK",
	"INLN",
	"NONE",
}

DispFmtgTypes is a string-enum specifying how an element fits into layout.

View Source
var WalkLevel int

Functions

func XmlCheckForPreambleToken

func XmlCheckForPreambleToken(p []*gtoken.GToken) []*gtoken.GToken

XmlCheckForPreamble only prints something. It could return a flag, or even insert the standard XML preamble if one is not present.

Types

type DispFmtgType

type DispFmtgType string

DispFmtgType (display formatting type) specifies the "rendering context".

func (DispFmtgType) LongForm

func (DFT DispFmtgType) LongForm() string

LongForm returns a marginally-more-user-frenly description.

type GEnt

type GEnt struct {
	// e.g. "foo"
	NameOnly string
	// including "%|&" and ";" i.e. "&foo;" or "%foo;"
	NameAsRef string
	// true if parameter entity, false if general entity
	TypeIsParm bool
	// "%" if parameter entity, "&" if general entity
	RefChar string
	// External entities only (PUBLIC, SYSTEM)
	IsSystemID bool
	IsPublicID bool
	ID         string
	URI        string
	Fullpath   string

	TheRest string
}

GEnt is a generic XML Entity. Either - an Internal Parsed General entity: <!ENTITY foo "bar"> OR - an Internal Parsed Parameter entity: <!ENTITY % foo "bar">

"bar" may not use any of: '&', '%', '"', '%Name;', '&Name;', Unicode char ref. (Sez who ?)

func (GEnt) String

func (ge GEnt) String() string

type MarkupStringer

type MarkupStringer interface {
	// Echo the input markup, (but) in a normalized format
	Echo() string
	EchoTo(io.Writer)
	// For development & debugging - probably not valid markup
	String() string
	DumpTo(io.Writer)
}

MarkupStringer is an interface meant to be a souped-up version of [GoStringer](https://golang.org/pkg/fmt/#GoStringer) for markup fragments in any of the formats we process ("XML", "Markdown", "HTML", future TBS) that can be string'ified in two different ways: - `Echo` basically recreates the original input, although in a normalized format; the name is more specific than `String`. - `String` returns a format suitable for development and debugging, but (probably) not valid markup: `string String()`, `DumpTo(w)`

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL