Documentation ¶
Overview ¶
Package gparse processes markup language tokens, primarily supporting the three formats of LwDITA: XDITA (XML), HDITA (HTML5), MDITA (Markdown-XP).
Terminology: Instead of the term "parse tree", this package uses the term "CST" (Concrete Syntax Tree), to contrast & compare to AST (Abstract Syntax Tree). See for example this introduction: https://en.wikipedia.org/wiki/Abstract_syntax_tree
This package uses yuin/goldmark https://github.com/yuin/goldmark as its Markdown parser for several reasons but mainly because work on v2 of the BlackFriday https://github.com/russross/blackfriday Markdown parser seems to have mostly stalled. Also because goldmark creates a CST (which goldmark calls an AST) whereas Blackfriday does not yet. Also goldmark is Commonmark-compliant, but note that this does not guarantee compliance with LwDITA MDITA and MDITA-XP.
Technical Approach ¶
This package makes its own new types from Go stdlib XML structures, so that they get sensible new names and handy methods while retaining an ability to be type-cast back to the Golang stdlib equivalents.
Short names (Att for Attribute, Elm for Element, Doc for Document) keep code readable.
This code *should* work with XML namespaces, but this is as yet untested.
Method Naming ¶
- NewFoo(..) always allocates new memory and returns a pointer. - Echo() echoes an object back in source XML form, but normalized. - String() outputs a human-friendly form useful for development and debugging but tipicly indigestible to an XML parser.
About XML content, including mixed content ¶
When working with XML we can generally distinguish three types of files: - Record-oriented XML data - expressed using XML elements - Natural language XML documents - also expressed using XML elements, and known as "mixed content" - Validation rules - generally expressed as XSD, RNG, or DTD. It is interesting to note that DTDs actually obey the same fundamental XML syntax rules as the other two types (record-oriented, mixed content); the typical DTD file extensions (`.dtd .mod`) are helpful to humans but are not strictly required as a signal to a parser that fully understands the syntax of all three XML file types (and all types of XML entities)
That being said, this package can superficially digest many directives (e.g. ELEMENT, ATTLIST, ENTITY) but does not yet (at this level) completely parse them, or act on them (by performing transclusion).
Index ¶
Constants ¶
This section is empty.
Variables ¶
var DispFmtgTypes = []DispFmtgType{
"nilerror",
"ROOT",
"BLCK",
"INLN",
"NONE",
}
DispFmtgTypes is a string-enum specifying how an element fits into layout.
var WalkLevel int
Functions ¶
Types ¶
type DispFmtgType ¶
type DispFmtgType string
DispFmtgType (display formatting type) specifies the "rendering context".
func (DispFmtgType) LongForm ¶
func (DFT DispFmtgType) LongForm() string
LongForm returns a marginally-more-user-frenly description.
type GEnt ¶
type GEnt struct { // e.g. "foo" NameOnly string // including "%|&" and ";" i.e. "&foo;" or "%foo;" NameAsRef string // true if parameter entity, false if general entity TypeIsParm bool // "%" if parameter entity, "&" if general entity RefChar string // External entities only (PUBLIC, SYSTEM) IsSystemID bool IsPublicID bool ID string URI string Fullpath string TheRest string }
GEnt is a generic XML Entity. Either - an Internal Parsed General entity: <!ENTITY foo "bar"> OR - an Internal Parsed Parameter entity: <!ENTITY % foo "bar">
"bar" may not use any of: '&', '%', '"', '%Name;', '&Name;', Unicode char ref. (Sez who ?)
type MarkupStringer ¶
type MarkupStringer interface { // Echo the input markup, (but) in a normalized format Echo() string EchoTo(io.Writer) // For development & debugging - probably not valid markup String() string DumpTo(io.Writer) }
MarkupStringer is an interface meant to be a souped-up version of [GoStringer](https://golang.org/pkg/fmt/#GoStringer) for markup fragments in any of the formats we process ("XML", "Markdown", "HTML", future TBS) that can be string'ified in two different ways: - `Echo` basically recreates the original input, although in a normalized format; the name is more specific than `String`. - `String` returns a format suitable for development and debugging, but (probably) not valid markup: `string String()`, `DumpTo(w)`