gparse

package module

v0.0.0-...-0aaf74e Latest Latest Go to latest Published: Apr 21, 2023 License: MIT Imports: 4 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fbaube/gparse

Links

Open Source Insights

README ¶

gparse

Package gparse processes Golang markup language tokens: initially XML, but with support for other flavors of LwDITA markup RSN (HTML5, Markdown-XP).

Files in this directory use Markdown, so use godoc2md on'em.

This package makes its own versions of Golang XML structures so that they get sensible new names and handy methods.

Two shortened names (Att Attribute, Elm Element) keep code readable.

About XML content, including mixed content

When working with XML we can generally distinguish three types of files:

Record-oriented XML data - expressed using XML elements, that contain only other elements
Mixed content XML documents - also expressed using XML elements, that can also contain text, not just other elements
Validation rules - generally expressed as XSD, RNG, or DTD. It is interesting to note that DTDs actually obey the same fundamental XML syntax rules as the other two types (record-oriented, mixed content); the typical DTD file extensions (.dtd .mod) are helpful to humans but are not strictly required as a signal to a parser that fully understands the syntax of all three XML file types.

That being said, this package can superficially digest many directives (e.g. ELEMENT, ATTLIST, ENTITY) but does not yet (at this level) completely parse them, or act on them (by performing transclusion).

Documentation ¶

Overview ¶

Package gparse processes markup language tokens, primarily supporting the three formats of LwDITA: XDITA (XML), HDITA (HTML5), MDITA (Markdown-XP).

Terminology: Instead of the term "parse tree", this package uses the term "CST" (Concrete Syntax Tree), to contrast & compare to AST (Abstract Syntax Tree). See for example this introduction: https://en.wikipedia.org/wiki/Abstract_syntax_tree

This package uses yuin/goldmark https://github.com/yuin/goldmark as its Markdown parser for several reasons but mainly because work on v2 of the BlackFriday https://github.com/russross/blackfriday Markdown parser seems to have mostly stalled. Also because goldmark creates a CST (which goldmark calls an AST) whereas Blackfriday does not yet. Also goldmark is Commonmark-compliant, but note that this does not guarantee compliance with LwDITA MDITA and MDITA-XP.

Technical Approach ¶

This package makes its own new types from Go stdlib XML structures, so that they get sensible new names and handy methods while retaining an ability to be type-cast back to the Golang stdlib equivalents.

Short names (Att for Attribute, Elm for Element, Doc for Document) keep code readable.

This code *should* work with XML namespaces, but this is as yet untested.

Method Naming ¶

- NewFoo(..) always allocates new memory and returns a pointer. - Echo() echoes an object back in source XML form, but normalized. - String() outputs a human-friendly form useful for development and debugging but tipicly indigestible to an XML parser.

About XML content, including mixed content ¶

When working with XML we can generally distinguish three types of files: - Record-oriented XML data - expressed using XML elements - Natural language XML documents - also expressed using XML elements, and known as "mixed content" - Validation rules - generally expressed as XSD, RNG, or DTD. It is interesting to note that DTDs actually obey the same fundamental XML syntax rules as the other two types (record-oriented, mixed content); the typical DTD file extensions (`.dtd .mod`) are helpful to humans but are not strictly required as a signal to a parser that fully understands the syntax of all three XML file types (and all types of XML entities)

That being said, this package can superficially digest many directives (e.g. ELEMENT, ATTLIST, ENTITY) but does not yet (at this level) completely parse them, or act on them (by performing transclusion).

Index ¶

Variables
func XmlCheckForPreambleToken(p []*gtoken.GToken) []*gtoken.GToken
type DispFmtgType
- func (DFT DispFmtgType) LongForm() string
type GEnt
- func (ge GEnt) String() string
type MarkupStringer

Constants ¶

This section is empty.

Variables ¶

View Source

var DispFmtgTypes = []DispFmtgType{
	"nilerror",
	"ROOT",
	"BLCK",
	"INLN",
	"NONE",
}

DispFmtgTypes is a string-enum specifying how an element fits into layout.

View Source

var WalkLevel int

Functions ¶

func XmlCheckForPreambleToken ¶

func XmlCheckForPreambleToken(p []*gtoken.GToken) []*gtoken.GToken

XmlCheckForPreamble only prints something. It could return a flag, or even insert the standard XML preamble if one is not present.

Types ¶

type DispFmtgType ¶

type DispFmtgType string

DispFmtgType (display formatting type) specifies the "rendering context".

func (DispFmtgType) LongForm ¶

func (DFT DispFmtgType) LongForm() string

LongForm returns a marginally-more-user-frenly description.

type GEnt ¶

type GEnt struct {
	// e.g. "foo"
	NameOnly string
	// including "%|&" and ";" i.e. "&foo;" or "%foo;"
	NameAsRef string
	// true if parameter entity, false if general entity
	TypeIsParm bool
	// "%" if parameter entity, "&" if general entity
	RefChar string
	// External entities only (PUBLIC, SYSTEM)
	IsSystemID bool
	IsPublicID bool
	ID         string
	URI        string
	Fullpath   string

	TheRest string
}

GEnt is a generic XML Entity. Either - an Internal Parsed General entity: <!ENTITY foo "bar"> OR - an Internal Parsed Parameter entity: <!ENTITY % foo "bar">

"bar" may not use any of: '&', '%', '"', '%Name;', '&Name;', Unicode char ref. (Sez who ?)

func (GEnt) String ¶

func (ge GEnt) String() string

type MarkupStringer ¶

type MarkupStringer interface {
	// Echo the input markup, (but) in a normalized format
	Echo() string
	EchoTo(io.Writer)
	// For development & debugging - probably not valid markup
	String() string
	DumpTo(io.Writer)
}

MarkupStringer is an interface meant to be a souped-up version of [GoStringer](https://golang.org/pkg/fmt/#GoStringer) for markup fragments in any of the formats we process ("XML", "Markdown", "HTML", future TBS) that can be string'ified in two different ways: - `Echo` basically recreates the original input, although in a normalized format; the name is more specific than `String`. - `String` returns a format suitable for development and debugging, but (probably) not valid markup: `string String()`, `DumpTo(w)`

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL