Documentation ¶
Overview ¶
Package gtoken is awesome.
Index ¶
- func DumpTo(rGTkns []*GToken, w io.Writer)
- func HasDoctype(GTs []*GToken) (bool, string)
- type GToken
- func DeleteNils(inGTzn []*GToken) (outGTzn []*GToken)
- func DoGTokens_html(pCPR *PU.ParserResults_html) ([]*GToken, error)
- func DoGTokens_mkdn(pCPR *PU.ParserResults_mkdn) ([]*GToken, error)
- func DoGTokens_xml(pCPR *XU.ParserResults_xml) ([]*GToken, error)
- func GetAllByTag(gTkzn []*GToken, s string) []*GToken
- func GetFirstByTag(gTkzn []*GToken, s string) *GToken
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DumpTo ¶
DumpTo writes out the `GToken`s to the `io.Writer`, one per line, and each line is prefixed with the token type. The output should parse the same as the input file, except perhaps for the treatment of all-whitespace CDATA.
func HasDoctype ¶
Types ¶
type GToken ¶
type GToken struct { // ========================================== // CToken has all the info about the original // source token, when considered in isolation. // ========================================== // Fields: // - CT.SourceToken interface{}: "source code" token // - SU.MarkupType: one of SU.MU_type_(XML/HTML/MKDN/BIN) // - CT.FilePosition: char position, and line nr & column nr // - CT.TDType: type of [xml.Token] or subtype of [xml.Directive] // - CT.CName: alias of [xml.Name], only for elements // - CT.CAtts: alias of slice of [xml.Attr], only for start-elm // - Text string: CDATA / PI Instr / DOCTYPE root elm decl // - ControlStrings []string: XML PI Target / XML Drctv subtype CT.CToken // Depth is the level of nesting of the source tag. Depth int // IsBlock and IsInline are // dupes of TagalogEntry ? IsBlock, IsInline bool NodeLevel int // Key stuff *lwdx.TagalogEntry // DitaTag and HtmlTag are // dupes of TagalogEntry ? NodeKind, DitaTag, HtmlTag, NodeText string }
GToken is meant to simplify & unify tokenisation across LwDITA's three supported input formats: XDITA XML, HDITA HTML5, and MDITA-XP Markdown. It also serves to represent all the various kinds of XML Directives, including DTDs(!).
To do this, the tokens produced by each parsing API are reduced to their essentials:
- tag/token type (defined by the enumeration [GTagTokType], named TT_type_*, values are strings)
- tag name (iff a markup element; is stored in a [GName], incl. NS)
- token text (non-tag text content)
- tag attributes
- whatever additional stuff is available for Markdown tokens (to include Pandoc-style attributes)
NOTE that XML Directives are later "normalized", but that's another story. .
func DeleteNils ¶
func DoGTokens_html ¶
func DoGTokens_html(pCPR *PU.ParserResults_html) ([]*GToken, error)
DoGTokens_html turns every html.Node (from stdlib) into a GToken. It's pretty simple because no tree building is done yet. Basically it just copies in the Node type and the Node's data, and sets the [TTType] field,
type Node struct { Parent, FirstChild, LastChild, PrevSibling, NextSibling *Node Type NodeType DataAtom atom.Atom Data string Namespace string Attr []Attribute }
Data is unescaped, so that it looks like "a<b" rather than "a<b". For element nodes, DataAtom is the atom for Data, or zero if Data is not a known tag name.
type Attribute struct { Namespace, Key, Val string }
..
func DoGTokens_mkdn ¶
func DoGTokens_mkdn(pCPR *PU.ParserResults_mkdn) ([]*GToken, error)
DoGTokens_mkdn turns every Goldmark ast.Node Markdown token into a GToken. It's pretty simple, because no tree building is done yet. However it does merge text tokens into their preceding tokens, which leaves some nils in the list of tokens. .
func DoGTokens_xml ¶
func DoGTokens_xml(pCPR *XU.ParserResults_xml) ([]*GToken, error)
DoGTokens_xml turns every xml.Token (from stdlib) into a GToken. It's pretty simple because no tree building is done yet. Basically it just copies in the Node type and the Node's data, and sets the [TDType] field,
xml.Token is an "any" interface holding a token types: StartElement, EndElement, CharData, Comment, ProcInst, Directive. Note that gtoken.TDType is a superset of these types. .
func GetAllByTag ¶
GetAllByTag returns a new GTokenization. It checks the basic tag only, not any namespace.
func GetFirstByTag ¶
GetFirstByTag checks the basic tag only, not any namespace.
func (*GToken) SourceTokenType ¶
SourceTokenType returns `XML`, `MKDN`, `HTML`, or future stuff TBD.