Documentation ¶
Index ¶
Constants ¶
const ( TD_type_ERROR TDType = "ERR" // ERROR TD_type_DOCMT = "Docmt" TD_type_ELMNT = "Elmnt" TD_type_ENDLM = "endlm" TD_type_VOIDD = "Voidd" // A void tag is one that needs/takes no closing tag TD_type_CDATA = "CData" TD_type_PINST = "PInst" TD_type_COMNT = "Comnt" TD_type_DRCTV = "Drctv" // The following are actually DIRECTIVE SUBTYPES, but they // are put in this list so that they can be assigned freely. TD_type_Doctype = "Doctype" TD_type_Element = "Element" TD_type_Attlist = "Attlist" TD_type_Entity = "Entitty" TD_type_Notation = "Notat:n" // The following are TBD/experimental. TD_type_ID = "ID" TD_type_IDREF = "IDREF" TD_type_Enum = "ENUM" )
Variables ¶
var NS_XML = "http://www.w3.org/XML/1998/namespace"
NS_XML is the XML namespace.
Functions ¶
Types ¶
type CAtt ¶
Alias the standard library's XML type (for simplicity and convenience) to
- attach methods to it (e.g. interface [stringutils.Stringser]), and
- use it for other markups too (like Markdown)
type xml.Attr struct { Name xml.Name; Value string } .
type CName ¶
Alias the standard library's XML type (for simplicity and convenience) to
- attach methods to it (e.g. interface [stringutils.Stringser]), and
- use it for other markups too (like Markdown)
type CToken ¶
type CToken struct { // ================================== // The original ("source code") token, // and other information about it // ================================== // SourceToken is the original token. // Keep it around "just in case". // TODO: Make this an Echoer ! // Types: // - XML: [xml.Token] from [xml.Decoder] // - HTML: TBS // - Markdown: TBS // Note that an XML Token is transitory, // so every Token has to be cloned, by // calling [xml.CopyToken]. SourceToken interface{} // Raw_type of the original token; the value is // one of MU_type_(XML/HTML/MKDN/BIN/SQL/DIRLIKE). // It is particularly helpful to have this info at // the token level when we consider that for example, // we can embed HTML tags in Markdown. Note that in // the future, each value could actually be the // appropriate namespace declaration. SU.Raw_type // FilePosition is char position, and line nr & column nr. FilePosition // TDType comprises (a) the types of [xml.Token] // (they are all different struct's, actually), // plus (b) the (sub)types of [xml.Directive]. // Note that [TD_type_ENDLM] ("EndElement") is // superfluous when token depth info is available. TDType // CName is ONLY for elements // (i.e. [TD_type_ELMNT] and [TD_type_ENDLM]). CName // CAtts is ONLY for [TD_type_ELMNT]. CAtts // Text holds CDATA, and a PI's Instruction, // and a DOCTYPE's root element declaration, // and Text string // ControlStrings is tipicly XML PI & Directive stuff. // When it is used, its length is 1 or 2. // - XML PI: the Target field // - XML directive: the directive subtype // But this field also available for other data that // is not classifiable as source text. ControlStrings []string }
CToken is the lowest common denominator of tokens parsed from XML mixed content and other content-oriented markup. It has [stringutils.MarkupType].
CToken:
- Common Token
- Content Token
- Combined Token
- Canonical Token
- Consolidated Token
- ConMuchoGusto Token :-P
A CToken contains all that can be parsed from a token that is considered in isolation, as-is, without the context of surrounding markup. It should record/reflect/reproduce any XML (or HTML) token faithfully, and also accommodate any token from Markdown or (in the future) related markup such as Docbook or Asciidoc or RST (restructured text).
The use of an XML-like data structure as the lingua franca is also meant to make XML-style automated processing simpler.
The use of a single unified token representation is intended most of all to simplify & unify tokenisation across LwDITA's three supported input formats: XDITA XML, HDITA HTML5, and MDITA-XP Markdown. It also serves to represent all the various kinds of XML directives, including DTDs(!).
Creation of a new CToken from an encoding/xml.Token is by design very straightforward, but creation from other types of token, such as HTML or Markdown, must be done in their other packages in order to prevent circular dependencies.
For convenience & simplicity, some items in the struct are simply aliases for Go's XML structs, but then these must also be adaptable for Markdown. For example, when Pandoc-style attributes are used.
CToken implements interface [stringutils.Stringser]. .
func GetAllCTokensByTag ¶
GetAllCTokensByTag checks the basic tag only, not any namespace. .
func GetFirstCTokenByTag ¶
GetFirstCTokenByTag checks a start-element's tag's local name only, not any namespace. If no match, it returns nil. This func returns only a naked CToken, taken from a slice and probably without context, so it is meant only for processing XML catalog files. General XML processing should use the GToken version, which returns a CToken in the context of a tree structure. .
func NewCTokenFromXmlToken ¶
NewCTokenFromXmlToken returns a single token type that replaces the unwieldy multi-typed mess of the standard library.
It returns a nil ptr for an ignorable, skippable token, like all-whitespace. .
func (CToken) GetCAttVal ¶
GetCAttVal returns the attribute's string value, or "" if not found.
func (CToken) IsNonElement ¶
type FilePosition ¶
type FilePosition struct { // Pos is the byte position in file, // e.g. from xml.Decoder.InputOffset() Pos int // Lnr & Col are line nr & column nr Lnr, Col int }
FilePosition is a char.position (e.g. from xml.Decoder) plus line nr & column nr (when they can be calculated).
FilePosition implements interface [stringutils.Stringser]. .
func NewFilePosition ¶
func NewFilePosition(i int) *FilePosition
NewFilePosition takes & uses only the character position in the file.
func (FilePosition) Debug ¶
func (fp FilePosition) Debug() string
func (FilePosition) Echo ¶
func (fp FilePosition) Echo() string
func (FilePosition) Info ¶
func (fp FilePosition) Info() string
type FileRange ¶
type FileRange struct { Beg FilePosition End FilePosition }
type Span ¶
Span specifies the range of a subset of a string (that is not included in the struct).
Span implements interface [stringutils.Stringser].
FIXME: Make this a ptr to a ContentityNode .
func (Span) GetSpanOfString ¶
type TDType ¶
type TDType string
TDType specifies the type of a markup tag (assumed to be XML) or an XML directive. Values are based on the tokens output'd by the stdlib xml.Decoder, with some additions to accommodate DIRECTIVE subtypes, IDs, and ENUM. .
type TypedRaw ¶
TypedRaw includes [stringutils.Raw_type] and can have it set to [Raw_type_DIRLIKE].
func (*TypedRaw) IsDirlike ¶
IsDirlike is IsDir()-like but more general. Dirlike is shorthand for "cannot (is not allowed to!) have own content", but it can be defined as "is/has link(s) to other stuff" - i.e. a directory or a symbolic link. In this context (i.e. when embedded in TypedRaw), it means SU.MU_type_DIRLIKE .