ctoken

package module
v0.0.0-...-d4f3b42 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 18, 2024 License: MIT Imports: 5 Imported by: 8

README

ctoken

TBS

Documentation

Index

Constants

View Source
const (
	TD_type_ERROR TDType = "ERR" // ERROR

	TD_type_DOCMT = "Docmt"
	TD_type_ELMNT = "Elmnt"
	TD_type_ENDLM = "endlm"
	TD_type_VOIDD = "Voidd" // A void tag is one that needs/takes no closing tag
	TD_type_CDATA = "CData"
	TD_type_PINST = "PInst"
	TD_type_COMNT = "Comnt"
	TD_type_DRCTV = "Drctv"
	// The following are actually DIRECTIVE SUBTYPES, but they
	// are put in this list so that they can be assigned freely.
	TD_type_Doctype  = "Doctype"
	TD_type_Element  = "Element"
	TD_type_Attlist  = "Attlist"
	TD_type_Entity   = "Entitty"
	TD_type_Notation = "Notat:n"
	// The following are TBD/experimental.
	TD_type_ID    = "ID"
	TD_type_IDREF = "IDREF"
	TD_type_Enum  = "ENUM"
)

Variables

View Source
var NS_XML = "http://www.w3.org/XML/1998/namespace"

NS_XML is the XML namespace.

Functions

func GetAttVal

func GetAttVal(se xml.StartElement, att string) string

GetAttVal returns the attribute's string value, or "" if not found.

Types

type CAtt

type CAtt xml.Attr

Alias the standard library's XML type (for simplicity and convenience) to

  • attach methods to it (e.g. interface [stringutils.Stringser]), and
  • use it for other markups too (like Markdown)

type xml.Attr struct { Name xml.Name; Value string } .

func (CAtt) Debug

func (A CAtt) Debug() string

func (CAtt) Echo

func (A CAtt) Echo() string

func (CAtt) Info

func (A CAtt) Info() string

type CAtts

type CAtts []CAtt

func (CAtts) AsStdLibXml

func (x1 CAtts) AsStdLibXml() []xml.Attr

func (CAtts) Echo

func (AL CAtts) Echo() string

type CName

type CName xml.Name

Alias the standard library's XML type (for simplicity and convenience) to

  • attach methods to it (e.g. interface [stringutils.Stringser]), and
  • use it for other markups too (like Markdown)

func NewCName

func NewCName(ns, local string) *CName

NewCName enforcs a colon after a non-empty namespace if it is not there already.

func (CName) Debug

func (N CName) Debug() string

func (CName) Echo

func (N CName) Echo() string

func (*CName) Equals

func (p1 *CName) Equals(p2 *CName) bool

CName is an xml.Name.

type xml.Name struct { Space, Local string } .

func (*CName) FixNS

func (p *CName) FixNS()

func (CName) Info

func (N CName) Info() string

type CToken

type CToken struct {
	// ==================================
	// The original ("source code") token,
	// and other information about it
	// ==================================
	// SourceToken is the original token.
	// Keep it around "just in case".
	// TODO: Make this an Echoer !
	// Types:
	//  - XML: [xml.Token] from [xml.Decoder]
	//  - HTML: TBS
	//  - Markdown: TBS
	// Note that an XML Token is transitory,
	// so every Token has to be cloned, by
	// calling [xml.CopyToken].
	SourceToken interface{}
	// Raw_type of the original token; the value is
	// one of MU_type_(XML/HTML/MKDN/BIN/SQL/DIRLIKE).
	// It is particularly helpful to have this info at
	// the token level when we consider that for example,
	// we can embed HTML tags in Markdown. Note that in
	// the future, each value could actually be the
	// appropriate namespace declaration.
	SU.Raw_type
	// FilePosition is char position, and line nr & column nr.
	FilePosition

	// TDType comprises (a) the types of [xml.Token]
	// (they are all different struct's, actually),
	// plus (b) the (sub)types of [xml.Directive].
	// Note that [TD_type_ENDLM] ("EndElement") is
	// superfluous when token depth info is available.
	TDType
	// CName is ONLY for elements
	// (i.e. [TD_type_ELMNT] and [TD_type_ENDLM]).
	CName
	// CAtts is ONLY for [TD_type_ELMNT].
	CAtts
	// Text holds CDATA, and a PI's Instruction,
	// and a DOCTYPE's root element declaration,
	// and
	Text string
	// ControlStrings is tipicly XML PI & Directive stuff.
	// When it is used, its length is 1 or 2.
	//  - XML PI: the Target field
	//  - XML directive: the directive subtype
	// But this field also available for other data that
	// is not classifiable as source text.
	ControlStrings []string
}

CToken is the lowest common denominator of tokens parsed from XML mixed content and other content-oriented markup. It has [stringutils.MarkupType].

CToken:

  • Common Token
  • Content Token
  • Combined Token
  • Canonical Token
  • Consolidated Token
  • ConMuchoGusto Token :-P

A CToken contains all that can be parsed from a token that is considered in isolation, as-is, without the context of surrounding markup. It should record/reflect/reproduce any XML (or HTML) token faithfully, and also accommodate any token from Markdown or (in the future) related markup such as Docbook or Asciidoc or RST (restructured text).

The use of an XML-like data structure as the lingua franca is also meant to make XML-style automated processing simpler.

The use of a single unified token representation is intended most of all to simplify & unify tokenisation across LwDITA's three supported input formats: XDITA XML, HDITA HTML5, and MDITA-XP Markdown. It also serves to represent all the various kinds of XML directives, including DTDs(!).

Creation of a new CToken from an encoding/xml.Token is by design very straightforward, but creation from other types of token, such as HTML or Markdown, must be done in their other packages in order to prevent circular dependencies.

For convenience & simplicity, some items in the struct are simply aliases for Go's XML structs, but then these must also be adaptable for Markdown. For example, when Pandoc-style attributes are used.

CToken implements interface [stringutils.Stringser]. .

func GetAllCTokensByTag

func GetAllCTokensByTag(tkzn []CToken, s string) []CToken

GetAllCTokensByTag checks the basic tag only, not any namespace. .

func GetFirstCTokenByTag

func GetFirstCTokenByTag(tkzn []CToken, s string) *CToken

GetFirstCTokenByTag checks a start-element's tag's local name only, not any namespace. If no match, it returns nil. This func returns only a naked CToken, taken from a slice and probably without context, so it is meant only for processing XML catalog files. General XML processing should use the GToken version, which returns a CToken in the context of a tree structure. .

func NewCTokenFromXmlToken

func NewCTokenFromXmlToken(XT xml.Token) *CToken

NewCTokenFromXmlToken returns a single token type that replaces the unwieldy multi-typed mess of the standard library.

It returns a nil ptr for an ignorable, skippable token, like all-whitespace. .

func (CToken) Debug

func (ct CToken) Debug() string

func (CToken) Echo

func (ct CToken) Echo() string

func (CToken) GetCAttVal

func (ct CToken) GetCAttVal(att string) string

GetCAttVal returns the attribute's string value, or "" if not found.

func (CToken) Info

func (ct CToken) Info() string

func (CToken) IsNonElement

func (ct CToken) IsNonElement() bool

type FilePosition

type FilePosition struct {
	// Pos is the byte position in file,
	// e.g. from xml.Decoder.InputOffset()
	Pos int
	// Lnr & Col are line nr & column nr
	Lnr, Col int
}

FilePosition is a char.position (e.g. from xml.Decoder) plus line nr & column nr (when they can be calculated).

FilePosition implements interface [stringutils.Stringser]. .

func NewFilePosition

func NewFilePosition(i int) *FilePosition

NewFilePosition takes & uses only the character position in the file.

func (FilePosition) Debug

func (fp FilePosition) Debug() string

func (FilePosition) Echo

func (fp FilePosition) Echo() string

func (FilePosition) Info

func (fp FilePosition) Info() string

type FileRange

type FileRange struct {
	Beg FilePosition
	End FilePosition
}

type LAToken

type LAToken struct {
	CToken
	FilePosition
}

LAToken is a location-aware XML token.

type Raw

type Raw string

func (Raw) S

func (s Raw) S() string

type Span

type Span struct {
	TagName string
	Atts    []xml.Attr
	// SliceBounds
	FileRange
}

Span specifies the range of a subset of a string (that is not included in the struct).

Span implements interface [stringutils.Stringser].

FIXME: Make this a ptr to a ContentityNode .

func (Span) Debug

func (sp Span) Debug() string

func (Span) Echo

func (sp Span) Echo() string

func (Span) GetSpanOfString

func (sp Span) GetSpanOfString(s string) string

func (Span) Info

func (sp Span) Info() string

type TDType

type TDType string

TDType specifies the type of a markup tag (assumed to be XML) or an XML directive. Values are based on the tokens output'd by the stdlib xml.Decoder, with some additions to accommodate DIRECTIVE subtypes, IDs, and ENUM. .

func (TDType) LongForm

func (tdt TDType) LongForm() string

func (TDType) S

func (tdt TDType) S() string

type TypedRaw

type TypedRaw struct {
	Raw
	SU.Raw_type
}

TypedRaw includes [stringutils.Raw_type] and can have it set to [Raw_type_DIRLIKE].

func (*TypedRaw) IsDirlike

func (p *TypedRaw) IsDirlike() bool

IsDirlike is IsDir()-like but more general. Dirlike is shorthand for "cannot (is not allowed to!) have own content", but it can be defined as "is/has link(s) to other stuff" - i.e. a directory or a symbolic link. In this context (i.e. when embedded in TypedRaw), it means SU.MU_type_DIRLIKE .

func (*TypedRaw) S

func (p *TypedRaw) S() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL