ctoken

package module

v0.0.0-...-d4f3b42 Latest Latest Go to latest Published: Sep 18, 2024 License: MIT Imports: 5 Imported by: 8

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fbaube/ctoken

Links

Open Source Insights

README ¶

ctoken

TBS

Documentation ¶

Index ¶

Constants
Variables
func GetAttVal(se xml.StartElement, att string) string
type CAtt
- func (A CAtt) Debug() string
- func (A CAtt) Echo() string
- func (A CAtt) Info() string
type CAtts
- func (x1 CAtts) AsStdLibXml() []xml.Attr
- func (AL CAtts) Echo() string
type CName
- func NewCName(ns, local string) *CName
- func (N CName) Debug() string
- func (N CName) Echo() string
- func (p1 *CName) Equals(p2 *CName) bool
- func (p *CName) FixNS()
- func (N CName) Info() string
type CToken
- func GetAllCTokensByTag(tkzn []CToken, s string) []CToken
- func GetFirstCTokenByTag(tkzn []CToken, s string) *CToken
- func NewCTokenFromXmlToken(XT xml.Token) *CToken
- func (ct CToken) Debug() string
- func (ct CToken) Echo() string
- func (ct CToken) GetCAttVal(att string) string
- func (ct CToken) Info() string
- func (ct CToken) IsNonElement() bool
type FilePosition
- func NewFilePosition(i int) *FilePosition
- func (fp FilePosition) Debug() string
- func (fp FilePosition) Echo() string
- func (fp FilePosition) Info() string
type FileRange
type LAToken
type Raw
- func (s Raw) S() string
type Span
- func (sp Span) Debug() string
- func (sp Span) Echo() string
- func (sp Span) GetSpanOfString(s string) string
- func (sp Span) Info() string
type TDType
- func (tdt TDType) LongForm() string
- func (tdt TDType) S() string
type TypedRaw
- func (p *TypedRaw) IsDirlike() bool
- func (p *TypedRaw) S() string

Constants ¶

View Source

const (
	TD_type_ERROR TDType = "ERR" // ERROR

	TD_type_DOCMT = "Docmt"
	TD_type_ELMNT = "Elmnt"
	TD_type_ENDLM = "endlm"
	TD_type_VOIDD = "Voidd" // A void tag is one that needs/takes no closing tag
	TD_type_CDATA = "CData"
	TD_type_PINST = "PInst"
	TD_type_COMNT = "Comnt"
	TD_type_DRCTV = "Drctv"
	// The following are actually DIRECTIVE SUBTYPES, but they
	// are put in this list so that they can be assigned freely.
	TD_type_Doctype  = "Doctype"
	TD_type_Element  = "Element"
	TD_type_Attlist  = "Attlist"
	TD_type_Entity   = "Entitty"
	TD_type_Notation = "Notat:n"
	// The following are TBD/experimental.
	TD_type_ID    = "ID"
	TD_type_IDREF = "IDREF"
	TD_type_Enum  = "ENUM"
)

Variables ¶

View Source

var NS_XML = "http://www.w3.org/XML/1998/namespace"

NS_XML is the XML namespace.

Functions ¶

func GetAttVal ¶

func GetAttVal(se xml.StartElement, att string) string

GetAttVal returns the attribute's string value, or "" if not found.

Types ¶

type CAtt ¶

type CAtt xml.Attr

Alias the standard library's XML type (for simplicity and convenience) to

attach methods to it (e.g. interface [stringutils.Stringser]), and
use it for other markups too (like Markdown)

type xml.Attr struct { Name xml.Name; Value string } .

func (CAtt) Debug ¶

func (A CAtt) Debug() string

func (CAtt) Echo ¶

func (A CAtt) Echo() string

func (CAtt) Info ¶

func (A CAtt) Info() string

type CAtts ¶

type CAtts []CAtt

func (CAtts) AsStdLibXml ¶

func (x1 CAtts) AsStdLibXml() []xml.Attr

func (CAtts) Echo ¶

func (AL CAtts) Echo() string

type CName ¶

type CName xml.Name

Alias the standard library's XML type (for simplicity and convenience) to

attach methods to it (e.g. interface [stringutils.Stringser]), and
use it for other markups too (like Markdown)

func NewCName ¶

func NewCName(ns, local string) *CName

NewCName enforcs a colon after a non-empty namespace if it is not there already.

func (CName) Debug ¶

func (N CName) Debug() string

func (CName) Echo ¶

func (N CName) Echo() string

func (*CName) Equals ¶

func (p1 *CName) Equals(p2 *CName) bool

CName is an xml.Name.

type xml.Name struct { Space, Local string } .

func (*CName) FixNS ¶

func (p *CName) FixNS()

func (CName) Info ¶

func (N CName) Info() string

type CToken ¶

type CToken struct {
	// ==================================
	// The original ("source code") token,
	// and other information about it
	// ==================================
	// SourceToken is the original token.
	// Keep it around "just in case".
	// TODO: Make this an Echoer !
	// Types:
	//  - XML: [xml.Token] from [xml.Decoder]
	//  - HTML: TBS
	//  - Markdown: TBS
	// Note that an XML Token is transitory,
	// so every Token has to be cloned, by
	// calling [xml.CopyToken].
	SourceToken interface{}
	// Raw_type of the original token; the value is
	// one of MU_type_(XML/HTML/MKDN/BIN/SQL/DIRLIKE).
	// It is particularly helpful to have this info at
	// the token level when we consider that for example,
	// we can embed HTML tags in Markdown. Note that in
	// the future, each value could actually be the
	// appropriate namespace declaration.
	SU.Raw_type
	// FilePosition is char position, and line nr & column nr.
	FilePosition

	// TDType comprises (a) the types of [xml.Token]
	// (they are all different struct's, actually),
	// plus (b) the (sub)types of [xml.Directive].
	// Note that [TD_type_ENDLM] ("EndElement") is
	// superfluous when token depth info is available.
	TDType
	// CName is ONLY for elements
	// (i.e. [TD_type_ELMNT] and [TD_type_ENDLM]).
	CName
	// CAtts is ONLY for [TD_type_ELMNT].
	CAtts
	// Text holds CDATA, and a PI's Instruction,
	// and a DOCTYPE's root element declaration,
	// and
	Text string
	// ControlStrings is tipicly XML PI & Directive stuff.
	// When it is used, its length is 1 or 2.
	//  - XML PI: the Target field
	//  - XML directive: the directive subtype
	// But this field also available for other data that
	// is not classifiable as source text.
	ControlStrings []string
}

CToken is the lowest common denominator of tokens parsed from XML mixed content and other content-oriented markup. It has [stringutils.MarkupType].

CToken:

Common Token
Content Token
Combined Token
Canonical Token
Consolidated Token
ConMuchoGusto Token :-P

A CToken contains all that can be parsed from a token that is considered in isolation, as-is, without the context of surrounding markup. It should record/reflect/reproduce any XML (or HTML) token faithfully, and also accommodate any token from Markdown or (in the future) related markup such as Docbook or Asciidoc or RST (restructured text).

The use of an XML-like data structure as the lingua franca is also meant to make XML-style automated processing simpler.

The use of a single unified token representation is intended most of all to simplify & unify tokenisation across LwDITA's three supported input formats: XDITA XML, HDITA HTML5, and MDITA-XP Markdown. It also serves to represent all the various kinds of XML directives, including DTDs(!).

Creation of a new CToken from an encoding/xml.Token is by design very straightforward, but creation from other types of token, such as HTML or Markdown, must be done in their other packages in order to prevent circular dependencies.

For convenience & simplicity, some items in the struct are simply aliases for Go's XML structs, but then these must also be adaptable for Markdown. For example, when Pandoc-style attributes are used.

CToken implements interface [stringutils.Stringser]. .

func GetAllCTokensByTag ¶

func GetAllCTokensByTag(tkzn []CToken, s string) []CToken

GetAllCTokensByTag checks the basic tag only, not any namespace. .

func GetFirstCTokenByTag ¶

func GetFirstCTokenByTag(tkzn []CToken, s string) *CToken

GetFirstCTokenByTag checks a start-element's tag's local name only, not any namespace. If no match, it returns nil. This func returns only a naked CToken, taken from a slice and probably without context, so it is meant only for processing XML catalog files. General XML processing should use the GToken version, which returns a CToken in the context of a tree structure. .

func NewCTokenFromXmlToken ¶

func NewCTokenFromXmlToken(XT xml.Token) *CToken

NewCTokenFromXmlToken returns a single token type that replaces the unwieldy multi-typed mess of the standard library.

It returns a nil ptr for an ignorable, skippable token, like all-whitespace. .

func (CToken) Debug ¶

func (ct CToken) Debug() string

func (CToken) Echo ¶

func (ct CToken) Echo() string

func (CToken) GetCAttVal ¶

func (ct CToken) GetCAttVal(att string) string

GetCAttVal returns the attribute's string value, or "" if not found.

func (CToken) Info ¶

func (ct CToken) Info() string

func (CToken) IsNonElement ¶

func (ct CToken) IsNonElement() bool

type FilePosition ¶

type FilePosition struct {
	// Pos is the byte position in file,
	// e.g. from xml.Decoder.InputOffset()
	Pos int
	// Lnr & Col are line nr & column nr
	Lnr, Col int
}

FilePosition is a char.position (e.g. from xml.Decoder) plus line nr & column nr (when they can be calculated).

FilePosition implements interface [stringutils.Stringser]. .

func NewFilePosition ¶

func NewFilePosition(i int) *FilePosition

NewFilePosition takes & uses only the character position in the file.

func (FilePosition) Debug ¶

func (fp FilePosition) Debug() string

func (FilePosition) Echo ¶

func (fp FilePosition) Echo() string

func (FilePosition) Info ¶

func (fp FilePosition) Info() string

type FileRange ¶

type FileRange struct {
	Beg FilePosition
	End FilePosition
}

type LAToken ¶

type LAToken struct {
	CToken
	FilePosition
}

LAToken is a location-aware XML token.

type Raw ¶

type Raw string

func (Raw) S ¶

func (s Raw) S() string

type Span ¶

type Span struct {
	TagName string
	Atts    []xml.Attr
	// SliceBounds
	FileRange
}

Span specifies the range of a subset of a string (that is not included in the struct).

Span implements interface [stringutils.Stringser].

FIXME: Make this a ptr to a ContentityNode .

func (Span) Debug ¶

func (sp Span) Debug() string

func (Span) Echo ¶

func (sp Span) Echo() string

func (Span) GetSpanOfString ¶

func (sp Span) GetSpanOfString(s string) string

func (Span) Info ¶

func (sp Span) Info() string

type TDType ¶

type TDType string

TDType specifies the type of a markup tag (assumed to be XML) or an XML directive. Values are based on the tokens output'd by the stdlib xml.Decoder, with some additions to accommodate DIRECTIVE subtypes, IDs, and ENUM. .

func (TDType) LongForm ¶

func (tdt TDType) LongForm() string

func (TDType) S ¶

func (tdt TDType) S() string

type TypedRaw ¶

type TypedRaw struct {
	Raw
	SU.Raw_type
}

TypedRaw includes [stringutils.Raw_type] and can have it set to [Raw_type_DIRLIKE].

func (*TypedRaw) IsDirlike ¶

func (p *TypedRaw) IsDirlike() bool

IsDirlike is IsDir()-like but more general. Dirlike is shorthand for "cannot (is not allowed to!) have own content", but it can be defined as "is/has link(s) to other stuff" - i.e. a directory or a symbolic link. In this context (i.e. when embedded in TypedRaw), it means SU.MU_type_DIRLIKE .

func (*TypedRaw) S ¶

func (p *TypedRaw) S() string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL