etree

package module
v1.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 17, 2024 License: BSD-2-Clause Imports: 10 Imported by: 1,659

README

GoDoc Go

etree

The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python ElementTree module.

Some of the package's capabilities and features:

  • Represents XML documents as trees of elements for easy traversal.
  • Imports, serializes, modifies or creates XML documents from scratch.
  • Writes and reads XML to/from files, byte slices, strings and io interfaces.
  • Performs simple or complex searches with lightweight XPath-like query APIs.
  • Auto-indents XML using spaces or tabs for better readability.
  • Implemented in pure go; depends only on standard go libraries.
  • Built on top of the go encoding/xml package.
Creating an XML document

The following example creates an XML document from scratch using the etree package and outputs its indented contents to stdout.

doc := etree.NewDocument()
doc.CreateProcInst("xml", `version="1.0" encoding="UTF-8"`)
doc.CreateProcInst("xml-stylesheet", `type="text/xsl" href="style.xsl"`)

people := doc.CreateElement("People")
people.CreateComment("These are all known people")

jon := people.CreateElement("Person")
jon.CreateAttr("name", "Jon")

sally := people.CreateElement("Person")
sally.CreateAttr("name", "Sally")

doc.Indent(2)
doc.WriteTo(os.Stdout)

Output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<People>
  <!--These are all known people-->
  <Person name="Jon"/>
  <Person name="Sally"/>
</People>
Reading an XML file

Suppose you have a file on disk called bookstore.xml containing the following data:

<bookstore xmlns:p="urn:schemas-books-com:prices">

  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <p:price>30.00</p:price>
  </book>

  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <p:price>29.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <p:price>49.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <p:price>39.95</p:price>
  </book>

</bookstore>

This code reads the file's contents into an etree document.

doc := etree.NewDocument()
if err := doc.ReadFromFile("bookstore.xml"); err != nil {
    panic(err)
}

You can also read XML from a string, a byte slice, or an io.Reader.

Processing elements and attributes

This example illustrates several ways to access elements and attributes using etree selection queries.

root := doc.SelectElement("bookstore")
fmt.Println("ROOT element:", root.Tag)

for _, book := range root.SelectElements("book") {
    fmt.Println("CHILD element:", book.Tag)
    if title := book.SelectElement("title"); title != nil {
        lang := title.SelectAttrValue("lang", "unknown")
        fmt.Printf("  TITLE: %s (%s)\n", title.Text(), lang)
    }
    for _, attr := range book.Attr {
        fmt.Printf("  ATTR: %s=%s\n", attr.Key, attr.Value)
    }
}

Output:

ROOT element: bookstore
CHILD element: book
  TITLE: Everyday Italian (en)
  ATTR: category=COOKING
CHILD element: book
  TITLE: Harry Potter (en)
  ATTR: category=CHILDREN
CHILD element: book
  TITLE: XQuery Kick Start (en)
  ATTR: category=WEB
CHILD element: book
  TITLE: Learning XML (en)
  ATTR: category=WEB
Path queries

This example uses etree's path functions to select all book titles that fall into the category of 'WEB'. The double-slash prefix in the path causes the search for book elements to occur recursively; book elements may appear at any level of the XML hierarchy.

for _, t := range doc.FindElements("//book[@category='WEB']/title") {
    fmt.Println("Title:", t.Text())
}

Output:

Title: XQuery Kick Start
Title: Learning XML

This example finds the first book element under the root bookstore element and outputs the tag and text of each of its child elements.

for _, e := range doc.FindElements("./bookstore/book[1]/*") {
    fmt.Printf("%s: %s\n", e.Tag, e.Text())
}

Output:

title: Everyday Italian
author: Giada De Laurentiis
year: 2005
price: 30.00

This example finds all books with a price of 49.99 and outputs their titles.

path := etree.MustCompilePath("./bookstore/book[p:price='49.99']/title")
for _, e := range doc.FindElementsPath(path) {
    fmt.Println(e.Text())
}

Output:

XQuery Kick Start

Note that this example uses the FindElementsPath function, which takes as an argument a pre-compiled path object. Use precompiled paths when you plan to search with the same path more than once.

Other features

These are just a few examples of the things the etree package can do. See the documentation for a complete description of its capabilities.

Contributing

This project accepts contributions. Just fork the repo and submit a pull request!

Documentation

Overview

Package etree provides XML services through an Element Tree abstraction.

Index

Examples

Constants

View Source
const (
	// NoIndent is used with the IndentSettings record to remove all
	// indenting.
	NoIndent = -1
)

Variables

View Source
var ErrXML = errors.New("etree: invalid XML format")

ErrXML is returned when XML parsing fails due to incorrect formatting.

Functions

This section is empty.

Types

type Attr

type Attr struct {
	Space, Key string // The attribute's namespace prefix and key
	Value      string // The attribute value string
	// contains filtered or unexported fields
}

An Attr represents a key-value attribute within an XML element.

func (*Attr) Element added in v1.1.0

func (a *Attr) Element() *Element

Element returns a pointer to the element containing this attribute.

func (*Attr) FullKey added in v1.1.0

func (a *Attr) FullKey() string

FullKey returns this attribute's complete key, including namespace prefix if present.

func (*Attr) NamespaceURI added in v1.1.0

func (a *Attr) NamespaceURI() string

NamespaceURI returns the XML namespace URI associated with this attribute. The function returns the empty string if the attribute is unprefixed or if the attribute is part of the XML default namespace.

func (*Attr) WriteTo added in v1.2.0

func (a *Attr) WriteTo(w Writer, s *WriteSettings)

WriteTo serializes the attribute to the writer.

type CharData

type CharData struct {
	Data string // the simple text or CDATA section content
	// contains filtered or unexported fields
}

CharData may be used to represent simple text data or a CDATA section within an XML document. The Data property should never be modified directly; use the SetData function instead.

func NewCData added in v1.1.0

func NewCData(data string) *CharData

NewCData creates an unparented XML character CDATA section with 'data' as its content.

func NewCharData deprecated

func NewCharData(data string) *CharData

NewCharData creates an unparented CharData token containing simple text data.

Deprecated: NewCharData is deprecated. Instead, use NewText, which does the same thing.

func NewText added in v1.1.0

func NewText(text string) *CharData

NewText creates an unparented CharData token containing simple text data.

func (*CharData) Index added in v1.1.0

func (c *CharData) Index() int

Index returns the index of this CharData token within its parent element's list of child tokens. If this CharData token has no parent, then the function returns -1.

func (*CharData) IsCData added in v1.1.0

func (c *CharData) IsCData() bool

IsCData returns true if this CharData token is contains a CDATA section. It returns false if the CharData token contains simple text.

func (*CharData) IsWhitespace added in v1.1.0

func (c *CharData) IsWhitespace() bool

IsWhitespace returns true if this CharData token contains only whitespace.

func (*CharData) Parent

func (c *CharData) Parent() *Element

Parent returns this CharData token's parent element, or nil if it has no parent.

func (*CharData) SetData added in v1.1.1

func (c *CharData) SetData(text string)

SetData modifies the content of the CharData token. In the case of a CharData token containing simple text, the simple text is modified. In the case of a CharData token containing a CDATA section, the CDATA section's content is modified.

func (*CharData) WriteTo added in v1.2.0

func (c *CharData) WriteTo(w Writer, s *WriteSettings)

WriteTo serializes character data to the writer.

type Comment

type Comment struct {
	Data string // the comment's text
	// contains filtered or unexported fields
}

A Comment represents an XML comment.

func NewComment

func NewComment(comment string) *Comment

NewComment creates an unparented comment token.

func (*Comment) Index added in v1.1.0

func (c *Comment) Index() int

Index returns the index of this Comment token within its parent element's list of child tokens. If this Comment token has no parent, then the function returns -1.

func (*Comment) Parent

func (c *Comment) Parent() *Element

Parent returns comment token's parent element, or nil if it has no parent.

func (*Comment) WriteTo added in v1.2.0

func (c *Comment) WriteTo(w Writer, s *WriteSettings)

WriteTo serialies the comment to the writer.

type Directive

type Directive struct {
	Data string // the directive string
	// contains filtered or unexported fields
}

A Directive represents an XML directive.

func NewDirective

func NewDirective(data string) *Directive

NewDirective creates an unparented XML directive token.

func (*Directive) Index added in v1.1.0

func (d *Directive) Index() int

Index returns the index of this Directive token within its parent element's list of child tokens. If this Directive token has no parent, then the function returns -1.

func (*Directive) Parent

func (d *Directive) Parent() *Element

Parent returns directive token's parent element, or nil if it has no parent.

func (*Directive) WriteTo added in v1.2.0

func (d *Directive) WriteTo(w Writer, s *WriteSettings)

WriteTo serializes the XML directive to the writer.

type Document

type Document struct {
	Element
	ReadSettings  ReadSettings
	WriteSettings WriteSettings
}

A Document is a container holding a complete XML tree.

A document has a single embedded element, which contains zero or more child tokens, one of which is usually the root element. The embedded element may include other children such as processing instruction tokens or character data tokens. The document's embedded element is never directly serialized; only its children are.

A document also contains read and write settings, which influence the way the document is deserialized, serialized, and indented.

Example (Creating)

Create an etree Document, add XML entities to it, and serialize it to stdout.

doc := NewDocument()
doc.CreateProcInst("xml", `version="1.0" encoding="UTF-8"`)
doc.CreateProcInst("xml-stylesheet", `type="text/xsl" href="style.xsl"`)

people := doc.CreateElement("People")
people.CreateComment("These are all known people")

jon := people.CreateElement("Person")
jon.CreateAttr("name", "Jon O'Reilly")

sally := people.CreateElement("Person")
sally.CreateAttr("name", "Sally")

doc.Indent(2)
doc.WriteTo(os.Stdout)
Output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<People>
  <!--These are all known people-->
  <Person name="Jon O&apos;Reilly"/>
  <Person name="Sally"/>
</People>
Example (Reading)
doc := NewDocument()
if err := doc.ReadFromFile("document.xml"); err != nil {
	panic(err)
}
Output:

func NewDocument

func NewDocument() *Document

NewDocument creates an XML document without a root element.

func NewDocumentWithRoot added in v1.1.1

func NewDocumentWithRoot(e *Element) *Document

NewDocumentWithRoot creates an XML document and sets the element 'e' as its root element. If the element 'e' is already part of another document, it is first removed from its existing document.

func (*Document) Copy

func (d *Document) Copy() *Document

Copy returns a recursive, deep copy of the document.

func (*Document) Indent

func (d *Document) Indent(spaces int)

Indent modifies the document's element tree by inserting character data tokens containing newlines and spaces for indentation. The amount of indentation per depth level is given by the 'spaces' parameter. Other than the number of spaces, default IndentSettings are used.

func (*Document) IndentTabs

func (d *Document) IndentTabs()

IndentTabs modifies the document's element tree by inserting CharData tokens containing newlines and tabs for indentation. One tab is used per indentation level. Other than the use of tabs, default IndentSettings are used.

func (*Document) IndentWithSettings added in v1.1.4

func (d *Document) IndentWithSettings(s *IndentSettings)

IndentWithSettings modifies the document's element tree by inserting character data tokens containing newlines and indentation. The behavior of the indentation algorithm is configured by the indent settings.

func (*Document) ReadFrom

func (d *Document) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads XML from the reader 'r' into this document. The function returns the number of bytes read and any error encountered.

func (*Document) ReadFromBytes

func (d *Document) ReadFromBytes(b []byte) error

ReadFromBytes reads XML from the byte slice 'b' into the this document.

func (*Document) ReadFromFile

func (d *Document) ReadFromFile(filepath string) error

ReadFromFile reads XML from a local file at path 'filepath' into this document.

func (*Document) ReadFromString

func (d *Document) ReadFromString(s string) error

ReadFromString reads XML from the string 's' into this document.

func (*Document) Root

func (d *Document) Root() *Element

Root returns the root element of the document. It returns nil if there is no root element.

func (*Document) SetRoot

func (d *Document) SetRoot(e *Element)

SetRoot replaces the document's root element with the element 'e'. If the document already has a root element when this function is called, then the existing root element is unbound from the document. If the element 'e' is part of another document, then it is unbound from the other document.

func (*Document) Unindent added in v1.1.4

func (d *Document) Unindent()

Unindent modifies the document's element tree by removing character data tokens containing only whitespace. Other than the removal of indentation, default IndentSettings are used.

func (*Document) WriteTo

func (d *Document) WriteTo(w io.Writer) (n int64, err error)

WriteTo serializes the document out to the writer 'w'. The function returns the number of bytes written and any error encountered.

func (*Document) WriteToBytes

func (d *Document) WriteToBytes() (b []byte, err error)

WriteToBytes serializes this document into a slice of bytes.

func (*Document) WriteToFile

func (d *Document) WriteToFile(filepath string) error

WriteToFile serializes the document out to the file at path 'filepath'.

func (*Document) WriteToString

func (d *Document) WriteToString() (s string, err error)

WriteToString serializes this document into a string.

type Element

type Element struct {
	Space, Tag string  // namespace prefix and tag
	Attr       []Attr  // key-value attribute pairs
	Child      []Token // child tokens (elements, comments, etc.)
	// contains filtered or unexported fields
}

An Element represents an XML element, its attributes, and its child tokens.

func NewElement

func NewElement(tag string) *Element

NewElement creates an unparented element with the specified tag (i.e., name). The tag may include a namespace prefix followed by a colon.

func (*Element) AddChild

func (e *Element) AddChild(t Token)

AddChild adds the token 't' as the last child of the element. If token 't' was already the child of another element, it is first removed from its parent element.

func (*Element) ChildElements

func (e *Element) ChildElements() []*Element

ChildElements returns all elements that are children of this element.

func (*Element) Copy

func (e *Element) Copy() *Element

Copy creates a recursive, deep copy of the element and all its attributes and children. The returned element has no parent but can be parented to a another element using AddChild, or added to a document with SetRoot or NewDocumentWithRoot.

func (*Element) CreateAttr

func (e *Element) CreateAttr(key, value string) *Attr

CreateAttr creates an attribute with the specified 'key' and 'value' and adds it to this element. If an attribute with same key already exists on this element, then its value is replaced. The key may include a namespace prefix followed by a colon.

func (*Element) CreateCData added in v1.1.0

func (e *Element) CreateCData(data string) *CharData

CreateCData creates a CharData token containing a CDATA section with 'data' as its content and adds it to the end of this element's list of child tokens.

func (*Element) CreateCharData deprecated

func (e *Element) CreateCharData(data string) *CharData

CreateCharData creates a CharData token containing simple text data and adds it to the end of this element's list of child tokens.

Deprecated: CreateCharData is deprecated. Instead, use CreateText, which does the same thing.

func (*Element) CreateComment

func (e *Element) CreateComment(comment string) *Comment

CreateComment creates a comment token using the specified 'comment' string and adds it as the last child token of this element.

func (*Element) CreateDirective

func (e *Element) CreateDirective(data string) *Directive

CreateDirective creates an XML directive token with the specified 'data' value and adds it as the last child token of this element.

func (*Element) CreateElement

func (e *Element) CreateElement(tag string) *Element

CreateElement creates a new element with the specified tag (i.e., name) and adds it as the last child token of this element. The tag may include a prefix followed by a colon.

func (*Element) CreateProcInst

func (e *Element) CreateProcInst(target, inst string) *ProcInst

CreateProcInst creates an XML processing instruction token with the specified 'target' and instruction 'inst'. It is then added as the last child token of this element.

func (*Element) CreateText added in v1.1.0

func (e *Element) CreateText(text string) *CharData

CreateText creates a CharData token containing simple text data and adds it to the end of this element's list of child tokens.

func (*Element) FindElement

func (e *Element) FindElement(path string) *Element

FindElement returns the first element matched by the XPath-like 'path' string. The function returns nil if no child element is found using the path. It panics if an invalid path string is supplied.

func (*Element) FindElementPath

func (e *Element) FindElementPath(path Path) *Element

FindElementPath returns the first element matched by the 'path' object. The function returns nil if no element is found using the path.

func (*Element) FindElements

func (e *Element) FindElements(path string) []*Element

FindElements returns a slice of elements matched by the XPath-like 'path' string. The function returns nil if no child element is found using the path. It panics if an invalid path string is supplied.

func (*Element) FindElementsPath

func (e *Element) FindElementsPath(path Path) []*Element

FindElementsPath returns a slice of elements matched by the 'path' object.

func (*Element) FullTag added in v1.1.0

func (e *Element) FullTag() string

FullTag returns the element e's complete tag, including namespace prefix if present.

func (*Element) GetPath added in v1.0.1

func (e *Element) GetPath() string

GetPath returns the absolute path of the element. The absolute path is the full path from the document's root.

func (*Element) GetRelativePath added in v1.0.1

func (e *Element) GetRelativePath(source *Element) string

GetRelativePath returns the path of this element relative to the 'source' element. If the two elements are not part of the same element tree, then the function returns the empty string.

func (*Element) IndentWithSettings added in v1.2.0

func (e *Element) IndentWithSettings(s *IndentSettings)

IndentWithSettings modifies the element and its child tree by inserting character data tokens containing newlines and indentation. The behavior of the indentation algorithm is configured by the indent settings. Because this function indents the element as if it were at the root of a document, it is most useful when called just before writing the element as an XML fragment using WriteTo.

func (*Element) Index added in v1.1.0

func (e *Element) Index() int

Index returns the index of this element within its parent element's list of child tokens. If this element has no parent, then the function returns -1.

func (*Element) InsertChild deprecated

func (e *Element) InsertChild(ex Token, t Token)

InsertChild inserts the token 't' into this element's list of children just before the element's existing child token 'ex'. If the existing element 'ex' does not appear in this element's list of child tokens, then 't' is added to the end of this element's list of child tokens. If token 't' is already the child of another element, it is first removed from the other element's list of child tokens.

Deprecated: InsertChild is deprecated. Use InsertChildAt instead.

func (*Element) InsertChildAt added in v1.1.0

func (e *Element) InsertChildAt(index int, t Token)

InsertChildAt inserts the token 't' into this element's list of child tokens just before the requested 'index'. If the index is greater than or equal to the length of the list of child tokens, then the token 't' is added to the end of the list of child tokens.

func (*Element) NamespaceURI added in v1.1.0

func (e *Element) NamespaceURI() string

NamespaceURI returns the XML namespace URI associated with the element. If the element is part of the XML default namespace, NamespaceURI returns the empty string.

func (*Element) NextSibling added in v1.4.0

func (e *Element) NextSibling() *Element

NextSibling returns this element's next sibling element. It returns nil if there is no next sibling element.

func (*Element) NotNil added in v1.4.0

func (e *Element) NotNil() *Element

NotNil returns the receiver element if it isn't nil; otherwise, it returns an unparented element with an empty string tag. This function simplifies the task of writing code to ignore not-found results from element queries. For example, instead of writing this:

if e := doc.SelectElement("enabled"); e != nil {
	e.SetText("true")
}

You could write this:

doc.SelectElement("enabled").NotNil().SetText("true")

func (*Element) Parent

func (e *Element) Parent() *Element

Parent returns this element's parent element. It returns nil if this element has no parent.

func (*Element) PrevSibling added in v1.4.0

func (e *Element) PrevSibling() *Element

PrevSibling returns this element's preceding sibling element. It returns nil if there is no preceding sibling element.

func (*Element) ReindexChildren added in v1.3.0

func (e *Element) ReindexChildren()

ReindexChildren recalculates the index values of the element's child tokens. This is necessary only if you have manually manipulated the element's `Child` array.

func (*Element) RemoveAttr

func (e *Element) RemoveAttr(key string) *Attr

RemoveAttr removes the first attribute of this element whose key matches 'key'. It returns a copy of the removed attribute if a match is found. If no match is found, it returns nil. The key may include a namespace prefix followed by a colon.

func (*Element) RemoveChild

func (e *Element) RemoveChild(t Token) Token

RemoveChild attempts to remove the token 't' from this element's list of child tokens. If the token 't' was a child of this element, then it is removed and returned. Otherwise, nil is returned.

func (*Element) RemoveChildAt added in v1.1.0

func (e *Element) RemoveChildAt(index int) Token

RemoveChildAt removes the child token appearing in slot 'index' of this element's list of child tokens. The removed child token is then returned. If the index is out of bounds, no child is removed and nil is returned.

func (*Element) SelectAttr

func (e *Element) SelectAttr(key string) *Attr

SelectAttr finds an element attribute matching the requested 'key' and, if found, returns a pointer to the matching attribute. The function returns nil if no matching attribute is found. The key may include a namespace prefix followed by a colon.

func (*Element) SelectAttrValue

func (e *Element) SelectAttrValue(key, dflt string) string

SelectAttrValue finds an element attribute matching the requested 'key' and returns its value if found. If no matching attribute is found, the function returns the 'dflt' value instead. The key may include a namespace prefix followed by a colon.

func (*Element) SelectElement

func (e *Element) SelectElement(tag string) *Element

SelectElement returns the first child element with the given 'tag' (i.e., name). The function returns nil if no child element matching the tag is found. The tag may include a namespace prefix followed by a colon.

func (*Element) SelectElements

func (e *Element) SelectElements(tag string) []*Element

SelectElements returns a slice of all child elements with the given 'tag' (i.e., name). The tag may include a namespace prefix followed by a colon.

func (*Element) SetCData added in v1.1.0

func (e *Element) SetCData(text string)

SetCData replaces all character data immediately following an element's opening tag with a CDATA section.

func (*Element) SetTail added in v1.1.0

func (e *Element) SetTail(text string)

SetTail replaces all character data immediately following the element's end tag with the requested string.

func (*Element) SetText

func (e *Element) SetText(text string)

SetText replaces all character data immediately following an element's opening tag with the requested string.

func (*Element) SortAttrs added in v1.1.0

func (e *Element) SortAttrs()

SortAttrs sorts this element's attributes lexicographically by key.

func (*Element) Tail added in v1.1.0

func (e *Element) Tail() string

Tail returns all character data immediately following the element's end tag.

func (*Element) Text

func (e *Element) Text() string

Text returns all character data immediately following the element's opening tag.

func (*Element) WriteTo added in v1.2.0

func (e *Element) WriteTo(w Writer, s *WriteSettings)

WriteTo serializes the element to the writer w.

type ErrPath

type ErrPath string

ErrPath is returned by path functions when an invalid etree path is provided.

func (ErrPath) Error

func (err ErrPath) Error() string

Error returns the string describing a path error.

type IndentSettings added in v1.1.4

type IndentSettings struct {
	// Spaces indicates the number of spaces to insert for each level of
	// indentation. Set to etree.NoIndent to remove all indentation. Ignored
	// when UseTabs is true. Default: 4.
	Spaces int

	// UseTabs causes tabs to be used instead of spaces when indenting.
	// Default: false.
	UseTabs bool

	// UseCRLF causes newlines to be written as a carriage return followed by
	// a linefeed ("\r\n"). If false, only a linefeed character is output
	// for a newline ("\n"). Default: false.
	UseCRLF bool

	// PreserveLeafWhitespace causes indent functions to preserve whitespace
	// within XML elements containing only non-CDATA character data. Default:
	// false.
	PreserveLeafWhitespace bool

	// SuppressTrailingWhitespace suppresses the generation of a trailing
	// whitespace characters (such as newlines) at the end of the indented
	// document. Default: false.
	SuppressTrailingWhitespace bool
}

IndentSettings determine the behavior of the Document's Indent* functions.

func NewIndentSettings added in v1.1.4

func NewIndentSettings() *IndentSettings

NewIndentSettings creates a default IndentSettings record.

type Path

type Path struct {
	// contains filtered or unexported fields
}

A Path is a string that represents a search path through an etree starting from the document root or an arbitrary element. Paths are used with the Element object's Find* methods to locate and return desired elements.

A Path consists of a series of slash-separated "selectors", each of which may be modified by one or more bracket-enclosed "filters". Selectors are used to traverse the etree from element to element, while filters are used to narrow the list of candidate elements at each node.

Although etree Path strings are structurally and behaviorally similar to XPath strings (https://www.w3.org/TR/1999/REC-xpath-19991116/), they have a more limited set of selectors and filtering options.

The following selectors are supported by etree paths:

.               Select the current element.
..              Select the parent of the current element.
*               Select all child elements of the current element.
/               Select the root element when used at the start of a path.
//              Select all descendants of the current element.
tag             Select all child elements with a name matching the tag.

The following basic filters are supported:

[@attrib]       Keep elements with an attribute named attrib.
[@attrib='val'] Keep elements with an attribute named attrib and value matching val.
[tag]           Keep elements with a child element named tag.
[tag='val']     Keep elements with a child element named tag and text matching val.
[n]             Keep the n-th element, where n is a numeric index starting from 1.

The following function-based filters are supported:

[text()]                    Keep elements with non-empty text.
[text()='val']              Keep elements whose text matches val.
[local-name()='val']        Keep elements whose un-prefixed tag matches val.
[name()='val']              Keep elements whose full tag exactly matches val.
[namespace-prefix()]        Keep elements with non-empty namespace prefixes.
[namespace-prefix()='val']  Keep elements whose namespace prefix matches val.
[namespace-uri()]           Keep elements with non-empty namespace URIs.
[namespace-uri()='val']     Keep elements whose namespace URI matches val.

Below are some examples of etree path strings.

Select the bookstore child element of the root element:

/bookstore

Beginning from the root element, select the title elements of all descendant book elements having a 'category' attribute of 'WEB':

//book[@category='WEB']/title

Beginning from the current element, select the first descendant book element with a title child element containing the text 'Great Expectations':

.//book[title='Great Expectations'][1]

Beginning from the current element, select all child elements of book elements with an attribute 'language' set to 'english':

./book/*[@language='english']

Beginning from the current element, select all child elements of book elements containing the text 'special':

./book/*[text()='special']

Beginning from the current element, select all descendant book elements whose title child element has a 'language' attribute of 'french':

.//book/title[@language='french']/..

Beginning from the current element, select all descendant book elements belonging to the http://www.w3.org/TR/html4/ namespace:

.//book[namespace-uri()='http://www.w3.org/TR/html4/']
Example
xml := `
<bookstore>
	<book>
		<title>Great Expectations</title>
		<author>Charles Dickens</author>
	</book>
	<book>
		<title>Ulysses</title>
		<author>James Joyce</author>
	</book>
</bookstore>`

doc := NewDocument()
doc.ReadFromString(xml)
for _, e := range doc.FindElements(".//book[author='Charles Dickens']") {
	doc := NewDocumentWithRoot(e.Copy())
	doc.Indent(2)
	doc.WriteTo(os.Stdout)
}
Output:

<book>
  <title>Great Expectations</title>
  <author>Charles Dickens</author>
</book>

func CompilePath

func CompilePath(path string) (Path, error)

CompilePath creates an optimized version of an XPath-like string that can be used to query elements in an element tree.

func MustCompilePath

func MustCompilePath(path string) Path

MustCompilePath creates an optimized version of an XPath-like string that can be used to query elements in an element tree. Panics if an error occurs. Use this function to create Paths when you know the path is valid (i.e., if it's hard-coded).

type ProcInst

type ProcInst struct {
	Target string // the processing instruction target
	Inst   string // the processing instruction value
	// contains filtered or unexported fields
}

A ProcInst represents an XML processing instruction.

func NewProcInst

func NewProcInst(target, inst string) *ProcInst

NewProcInst creates an unparented XML processing instruction.

func (*ProcInst) Index added in v1.1.0

func (p *ProcInst) Index() int

Index returns the index of this ProcInst token within its parent element's list of child tokens. If this ProcInst token has no parent, then the function returns -1.

func (*ProcInst) Parent

func (p *ProcInst) Parent() *Element

Parent returns processing instruction token's parent element, or nil if it has no parent.

func (*ProcInst) WriteTo added in v1.2.0

func (p *ProcInst) WriteTo(w Writer, s *WriteSettings)

WriteTo serializes the processing instruction to the writer.

type ReadSettings

type ReadSettings struct {
	// CharsetReader, if non-nil, defines a function to generate
	// charset-conversion readers, converting from the provided non-UTF-8
	// charset into UTF-8. If nil, the ReadFrom* functions will use a
	// "pass-through" CharsetReader that performs no conversion on the reader's
	// data regardless of the value of the "charset" encoding string. Default:
	// nil.
	CharsetReader func(charset string, input io.Reader) (io.Reader, error)

	// Permissive allows input containing common mistakes such as missing tags
	// or attribute values. Default: false.
	Permissive bool

	// Preserve CDATA character data blocks when decoding XML (instead of
	// converting it to normal character text). This entails additional
	// processing and memory usage during ReadFrom* operations. Default:
	// false.
	PreserveCData bool

	// When an element has two or more attributes with the same name,
	// preserve them instead of keeping only one. Default: false.
	PreserveDuplicateAttrs bool

	// ValidateInput forces all ReadFrom* functions to validate that the
	// provided input is composed of "well-formed"(*) XML before processing it.
	// If invalid XML is detected, the ReadFrom* functions return an error.
	// Because this option requires the input to be processed twice, it incurs a
	// significant performance penalty. Default: false.
	//
	// (*) Note that this definition of "well-formed" is in the context of the
	// go standard library's encoding/xml package. Go's encoding/xml package
	// does not, in fact, guarantee well-formed XML as specified by the W3C XML
	// recommendation. See: https://github.com/golang/go/issues/68299
	ValidateInput bool

	// Entity to be passed to standard xml.Decoder. Default: nil.
	Entity map[string]string

	// When Permissive is true, AutoClose indicates a set of elements to
	// consider closed immediately after they are opened, regardless of
	// whether an end element is present. Commonly set to xml.HTMLAutoClose.
	// Default: nil.
	AutoClose []string
}

ReadSettings determine the default behavior of the Document's ReadFrom* functions.

type Token

type Token interface {
	Parent() *Element
	Index() int
	WriteTo(w Writer, s *WriteSettings)
	// contains filtered or unexported methods
}

A Token is an interface type used to represent XML elements, character data, CDATA sections, XML comments, XML directives, and XML processing instructions.

type WriteSettings

type WriteSettings struct {
	// CanonicalEndTags forces the production of XML end tags, even for
	// elements that have no child elements. Default: false.
	CanonicalEndTags bool

	// CanonicalText forces the production of XML character references for
	// text data characters &, <, and >. If false, XML character references
	// are also produced for " and '. Default: false.
	CanonicalText bool

	// CanonicalAttrVal forces the production of XML character references for
	// attribute value characters &, < and ". If false, XML character
	// references are also produced for > and '. Default: false.
	CanonicalAttrVal bool

	// AttrSingleQuote causes attributes to use single quotes (attr='example')
	// instead of double quotes (attr = "example") when set to true. Default:
	// false.
	AttrSingleQuote bool

	// UseCRLF causes the document's Indent* functions to use a carriage return
	// followed by a linefeed ("\r\n") when outputting a newline. If false,
	// only a linefeed is used ("\n"). Default: false.
	//
	// Deprecated: UseCRLF is deprecated. Use IndentSettings.UseCRLF instead.
	UseCRLF bool
}

WriteSettings determine the behavior of the Document's WriteTo* functions.

type Writer added in v1.2.0

type Writer interface {
	io.StringWriter
	io.ByteWriter
	io.Writer
}

Writer is the interface that wraps the Write* functions called by each token type's WriteTo function.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL