etree

package
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 3, 2024 License: Apache-2.0, BSD-2-Clause Imports: 8 Imported by: 0

README

Build Status GoDoc

etree

The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python ElementTree module. Some of the package's features include:

  • Represents XML documents as trees of elements for easy traversal.
  • Imports, serializes, modifies or creates XML documents from scratch.
  • Writes and reads XML to/from files, byte slices, strings and io interfaces.
  • Performs simple or complex searches with lightweight XPath-like query APIs.
  • Auto-indents XML using spaces or tabs for better readability.
  • Implemented in pure go; depends only on standard go libraries.
  • Built on top of the go encoding/xml package.
Creating an XML document

The following example creates an XML document from scratch using the etree package and outputs its indented contents to stdout.

doc := etree.NewDocument()
doc.CreateProcInst("xml", `version="1.0" encoding="UTF-8"`)
doc.CreateProcInst("xml-stylesheet", `type="text/xsl" href="style.xsl"`)

people := doc.CreateElement("People")
people.CreateComment("These are all known people")

jon := people.CreateElement("Person")
jon.CreateAttr("name", "Jon")

sally := people.CreateElement("Person")
sally.CreateAttr("name", "Sally")

doc.Indent(2)
doc.WriteTo(os.Stdout)

Output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<People>
  <!--These are all known people-->
  <Person name="Jon"/>
  <Person name="Sally"/>
</People>
Reading an XML file

Suppose you have a file on disk called bookstore.xml containing the following data:

<bookstore xmlns:p="urn:schemas-books-com:prices">

  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <p:price>30.00</p:price>
  </book>

  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <p:price>29.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <p:price>49.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <p:price>39.95</p:price>
  </book>

</bookstore>

This code reads the file's contents into an etree document.

doc := etree.NewDocument()
if err := doc.ReadFromFile("bookstore.xml"); err != nil {
    panic(err)
}

You can also read XML from a string, a byte slice, or an io.Reader.

Processing elements and attributes

This example illustrates several ways to access elements and attributes using etree selection queries.

root := doc.SelectElement("bookstore")
fmt.Println("ROOT element:", root.Tag)

for _, book := range root.SelectElements("book") {
    fmt.Println("CHILD element:", book.Tag)
    if title := book.SelectElement("title"); title != nil {
        lang := title.SelectAttrValue("lang", "unknown")
        fmt.Printf("  TITLE: %s (%s)\n", title.Text(), lang)
    }
    for _, attr := range book.Attr {
        fmt.Printf("  ATTR: %s=%s\n", attr.Key, attr.Value)
    }
}

Output:

ROOT element: bookstore
CHILD element: book
  TITLE: Everyday Italian (en)
  ATTR: category=COOKING
CHILD element: book
  TITLE: Harry Potter (en)
  ATTR: category=CHILDREN
CHILD element: book
  TITLE: XQuery Kick Start (en)
  ATTR: category=WEB
CHILD element: book
  TITLE: Learning XML (en)
  ATTR: category=WEB
Path queries

This example uses etree's path functions to select all book titles that fall into the category of 'WEB'. The double-slash prefix in the path causes the search for book elements to occur recursively; book elements may appear at any level of the XML hierarchy.

for _, t := range doc.FindElements("//book[@category='WEB']/title") {
    fmt.Println("Title:", t.Text())
}

Output:

Title: XQuery Kick Start
Title: Learning XML

This example finds the first book element under the root bookstore element and outputs the tag and text of each of its child elements.

for _, e := range doc.FindElements("./bookstore/book[1]/*") {
    fmt.Printf("%s: %s\n", e.Tag, e.Text())
}

Output:

title: Everyday Italian
author: Giada De Laurentiis
year: 2005
price: 30.00

This example finds all books with a price of 49.99 and outputs their titles.

path := etree.MustCompilePath("./bookstore/book[p:price='49.99']/title")
for _, e := range doc.FindElementsPath(path) {
    fmt.Println(e.Text())
}

Output:

XQuery Kick Start

Note that this example uses the FindElementsPath function, which takes as an argument a pre-compiled path object. Use precompiled paths when you plan to search with the same path more than once.

###Other features

These are just a few examples of the things the etree package can do. See the documentation for a complete description of its capabilities.

###Contributing

This project accepts contributions. Just fork the repo and submit a pull request!

Documentation

Overview

Package etree provides XML services through an Element Tree abstraction.

Index

Examples

Constants

View Source
const (
	// NoIndent is used with Indent to disable all indenting.
	NoIndent = -1
)

Variables

View Source
var ErrXML = errors.New("etree: invalid XML format")

ErrXML is returned when XML parsing fails due to incorrect formatting.

Functions

This section is empty.

Types

type Attr

type Attr struct {
	Space, Key string // The attribute's namespace and key
	Value      string // The attribute value string
}

An Attr represents a key-value attribute of an XML element.

type CData

type CData struct {
	Value string
	// contains filtered or unexported fields
}

func (*CData) Parent

func (this *CData) Parent() *Element

type CharData

type CharData struct {
	Data string
	// contains filtered or unexported fields
}

CharData represents character data within XML.

func NewCharData

func NewCharData(data string) *CharData

NewCharData creates a parentless XML character data entity.

func (*CharData) Parent

func (c *CharData) Parent() *Element

Parent returns the character data token's parent element, or nil if it has no parent.

type Comment

type Comment struct {
	Data string
	// contains filtered or unexported fields
}

A Comment represents an XML comment.

func NewComment

func NewComment(comment string) *Comment

NewComment creates a parentless XML comment.

func (*Comment) Parent

func (c *Comment) Parent() *Element

Parent returns comment token's parent element, or nil if it has no parent.

type Directive

type Directive struct {
	Data string
	// contains filtered or unexported fields
}

A Directive represents an XML directive.

func NewDirective

func NewDirective(data string) *Directive

NewDirective creates a parentless XML directive.

func (*Directive) Parent

func (d *Directive) Parent() *Element

Parent returns directive token's parent element, or nil if it has no parent.

type Document

type Document struct {
	Element
	WriteSettings WriteSettings
}

A Document is a container holding a complete XML hierarchy. Its embedded element contains zero or more children, one of which is usually the root element. The embedded element may include other children such as processing instructions or BOM CharData tokens.

Example (Creating)

Create an etree Document, add XML entities to it, and serialize it to stdout.

doc := NewDocument()
doc.CreateProcInst("xml", `version="1.0" encoding="UTF-8"`)
doc.CreateProcInst("xml-stylesheet", `type="text/xsl" href="style.xsl"`)

people := doc.CreateElement("People")
people.CreateComment("These are all known people")

jon := people.CreateElement("Person")
jon.CreateAttr("name", "Jon O'Reilly")

sally := people.CreateElement("Person")
sally.CreateAttr("name", "Sally")

doc.Indent(2)
doc.WriteTo(os.Stdout)
Output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<People>
  <!--These are all known people-->
  <Person name="Jon O&apos;Reilly"/>
  <Person name="Sally"/>
</People>
Example (Reading)
doc := NewDocument()
if err := doc.ReadFromFile("document.xml"); err != nil {
	panic(err)
}
Output:

func NewDocument

func NewDocument() *Document

NewDocument creates an XML document without a root element.

func (*Document) Copy

func (d *Document) Copy() *Document

Copy returns a recursive, deep copy of the document.

func (*Document) Indent

func (d *Document) Indent(spaces int)

Indent modifies the document's element tree by inserting CharData entities containing carriage returns and indentation. The amount of indentation per depth level is given as spaces. Pass etree.NoIndent for spaces if you want no indentation at all.

func (*Document) IndentTabs

func (d *Document) IndentTabs()

IndentTabs modifies the document's element tree by inserting CharData entities containing carriage returns and tabs for indentation. One tab is used per indentation level.

func (*Document) ReadFrom

func (d *Document) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads XML from the reader r into the document d. It returns the number of bytes read and any error encountered.

func (*Document) ReadFromBytes

func (d *Document) ReadFromBytes(b []byte) error

ReadFromBytes reads XML from the byte slice b into the document d.

func (*Document) ReadFromFile

func (d *Document) ReadFromFile(filename string) error

ReadFromFile reads XML from the string s into the document d.

func (*Document) ReadFromString

func (d *Document) ReadFromString(s string) error

ReadFromString reads XML from the string s into the document d.

func (*Document) Root

func (d *Document) Root() *Element

Root returns the root element of the document, or nil if there is no root element.

func (*Document) SetRoot

func (d *Document) SetRoot(e *Element)

SetRoot replaces the document's root element with e. If the document already has a root when this function is called, then the document's original root is unbound first. If the element e is bound to another document (or to another element within a document), then it is unbound first.

func (*Document) WriteTo

func (d *Document) WriteTo(w io.Writer) (n int64, err error)

WriteTo serializes an XML document into the writer w. It returns the number of bytes written and any error encountered.

func (*Document) WriteToBytes

func (d *Document) WriteToBytes() (b []byte, err error)

WriteToBytes serializes the XML document into a slice of bytes.

func (*Document) WriteToFile

func (d *Document) WriteToFile(filename string) error

WriteToFile serializes an XML document into the file named filename.

func (*Document) WriteToString

func (d *Document) WriteToString() (s string, err error)

WriteToString serializes the XML document into a string.

type Element

type Element struct {
	Space, Tag string  // namespace and tag
	Attr       []Attr  // key-value attribute pairs
	Child      []Token // child tokens (elements, comments, etc.)
	// contains filtered or unexported fields
}

An Element represents an XML element, its attributes, and its child tokens.

func NewElement

func NewElement(tag string) *Element

NewElement creates an unparented element with the specified tag. The tag may be prefixed by a namespace and a colon.

func (*Element) AddChild

func (e *Element) AddChild(t Token)

AddChild adds the token t as the last child of element e. If token t was already the child of another element, it is first removed from its current parent element.

func (*Element) ChildElements

func (e *Element) ChildElements() []*Element

ChildElements returns all elements that are children of element e.

func (*Element) Copy

func (e *Element) Copy() *Element

Copy creates a recursive, deep copy of the element and all its attributes and children. The returned element has no parent but can be parented to a another element using AddElement, or to a document using SetRoot.

func (*Element) CreateAttr

func (e *Element) CreateAttr(key, value string) *Attr

CreateAttr creates an attribute and adds it to element e. The key may be prefixed by a namespace and a colon. If an attribute with the key already exists, its value is replaced.

func (*Element) CreateCharData

func (e *Element) CreateCharData(data string) *CharData

CreateCharData creates an XML character data entity and adds it as a child of element e.

func (*Element) CreateComment

func (e *Element) CreateComment(comment string) *Comment

CreateComment creates an XML comment and adds it as a child of element e.

func (*Element) CreateDirective

func (e *Element) CreateDirective(data string) *Directive

CreateDirective creates an XML directive and adds it as the last child of element e.

func (*Element) CreateElement

func (e *Element) CreateElement(tag string) *Element

CreateElement creates an element with the specified tag and adds it as the last child element of the element e. The tag may be prefixed by a namespace and a colon.

func (*Element) CreateProcInst

func (e *Element) CreateProcInst(target, inst string) *ProcInst

CreateProcInst creates a processing instruction and adds it as a child of element e.

func (*Element) FindElement

func (e *Element) FindElement(path string) *Element

FindElement returns the first element matched by the XPath-like path string. Panics if an invalid path string is supplied.

func (*Element) FindElementPath

func (e *Element) FindElementPath(path Path) *Element

FindElementPath returns the first element matched by the XPath-like path string.

func (*Element) FindElements

func (e *Element) FindElements(path string) []*Element

FindElements returns a slice of elements matched by the XPath-like path string. Panics if an invalid path string is supplied.

func (*Element) FindElementsPath

func (e *Element) FindElementsPath(path Path) []*Element

FindElementsPath returns a slice of elements matched by the Path object.

func (*Element) InsertChild

func (e *Element) InsertChild(ex Token, t Token)

InsertChild inserts the token t before e's existing child token ex. If ex is nil (or if ex is not a child of e), then t is added to the end of e's child token list. If token t was already the child of another element, it is first removed from its current parent element.

func (*Element) Parent

func (e *Element) Parent() *Element

Parent returns the element token's parent element, or nil if it has no parent.

func (*Element) RemoveAttr

func (e *Element) RemoveAttr(key string) *Attr

RemoveAttr removes and returns the first attribute of the element whose key matches the given key. The key may be prefixed by a namespace and a colon. If an equal attribute does not exist, nil is returned.

func (*Element) RemoveChild

func (e *Element) RemoveChild(t Token) Token

RemoveChild attempts to remove the token t from element e's list of children. If the token t is a child of e, then it is returned. Otherwise, nil is returned.

func (*Element) SelectAttr

func (e *Element) SelectAttr(key string) *Attr

SelectAttr finds an element attribute matching the requested key and returns it if found. The key may be prefixed by a namespace and a colon.

func (*Element) SelectAttrValue

func (e *Element) SelectAttrValue(key, dflt string) string

SelectAttrValue finds an element attribute matching the requested key and returns its value if found. The key may be prefixed by a namespace and a colon. If the key is not found, the dflt value is returned instead.

func (*Element) SelectElement

func (e *Element) SelectElement(tag string) *Element

SelectElement returns the first child element with the given tag. The tag may be prefixed by a namespace and a colon.

func (*Element) SelectElements

func (e *Element) SelectElements(tag string) []*Element

SelectElements returns a slice of all child elements with the given tag. The tag may be prefixed by a namespace and a colon.

func (*Element) SetText

func (e *Element) SetText(text string)

SetText replaces an element's subsidiary CharData text with a new string.

func (*Element) Text

func (e *Element) Text() string

Text returns the characters immediately following the element's opening tag.

func (*Element) WriteCData

func (e *Element) WriteCData(text string)

type ErrPath

type ErrPath string

ErrPath is returned by path functions when an invalid etree path is provided.

func (ErrPath) Error

func (err ErrPath) Error() string

Error returns the string describing a path error.

type Path

type Path struct {
	// contains filtered or unexported fields
}

A Path is an object that represents an optimized version of an XPath-like search string. Although path strings are XPath-like, only the following limited syntax is supported:

.               Selects the current element
..              Selects the parent of the current element
*               Selects all child elements
//              Selects all descendants of the current element
tag             Selects all child elements with the given tag
[#]             Selects the element of the given index (1-based,
                  negative starts from the end)
[@attrib]       Selects all elements with the given attribute
[@attrib='val'] Selects all elements with the given attribute set to val
[tag]           Selects all elements with a child element named tag
[tag='val']     Selects all elements with a child element named tag
                  and text equal to val
|               Selects the predicate on either side of the '|' operator

Examples:

Select the title elements of all descendant book elements having a 'category' attribute of 'WEB':

//book[@category='WEB']/title

Select the first book element with a title child containing the text 'Great Expectations':

.//book[title='Great Expectations'][1]

Starting from the current element, select all children of book elements with an attribute 'language' set to 'english':

./book/*[@language='english']

Select all descendant book elements whose title element has an attribute 'language' set to 'french':

//book/title[@language='french']/..

Select all title and year elements:

//title|//year
Example
xml := `<bookstore><book><title>Great Expectations</title>
      <author>Charles Dickens</author></book><book><title>Ulysses</title>
      <author>James Joyce</author></book></bookstore>`

doc := NewDocument()
doc.ReadFromString(xml)
for _, e := range doc.FindElements(".//book[author='Charles Dickens']") {
	doc := NewDocument()
	doc.SetRoot(e.Copy())
	doc.Indent(2)
	doc.WriteTo(os.Stdout)
}
Output:

<book>
  <title>Great Expectations</title>
  <author>Charles Dickens</author>
</book>

func CompilePath

func CompilePath(path string) (Path, error)

CompilePath creates an optimized version of an XPath-like string that can be used to query elements in an element tree.

func MustCompilePath

func MustCompilePath(path string) Path

MustCompilePath creates an optimized version of an XPath-like string that can be used to query elements in an element tree. Panics if an error occurs. Use this function to create Paths when you know the path is valid (i.e., if it's hard-coded).

type ProcInst

type ProcInst struct {
	Target string
	Inst   string
	// contains filtered or unexported fields
}

A ProcInst represents an XML processing instruction.

func NewProcInst

func NewProcInst(target, inst string) *ProcInst

NewProcInst creates a parentless XML processing instruction.

func (*ProcInst) Parent

func (p *ProcInst) Parent() *Element

Parent returns processing instruction token's parent element, or nil if it has no parent.

type Token

type Token interface {
	Parent() *Element
	// contains filtered or unexported methods
}

A Token is an empty interface that represents an Element, CharData, Comment, Directive, or ProcInst.

type WriteSettings

type WriteSettings struct {
	// CanonicalEndTags forces the production of XML end tags, even for
	// elements that have no child elements. Default: false.
	CanonicalEndTags bool

	// CanonicalText forces the production of XML character references for
	// text data characters &, <, and >. If false, XML character references
	// are also produced for " and '. Default: false.
	CanonicalText bool

	// CanonicalAttrVal forces the production of XML character references for
	// attribute value characters &, < and ". If false, XML character
	// references are also produced for > and '. Default: false.
	CanonicalAttrVal bool
}

WriteSettings allow for changing the serialization behavior of the WriteTo* methods.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL