x2j

package module
v0.0.0-...-a0352aa Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 20, 2013 License: BSD-3-Clause Imports: 11 Imported by: 427

README

x2j.go - Unmarshal dynamic / arbitrary XML docs and extract values (using wildcards, if necessary).

ANNOUNCEMENTS

20 December 2013:

Non-UTF8 character sets supported via the X2jCharsetReader variable.

12 December 2013:

For symmetry, the package j2x has functions that marshal JSON strings and
map[string]interface{} values to XML encoded strings: http://godoc.org/github.com/clbanning/j2x.

Also, ToTree(), ToMap(), ToJson(), ToJsonIndent(), ReaderValuesFromTagPath() and ReaderValuesForTag() use io.Reader instead of string or []byte.

If you want to process a stream of XML messages check out XmlMsgsFromReader().

MOTIVATION

I make extensive use of JSON for messaging and typically unmarshal the messages into
map[string]interface{} variables.  This is easily done using json.Unmarshal from the
standard Go libraries.  Unfortunately, many legacy solutions use structured
XML messages; in those environments the applications would have to be refitted to
interoperate with my components.

The better solution is to just provide an alternative HTTP handler that receives
XML doc messages and parses it into a map[string]interface{} variable and then reuse
all the JSON-based code.  The Go xml.Unmarshal() function does not provide the same
option of unmarshaling XML messages into map[string]interface{} variables. So I wrote
a couple of small functions to fill this gap.

Of course, once the XML doc was unmarshal'd into a map[string]interface{} variable it
was just a matter of calling json.Marshal() to provide it as a JSON string.  Hence 'x2j'
rather than just 'x2m'.

USAGE

The package is fairly well self-documented. (http://godoc.org/github.com/clbanning/x2j)  
The one really useful function is:

    - Unmarshal(doc []byte, v interface{}) error  
      where v is a pointer to a variable of type 'map[string]interface{}', 'string', or
      any other type supported by xml.Unmarshal().

To retrieve a value for specific tag use: 

    - DocValue(doc, path string, attrs ...string) (interface{},error) 
    - MapValue(m map[string]interface{}, path string, attr map[string]interface{}, recast ...bool) (interface{}, error)

The 'path' argument is a period-separated tag hierarchy - also known as dot-notation.
It is the program's responsibility to cast the returned value to the proper type; possible 
types are the normal JSON unmarshaling types: string, float64, bool, []interface, map[string]interface{}.  

To retrieve all values associated with a tag occurring anywhere in the XML document use:

    - ValuesForTag(doc, tag string) ([]interface{}, error)
    - ValuesForKey(m map[string]interface{}, key string) []interface{}

    Demos: http://play.golang.org/p/m8zP-cpk0O
           http://play.golang.org/p/cIteTS1iSg
           http://play.golang.org/p/vd8pMiI21b

Returned values should be one of map[string]interface, []interface{}, or string.

All the values assocated with a tag-path that may include one or more wildcard characters - 
'*' - can also be retrieved using:

    - ValuesFromTagPath(doc, path string, getAttrs ...bool) ([]interface{}, error)
    - ValuesFromKeyPath(map[string]interface{}, path string, getAttrs ...bool) []interface{}

    Demos: http://play.golang.org/p/kUQnZ8VuhS
           http://play.golang.org/p/l1aMHYtz7G

NOTE: care should be taken when using "*" at the end of a path - i.e., "books.book.*".  See
the x2jpath_test.go case on how the wildcard returns all key values and collapses list values;
the same message structure can load a []interface{} or a map[string]interface{} (or an interface{}) 
value for a tag.

See the test cases in "x2jpath_test.go" and programs in "example" subdirectory for more.

XML PARSING CONVENTIONS

   - Attributes are parsed to map[string]interface{} values by prefixing a hyphen, '-',
     to the attribute label.
   - If the element is a simple element and has attributes, the element value
     is given the key '#text' for its map[string]interface{} representation.  (See
     the 'atomFeedString.xml' test data, below.)

BULK PROCESSING OF MESSAGE FILES

Sometime messages may be logged into files for transmission via FTP (e.g.) and subsequent
processing. You can use the bulk XML message processor to convert files of XML messages into 
map[string]interface{} values with custom processing and error handler functions.  See
the notes and test code for:

   - XmlMsgsFromFile(fname string, phandler func(map[string]interface{}) bool, ehandler func(error) bool,recast ...bool) error

IMPLEMENTATION NOTES

Nothing fancy here, just brute force.

   - Use xml.Decoder to parse the XML doc and build a tree.
   - Walk the tree and load values into a map[string]interface{} variable, 'm', as
     appropriate.
   - Use json.Marshaler to convert 'm' to JSON.

As for testing:

   - Copy an XML doc into 'x2j_test.xml'.
   - Run "go test" and you'll get a full dump.
     ("pathTestString.xml" and "atomFeedString.xml" are test data from "read_test.go"
     in the encoding/xml directory of the standard package library.)

USES

   - putting a XML API on our message hub middleware (http://jsonhub.net)
   - loading XML data into NoSQL database, such as, mongoDB

PERFORMANCE IMPROVEMENTS WITH GO 1.1 and 1.2

Upgrading to Go 1.1 environment results in performance improvements for XML and JSON
unmarshalling, in general.  The x2j package gets an average performance boost of 40%.

                           ----- Go 1.0.2 -----   ----------- Go 1.1 -----------
                            iterations  ns/op      iterations  ns/op  % improved
Benchmark_UseXml-4            100000    18776        200000    10377     45%
Benchmark_UseX2j-4             50000    55323         50000    33958     39%
Benchmark_UseJson-4          1000000     2257       1000000     1484     34%
Benchmark_UseJsonToMap-4     1000000     2531       1000000     1566     38%
BenchmarkBig_UseXml-4         100000    28918        100000    15876     45%
BenchmarkBig_UseX2j-4          20000    86338         50000    52661     39%
BenchmarkBig_UseJson-4        500000     4448       1000000     2664     40%
BenchmarkBig_UseJsonToMap-4   200000     9076        500000     5753     37%
BenchmarkBig3_UseXml-4         50000    42224        100000    24686     42%
BenchmarkBig3_UseX2j-4         10000   147407         20000    84332     43%
BenchmarkBig3_UseJson-4       500000     5921        500000     3930     34%
BenchmarkBig3_UseJsonToMap-4  200000    13037        200000     8670     33%

The x2j package gets an additional 15-20% performance boost going to Go 1.2.

                           ------ Go 1.1 ------   ----------- Go 1.2 -----------
                            iterations  ns/op      iterations  ns/op  % improved
Benchmark_UseXml-4            200000    10377        200000    11031     -6%
Benchmark_UseX2j-4             50000    33958        100000    29188     14%
Benchmark_UseJson-4          1000000     1484       1000000     1347      9%
Benchmark_UseJsonToMap-4     1000000     1566       1000000     1434      8%
BenchmarkBig_UseXml-4         100000    15876        100000    16585     -4%
BenchmarkBig_UseX2j-4          50000    52661         50000    43452     17%
BenchmarkBig_UseJson-4       1000000     2664       1000000     2523      5%
BenchmarkBig_UseJsonToMap-4   500000     5753        500000     4992     13%
BenchmarkBig3_UseXml-4        100000    24686        100000    24348      1%
BenchmarkBig3_UseX2j-4         20000    84332         50000    66736     21%
BenchmarkBig3_UseJson-4       500000     3930        500000     3733      5%
BenchmarkBig3_UseJsonToMap-4  200000     8670        200000     7810     10%



Documentation

Overview

Unmarshal dynamic / arbitrary XML docs and extract values (using wildcards, if necessary).

Copyright 2012-2013 Charles Banning. All rights reserved. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file

Unmarshal dynamic / arbitrary XML docs and extract values (using wildcards, if necessary).

One useful function is:

    - Unmarshal(doc []byte, v interface{}) error
      where v is a pointer to a variable of type 'map[string]interface{}', 'string', or
      any other type supported by xml.Unmarshal().

To retrieve a value for specific tag use:

    - DocValue(doc, path string, attrs ...string) (interface{},error)
    - MapValue(m map[string]interface{}, path string, attr map[string]interface{}, recast ...bool) (interface{}, error)

The 'path' argument is a period-separated tag hierarchy - also known as dot-notation.
It is the program's responsibility to cast the returned value to the proper type; possible
types are the normal JSON unmarshaling types: string, float64, bool, []interface, map[string]interface{}.

To retrieve all values associated with a tag occurring anywhere in the XML document use:

    - ValuesForTag(doc, tag string) ([]interface{}, error)
    - ValuesForKey(m map[string]interface{}, key string) []interface{}

    Demos: http://play.golang.org/p/m8zP-cpk0O
           http://play.golang.org/p/cIteTS1iSg
           http://play.golang.org/p/vd8pMiI21b

Returned values should be one of map[string]interface, []interface{}, or string.

All the values assocated with a tag-path that may include one or more wildcard characters -
'*' - can also be retrieved using:

    - ValuesFromTagPath(doc, path string, getAttrs ...bool) ([]interface{}, error)
    - ValuesFromKeyPath(map[string]interface{}, path string, getAttrs ...bool) []interface{}

    Demos: http://play.golang.org/p/kUQnZ8VuhS
	        http://play.golang.org/p/l1aMHYtz7G

NOTE: care should be taken when using "*" at the end of a path - i.e., "books.book.*".  See
the x2jpath_test.go case on how the wildcard returns all key values and collapses list values;
the same message structure can load a []interface{} or a map[string]interface{} (or an interface{})
value for a tag.

See the test cases in "x2jpath_test.go" and programs in "example" subdirectory for more.

XML PARSING CONVENTIONS

   - Attributes are parsed to map[string]interface{} values by prefixing a hyphen, '-',
     to the attribute label.
   - If the element is a simple element and has attributes, the element value
     is given the key '#text' for its map[string]interface{} representation.  (See
     the 'atomFeedString.xml' test data, below.)

 io.Reader HANDLING

 ToTree(), ToMap(), ToJson(), and ToJsonIndent() provide parsing of messages from an io.Reader.
 If you want to handle a message stream, look at XmlMsgsFromReader().

 NON-UTF8 CHARACTER SETS

 Use the X2jCharsetReader variable to assign io.Reader for alternative character sets.

Index

Constants

This section is empty.

Variables

View Source
var X2jCharsetReader func(charset string, input io.Reader) (io.Reader, error)

If X2jCharsetReader != nil, it will be used to decode the doc or stream if required

import charset "code.google.com/p/go-charset/charset"
...
x2j.X2jCharsetReader = charset.NewReader
s, err := x2j.DocToJson(doc)

Functions

func ByteDocToJson

func ByteDocToJson(doc []byte, recast ...bool) (string, error)

ByteDocToJson - return an XML doc as a JSON string.

If the optional argument 'recast' is 'true', then values will be converted to boolean or float64 if possible.

func ByteDocToMap

func ByteDocToMap(doc []byte, recast ...bool) (map[string]interface{}, error)

ByteDocToMap - convert an XML doc into a map[string]interface{}. (This is analogous to unmarshalling a JSON string to map[string]interface{} using json.Unmarshal().)

If the optional argument 'recast' is 'true', then values will be converted to boolean or float64 if possible.
Note: recasting is only applied to element values, not attribute values.

func DocToJson

func DocToJson(doc string, recast ...bool) (string, error)

DocToJson - return an XML doc as a JSON string.

If the optional argument 'recast' is 'true', then values will be converted to boolean or float64 if possible.

func DocToJsonIndent

func DocToJsonIndent(doc string, recast ...bool) (string, error)

DocToJsonIndent - return an XML doc as a prettified JSON string.

If the optional argument 'recast' is 'true', then values will be converted to boolean or float64 if possible.
Note: recasting is only applied to element values, not attribute values.

func DocToMap

func DocToMap(doc string, recast ...bool) (map[string]interface{}, error)

DocToMap - convert an XML doc into a map[string]interface{}. (This is analogous to unmarshalling a JSON string to map[string]interface{} using json.Unmarshal().)

If the optional argument 'recast' is 'true', then values will be converted to boolean or float64 if possible.
Note: recasting is only applied to element values, not attribute values.

func DocValue

func DocValue(doc, path string, attrs ...string) (interface{}, error)

DocValue - return a value for a specific tag

'doc' is a valid XML message.
'path' is a hierarchy of XML tags, e.g., "doc.name".
'attrs' is an OPTIONAL list of "name:value" pairs for attributes.
Note: 'recast' is not enabled here. Use DocToMap(), NewAttributeMap(), and MapValue() calls for that.

func MapValue

func MapValue(m map[string]interface{}, path string, attr map[string]interface{}, r ...bool) (interface{}, error)

MapValue - retrieves value based on walking the map, 'm'.

'm' is the map value of interest.
'path' is a period-separated hierarchy of keys in the map.
'attr' is a map of attribute "name:value" pairs from NewAttributeMap().  May be 'nil'.
If the path can't be traversed, an error is returned.
Note: the optional argument 'r' can be used to coerce attribute values, 'attr', if done so for 'm'.

func NewAttributeMap

func NewAttributeMap(kv ...string) (map[string]interface{}, error)

NewAttributeMap() - generate map of attributes=value entries as map["-"+string]string.

'kv' arguments are "name:value" pairs that appear as attributes, name="value".
If len(kv) == 0, the return is (nil, nil).

func ReaderValuesForTag

func ReaderValuesForTag(rdr io.Reader, tag string) ([]interface{}, error)

ReaderValuesForTag - io.Reader version of ValuesForTag()

func ReaderValuesFromTagPath

func ReaderValuesFromTagPath(rdr io.Reader, path string, getAttrs ...bool) ([]interface{}, error)

ReaderValuesFromTagPath - io.Reader version of ValuesFromTagPath()

func ToJson

func ToJson(rdr io.Reader, recast ...bool) (string, error)

ToJson() - parse a XML io.Reader to a JSON string

func ToJsonIndent

func ToJsonIndent(rdr io.Reader, recast ...bool) (string, error)

ToJsonIndent - the pretty form of ReaderToJson

func ToMap

func ToMap(rdr io.Reader, recast ...bool) (map[string]interface{}, error)

ToMap() - parse a XML io.Reader to a map[string]interface{}

func Unmarshal

func Unmarshal(doc []byte, v interface{}) error

Unmarshal - wraps xml.Unmarshal with handling of map[string]interface{} and string type variables.

Usage: x2j.Unmarshal(doc,&m) where m of type map[string]interface{}
       x2j.Unmarshal(doc,&s) where s of type string (Overrides xml.Unmarshal().)
       x2j.Unmarshal(doc,&struct) - passed to xml.Unmarshal()
       x2j.Unmarshal(doc,&slice) - passed to xml.Unmarshal()

func ValuesForKey

func ValuesForKey(m map[string]interface{}, key string) []interface{}

ValuesForKey - return all values in map associated with 'key'

Returns nil if the 'key' does not occur in the map

func ValuesForTag

func ValuesForTag(doc, tag string) ([]interface{}, error)

ValuesForTag - return all values in doc associated with 'tag'.

Returns nil if the 'tag' does not occur in the doc.
If there is an error encounted while parsing doc, that is returned.
If you want values 'recast' use DocToMap() and ValuesForKey().

func ValuesFromKeyPath

func ValuesFromKeyPath(m map[string]interface{}, path string, getAttrs ...bool) []interface{}

ValuesFromKeyPath - deliver all values for a path node from a map[string]interface{} If there are no values for the path 'nil' is returned.

'm' is the map to be walked
'path' is a dot-separated path of key values
'getAttrs' can be set 'true' to return attribute values for "*"-terminated path
       If a node is '*', then everything beyond is walked.
       E.g., see ValuesFromTagPath documentation.

func ValuesFromTagPath

func ValuesFromTagPath(doc, path string, getAttrs ...bool) ([]interface{}, error)

ValuesFromTagPath - deliver all values for a path node from a XML doc If there are no values for the path 'nil' is returned. A return value of (nil, nil) means that there were no values and no errors parsing the doc.

'doc' is the XML document
'path' is a dot-separated path of tag nodes
'getAttrs' can be set 'true' to return attribute values for "*"-terminated path
       If a node is '*', then everything beyond is scanned for values.
       E.g., "doc.books' might return a single value 'book' of type []interface{}, but
             "doc.books.*" could return all the 'book' entries as []map[string]interface{}.
             "doc.books.*.author" might return all the 'author' tag values as []string - or
         		"doc.books.*.author.lastname" might be required, depending on he schema.

func WriteMap

func WriteMap(m interface{}, offset ...int) string

WriteMap - dumps the map[string]interface{} for examination.

'offset' is initial indentation count; typically: WriteMap(m).
NOTE: with XML all element types are 'string'.
But code written as generic for use with maps[string]interface{} values from json.Unmarshal().
Or it can handle a DocToMap(doc,true) result where values have been recast'd.

func XmlBufferToJson

func XmlBufferToJson(b *bytes.Buffer, recast ...bool) (string, error)

XmlBufferToJson - process XML message from a bytes.Buffer

'b' is the buffer
Optional argument 'recast' coerces values to float64 or bool where possible.

func XmlBufferToMap

func XmlBufferToMap(b *bytes.Buffer, recast ...bool) (map[string]interface{}, error)

XmlBufferToMap - process XML message from a bytes.Buffer

'b' is the buffer
Optional argument 'recast' coerces map values to float64 or bool where possible.

func XmlMsgsFromFile

func XmlMsgsFromFile(fname string, phandler func(map[string]interface{}) bool, ehandler func(error) bool, recast ...bool) error

XmlMsgsFromFile()

'fname' is name of file
'phandler' is the map processing handler. Return of 'false' stops further processing.
'ehandler' is the parsing error handler. Return of 'false' stops further processing and returns error.
Note: phandler() and ehandler() calls are blocking, so reading and processing of messages is serialized.
      This means that you can stop reading the file on error or after processing a particular message.
      To have reading and handling run concurrently, pass arguments to a go routine in handler and return true.

func XmlMsgsFromFileAsJson

func XmlMsgsFromFileAsJson(fname string, phandler func(string) bool, ehandler func(error) bool, recast ...bool) error

XmlMsgsFromFileAsJson()

'fname' is name of file
'phandler' is the JSON string processing handler. Return of 'false' stops further processing.
'ehandler' is the parsing error handler. Return of 'false' stops further processing and returns error.
Note: phandler() and ehandler() calls are blocking, so reading and processing of messages is serialized.
      This means that you can stop reading the file on error or after processing a particular message.
      To have reading and handling run concurrently, pass arguments to a go routine in handler and return true.

func XmlMsgsFromReader

func XmlMsgsFromReader(rdr io.Reader, phandler func(map[string]interface{}) bool, ehandler func(error) bool, recast ...bool) error

XmlMsgsFromReader() - io.Reader version of XmlMsgsFromFile

'rdr' is an io.Reader for an XML message (stream)
'phandler' is the map processing handler. Return of 'false' stops further processing.
'ehandler' is the parsing error handler. Return of 'false' stops further processing and returns error.
Note: phandler() and ehandler() calls are blocking, so reading and processing of messages is serialized.
      This means that you can stop reading the file on error or after processing a particular message.
      To have reading and handling run concurrently, pass arguments to a go routine in handler and return true.

func XmlMsgsFromReaderAsJson

func XmlMsgsFromReaderAsJson(rdr io.Reader, phandler func(string) bool, ehandler func(error) bool, recast ...bool) error

XmlMsgsFromReaderAsJson() - io.Reader version of XmlMsgsFromFileAsJson

'rdr' is an io.Reader for an XML message (stream)
'phandler' is the JSON string processing handler. Return of 'false' stops further processing.
'ehandler' is the parsing error handler. Return of 'false' stops further processing and returns error.
Note: phandler() and ehandler() calls are blocking, so reading and processing of messages is serialized.
      This means that you can stop reading the file on error or after processing a particular message.
      To have reading and handling run concurrently, pass arguments to a go routine in handler and return true.

Types

type Node

type Node struct {
	// contains filtered or unexported fields
}

func ByteDocToTree

func ByteDocToTree(doc []byte) (*Node, error)

ByteDocToTree - convert an XML doc into a tree of nodes.

func DocToTree

func DocToTree(doc string) (*Node, error)

DocToTree - convert an XML doc into a tree of nodes.

func ToTree

func ToTree(rdr io.Reader) (*Node, error)

ToTree() - parse a XML io.Reader to a tree of Nodes

func XmlBufferToTree

func XmlBufferToTree(b *bytes.Buffer) (*Node, error)

BufferToTree - derived from DocToTree()

func (*Node) WriteTree

func (n *Node) WriteTree(padding ...int) string

(*Node)WriteTree - convert a tree of nodes into a printable string.

'padding' is the starting indentation; typically: n.WriteTree().

type XmlBuffer

type XmlBuffer struct {
	// contains filtered or unexported fields
}

XmlBuffer - create XML decoder buffer for a string from anywhere, not necessarily a file.

func BytesNewXmlBuffer

func BytesNewXmlBuffer(b []byte) *XmlBuffer

BytesNewXmlBuffer() - creates a bytes.Buffer from b with possibly multiple messages

Use Close() function to release the buffer for garbage collection.

func NewXmlBuffer

func NewXmlBuffer(s string) *XmlBuffer

NewXmlBuffer() - creates a bytes.Buffer from a string with multiple messages

Use Close() function to release the buffer for garbage collection.

func (*XmlBuffer) Close

func (buf *XmlBuffer) Close()

Close() - release the buffer address for garbage collection

func (*XmlBuffer) NextMap

func (buf *XmlBuffer) NextMap(recast ...bool) (map[string]interface{}, error)

NextMap() - retrieve next XML message in buffer as a map[string]interface{} value.

The optional argument 'recast' will try and coerce values to float64 or bool as appropriate.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL