zek

package module
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 14, 2018 License: GPL-3.0 Imports: 11 Imported by: 0

README

zek

Zek is a prototype for creating a Go struct from an XML document.

Build Status

Upsides:

  • it works fine for non-recursive structures,
  • does not need XSD or DTD,
  • it is relatively convenient to access attributes, children and text,
  • will generate a single struct, which make for a quite compact representation,
  • simple user interface,
  • comments with examples,
  • schema inference across multiple files.

Downsides:

  • experimental, early, buggy, unstable prototype,
  • no support for recursive types (similar to Russian Doll strategy, [1])
  • no type inference, everything is accessible as string.

Bugs:

Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values.

https://golang.org/pkg/encoding/xml/#pkg-note-BUG

Related projects:

Install

$ go get github.com/miku/zek/cmd/...

Debian and RPM packages:

Usage

$ zek -h
Usage of zek:
  -F    skip formatting
  -d    debug output
  -e    add comments with example
  -max-examples int
        limit number of examples (default 10)
  -n string
        use a different name for the top-level struct
  -p    write out an example program
  -s    strict parsing and writing
  -t string
        emit struct for tag matching this name
  -version
        show version
  -x int
        max chars for example (default 25)

Examples:

$ cat fixtures/a.xml
<a></a>

$ zek < fixtures/a.xml
type A struct {
    XMLName xml.Name `xml:"a"`
    Text    string   `xml:",chardata"`
}

Debug output dumps the internal tree as JSON to stdout.

$ zek -d < fixtures/a.xml
{"name":{"Space":"","Local":"a"}}

Example program:

package main

import (
	"encoding/json"
	"encoding/xml"
	"fmt"
	"log"
	"os"
)

// A was generated 2017-12-05 17:35:21 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
}

func main() {
	dec := xml.NewDecoder(os.Stdin)
	var doc A
	if err := dec.Decode(&doc); err != nil {
		log.Fatal(err)
	}
	b, err := json.Marshal(doc)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(b))
}

$ zek -p < fixtures/a.xml > sample.go && go run sample.go < fixtures/a.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "a"
  },
  "Text": ""
}

More complex example:

$ zek < fixtures/d.xml
type Root struct {
	XMLName xml.Name `xml:"root"`
	Text    string   `xml:",chardata"`
	A       []struct {
		Text string `xml:",chardata"`
		B    []struct {
			Text string `xml:",chardata"`
			C    struct {
				Text string `xml:",chardata"`
			} `xml:"c"`
			D struct {
				Text string `xml:",chardata"`
			} `xml:"d"`
		} `xml:"b"`
	} `xml:"a"`
}

$ zek -p < fixtures/d.xml > sample.go && go run sample.go < fixtures/d.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "root"
  },
  "Text": "\n\n\n\n",
  "A": [
    {
      "Text": "\n  \n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": {
            "Text": "Hi"
          },
          "D": {
            "Text": ""
          }
        },
        {
          "Text": "\n    \n    \n  ",
          "C": {
            "Text": "World"
          },
          "D": {
            "Text": ""
          }
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": {
            "Text": "Hello"
          },
          "D": {
            "Text": ""
          }
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": {
            "Text": ""
          },
          "D": {
            "Text": "World"
          }
        }
      ]
    }
  ]
}

Annotate with comments:

$ zek -e < fixtures/l.xml
type Records struct {
	XMLName xml.Name `xml:"Records"`
	Text    string   `xml:",chardata"` // \n
	Xsi     string   `xml:"xsi,attr"`
	Record  []struct {
		Text   string `xml:",chardata"`
		Header struct {
			Text       string `xml:",chardata"`
			Status     string `xml:"status,attr"`
			Identifier struct {
				Text string `xml:",chardata"` // oai:ojs.localhost:article...
			} `xml:"identifier"`
			Datestamp struct {
				Text string `xml:",chardata"` // 2009-06-24T14:48:23Z, 200...
			} `xml:"datestamp"`
			SetSpec struct {
				Text string `xml:",chardata"` // eppp:ART, eppp:ART, eppp:...
			} `xml:"setSpec"`
		} `xml:"header"`
		Metadata struct {
			Text    string `xml:",chardata"`
			Rfc1807 struct {
				Text           string `xml:",chardata"`
				Xmlns          string `xml:"xmlns,attr"`
				Xsi            string `xml:"xsi,attr"`
				SchemaLocation string `xml:"schemaLocation,attr"`
				BibVersion     struct {
					Text string `xml:",chardata"` // v2, v2, v2, v2, v2, v2, v...
				} `xml:"bib-version"`
				ID struct {
					Text string `xml:",chardata"` // http://journals.zpid.de/i...
				} `xml:"id"`
				Entry struct {
					Text string `xml:",chardata"` // 2009-06-24T14:48:23Z, 200...
				} `xml:"entry"`
				Organization []struct {
					Text string `xml:",chardata"` // Proceedings of the Worksh...
				} `xml:"organization"`
				Title struct {
					Text string `xml:",chardata"` // Introduction and some Ide...
				} `xml:"title"`
				Type struct {
					Text string `xml:",chardata"`
				} `xml:"type"`
				Author []struct {
					Text string `xml:",chardata"` // KRAMPEN, Günter, CARBON,...
				} `xml:"author"`
				Copyright struct {
					Text string `xml:",chardata"` // Das Urheberrecht liegt be...
				} `xml:"copyright"`
				OtherAccess struct {
					Text string `xml:",chardata"` // url:http://journals.zpid....
				} `xml:"other_access"`
				Keyword struct {
					Text string `xml:",chardata"`
				} `xml:"keyword"`
				Period []struct {
					Text string `xml:",chardata"`
				} `xml:"period"`
				Monitoring struct {
					Text string `xml:",chardata"`
				} `xml:"monitoring"`
				Language struct {
					Text string `xml:",chardata"` // en, en, en, en, en, en, e...
				} `xml:"language"`
				Abstract struct {
					Text string `xml:",chardata"` // After a short description...
				} `xml:"abstract"`
				Date struct {
					Text string `xml:",chardata"` // 2009-06-22 12:12:00, 2009...
				} `xml:"date"`
			} `xml:"rfc1807"`
		} `xml:"metadata"`
		About struct {
			Text string `xml:",chardata"`
		} `xml:"about"`
	} `xml:"Record"`
}

Only consider a nested element

$ zek -t thesis < fixtures/z.xml
type Thesis struct {
	XMLName        xml.Name `xml:"thesis"`
	Text           string   `xml:",chardata"`
	Xmlns          string   `xml:"xmlns,attr"`
	Doc            string   `xml:"doc,attr"`
	Xsi            string   `xml:"xsi,attr"`
	SchemaLocation string   `xml:"schemaLocation,attr"`
	Title          []struct {
		Text string `xml:",chardata"`
	} `xml:"title"`
	Creator []struct {
		Text string `xml:",chardata"`
	} `xml:"creator"`
	Date []struct {
		Text string `xml:",chardata"`
	} `xml:"date"`
	Identifier []struct {
		Text string `xml:",chardata"`
	} `xml:"identifier"`
	Language []struct {
		Text string `xml:",chardata"`
	} `xml:"language"`
	Rights []struct {
		Text string `xml:",chardata"`
	} `xml:"rights"`
	Coverage []struct {
		Text string `xml:",chardata"`
	} `xml:"coverage"`
	Publisher []struct {
		Text string `xml:",chardata"`
	} `xml:"publisher"`
	Contributor []struct {
		Text string `xml:",chardata"`
	} `xml:"contributor"`
	Subject []struct {
		Text string `xml:",chardata"`
	} `xml:"subject"`
	Description []struct {
		Text string `xml:",chardata"`
	} `xml:"description"`
	Source struct {
		Text string `xml:",chardata"`
	} `xml:"source"`
	Type struct {
		Text string `xml:",chardata"`
	} `xml:"type"`
	Relation []struct {
		Text string `xml:",chardata"`
	} `xml:"relation"`
}

Inference across files

$ zek fixtures/a.xml fixtures/b.xml fixtures/c.xml
// A was generated 2017-12-05 17:40:14 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
	B       []struct {
		Text string `xml:",chardata"`
	} `xml:"b"`
}

Misc

As a side effect, zek seems to be a useful for debugging. Example:

This record is emitted from a typical OAI server (OJS, not even uncommon), yet one can quickly spot the flaw in the structure.

Over 30 different struct generated manually in the course of a few hours (around five minutes per source): https://git.io/vbTDo.

-- Current extent leader: 1532 lines struct

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// UppercaseByDefault is used during XML tag name to Go name conversion.
	UppercaseByDefault = []string{"id", "Id", "isbn", "ismn", "json",
		"eissn", "issn", "http", "lccn", "rfc", "rsn", "uri", "url",
		"urn", "xml", "Xml", "zdb"}
	// DefaultTextFieldNames list struct field names for chardata, most preferred first.
	DefaultTextFieldNames = []string{"Text", "Chardata"}
	// DefaultAttributePrefixes are used, if there are name clashes.
	DefaultAttributePrefixes = []string{"Attr", "Attribute"}
)
View Source
var Version = "0.1.3"

Functions

func CreateNameFunc added in v0.1.2

func CreateNameFunc(upper []string) func(string) string

CreateNameFunc returns a function that converts a tag into a canonical Go name. Given list of strings will be wholly upper cased.

Types

type Node

type Node struct {
	Name        xml.Name   `json:"name,omitempty"`
	Attr        []xml.Attr `json:"attr,omitempty"`
	Examples    []string   `json:"examples,omitempty"`
	Children    []*Node    `json:"children,omitempty"`
	Freqs       []int      `json:"-"` // Collect number of occurences of this node within parent.
	MaxExamples int        `json:"-"` // Maximum number of examples to keep, gets passed to children.
	// contains filtered or unexported fields
}

Node represents an element in the XML tree. It keeps track of its name, attributes, childnodes and example chardata and basic statistics, e.g. how often a node has been seen within its parent node.

func (*Node) ByName added in v0.1.2

func (node *Node) ByName(name string) *Node

ByName finds a node in the tree by name. Comparisons start at the current node. First match is returned. If nothing matches, nil is returned.

func (*Node) CreateOrGetChild

func (node *Node) CreateOrGetChild(name xml.Name, attr []xml.Attr) *Node

CreateOrGetChild creates a child if no child with the same tag name exists, otherwise returns the existing node with that name. We want to collect node and attribute information for a node and not replicate the XML tree.

func (*Node) End

func (node *Node) End()

End signals end of an element.

func (*Node) Height

func (node *Node) Height() int

Height returns the height of the tree. A tree with zero nodes has height zero, a single node tree has height 1.

func (*Node) IsMultivalued

func (node *Node) IsMultivalued() bool

IsMultivalued returns true, if this node appeared more than once.

func (*Node) ReadFrom

func (node *Node) ReadFrom(r io.Reader) (int64, error)

ReadFrom reads XML from a reader.

func (*Node) ReadFromAll added in v0.1.2

func (node *Node) ReadFromAll(readers []io.Reader) (n int64, err error)

ReadFromAll builds a single node from all readers.

type Stack

type Stack struct {
	sync.Mutex
	// contains filtered or unexported fields
}

Stack is a simple stack for arbitrary types.

func (*Stack) Len

func (s *Stack) Len() int

Len returns number of items on the stack.

func (*Stack) Peek

func (s *Stack) Peek() interface{}

Peek returns the top element without removing it. Panic it stack is empty.

func (*Stack) Pop

func (s *Stack) Pop() interface{}

Pop item from stack. It's a panic if stack is empty.

func (*Stack) Put

func (s *Stack) Put(item interface{})

Put item onto stack.

type StructWriter

type StructWriter struct {
	NameFunc          func(string) string // Turns xml tag names into Go names.
	TextFieldNames    []string            // Field name for chardata.
	AttributePrefixes []string            // In case of a name clash, try these prefixes.
	WithComments      bool                // Annotate struct with comments and examples.
	Banner            string              // Autogenerated note.
	ExampleMaxChars   int                 // Max length of example comment.
	Strict            bool                // Whether to ignore implementation holes.
	WithJSONTags      bool                // Include JSON struct tags.
	// contains filtered or unexported fields
}

StructWriter can turn a node into a struct and can be configured.

func NewStructWriter

func NewStructWriter(w io.Writer) *StructWriter

NewStructWriter can write a node to a given writer. Default list of abbreviations to wholly uppercase.

func (*StructWriter) WriteNode

func (sw *StructWriter) WriteNode(node *Node) (err error)

WriteNode writes a node to a writer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL