README
¶
LibGuides Tools
A Golang package for working with LibGuides exported XML.
Table of contents
- Introduction
- Installation
- Known issues and limitations
- Getting help
- License
- Authors and history
- Acknowledgments
Introduction
There is a periodic need to work with exported LibGuides XML in Caltech Library. This is a Golang package for working with the exported data. Go provides a robust may of mapping simple data structures to and from XML (or JSON). This makes working with XML very easy in a consistent fashion. It seem time to move beyond my usual Bash/sed/python scripts.
One program is currently provided with springytools, lgxml2sjon which converts a LibGuides XML export file into JSON.
Installation
This is a Golang package providing two commands for working with LibGuides' exported XML. To compile you will need Go 1.16 or better, GNU Make and Stephen Dolan's jq for browser JSON output.
Steps to compile from source
- clone the repository
- change into the clone directory
- test
- build the command line tool lgxml2json
- use lgxml2json and test output with jq
- Replace "LibGuides_export_XXXXX.xml" with the file path to your exported LibGuides XML file
- install lgxml2json
Example commands to execute in the shell (e.g. Terminal on macOS, xterm on Linux)
git clone git@github.com:caltechlibrary/springytools
cd springytools
make
make test
make install
By default installation is to your $HOME/bin
directory. This directory should be in
your shell's "PATH".
You can get a brief description of the commands using the -h
option with the command.
lgxml2json -h
lglinkreport -h
Known issues and limitations
This library is currently written to perform the LibGuides link analysis. It only provides the commands I needed to do the data analysis. It will grow as needed.
The exported XML output from the LibGuides may not be valid UTF-8. UTF-8 encoding
is required to successfully parse the export file. Looking at the raw XML markup in vim
I noticed a number of control code sequences. This corresponded to the errors on parsing
the unsanitized XML file. The problem characters appear as ^A, ^K, ^L, ^S, ^C, ^R
. These
maybe non-UTF-8 characters embedded as UTF-8 when the rich text documents were pasted in via
the LibGuides edit UI. My hunch is these were pasted in/imported from Word documents. Remove
the offending characters allowed the export to parse successfully. These edits are destructive
as some of the codes probably represent UTF-8 characters used in non-English European names or
terminology.
Getting help
File an issue on GitHub.
License
Software produced by the Caltech Library is Copyright © 2021 California Institute of Technology. This software is freely distributed under a BSD/MIT type license. Please see the LICENSE file for more information.
Authors and history
- R. S. Doiel, Software Developer, Digital Library Development, Caltech Library
Acknowledgments
This work was funded by the California Institute of Technology Library.
(If this work was also supported by other organizations, acknowledge them here. In addition, if your work relies on software libraries, or was inspired by looking at other work, it is appropriate to acknowledge this intellectual debt too.)
Documentation
¶
Overview ¶
expected.go is a set of testing functions
Author: R. S. Doiel <rsdoiel@caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
extractors.go provides funcs for processing text and pulling out elements like URL links.
Author: R. S. Doiel <rsdoiel@caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
libguides.go implements the data structures for working with with LibGuides exported XML.
Author: R. S. Doiel <rsdoiel@caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
reports.go provides the functions that work in an input filename and output filename generating reports or data conversions.
Author: R. S. Doiel <rsdoiel@caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
tables.go provides a XML, JSON and CSV rendering of the Table datastructure.
Author: R. S. Doiel <rsdoiel@caltech.edu>
Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Index ¶
- Constants
- func ExtractHTTPLinks(src string) ([]string, int)
- func LibGuidesXMLFileToJSONFile(srcName, destName string) error
- func LinkReport(srcName, destName, format string) error
- type Account
- type Asset
- type Box
- type Customer
- type Group
- type Guide
- type LibGuides
- type Owner
- type Page
- type Pane
- type Site
- type Subject
- type TBody
- type THead
- type Table
- func (t *Table) AppendHeadings(cells ...string)
- func (t *Table) AppendRow(cells ...string)
- func (t *Table) SetCaption(caption string)
- func (t *Table) ToCSVFile(destName string, header bool) error
- func (t *Table) ToJSON() ([]byte, error)
- func (t *Table) ToJSONFile(destName string) error
- func (t *Table) ToXML() ([]byte, error)
- func (t *Table) ToXMLFile(destName string) error
- type Tag
- type Vendor
Constants ¶
const Version = "0.0.3"
Variables ¶
This section is empty.
Functions ¶
func ExtractHTTPLinks ¶ added in v0.0.2
ExtractHTTPLinks scan a string and look for URL or HRef extracting the links returning a list of URLs found and count. If count is zero, no URLs found.
NOTE: This only extracts full URLs (e.g. starts with http://, https://)
func LibGuidesXMLFileToJSONFile ¶ added in v0.0.2
LibGuidesXMLFileToJSONFile reads in a LibGuides XML export file and writes a JSON version of the file. It expects the name of the XML file in srcName the name of the JSON file in destName. It will return an error if any encountered.
func LinkReport ¶ added in v0.0.2
LinkReport reads in a LibGuides XML export and generates a link report encoded in JSON. Accepts a srcName (LibGuides XML export), destName, format (i.e. csv, json, xml). Returns an error if any encountered.
Types ¶
type Account ¶
type Account struct { Id int `xml:"id" json:"id"` Email string `xml:"email" json:"email"` FirstName string `xml:"first_name" json:"first_name"` LastName string `xml:"last_name" json:"last_name"` Title string `xml:"title" json:"title"` Nickname string `xml:"nickname" json:"nickname"` Signature string `xml:"signature" json:"signature"` Image string `xml:"image" json:"image"` Address string `xml:"address" json:"address"` Phone string `xml:"phone" json:"phone"` Skype string `xml:"skype" json:"skype"` Website string `xml:"website" json:"website"` Created string `xml:"created" json:"created"` Updated string `xml:"updated" json:"updated"` }
type Asset ¶
type Asset struct { Id int `xml:"id" json:"id"` Name string `xml:"name" json:"name"` Type string `xml:"type" json:"type"` // Description contains HTML encoded text, double encoding existing encoded text Description string `xml:"description" json:"description"` Url string `xml:"url" json:"url"` Owner Owner `xml:"owner" json:"owner"` MapId string `xml:"map_id" json:"map_id"` Position int `xml:"position" json:"position"` Created string `xml:"created" json:"created"` Updated string `xml:"updated" json:"updated"` }
type Box ¶
type Box struct { XMLName xml.Name `xml:"box" json:"box"` Id int `xml:"id" json:"id"` Name string `xml:"name" json:"name"` Type string `xml:"type" json:"type"` MapId string `xml:"map_id" json:"map_id"` Column int `xml:"column" json:"column"` Position int `xml:"position" json:"position"` Hidden int `xml:"hidden" json:"hidden"` Created string `xml:"created" json:"created"` Updated string `xml:"updated" json:"updated"` Assets []*Asset `xml:"assets>asset" json:"assets"` Panes []*Pane `xml:"panes>pane,omitempty" json:"panes,omitempty"` }
type Customer ¶
type Customer struct { XMLName xml.Name `xml:"customer" json:"-"` Id int `xml:"id" json:"id"` Type string `xml:"type" json:"type"` Name string `xml:"name" json:"name"` Url string `xml:"url" json:"url"` City string `xml:"city" json:"city"` State string `xml:"state" json:"state"` Country string `xml:"country" json:"country"` TimeZone string `xml:"time_zone" json:"time_zone"` Created string `xml:"created" json:"created"` Updated string `xml:"updated" json:"updated"` }
type Group ¶
type Group struct { Id int `xml:"id" json:"id"` Type string `xml:"type" json:"type"` Name string `xml:"name" json:"name"` Url string `xml:"url" json:"url"` Description string `xml:"description" json:"description"` Password string `xml:"password" json:"password"` Created string `xml:"created" json:"created"` Updated string `xml:"updated" json:"updated"` }
type Guide ¶
type Guide struct { Id int `xml:"id" json:"id"` Type string `xml:"type" json:"type"` Name string `xml:"name" json:"name"` Description string `xml:"description" json:"description"` Url string `xml:"url" json:"url"` Owner Owner `xml:"owner" json:"owner"` Group Group `xml:"group" json:"group"` Redirect string `xml:"redirect" json:"redirect"` Status string `xml:"status" json:"status"` Created string `xml:"created" json:"created"` Updated string `xml:"updated" json:"updated"` Modified string `xml:"modified" json:"modified"` Published string `xml:"published" json:"published"` Subjects []*Subject `xml:"subjects>subject" json:"subjects"` Tags []*Tag `xml:"tags>tag" json:"tags"` Pages []*Page `xml:"pages>page" json:"pages"` }
type LibGuides ¶
type LibGuides struct { XMLName xml.Name `json:"-"` Customer *Customer `xml:"customer" json:"customer"` Site *Site `xml:"site" json:"site"` Accounts []*Account `xml:"accounts>account" json:"accounts"` Groups []*Group `xml:"groups>group" json:"groups"` Subjects []*Subject `xml:"subjects>subject" json:"subjects"` Tags []*Tag `xml:"tags>tag" json:"tags"` Vendors []*Vendor `xml:"vendors>vendor" json:"vendors"` Guides []*Guide `xml:"guides>guide" json:"guides"` }
type Page ¶
type Page struct { Id int `xml:"id" json:"id"` Name string `xml:"name" json:"name"` Description string `xml:"description" json:"description"` Url string `xml:"url" json:"url"` Redirect string `xml:"redirect" json:"redirect"` SourcePageId int `xml:"source_page_id" json:"source_page_id"` ParentPageId int `xml:"parent_page_id" json:"parent_page_id"` Position int `xml:"position" json:"position"` Hidden int `xml:"hidden" json:"hidden"` Created string `xml:"created" json:"created"` Updated string `xml:"updated" json:"updated"` Modified string `xml:"modified" json:"modified"` Boxes []*Box `xml:"boxes>box" json:"boxes"` }
type Site ¶
type Site struct { XMLName xml.Name `xml:"site" json:"-"` Id int `xml:"id" json:"jd"` Type string `xml:"type" json:"type"` Name string `xml:"name" json:"name"` Domain string `xml:"domain" json:"domain"` Admin string `xml:"admin" json:"admin"` Created string `xml:"created" json:"created"` Updated string `xml:"updated" json:"updated"` }
type Table ¶ added in v0.0.2
type Table struct { XMLName xml.Name `xml:"table" json:"-"` Caption string `xml:"caption" json:"caption,omitempty"` Head THead `xml:"thead" json:"head,omitempty"` Body TBody `xml:"tbody" json:"body,omitempty"` }
func (*Table) AppendHeadings ¶ added in v0.0.2
func (*Table) SetCaption ¶ added in v0.0.2
func (*Table) ToCSVFile ¶ added in v0.0.2
ToCSVFile will create a CSV version of Table, it is a destructive write. A file with the same name will be replaced. Accepts the filename and header boolean. if header is true and the table's header is populated it will render a header row at start of the CSV output. Returns an error if one is encountered.
func (*Table) ToJSONFile ¶ added in v0.0.2
ToJSONFile will creates a JSON version of Table, it is a destructive write. A file with the same name will be replaced. Accepts the filename and Returns an error if one is encountered.
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
cmd
|
|
lglinkreport
linkreport.go traverse all the fields that have links and reports where they are found.
|
linkreport.go traverse all the fields that have links and reports where they are found. |
lgxml2json
lgxml2json.go converts a LibGuides XML export into JSON
|
lgxml2json.go converts a LibGuides XML export into JSON |