datatools

package module
v0.0.16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 22, 2017 License: BSD-3-Clause Imports: 10 Imported by: 0

README

datatools

For data

Command line utilities for simplifying work with CSV, JSON, Excel Workbooks and plain text files or content and general purpose shell scripting.

  • csv2json - a tool to take a CSV file and convert it into a JSON blob array or a list of JSON blobs one per line
  • csv2mdtable - a tool to render CSV as a Github Flavored Markdown table
  • csv2xlsx - a tool to take a CSV file and add it as a sheet to a Excel Workbook file.
  • csvcols - a tool for formatting command line arguments into CSV row of columns or filtering CSV rows for specific columns
  • csvfind - a tool for filtering a CSV file by column's value
  • csvjoin - a tool to join to CSV files on common values in designated columns, writes combined CSV rows to stdout
  • csvrows - a tool for formatting command line arguments into CSV columns of rows or filtering CSV columns for specific rows
  • jsoncols - a tool for exploring and extracting JSON values into columns
  • jsonjoin - a tool for joining JSON object documents
  • jsonmunge - a tool to transform JSON documents into something else
  • jsonrange - a tool for iterating for JSON maps and arrays
  • splitstring - splits a string using a delimiting string and returns a JSON array
  • vcard2json - an experimental tool to convert vCards to JSON
  • xlsx2csv - a tool for converting Excel Workbooks sheets to a CSV file(s)
  • xlsx2json - a tool for converting Excel Workbooks to JSON files

Compiled versions are provided for Linux (amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/datatools/releases.

Use the utilities try "-help" option for a full list of options.

For scripting

Various utilities for simplifying work on the command line.

  • findfile - find files based on prefix, suffix or contained string
  • finddir - find directories based on prefix, suffix or contained string
  • mergepath - prefix, append, clip path variables
  • range - emit a range of integers (useful for numbered loops in Bash)
  • reldate - display a relative date in YYYY-MM-DD format
  • timefmt - format a time value based on Golang's time format language
  • urlparse - split a URL into parts

Compiled versions are provided for Linux (amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/datatools/releases.

Use the utilities try "-help" option for a full list of options.

Installation

datatools is go get-able.

    go get github.com/caltechlibrary/datatools/...

Or see INSTALL.md for details for installing compiled versions of the programs.

Documentation

Overview

datatools package is a collection of Go based command line tools for working with JSON content

@Author R. S. Doiel, <rsdoiel@caltech.edu>

Copyright (c) 2017, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Index

Constants

View Source
const (
	Version = `v0.0.16`

	LicenseText = `` /* 1530-byte string literal not displayed */

	// Constants for datatools functions
	AsDelimited = iota
	AsCSV       = iota
	AsJSON      = iota
)

Variables

This section is empty.

Functions

func ApplyStopWords added in v0.0.7

func ApplyStopWords(fields []string, stopWords []string) []string

ApplyStopWords takes a list of words (array of strings) and removes any occurrences of the stop words return a revised list of words.

func CSVMarshal added in v0.0.7

func CSVMarshal(fields []string) ([]byte, error)

CSVMarshal takes a list of strings and returns a byte array of CSV formated output.

func Filter added in v0.0.7

func Filter(c rune, allowableCharacters string, allowPunctuation bool) bool

Filter filters out characters from string. By default it allows letters and numbers through with options for allow punctuation and other specific characters. Returns true if matches filter, false otherwise

func Levenshtein added in v0.0.7

func Levenshtein(src string, target string, insertCost int, deleteCost int, substituteCost int, caseSensitive bool) int

Levenshtein does a fuzzy match on two strings.

func NormalizeDelimiter added in v0.0.7

func NormalizeDelimiter(s string) string

NormalizeDelimiters handles the messy translation from a format string received as an option in the cli to something useful to pass to Join.

func NormalizeDelimiterRune added in v0.0.11

func NormalizeDelimiterRune(s string) rune

NormalizeDelimiterRune take a delimiter string and returns a single Rune

func ParseRange added in v0.0.10

func ParseRange(s string, max int) ([]int, error)

ParseRange take a range notation string and convert it into a list of integers

func Text2Fields added in v0.0.7

func Text2Fields(r *bufio.Reader, options *Options) ([]byte, error)

Text2Fields process a io.Reader as input and returns byte array of fields and error Options provides the configuration to apply

Types

type Options added in v0.0.7

type Options struct {
	AllowCharacters  string
	AllowPunctuation bool
	ToLower          bool
	ToUpper          bool
	StopWords        []string
	Delimiter        string
	Format           int
}

Options is the data structure to configure the Text2Fields parser

Directories

Path Synopsis
cmds
csv2json
csv2json - is a command line that takes CSV input from stdin and writes out JSON expression.
csv2json - is a command line that takes CSV input from stdin and writes out JSON expression.
csv2mdtable
csv2mdtable - is a command line that takes CSV input from stdin and writes out a Github Flavored Markdown table.
csv2mdtable - is a command line that takes CSV input from stdin and writes out a Github Flavored Markdown table.
csv2xlsx
csv2xlsx is a command line utility that will convert a CSV file and insert it into a named sheet in an Excel Workbook.
csv2xlsx is a command line utility that will convert a CSV file and insert it into a named sheet in an Excel Workbook.
csvcols
csvcols - is a command line that takes each argument in order and outputs a line in CSV format.
csvcols - is a command line that takes each argument in order and outputs a line in CSV format.
csvfind
csvfind - is a command line that takes CSV files in returns the rows that match a column value.
csvfind - is a command line that takes CSV files in returns the rows that match a column value.
csvjoin
csvjoin - is a command line that takes two CSV files and joins them by match a designated column in each.
csvjoin - is a command line that takes two CSV files and joins them by match a designated column in each.
csvrows
csvrows - is can filter selected rows, out row ranges or turn each command line parameter into a CSV row of output.
csvrows - is can filter selected rows, out row ranges or turn each command line parameter into a CSV row of output.
finddir
finddir.go - a simple directory tree walker that looks for directories by name, basename or extension.
finddir.go - a simple directory tree walker that looks for directories by name, basename or extension.
findfile
findfile.go - a simple directory tree walker that looks for files by name, basename or extension.
findfile.go - a simple directory tree walker that looks for files by name, basename or extension.
jsoncols
jsoncols is a command line tool for filter JSON data from standard in or specified files.
jsoncols is a command line tool for filter JSON data from standard in or specified files.
jsonjoin
jsonjoin is a command line tool that takes two JSON documents and combined them into one depending on the options @author R. S. Doiel, <rsdoiel@caltech.edu>
jsonjoin is a command line tool that takes two JSON documents and combined them into one depending on the options @author R. S. Doiel, <rsdoiel@caltech.edu>
jsonmunge
jsonmunge is a command line tool that takes a JSON document and a Go text/template rendering the result.
jsonmunge is a command line tool that takes a JSON document and a Go text/template rendering the result.
jsonrange
jsonrange iterates over an array or map returning either a JSON expression or map keep to stdout @Author R. S. Doiel, <rsdoiel@caltech.edu>
jsonrange iterates over an array or map returning either a JSON expression or map keep to stdout @Author R. S. Doiel, <rsdoiel@caltech.edu>
mergepath
mergepath.go - merge the path variable to avoid duplicates @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2017, Caltech All rights not granted herein are expressly reserved by Caltech.
mergepath.go - merge the path variable to avoid duplicates @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2017, Caltech All rights not granted herein are expressly reserved by Caltech.
range
range.go - emit a list of integers separated by spaces starting from first command line parameter to last command line parameter.
range.go - emit a list of integers separated by spaces starting from first command line parameter to last command line parameter.
reldate
Generates a date in YYYY-MM-DD format based on a relative time description (e.g.
Generates a date in YYYY-MM-DD format based on a relative time description (e.g.
timefmt
datefmt formats a date based on the formatting options available with Golang's Time.Format @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2017, Caltech All rights not granted herein are expressly reserved by Caltech.
datefmt formats a date based on the formatting options available with Golang's Time.Format @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2017, Caltech All rights not granted herein are expressly reserved by Caltech.
urlparse
urlparse.go - a URL Parser library for use in Bash scripts.
urlparse.go - a URL Parser library for use in Bash scripts.
vcard2json
vcard2json - converts a single VCard version4.0 to a JSON document @author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2017, Caltech All rights not granted herein are expressly reserved by Caltech.
vcard2json - converts a single VCard version4.0 to a JSON document @author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2017, Caltech All rights not granted herein are expressly reserved by Caltech.
xlsx2csv
xlsx2csv.go is a command line utility that converts individual Excel Workbook Sheets to CSV.
xlsx2csv.go is a command line utility that converts individual Excel Workbook Sheets to CSV.
xlsx2json
xlsx2json.go is a command line utility that converts an Excel Workboom Sheet into JSON.
xlsx2json.go is a command line utility that converts an Excel Workboom Sheet into JSON.
Package reldate generates a date in YYYY-MM-DD format based on a relative time description (e.g.
Package reldate generates a date in YYYY-MM-DD format based on a relative time description (e.g.
timefmt provides additional common formats found around the web that are missing from Golang's own time package.
timefmt provides additional common formats found around the web that are missing from Golang's own time package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL