grate

package module
v0.0.0-...-3f8e65d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 6, 2023 License: MIT Imports: 3 Imported by: 4

README

grate

A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

Why?

Grate focuses on speed and stability first, and makes no attempt to parse charts, figures, or other content types that may be present embedded within the input files. It tries to perform as few allocations as possible and errs on the side of caution.

There are certainly still some bugs and edge cases, but we have run it successfully on a set of 400k .xls and .xlsx files to catch many bugs and error conditions. Please file an issue with any feedback and additional problem files.

Usage

Grate provides a simple standard interface for all supported filetypes, allowing access to both named worksheets in spreadsheets and single tables in plaintext formats.

package main

import (
    "fmt"
    "os"
    "strings"

    "github.com/pbnjay/grate"
    _ "github.com/pbnjay/grate/simple" // tsv and csv support
    _ "github.com/pbnjay/grate/xls"
    _ "github.com/pbnjay/grate/xlsx"
)

func main() {
    wb, _ := grate.Open(os.Args[1])  // open the file
    sheets, _ := wb.List()           // list available sheets
    for _, s := range sheets {       // enumerate each sheet name
        sheet, _ := wb.Get(s)        // open the sheet
        for sheet.Next() {           // enumerate each row of data
            row := sheet.Strings()   // get the row's content as []string
            fmt.Println(strings.Join(row, "\t"))
        }
    }
    wb.Close()
}

License

All source code is licensed under the MIT License.

Documentation

Overview

Package grate opens tabular data files (such as spreadsheets and delimited plaintext files) and allows programmatic access to the data contents in a consistent interface.

Index

Constants

View Source
const (
	// ContinueColumnMerged marks a continuation column within a merged cell.
	ContinueColumnMerged = "→"
	// EndColumnMerged marks the last column of a merged cell.
	EndColumnMerged = "⇥"

	// ContinueRowMerged marks a continuation row within a merged cell.
	ContinueRowMerged = "↓"
	// EndRowMerged marks the last row of a merged cell.
	EndRowMerged = "⤓"
)

Variables

View Source
var (

	// Debug should be set to true to expose detailed logging.
	Debug bool = (loglevel == "debug")
)
View Source
var ErrInvalidScanType = errors.New("grate: Scan only supports *bool, *int, *float64, *string, *time.Time arguments")

ErrInvalidScanType is returned by Scan for invalid arguments.

View Source
var ErrNotInFormat = errors.New("grate: file is not in this format")

ErrNotInFormat is used to auto-detect file types using the defined OpenFunc It is returned by OpenFunc when the code does not detect correct file formats.

View Source
var ErrUnknownFormat = errors.New("grate: file format is not known/supported")

ErrUnknownFormat is used when grate does not know how to open a file format.

Functions

func Register

func Register(name string, priority int, opener OpenFunc) error

Register the named source as a grate datasource implementation.

func WrapErr

func WrapErr(e ...error) error

WrapErr wraps a set of errors.

Types

type Collection

type Collection interface {
	// Next advances to the next record of content.
	// It MUST be called prior to any Scan().
	Next() bool

	// Strings extracts values from the current record into a list of strings.
	Strings() []string

	// Types extracts the data types from the current record into a list.
	// options: "boolean", "integer", "float", "string", "date",
	// and special cases: "blank", "hyperlink" which are string types
	Types() []string

	// Formats extracts the format codes for the current record into a list.
	Formats() []string

	// Scan extracts values from the current record into the provided arguments
	// Arguments must be pointers to one of 5 supported types:
	//     bool, int64, float64, string, or time.Time
	// If invalid, returns ErrInvalidScanType
	Scan(args ...interface{}) error

	// IsEmpty returns true if there are no data values.
	IsEmpty() bool

	// Err returns the last error that occured.
	Err() error
}

Collection represents an iterable collection of records.

type OpenFunc

type OpenFunc func(filename string) (Source, error)

OpenFunc defines a Source's instantiation function. It should return ErrNotInFormat immediately if filename is not of the correct file type.

type Source

type Source interface {
	// List the individual data tables within this source.
	List() ([]string, error)

	// Get a Collection from the source by name.
	Get(name string) (Collection, error)

	// Close the source and discard memory.
	Close() error
}

Source represents a set of data collections.

func Open

func Open(filename string) (Source, error)

Open a tabular data file and return a Source for accessing it's contents.

Directories

Path Synopsis
cmd
grate2tsv
Command grate2tsv is a highly parallel tabular data extraction tool.
Command grate2tsv is a highly parallel tabular data extraction tool.
grater
Command grater extracts contents of the tabular files to stdout.
Command grater extracts contents of the tabular files to stdout.
xls
Package xls implements the Microsoft Excel Binary File Format (.xls) Structure.
Package xls implements the Microsoft Excel Binary File Format (.xls) Structure.
cfb
Package cfb implements the Microsoft Compound File Binary File Format.
Package cfb implements the Microsoft Compound File Binary File Format.
crypto
Package crypto implements excel encryption algorithms from the MS-OFFCRYPTO design specs.
Package crypto implements excel encryption algorithms from the MS-OFFCRYPTO design specs.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL