xlsxreader

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 3, 2024 License: MIT Imports: 13 Imported by: 0

README

xlsxreader logo

xlsxreader: A Go Package for reading data from an xlsx file

Overview

Go Reference Go Report Card

A low-memory high performance library for reading data from an xlsx file.

Suitable for reading .xlsx data and designed to aid with the bulk uploading of data where the key requirement is to parse and read raw data.

The reader will read data out row by row (1->n) and has no concept of headers or data types (this is to be managed by the consumer).

The reader is currently not concerned with handling some of the more advanced cell data that can be stored in a xlsx file.

Further reading on how this came to be is available on our blog

Install

go get github.com/thedatashed/xlsxreader

Example Usage

Reading from the file system:

package main

import (
  "github.com/thedatashed/xlsxreader"
)

func main() {
    // Create an instance of the reader by opening a target file
    xl, _ := xlsxreader.OpenFile("./test.xlsx")

    // Ensure the file reader is closed once utilised
    defer xl.Close()

    // Iterate on the rows of data
    for row := range xl.ReadRows(xl.Sheets[0]){
    ...
    }
}

Reading from an already in-memory source

package main

import (
  "io/ioutil"
  "github.com/thedatashed/xlsxreader"
)

func main() {

    // Preprocessing of file data
    file, _ := os.Open("./test/test-small.xlsx")
    defer file.Close()
    bytes, _ := ioutil.ReadAll(file)

    // Create an instance of the reader by providing a data stream
    xl, _ := xlsxreader.NewReader(bytes)

    // Iterate on the rows of data
    for row := range xl.ReadRows(xl.Sheets[0]){
    ...
    }
}

Key Concepts

Files

The reader operates on a single file and will read data from the specified file using the OpenFile function.

Data

The Reader can also be instantiated with a byte array by using the NewReader function.

Sheets

An xlsx workbook can contain many worksheets, when reading data, the target sheet name should be passed. To process multiple sheets, either iterate on the array of sheet names identified by the reader or make multiple calls to the ReadRows function with the desired sheet names.

Rows

A sheet contains n rows of data, the reader returns an iterator that can be accessed to cycle through each row of data in a worksheet. Each row holds an index and contains n cells that contain column data.

Cells

A cell represents a row/column value and contains a string representation of that data. Currently numeric data is parsed as found, with dates parsed to ISO 8601 / RFC3339 format.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Cell

type Cell struct {
	Column string // E.G   A, B, C
	Row    int
	Value  string
	Type   CellType
}

Cell represents the data in a single cell as a consumable format.

func (Cell) ColumnIndex

func (c Cell) ColumnIndex() int

ColumnIndex gives a number, representing the column the cell lies beneath.

type CellType

type CellType string

CellType defines the data type of an excel cell

const (
	// TypeString is for text cells
	TypeString CellType = "string"
	// TypeNumerical is for numerical values
	TypeNumerical CellType = "numerical"
	// TypeDateTime is for date values
	TypeDateTime CellType = "datetime"
	// TypeBoolean is for true/false values
	TypeBoolean CellType = "boolean"
)

type Row

type Row struct {
	Error error
	Index int
	Cells []Cell
}

Row represents a row of data read from an Xlsx file, in a consumable format

type XlsxFile

type XlsxFile struct {
	Sheets []string
	// contains filtered or unexported fields
}

XlsxFile defines a populated XLSX file struct.

func NewReader

func NewReader(xlsxBytes []byte) (*XlsxFile, error)

NewReader takes bytes of Xlsx file and returns a populated XlsxFile struct for it. If the file cannot be found, or key parts of the files contents are missing, an error is returned.

func NewReaderZip

func NewReaderZip(r *zip.Reader) (*XlsxFile, error)

NewReaderZip takes zip reader of Xlsx file and returns a populated XlsxFile struct for it. If the file cannot be found, or key parts of the files contents are missing, an error is returned.

func (*XlsxFile) ReadRows

func (x *XlsxFile) ReadRows(sheet string) chan Row

ReadRows provides an interface allowing rows from a specific worksheet to be streamed from an xlsx file. In order to provide a simplistic interface, this method returns a channel that can be range-d over.

If you want to read only some of the values, please ensure that the Close() method is called after processing the entire file to stop all active goroutines and prevent any potential goroutine leaks.

Notes: Xlsx sheets may omit cells which are empty, meaning a row may not have continuous cell references. This function makes no attempt to fill/pad the missing cells.

type XlsxFileCloser

type XlsxFileCloser struct {
	XlsxFile
	// contains filtered or unexported fields
}

XlsxFileCloser wraps XlsxFile to be able to close an open file

func OpenFile

func OpenFile(filename string) (*XlsxFileCloser, error)

OpenFile takes the name of an XLSX file and returns a populated XlsxFile struct for it. If the file cannot be found, or key parts of the files contents are missing, an error is returned. Note that the file must be Close()-d when you are finished with it.

func OpenReaderZip

func OpenReaderZip(rc *zip.ReadCloser) (*XlsxFileCloser, error)

OpenReaderZip takes the zip ReadCloser of an XLSX file and returns a populated XlsxFileCloser struct for it. If the file cannot be found, or key parts of the files contents are missing, an error is returned. Note that the file must be Close()-d when you are finished with it.

func (*XlsxFileCloser) Close

func (xl *XlsxFileCloser) Close() error

Close closes the XlsxFile, rendering it unusable for I/O.

func (*XlsxFileCloser) GetSheetFileForSheetName

func (xl *XlsxFileCloser) GetSheetFileForSheetName(sheetName string) *zip.File

GetSheetFileForSheetName returns the sheet file associated with the sheet name. This is useful when you want to further process something out of the sheet, that this library does not handle. For example this is useful when trying to read the hyperlinks section of a sheet file; getting the sheet file enables you to read the XML directly.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL