parquet

package module
v0.0.0-...-ad71342 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 28, 2022 License: Apache-2.0 Imports: 22 Imported by: 0

README

Arrow-Parquet-Go

This is a pure-go implementation for reading and writing parquet files into Arrow Record batches.

Install

go get github.com/mindhash/arrow-parquet-go

Usage

For Reading a file:

    readerFunc := func(offset, length int64) (io.ReadCloser, error) {
        file, err := os.Open(parquet_file_name)
        if err != nil {
            panic(err)
        }   
        offset := 0 
        file.Seek(offset, os.SEEK_SET)
        return file, nil
	}

    reader, err := NewReader(readerFunc)
    for {
		record, err := reader.Read()
		if err != nil {
			if err != io.EOF {
				t.Error(err)
			}

			break
		}
        fmt.Println("batch record count:", record.NumRows())
        record.Release()
    }
    reader.Close()

Attribution

This is modified version of minio/parquet-go with more optimization and support for Arrow batches.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GetReaderFunc

type GetReaderFunc func(offset, length int64) (io.ReadCloser, error)

GetReaderFunc - function type returning io.ReadCloser for requested offset/length.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader - Reader for parquet file.

func NewReader

func NewReader(getReaderFunc GetReaderFunc) (*Reader, error)

NewReader - creates new parquet reader. Reader calls getReaderFunc to handle on target file

func (*Reader) Close

func (reader *Reader) Close() (err error)

Close - closes underneath readers.

func (*Reader) Read

func (reader *Reader) Read() (record array.Record, err error)

Read - reads single record batch.

type Writer

type Writer struct {
	PageSize        int64
	RowGroupSize    int64
	CompressionType parquet.CompressionCodec
	// contains filtered or unexported fields
}

Writer - represents parquet writer.

func NewWriter

func NewWriter(writeCloser io.WriteCloser, schemaTree *schema.Tree, rowGroupCount int) (*Writer, error)

NewWriter - creates new parquet writer. Binary data of rowGroupCount records are written to writeCloser.

func (*Writer) Close

func (writer *Writer) Close() (err error)

Close - finalizes and closes writer. If any pending records are available, they are written here.

func (*Writer) Write

func (writer *Writer) Write(record map[string]*data.Column) (err error)

Write - writes a record represented in map.

func (*Writer) WriteJSON

func (writer *Writer) WriteJSON(recordData []byte) (err error)

WriteJSON - writes a record represented in JSON.

Directories

Path Synopsis
gen-go

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL