Documentation ¶
Index ¶
- Variables
- func ReadFileMetaData(r io.ReadSeeker) (*parquetformat.FileMetaData, error)
- type Column
- type ColumnChunkReader
- func (cr *ColumnChunkReader) DictionaryPageHeader() *parquetformat.PageHeader
- func (cr *ColumnChunkReader) PageHeader() *parquetformat.PageHeader
- func (cr *ColumnChunkReader) Read(values interface{}, dLevels []uint16, rLevels []uint16) (n int, err error)
- func (cr *ColumnChunkReader) SkipPage() error
- type File
- type Int96
- type Schema
Constants ¶
This section is empty.
Variables ¶
var (
EndOfChunk = errors.New("EndOfChunk")
)
Functions ¶
func ReadFileMetaData ¶
func ReadFileMetaData(r io.ReadSeeker) (*parquetformat.FileMetaData, error)
ReadFileMetaData reads parquetformat.FileMetaData object from r that provides read interface to data in parquet format.
Parquet format is described here: https://github.com/apache/parquet-format/blob/master/README.md
Types ¶
type Column ¶
type Column struct {
// contains filtered or unexported fields
}
Column contains information about a single column in a parquet file.
func (Column) Index ¶
Index is a 0-based index of col in its schema.
Column chunks in a row group have the same order as columns in the schema.
func (Column) MaxD ¶
MaxD returns the maximum definition level for col.
A read value is not null when its definition level equals to the maximum definition level.
type ColumnChunkReader ¶
type ColumnChunkReader struct {
// contains filtered or unexported fields
}
ColumnChunkReader allows to read data from a single column chunk of a parquet file.
func (*ColumnChunkReader) DictionaryPageHeader ¶
func (cr *ColumnChunkReader) DictionaryPageHeader() *parquetformat.PageHeader
DictionaryPageHeader returns a DICTIONARY_PAGE page header if the column chunk has one or nil otherwise.
func (*ColumnChunkReader) PageHeader ¶
func (cr *ColumnChunkReader) PageHeader() *parquetformat.PageHeader
PageHeader returns PageHeader of a page that is about to be read or currently being read.
If there was an error reading the last page (including EndOfChunk) PageHeder returns nil.
func (*ColumnChunkReader) Read ¶
func (cr *ColumnChunkReader) Read(values interface{}, dLevels []uint16, rLevels []uint16) (n int, err error)
Read reads up to len(dLevels) values into values and corresponding definition and repetition levels into dLevels and rLevels respectfully. Panics if len(dLevels) != len(rLevels) != len(values). It returns the number of values read (including nulls) and any errors encountered.
Note that after Read values slice contains only non-null values. Number of these values could be less than n.
values must be a slice of interface{} or type that corresponds to the column type (such as []int32 for INT32 column or [][]byte for BYTE_ARRAY column).
When there is not enough values in the current page to fill dLevels Read doesn't advance to the next page and returns the number of values read. If this page was the last page in its column chunk and there is no more data to read it returns EndOfChunk error.
func (*ColumnChunkReader) SkipPage ¶
func (cr *ColumnChunkReader) SkipPage() error
SkipPage positions cr at the beginning of the next page skipping all values in the current page.
Returns EndOfChunk if no more data available
type File ¶
type File struct { MetaData *parquetformat.FileMetaData Schema Schema // contains filtered or unexported fields }
func FileFromReader ¶
func FileFromReader(r io.ReadSeeker) (*File, error)
FileFromReader creates parquet.File from io.ReadSeeker.
type Schema ¶
type Schema struct {
// contains filtered or unexported fields
}
Schema describes structure of the data that is stored in a parquet file.
A Schema can be created from a parquetformat.FileMetaData. Information that is stored in RowGroups part of FileMetaData is not needed for the schema creation.
TODO(ksh): provide a way to read FileMetaData without RowGroups.
Usually FileMetaData should be read from the same file as data. When data is split into multiple parquet files metadata can be stored in a separate file. Usually this file is called "_common_metadata".
func MakeSchema ¶
func MakeSchema(meta *parquetformat.FileMetaData) (Schema, error)
MakeSchema creates a Schema from meta.
func (Schema) ColumnByName ¶
ColumnByName returns a Column with the given name (individual elements are separated with ".").
func (Schema) ColumnByPath ¶
ColumnByPath returns a Column for the given path.
func (Schema) DisplayString ¶
DisplayString returns a string representation of s using textual format similar to that described in the Dremel paper and used by parquet-mr project.