Documentation ¶
Overview ¶
Package parquet is not intended to be used as a general library. The code generated by the 'parquetgen' command is what actually uses it for reading and writing parquet files.
Index ¶
- func GetBools(r io.Reader, n int, pageSizes []int) ([]bool, error)
- func OptionalFieldGzip(r *OptionalField)
- func OptionalFieldSnappy(r *OptionalField)
- func OptionalFieldUncompressed(o *OptionalField)
- func PageHeader(r io.Reader) (*sch.PageHeader, error)
- func PageHeaders(footer *sch.FileMetaData, r io.ReadSeeker) ([]sch.PageHeader, error)
- func PageHeadersAtOffset(r io.ReadSeeker, o, n int64) ([]sch.PageHeader, error)
- func ReadMetaData(r io.ReadSeeker) (*sch.FileMetaData, error)
- func RepetitionOptional(se *sch.SchemaElement)
- func RepetitionRepeated(se *sch.SchemaElement)
- func RepetitionRequired(se *sch.SchemaElement)
- func RequiredFieldGzip(r *RequiredField)
- func RequiredFieldSnappy(r *RequiredField)
- func RequiredFieldUncompressed(r *RequiredField)
- type Field
- type FieldFunc
- type MaxLevel
- type Metadata
- func (m *Metadata) Footer(w io.Writer) error
- func (m *Metadata) NextDoc()
- func (m *Metadata) Pages() (map[string][]Page, error)
- func (m *Metadata) ReadFooter(r io.ReadSeeker) error
- func (m *Metadata) RowGroups() []RowGroup
- func (m *Metadata) Rows() int64
- func (m *Metadata) StartRowGroup(fields ...Field)
- func (m *Metadata) WritePageHeader(w io.Writer, pth []string, dataLen, compressedLen, defCount, count int, ...) error
- type OptionalField
- func (f *OptionalField) DoRead(r io.ReadSeeker, pg Page) (io.Reader, []int, error)
- func (f *OptionalField) DoWrite(w io.Writer, meta *Metadata, vals []byte, count int, stats Stats) error
- func (f *OptionalField) Name() string
- func (f *OptionalField) Path() []string
- func (f *OptionalField) Values() int
- type Page
- type RepetitionType
- type RepetitionTypes
- type RequiredField
- type RowGroup
- type Stats
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func OptionalFieldGzip ¶ added in v0.5.0
func OptionalFieldGzip(r *OptionalField)
OptionalFieldGzip sets the compression for a column to gzip It is an optional arg to NewOptionalField
func OptionalFieldSnappy ¶ added in v0.1.0
func OptionalFieldSnappy(r *OptionalField)
OptionalFieldSnappy sets the compression for a column to snappy It is an optional arg to NewOptionalField
func OptionalFieldUncompressed ¶ added in v0.1.0
func OptionalFieldUncompressed(o *OptionalField)
OptionalFieldUncompressed sets the compression to none It is an optional arg to NewOptionalField
func PageHeader ¶ added in v0.2.0
func PageHeader(r io.Reader) (*sch.PageHeader, error)
PageHeader reads the page header from a column page
func PageHeaders ¶ added in v0.2.0
func PageHeaders(footer *sch.FileMetaData, r io.ReadSeeker) ([]sch.PageHeader, error)
PageHeaders reads all the page headers without reading the actual data. It is used by parquetgen to print the page headers.
func PageHeadersAtOffset ¶ added in v0.2.0
func PageHeadersAtOffset(r io.ReadSeeker, o, n int64) ([]sch.PageHeader, error)
PageHeadersAtOffset seeks to the given offset, then reads the PageHeader without reading the data.
func ReadMetaData ¶ added in v0.1.0
func ReadMetaData(r io.ReadSeeker) (*sch.FileMetaData, error)
ReadMetaData reads the FileMetaData from the end of a parquet file
func RepetitionOptional ¶
func RepetitionOptional(se *sch.SchemaElement)
RepetitionOptional sets the repetition type to optional
func RepetitionRepeated ¶ added in v0.3.0
func RepetitionRepeated(se *sch.SchemaElement)
RepetitionRepeated sets the repetition type to repeated
func RepetitionRequired ¶
func RepetitionRequired(se *sch.SchemaElement)
RepetitionRequired sets the repetition type to required
func RequiredFieldGzip ¶ added in v0.5.0
func RequiredFieldGzip(r *RequiredField)
RequiredFieldGzip sets the compression for a column to gzip It is an optional arg to NewRequiredField
func RequiredFieldSnappy ¶ added in v0.1.0
func RequiredFieldSnappy(r *RequiredField)
RequiredFieldSnappy sets the compression for a column to snappy It is an optional arg to NewRequiredField
func RequiredFieldUncompressed ¶ added in v0.1.0
func RequiredFieldUncompressed(r *RequiredField)
RequiredFieldUncompressed sets the compression to none It is an optional arg to NewRequiredField
Types ¶
type FieldFunc ¶
type FieldFunc func(*sch.SchemaElement)
FieldFunc is used to set some of the metadata for each column
type MaxLevel ¶ added in v0.3.0
MaxLevel holds the maximum definition and repeptition level for a given field.
type Metadata ¶
type Metadata struct {
// contains filtered or unexported fields
}
Metadata keeps track of the things that need to be kept track of in order to write the FileMetaData at the end of the parquet file.
func (*Metadata) NextDoc ¶ added in v0.3.0
func (m *Metadata) NextDoc()
NextDoc keeps track of how many documents have been added to this parquet file. The final value of m.docs is used for the FileMetaData.NumRows
func (*Metadata) ReadFooter ¶
func (m *Metadata) ReadFooter(r io.ReadSeeker) error
ReadFooter reads the parquet metadata
func (*Metadata) Rows ¶
Rows return the total number of rows that are being written in to a parquet file.
func (*Metadata) StartRowGroup ¶
StartRowGroup is called when starting a new row group
type OptionalField ¶ added in v0.0.6
type OptionalField struct { Defs []uint8 Reps []uint8 MaxLevels MaxLevel RepetitionType FieldFunc Types []int // contains filtered or unexported fields }
OptionalField is any exported field in a struct that is a pointer.
func NewOptionalField ¶ added in v0.0.6
func NewOptionalField(pth []string, types []int, opts ...func(*OptionalField)) OptionalField
NewOptionalField creates an optional field
func (*OptionalField) DoRead ¶ added in v0.0.6
func (f *OptionalField) DoRead(r io.ReadSeeker, pg Page) (io.Reader, []int, error)
DoRead is called by all optional fields. It reads the definition levels and uses them to interpret the raw data.
func (*OptionalField) DoWrite ¶ added in v0.0.6
func (f *OptionalField) DoWrite(w io.Writer, meta *Metadata, vals []byte, count int, stats Stats) error
DoWrite is called by all optional field types to write the definition levels and raw data to the io.Writer
func (*OptionalField) Name ¶ added in v0.0.6
func (f *OptionalField) Name() string
Name returns the column name of this field
func (*OptionalField) Path ¶ added in v0.2.0
func (f *OptionalField) Path() []string
Path returns the path of this field
func (*OptionalField) Values ¶ added in v0.0.6
func (f *OptionalField) Values() int
Values reads the definition levels and uses them to return the values from the page data.
type Page ¶ added in v0.1.0
type Page struct { // N is the number of values in the ColumnChunk N int Size int Offset int64 Codec sch.CompressionCodec }
Page keeps track of metadata for each ColumnChunk
type RepetitionType ¶ added in v0.5.0
type RepetitionType int
RepetitionType is an enum of the possible parquet repetition types
const ( Unseen RepetitionType = -1 Required RepetitionType = 0 Optional RepetitionType = 1 Repeated RepetitionType = 2 )
type RepetitionTypes ¶ added in v0.5.0
type RepetitionTypes []RepetitionType
func (RepetitionTypes) MaxDef ¶ added in v0.5.0
func (r RepetitionTypes) MaxDef() uint8
MaxDef returns the largest definition level
func (RepetitionTypes) MaxRep ¶ added in v0.5.0
func (r RepetitionTypes) MaxRep() uint8
MaxRep returns the largest repetition level
type RequiredField ¶ added in v0.0.6
type RequiredField struct {
// contains filtered or unexported fields
}
RequiredField writes the raw data for required columns
func NewRequiredField ¶ added in v0.0.6
func NewRequiredField(pth []string, opts ...func(*RequiredField)) RequiredField
NewRequiredField creates a required field.
func (*RequiredField) DoRead ¶ added in v0.0.6
func (f *RequiredField) DoRead(r io.ReadSeeker, pg Page) (io.Reader, []int, error)
DoRead reads the actual raw data.
func (*RequiredField) DoWrite ¶ added in v0.0.6
func (f *RequiredField) DoWrite(w io.Writer, meta *Metadata, vals []byte, count int, stats Stats) error
DoWrite writes the actual raw data.
func (*RequiredField) Name ¶ added in v0.0.6
func (f *RequiredField) Name() string
Name returns the column name of this field
func (*RequiredField) Path ¶ added in v0.2.0
func (f *RequiredField) Path() []string
Path returns the path of this field
type RowGroup ¶ added in v0.0.6
type RowGroup struct { Rows int64 // contains filtered or unexported fields }
RowGroup wraps schema.RowGroup and adds accounting functions that are used to keep track of number of rows written, byte size, etc.
func (*RowGroup) Columns ¶ added in v0.0.6
func (r *RowGroup) Columns() []*sch.ColumnChunk
Columns returns the Columns of the row group.