arrow

package
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 21, 2023 License: Apache-2.0, Apache-2.0, BSD-2-Clause, + 9 more Imports: 16 Imported by: 0

README

Apache Arrow for Go

GoDoc

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and inter-process communication.

Reference Counting

The library makes use of reference counting so that it can track when memory buffers are no longer used. This allows Arrow to update resource accounting, pool memory such and track overall memory usage as objects are created and released. Types expose two methods to deal with this pattern. The Retain method will increase the reference count by 1 and Release method will reduce the count by 1. Once the reference count of an object is zero, any associated object will be freed. Retain and Release are safe to call from multiple goroutines.

When to call Retain / Release?
  • If you are passed an object and wish to take ownership of it, you must call Retain. You must later pair this with a call to Release when you no longer need the object. "Taking ownership" typically means you wish to access the object outside the scope of the current function call.

  • You own any object you create via functions whose name begins with New or Copy or when receiving an object over a channel. Therefore you must call Release once you no longer need the object.

  • If you send an object over a channel, you must call Retain before sending it as the receiver is assumed to own the object and will later call Release when it no longer needs the object.

Performance

The arrow package makes extensive use of c2goasm to leverage LLVM's advanced optimizer and generate PLAN9 assembly functions from C/C++ code. The arrow package can be compiled without these optimizations using the noasm build tag. Alternatively, by configuring an environment variable, it is possible to dynamically configure which architecture optimizations are used at runtime. See the cpu package README for a description of this environment variable.

Example Usage

The following benchmarks demonstrate summing an array of 8192 values using various optimizations.

Disable no architecture optimizations (thus using AVX2):

$ INTEL_DISABLE_EXT=NONE go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	 2000000	       687 ns/op	95375.41 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	 2000000	       719 ns/op	91061.06 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	 2000000	       691 ns/op	94797.29 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	6.444s

NOTE: NONE is simply ignored, thus enabling optimizations for AVX2 and SSE4


Disable AVX2 architecture optimizations:

$ INTEL_DISABLE_EXT=AVX2 go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	 1000000	      1912 ns/op	34263.63 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	 1000000	      1392 ns/op	47065.57 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	 1000000	      1405 ns/op	46636.41 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	4.786s

Disable ALL architecture optimizations, thus using pure Go implementation:

$ INTEL_DISABLE_EXT=ALL go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	  200000	     10285 ns/op	6371.41 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	  500000	      3892 ns/op	16837.37 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	  500000	      3929 ns/op	16680.00 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	6.179s

Documentation

Overview

Package arrow provides an implementation of Apache Arrow.

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and inter-process communication.

Basics

The fundamental data structure in Arrow is an Array, which holds a sequence of values of the same type. An array consists of memory holding the data and an additional validity bitmap that indicates if the corresponding entry in the array is valid (not null). If the array has no null entries, it is possible to omit this bitmap.

Example (FixedSizeListArray)

This example shows how to create a FixedSizeList array. The resulting array should be:

[[0, 1, 2], (null), [3, 4, 5], [6, 7, 8], (null)]
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	pool := memory.NewGoAllocator()

	lb := array.NewFixedSizeListBuilder(pool, 3, arrow.PrimitiveTypes.Int64)
	defer lb.Release()

	vb := lb.ValueBuilder().(*array.Int64Builder)
	vb.Reserve(10)

	lb.Append(true)
	vb.Append(0)
	vb.Append(1)
	vb.Append(2)

	lb.AppendNull()
	vb.AppendValues([]int64{-1, -1, -1}, nil)

	lb.Append(true)
	vb.Append(3)
	vb.Append(4)
	vb.Append(5)

	lb.Append(true)
	vb.Append(6)
	vb.Append(7)
	vb.Append(8)

	lb.AppendNull()

	arr := lb.NewArray().(*array.FixedSizeList)
	defer arr.Release()

	fmt.Printf("NullN()   = %d\n", arr.NullN())
	fmt.Printf("Len()     = %d\n", arr.Len())
	fmt.Printf("Type()    = %v\n", arr.DataType())
	fmt.Printf("List      = %v\n", arr)

}
Output:

NullN()   = 2
Len()     = 5
Type()    = fixed_size_list<item: int64, nullable>[3]
List      = [[0 1 2] (null) [3 4 5] [6 7 8] (null)]
Example (Float64Slice)

This example shows how one can slice an array. The initial (float64) array is:

[1, 2, 3, (null), 4, 5]

and the sub-slice is:

[3, (null), 4]
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	pool := memory.NewGoAllocator()

	b := array.NewFloat64Builder(pool)
	defer b.Release()

	b.AppendValues(
		[]float64{1, 2, 3, -1, 4, 5},
		[]bool{true, true, true, false, true, true},
	)

	arr := b.NewFloat64Array()
	defer arr.Release()

	fmt.Printf("array = %v\n", arr)

	sli := array.NewSlice(arr, 2, 5).(*array.Float64)
	defer sli.Release()

	fmt.Printf("slice = %v\n", sli)

}
Output:

array = [1 2 3 (null) 4 5]
slice = [3 (null) 4]
Example (Float64Tensor2x5)
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/tensor"
)

func main() {
	pool := memory.NewGoAllocator()

	b := array.NewFloat64Builder(pool)
	defer b.Release()

	raw := []float64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	b.AppendValues(raw, nil)

	arr := b.NewFloat64Array()
	defer arr.Release()

	f64 := tensor.NewFloat64(arr.Data(), []int64{2, 5}, nil, []string{"x", "y"})
	defer f64.Release()

	for _, i := range [][]int64{
		{0, 0},
		{0, 1},
		{0, 2},
		{0, 3},
		{0, 4},
		{1, 0},
		{1, 1},
		{1, 2},
		{1, 3},
		{1, 4},
	} {
		fmt.Printf("arr%v = %v\n", i, f64.Value(i))
	}

}
Output:

arr[0 0] = 1
arr[0 1] = 2
arr[0 2] = 3
arr[0 3] = 4
arr[0 4] = 5
arr[1 0] = 6
arr[1 1] = 7
arr[1 2] = 8
arr[1 3] = 9
arr[1 4] = 10
Example (Float64Tensor2x5ColMajor)
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/tensor"
)

func main() {
	pool := memory.NewGoAllocator()

	b := array.NewFloat64Builder(pool)
	defer b.Release()

	raw := []float64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	b.AppendValues(raw, nil)

	arr := b.NewFloat64Array()
	defer arr.Release()

	f64 := tensor.NewFloat64(arr.Data(), []int64{2, 5}, []int64{8, 16}, []string{"x", "y"})
	defer f64.Release()

	for _, i := range [][]int64{
		{0, 0},
		{0, 1},
		{0, 2},
		{0, 3},
		{0, 4},
		{1, 0},
		{1, 1},
		{1, 2},
		{1, 3},
		{1, 4},
	} {
		fmt.Printf("arr%v = %v\n", i, f64.Value(i))
	}

}
Output:

arr[0 0] = 1
arr[0 1] = 3
arr[0 2] = 5
arr[0 3] = 7
arr[0 4] = 9
arr[1 0] = 2
arr[1 1] = 4
arr[1 2] = 6
arr[1 3] = 8
arr[1 4] = 10
Example (FromMemory)

This example demonstrates creating an array, sourcing the values and null bitmaps directly from byte slices. The null count is set to UnknownNullCount, instructing the array to calculate the null count from the bitmap when NullN is called.

package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	// create LSB packed bits with the following pattern:
	// 01010011 11000101
	data := memory.NewBufferBytes([]byte{0xca, 0xa3})

	// create LSB packed validity (null) bitmap, where every 4th element is null:
	// 11101110 11101110
	nullBitmap := memory.NewBufferBytes([]byte{0x77, 0x77})

	// Create a boolean array and lazily determine NullN using UnknownNullCount
	bools := array.NewBoolean(16, data, nullBitmap, array.UnknownNullCount)
	defer bools.Release()

	// Show the null count
	fmt.Printf("NullN()  = %d\n", bools.NullN())

	// Enumerate the values.
	n := bools.Len()
	for i := 0; i < n; i++ {
		fmt.Printf("bools[%d] = ", i)
		if bools.IsNull(i) {
			fmt.Println("(null)")
		} else {
			fmt.Printf("%t\n", bools.Value(i))
		}
	}

}
Output:

NullN()  = 4
bools[0] = false
bools[1] = true
bools[2] = false
bools[3] = (null)
bools[4] = false
bools[5] = false
bools[6] = true
bools[7] = (null)
bools[8] = true
bools[9] = true
bools[10] = false
bools[11] = (null)
bools[12] = false
bools[13] = true
bools[14] = false
bools[15] = (null)
Example (ListArray)

This example shows how to create a List array. The resulting array should be:

[[0, 1, 2], [], [3], [4, 5], [6, 7, 8], [], [9]]
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	pool := memory.NewGoAllocator()

	lb := array.NewListBuilder(pool, arrow.PrimitiveTypes.Int64)
	defer lb.Release()

	vb := lb.ValueBuilder().(*array.Int64Builder)
	vb.Reserve(10)

	lb.Append(true)
	vb.Append(0)
	vb.Append(1)
	vb.Append(2)

	lb.AppendNull()

	lb.Append(true)
	vb.Append(3)

	lb.Append(true)
	vb.Append(4)
	vb.Append(5)

	lb.Append(true)
	vb.Append(6)
	vb.Append(7)
	vb.Append(8)

	lb.AppendNull()

	lb.Append(true)
	vb.Append(9)

	arr := lb.NewArray().(*array.List)
	defer arr.Release()

	arr.DataType().(*arrow.ListType).SetElemNullable(false)
	fmt.Printf("NullN()   = %d\n", arr.NullN())
	fmt.Printf("Len()     = %d\n", arr.Len())
	fmt.Printf("Offsets() = %v\n", arr.Offsets())
	fmt.Printf("Type()    = %v\n", arr.DataType())

	offsets := arr.Offsets()[1:]

	varr := arr.ListValues().(*array.Int64)

	pos := 0
	for i := 0; i < arr.Len(); i++ {
		if !arr.IsValid(i) {
			fmt.Printf("List[%d]   = (null)\n", i)
			continue
		}
		fmt.Printf("List[%d]   = [", i)
		for j := pos; j < int(offsets[i]); j++ {
			if j != pos {
				fmt.Printf(", ")
			}
			fmt.Printf("%v", varr.Value(j))
		}
		pos = int(offsets[i])
		fmt.Printf("]\n")
	}
	fmt.Printf("List      = %v\n", arr)

}
Output:

NullN()   = 2
Len()     = 7
Offsets() = [0 3 3 4 6 9 9 10]
Type()    = list<item: int64>
List[0]   = [0, 1, 2]
List[1]   = (null)
List[2]   = [3]
List[3]   = [4, 5]
List[4]   = [6, 7, 8]
List[5]   = (null)
List[6]   = [9]
List      = [[0 1 2] (null) [3] [4 5] [6 7 8] (null) [9]]
Example (MapArray)

This example demonstrates how to create a Map Array. The resulting array should be:

[{["ab" "cd" "ef" "gh"] [1 2 3 4]} (null) {["ab" "cd" "ef" "gh"] [(null) 2 5 1]}]
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	pool := memory.NewGoAllocator()
	mb := array.NewMapBuilder(pool, arrow.BinaryTypes.String, arrow.PrimitiveTypes.Int16, false)
	defer mb.Release()

	kb := mb.KeyBuilder().(*array.StringBuilder)
	ib := mb.ItemBuilder().(*array.Int16Builder)

	keys := []string{"ab", "cd", "ef", "gh"}

	mb.Append(true)
	kb.AppendValues(keys, nil)
	ib.AppendValues([]int16{1, 2, 3, 4}, nil)

	mb.AppendNull()

	mb.Append(true)
	kb.AppendValues(keys, nil)
	ib.AppendValues([]int16{-1, 2, 5, 1}, []bool{false, true, true, true})

	arr := mb.NewMapArray()
	defer arr.Release()

	fmt.Printf("NullN() = %d\n", arr.NullN())
	fmt.Printf("Len()   = %d\n", arr.Len())

	offsets := arr.Offsets()
	keyArr := arr.Keys().(*array.String)
	itemArr := arr.Items().(*array.Int16)

	for i := 0; i < arr.Len(); i++ {
		if arr.IsNull(i) {
			fmt.Printf("Map[%d] = (null)\n", i)
			continue
		}

		fmt.Printf("Map[%d] = {", i)
		for j := offsets[i]; j < offsets[i+1]; j++ {
			if j != offsets[i] {
				fmt.Printf(", ")
			}
			fmt.Printf("%v => ", keyArr.Value(int(j)))
			if itemArr.IsValid(int(j)) {
				fmt.Printf("%v", itemArr.Value(int(j)))
			} else {
				fmt.Printf("(null)")
			}
		}
		fmt.Printf("}\n")
	}
	fmt.Printf("Map    = %v\n", arr)

}
Output:

NullN() = 1
Len()   = 3
Map[0] = {ab => 1, cd => 2, ef => 3, gh => 4}
Map[1] = (null)
Map[2] = {ab => (null), cd => 2, ef => 5, gh => 1}
Map    = [{["ab" "cd" "ef" "gh"] [1 2 3 4]} (null) {["ab" "cd" "ef" "gh"] [(null) 2 5 1]}]
Example (Minimal)

This example demonstrates how to build an array of int64 values using a builder and Append. Whilst convenient for small arrays,

package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	// Create an allocator.
	pool := memory.NewGoAllocator()

	// Create an int64 array builder.
	builder := array.NewInt64Builder(pool)
	defer builder.Release()

	builder.Append(1)
	builder.Append(2)
	builder.Append(3)
	builder.AppendNull()
	builder.Append(5)
	builder.Append(6)
	builder.Append(7)
	builder.Append(8)

	// Finish building the int64 array and reset the builder.
	ints := builder.NewInt64Array()
	defer ints.Release()

	// Enumerate the values.
	for i, v := range ints.Int64Values() {
		fmt.Printf("ints[%d] = ", i)
		if ints.IsNull(i) {
			fmt.Println("(null)")
		} else {
			fmt.Println(v)
		}
	}
	fmt.Printf("ints = %v\n", ints)

}
Output:

ints[0] = 1
ints[1] = 2
ints[2] = 3
ints[3] = (null)
ints[4] = 5
ints[5] = 6
ints[6] = 7
ints[7] = 8
ints = [1 2 3 (null) 5 6 7 8]
Example (Record)
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	pool := memory.NewGoAllocator()

	schema := arrow.NewSchema(
		[]arrow.Field{
			{Name: "f1-i32", Type: arrow.PrimitiveTypes.Int32},
			{Name: "f2-f64", Type: arrow.PrimitiveTypes.Float64},
		},
		nil,
	)

	b := array.NewRecordBuilder(pool, schema)
	defer b.Release()

	b.Field(0).(*array.Int32Builder).AppendValues([]int32{1, 2, 3, 4, 5, 6}, nil)
	b.Field(0).(*array.Int32Builder).AppendValues([]int32{7, 8, 9, 10}, []bool{true, true, false, true})
	b.Field(1).(*array.Float64Builder).AppendValues([]float64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, nil)

	rec := b.NewRecord()
	defer rec.Release()

	for i, col := range rec.Columns() {
		fmt.Printf("column[%d] %q: %v\n", i, rec.ColumnName(i), col)
	}

}
Output:

column[0] "f1-i32": [1 2 3 4 5 6 7 8 (null) 10]
column[1] "f2-f64": [1 2 3 4 5 6 7 8 9 10]
Example (RecordReader)
package main

import (
	"fmt"
	"log"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	pool := memory.NewGoAllocator()

	schema := arrow.NewSchema(
		[]arrow.Field{
			{Name: "f1-i32", Type: arrow.PrimitiveTypes.Int32},
			{Name: "f2-f64", Type: arrow.PrimitiveTypes.Float64},
		},
		nil,
	)

	b := array.NewRecordBuilder(pool, schema)
	defer b.Release()

	b.Field(0).(*array.Int32Builder).AppendValues([]int32{1, 2, 3, 4, 5, 6}, nil)
	b.Field(0).(*array.Int32Builder).AppendValues([]int32{7, 8, 9, 10}, []bool{true, true, false, true})
	b.Field(1).(*array.Float64Builder).AppendValues([]float64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, nil)

	rec1 := b.NewRecord()
	defer rec1.Release()

	b.Field(0).(*array.Int32Builder).AppendValues([]int32{11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, nil)
	b.Field(1).(*array.Float64Builder).AppendValues([]float64{11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, nil)

	rec2 := b.NewRecord()
	defer rec2.Release()

	itr, err := array.NewRecordReader(schema, []array.Record{rec1, rec2})
	if err != nil {
		log.Fatal(err)
	}
	defer itr.Release()

	n := 0
	for itr.Next() {
		rec := itr.Record()
		for i, col := range rec.Columns() {
			fmt.Printf("rec[%d][%q]: %v\n", n, rec.ColumnName(i), col)
		}
		n++
	}

}
Output:

rec[0]["f1-i32"]: [1 2 3 4 5 6 7 8 (null) 10]
rec[0]["f2-f64"]: [1 2 3 4 5 6 7 8 9 10]
rec[1]["f1-i32"]: [11 12 13 14 15 16 17 18 19 20]
rec[1]["f2-f64"]: [11 12 13 14 15 16 17 18 19 20]
Example (StructArray)

This example shows how to create a Struct array. The resulting array should be:

[{‘joe’, 1}, {null, 2}, null, {‘mark’, 4}]
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	pool := memory.NewGoAllocator()

	dtype := arrow.StructOf([]arrow.Field{
		{Name: "f1", Type: arrow.ListOf(arrow.PrimitiveTypes.Uint8)},
		{Name: "f2", Type: arrow.PrimitiveTypes.Int32},
	}...)

	sb := array.NewStructBuilder(pool, dtype)
	defer sb.Release()

	f1b := sb.FieldBuilder(0).(*array.ListBuilder)
	f1vb := f1b.ValueBuilder().(*array.Uint8Builder)
	f2b := sb.FieldBuilder(1).(*array.Int32Builder)

	sb.Reserve(4)
	f1vb.Reserve(7)
	f2b.Reserve(3)

	sb.Append(true)
	f1b.Append(true)
	f1vb.AppendValues([]byte("joe"), nil)
	f2b.Append(1)

	sb.Append(true)
	f1b.AppendNull()
	f2b.Append(2)

	sb.AppendNull()

	sb.Append(true)
	f1b.Append(true)
	f1vb.AppendValues([]byte("mark"), nil)
	f2b.Append(4)

	arr := sb.NewArray().(*array.Struct)
	defer arr.Release()

	fmt.Printf("NullN() = %d\n", arr.NullN())
	fmt.Printf("Len()   = %d\n", arr.Len())

	list := arr.Field(0).(*array.List)
	offsets := list.Offsets()

	varr := list.ListValues().(*array.Uint8)
	ints := arr.Field(1).(*array.Int32)

	for i := 0; i < arr.Len(); i++ {
		if !arr.IsValid(i) {
			fmt.Printf("Struct[%d] = (null)\n", i)
			continue
		}
		fmt.Printf("Struct[%d] = [", i)
		pos := int(offsets[i])
		switch {
		case list.IsValid(pos):
			fmt.Printf("[")
			for j := offsets[i]; j < offsets[i+1]; j++ {
				if j != offsets[i] {
					fmt.Printf(", ")
				}
				fmt.Printf("%v", string(varr.Value(int(j))))
			}
			fmt.Printf("], ")
		default:
			fmt.Printf("(null), ")
		}
		fmt.Printf("%d]\n", ints.Value(i))
	}

}
Output:

NullN() = 1
Len()   = 4
Struct[0] = [[j, o, e], 1]
Struct[1] = [[], 2]
Struct[2] = (null)
Struct[3] = [[m, a, r, k], 4]
Example (Table)
package main

import (
	"fmt"

	"github.com/aliyun/aliyun-odps-go-sdk/arrow"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/array"
	"github.com/aliyun/aliyun-odps-go-sdk/arrow/memory"
)

func main() {
	pool := memory.NewGoAllocator()

	schema := arrow.NewSchema(
		[]arrow.Field{
			{Name: "f1-i32", Type: arrow.PrimitiveTypes.Int32},
			{Name: "f2-f64", Type: arrow.PrimitiveTypes.Float64},
		},
		nil,
	)

	b := array.NewRecordBuilder(pool, schema)
	defer b.Release()

	b.Field(0).(*array.Int32Builder).AppendValues([]int32{1, 2, 3, 4, 5, 6}, nil)
	b.Field(0).(*array.Int32Builder).AppendValues([]int32{7, 8, 9, 10}, []bool{true, true, false, true})
	b.Field(1).(*array.Float64Builder).AppendValues([]float64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, nil)

	rec1 := b.NewRecord()
	defer rec1.Release()

	b.Field(0).(*array.Int32Builder).AppendValues([]int32{11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, nil)
	b.Field(1).(*array.Float64Builder).AppendValues([]float64{11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, nil)

	rec2 := b.NewRecord()
	defer rec2.Release()

	tbl := array.NewTableFromRecords(schema, []array.Record{rec1, rec2})
	defer tbl.Release()

	tr := array.NewTableReader(tbl, 5)
	defer tr.Release()

	n := 0
	for tr.Next() {
		rec := tr.Record()
		for i, col := range rec.Columns() {
			fmt.Printf("rec[%d][%q]: %v\n", n, rec.ColumnName(i), col)
		}
		n++
	}

}
Output:

rec[0]["f1-i32"]: [1 2 3 4 5]
rec[0]["f2-f64"]: [1 2 3 4 5]
rec[1]["f1-i32"]: [6 7 8 (null) 10]
rec[1]["f2-f64"]: [6 7 8 9 10]
rec[2]["f1-i32"]: [11 12 13 14 15]
rec[2]["f2-f64"]: [11 12 13 14 15]
rec[3]["f1-i32"]: [16 17 18 19 20]
rec[3]["f2-f64"]: [16 17 18 19 20]

Index

Examples

Constants

View Source
const (
	// Date32SizeBytes specifies the number of bytes required to store a single Date32 in memory
	Date32SizeBytes = int(unsafe.Sizeof(Date32(0)))
)
View Source
const (
	// Date64SizeBytes specifies the number of bytes required to store a single Date64 in memory
	Date64SizeBytes = int(unsafe.Sizeof(Date64(0)))
)
View Source
const (
	// DayTimeIntervalSizeBytes specifies the number of bytes required to store a single DayTimeInterval in memory
	DayTimeIntervalSizeBytes = int(unsafe.Sizeof(DayTimeInterval{}))
)
View Source
const (
	// Decimal128SizeBytes specifies the number of bytes required to store a single decimal128 in memory
	Decimal128SizeBytes = int(unsafe.Sizeof(decimal128.Num{}))
)
View Source
const (
	// DurationSizeBytes specifies the number of bytes required to store a single Duration in memory
	DurationSizeBytes = int(unsafe.Sizeof(Duration(0)))
)
View Source
const (
	// Float16SizeBytes specifies the number of bytes required to store a single float16 in memory
	Float16SizeBytes = int(unsafe.Sizeof(uint16(0)))
)
View Source
const (
	// Float32SizeBytes specifies the number of bytes required to store a single float32 in memory
	Float32SizeBytes = int(unsafe.Sizeof(float32(0)))
)
View Source
const (
	// Float64SizeBytes specifies the number of bytes required to store a single float64 in memory
	Float64SizeBytes = int(unsafe.Sizeof(float64(0)))
)
View Source
const (
	// Int16SizeBytes specifies the number of bytes required to store a single int16 in memory
	Int16SizeBytes = int(unsafe.Sizeof(int16(0)))
)
View Source
const (
	// Int32SizeBytes specifies the number of bytes required to store a single int32 in memory
	Int32SizeBytes = int(unsafe.Sizeof(int32(0)))
)
View Source
const (
	// Int64SizeBytes specifies the number of bytes required to store a single int64 in memory
	Int64SizeBytes = int(unsafe.Sizeof(int64(0)))
)
View Source
const (
	// Int8SizeBytes specifies the number of bytes required to store a single int8 in memory
	Int8SizeBytes = int(unsafe.Sizeof(int8(0)))
)
View Source
const (
	// MonthDayNanoIntervalSizeBytes specifies the number of bytes required to store a single DayTimeInterval in memory
	MonthDayNanoIntervalSizeBytes = int(unsafe.Sizeof(MonthDayNanoInterval{}))
)
View Source
const (
	// MonthIntervalSizeBytes specifies the number of bytes required to store a single MonthInterval in memory
	MonthIntervalSizeBytes = int(unsafe.Sizeof(MonthInterval(0)))
)
View Source
const (
	// Time32SizeBytes specifies the number of bytes required to store a single Time32 in memory
	Time32SizeBytes = int(unsafe.Sizeof(Time32(0)))
)
View Source
const (
	// Time64SizeBytes specifies the number of bytes required to store a single Time64 in memory
	Time64SizeBytes = int(unsafe.Sizeof(Time64(0)))
)
View Source
const (
	// TimestampSizeBytes specifies the number of bytes required to store a single Timestamp in memory
	TimestampSizeBytes = int(unsafe.Sizeof(Timestamp(0)))
)
View Source
const (
	// Uint16SizeBytes specifies the number of bytes required to store a single uint16 in memory
	Uint16SizeBytes = int(unsafe.Sizeof(uint16(0)))
)
View Source
const (
	// Uint32SizeBytes specifies the number of bytes required to store a single uint32 in memory
	Uint32SizeBytes = int(unsafe.Sizeof(uint32(0)))
)
View Source
const (
	// Uint64SizeBytes specifies the number of bytes required to store a single uint64 in memory
	Uint64SizeBytes = int(unsafe.Sizeof(uint64(0)))
)
View Source
const (
	// Uint8SizeBytes specifies the number of bytes required to store a single uint8 in memory
	Uint8SizeBytes = int(unsafe.Sizeof(uint8(0)))
)

Variables

View Source
var (
	MonthIntervalTraits        monthTraits
	DayTimeIntervalTraits      daytimeTraits
	MonthDayNanoIntervalTraits monthDayNanoTraits
)
View Source
var (
	Int64Traits     int64Traits
	Uint64Traits    uint64Traits
	Float64Traits   float64Traits
	Int32Traits     int32Traits
	Uint32Traits    uint32Traits
	Float32Traits   float32Traits
	Int16Traits     int16Traits
	Uint16Traits    uint16Traits
	Int8Traits      int8Traits
	Uint8Traits     uint8Traits
	TimestampTraits timestampTraits
	Time32Traits    time32Traits
	Time64Traits    time64Traits
	Date32Traits    date32Traits
	Date64Traits    date64Traits
	DurationTraits  durationTraits
)
View Source
var (
	BinaryTypes = struct {
		Binary BinaryDataType
		String BinaryDataType
	}{
		Binary: &BinaryType{},
		String: &StringType{},
	}
)
View Source
var BooleanTraits booleanTraits
View Source
var Decimal128Traits decimal128Traits

Decimal128 traits

View Source
var (
	FixedWidthTypes = struct {
		Boolean              FixedWidthDataType
		Date32               FixedWidthDataType
		Date64               FixedWidthDataType
		DayTimeInterval      FixedWidthDataType
		Duration_s           FixedWidthDataType
		Duration_ms          FixedWidthDataType
		Duration_us          FixedWidthDataType
		Duration_ns          FixedWidthDataType
		Float16              FixedWidthDataType
		MonthInterval        FixedWidthDataType
		Time32s              FixedWidthDataType
		Time32ms             FixedWidthDataType
		Time64us             FixedWidthDataType
		Time64ns             FixedWidthDataType
		Timestamp_s          FixedWidthDataType
		Timestamp_ms         FixedWidthDataType
		Timestamp_us         FixedWidthDataType
		Timestamp_ns         FixedWidthDataType
		MonthDayNanoInterval FixedWidthDataType
	}{
		Boolean:              &BooleanType{},
		Date32:               &Date32Type{},
		Date64:               &Date64Type{},
		DayTimeInterval:      &DayTimeIntervalType{},
		Duration_s:           &DurationType{Unit: Second},
		Duration_ms:          &DurationType{Unit: Millisecond},
		Duration_us:          &DurationType{Unit: Microsecond},
		Duration_ns:          &DurationType{Unit: Nanosecond},
		Float16:              &Float16Type{},
		MonthInterval:        &MonthIntervalType{},
		Time32s:              &Time32Type{Unit: Second},
		Time32ms:             &Time32Type{Unit: Millisecond},
		Time64us:             &Time64Type{Unit: Microsecond},
		Time64ns:             &Time64Type{Unit: Nanosecond},
		Timestamp_s:          &TimestampType{Unit: Second, TimeZone: "UTC"},
		Timestamp_ms:         &TimestampType{Unit: Millisecond, TimeZone: "UTC"},
		Timestamp_us:         &TimestampType{Unit: Microsecond, TimeZone: "UTC"},
		Timestamp_ns:         &TimestampType{Unit: Nanosecond, TimeZone: "UTC"},
		MonthDayNanoInterval: &MonthDayNanoIntervalType{},
	}
)
View Source
var Float16Traits float16Traits

Float16 traits

View Source
var (
	PrimitiveTypes = struct {
		Int8    DataType
		Int16   DataType
		Int32   DataType
		Int64   DataType
		Uint8   DataType
		Uint16  DataType
		Uint32  DataType
		Uint64  DataType
		Float32 DataType
		Float64 DataType
		Date32  DataType
		Date64  DataType
	}{

		Int8:    &Int8Type{},
		Int16:   &Int16Type{},
		Int32:   &Int32Type{},
		Int64:   &Int64Type{},
		Uint8:   &Uint8Type{},
		Uint16:  &Uint16Type{},
		Uint32:  &Uint32Type{},
		Uint64:  &Uint64Type{},
		Float32: &Float32Type{},
		Float64: &Float64Type{},
		Date32:  &Date32Type{},
		Date64:  &Date64Type{},
	}
)

Functions

func HashType

func HashType(seed maphash.Seed, dt DataType) uint64

func RegisterExtensionType

func RegisterExtensionType(typ ExtensionType) error

RegisterExtensionType registers the provided ExtensionType by calling ExtensionName to use as a Key for registrying the type. If a type with the same name is already registered then this will return an error saying so, otherwise it will return nil if successful registering the type. This function is safe to call from multiple goroutines simultaneously.

func TypeEqual

func TypeEqual(left, right DataType, opts ...TypeEqualOption) bool

TypeEqual checks if two DataType are the same, optionally checking metadata equality for STRUCT types.

func UnregisterExtensionType

func UnregisterExtensionType(typName string) error

UnregisterExtensionType removes the type with the given name from the registry causing any messages with that type which come in to be expressed with their metadata and underlying type instead of the extension type that isn't known. This function is safe to call from multiple goroutines simultaneously.

Types

type BinaryDataType

type BinaryDataType interface {
	DataType
	// contains filtered or unexported methods
}

type BinaryType

type BinaryType struct{}

func (*BinaryType) Fingerprint

func (t *BinaryType) Fingerprint() string

func (*BinaryType) ID

func (t *BinaryType) ID() Type

func (*BinaryType) Name

func (t *BinaryType) Name() string

func (*BinaryType) String

func (t *BinaryType) String() string

type BooleanType

type BooleanType struct{}

func (*BooleanType) BitWidth

func (t *BooleanType) BitWidth() int

BitWidth returns the number of bits required to store a single element of this data type in memory.

func (*BooleanType) Fingerprint

func (t *BooleanType) Fingerprint() string

func (*BooleanType) ID

func (t *BooleanType) ID() Type

func (*BooleanType) Name

func (t *BooleanType) Name() string

func (*BooleanType) String

func (t *BooleanType) String() string

type DataType

type DataType interface {
	ID() Type
	// Name is name of the data type.
	Name() string
	Fingerprint() string
}

DataType is the representation of an Arrow type.

type Date32

type Date32 int32

type Date32Type

type Date32Type struct{}

func (*Date32Type) BitWidth

func (t *Date32Type) BitWidth() int

func (*Date32Type) Fingerprint

func (t *Date32Type) Fingerprint() string

func (*Date32Type) ID

func (t *Date32Type) ID() Type

func (*Date32Type) Name

func (t *Date32Type) Name() string

func (*Date32Type) String

func (t *Date32Type) String() string

type Date64

type Date64 int64

type Date64Type

type Date64Type struct{}

func (*Date64Type) BitWidth

func (t *Date64Type) BitWidth() int

func (*Date64Type) Fingerprint

func (t *Date64Type) Fingerprint() string

func (*Date64Type) ID

func (t *Date64Type) ID() Type

func (*Date64Type) Name

func (t *Date64Type) Name() string

func (*Date64Type) String

func (t *Date64Type) String() string

type DayTimeInterval

type DayTimeInterval struct {
	Days         int32 `json:"days"`
	Milliseconds int32 `json:"milliseconds"`
}

DayTimeInterval represents a number of days and milliseconds (fraction of day).

type DayTimeIntervalType

type DayTimeIntervalType struct{}

DayTimeIntervalType is encoded as a pair of 32-bit signed integer, representing a number of days and milliseconds (fraction of day).

func (*DayTimeIntervalType) BitWidth

func (t *DayTimeIntervalType) BitWidth() int

BitWidth returns the number of bits required to store a single element of this data type in memory.

func (*DayTimeIntervalType) Fingerprint

func (*DayTimeIntervalType) Fingerprint() string

func (*DayTimeIntervalType) ID

func (*DayTimeIntervalType) ID() Type

func (*DayTimeIntervalType) Name

func (*DayTimeIntervalType) Name() string

func (*DayTimeIntervalType) String

func (*DayTimeIntervalType) String() string

type Decimal128Type

type Decimal128Type struct {
	Precision int32
	Scale     int32
}

Decimal128Type represents a fixed-size 128-bit decimal type.

func (*Decimal128Type) BitWidth

func (*Decimal128Type) BitWidth() int

func (*Decimal128Type) Fingerprint

func (t *Decimal128Type) Fingerprint() string

func (*Decimal128Type) ID

func (*Decimal128Type) ID() Type

func (*Decimal128Type) Name

func (*Decimal128Type) Name() string

func (*Decimal128Type) String

func (t *Decimal128Type) String() string

type Duration

type Duration int64

type DurationType

type DurationType struct {
	Unit TimeUnit
}

DurationType is encoded as a 64-bit signed integer, representing an amount of elapsed time without any relation to a calendar artifact.

func (*DurationType) BitWidth

func (*DurationType) BitWidth() int

func (*DurationType) Fingerprint

func (t *DurationType) Fingerprint() string

func (*DurationType) ID

func (*DurationType) ID() Type

func (*DurationType) Name

func (*DurationType) Name() string

func (*DurationType) String

func (t *DurationType) String() string

type ExtensionBase

type ExtensionBase struct {
	// Storage is the underlying storage type
	Storage DataType
}

ExtensionBase is the base struct for user-defined Extension Types which must be embedded in any user-defined types like so:

type UserDefinedType struct {
    arrow.ExtensionBase
    // any other data
}

func (*ExtensionBase) Fingerprint

func (e *ExtensionBase) Fingerprint() string

func (*ExtensionBase) ID

func (*ExtensionBase) ID() Type

ID always returns arrow.EXTENSION and should not be overridden

func (*ExtensionBase) Name

func (*ExtensionBase) Name() string

Name should always return "extension" and should not be overridden

func (*ExtensionBase) StorageType

func (e *ExtensionBase) StorageType() DataType

StorageType returns the underlying storage type and exists so that functions written against the ExtensionType interface can access the storage type.

func (*ExtensionBase) String

func (e *ExtensionBase) String() string

String by default will return "extension_type<storage=storage_type>" by can be overridden to customize what is printed out when printing this extension type.

type ExtensionType

type ExtensionType interface {
	DataType
	// ArrayType should return the reflect.TypeOf(ExtensionArrayType{}) where the
	// ExtensionArrayType is a type that implements the array.ExtensionArray interface.
	// Such a type must also embed the array.ExtensionArrayBase in it. This will be used
	// when creating arrays of this ExtensionType by using reflect.New
	ArrayType() reflect.Type
	// ExtensionName is what will be used when registering / unregistering this extension
	// type. Multiple user-defined types can be defined with a parameterized ExtensionType
	// as long as the parameter is used in the ExtensionName to distinguish the instances
	// in the global Extension Type registry.
	// The return from this is also what will be placed in the metadata for IPC communication
	// under the key ARROW:extension:name
	ExtensionName() string
	// StorageType returns the underlying storage type which is used by this extension
	// type. It is already implemented by the ExtensionBase struct and thus does not need
	// to be re-implemented by a user-defined type.
	StorageType() DataType
	// ExtensionEquals is used to tell whether two ExtensionType instances are equal types.
	ExtensionEquals(ExtensionType) bool
	// Serialize should produce any extra metadata necessary for initializing an instance of
	// this user-defined type. Not all user-defined types require this and it is valid to return
	// nil from this function or an empty slice. This is used for the IPC format and will be
	// added to metadata for IPC communication under the key ARROW:extension:metadata
	// This should be implemented such that it is valid to be called by multiple goroutines
	// concurrently.
	Serialize() string
	// Deserialize is called when reading in extension arrays and types via the IPC format
	// in order to construct an instance of the appropriate extension type. The data passed in
	// is pulled from the ARROW:extension:metadata key and may be nil or an empty slice.
	// If the storage type is incorrect or something else is invalid with the data this should
	// return nil and an appropriate error.
	Deserialize(storageType DataType, data string) (ExtensionType, error)
	// contains filtered or unexported methods
}

ExtensionType is an interface for handling user-defined types. They must be DataTypes and must embed arrow.ExtensionBase in them in order to work properly ensuring that they always have the expected base behavior.

The arrow.ExtensionBase that needs to be embedded implements the DataType interface leaving the remaining functions having to be implemented by the actual user-defined type in order to be handled properly.

func GetExtensionType

func GetExtensionType(typName string) ExtensionType

GetExtensionType retrieves and returns the extension type of the given name from the global extension type registry. If the type isn't found it will return nil. This function is safe to call from multiple goroutines concurrently.

type Field

type Field struct {
	Name     string   // Field name
	Type     DataType // The field's data type
	Nullable bool     // Fields can be nullable
	Metadata Metadata // The field's metadata, if any
}

func (Field) Equal

func (f Field) Equal(o Field) bool

func (Field) Fingerprint

func (f Field) Fingerprint() string

func (Field) HasMetadata

func (f Field) HasMetadata() bool

func (Field) String

func (f Field) String() string

type FixedSizeBinaryType

type FixedSizeBinaryType struct {
	ByteWidth int
}

func (*FixedSizeBinaryType) BitWidth

func (t *FixedSizeBinaryType) BitWidth() int

func (*FixedSizeBinaryType) Fingerprint

func (t *FixedSizeBinaryType) Fingerprint() string

func (*FixedSizeBinaryType) ID

func (*FixedSizeBinaryType) ID() Type

func (*FixedSizeBinaryType) Name

func (*FixedSizeBinaryType) Name() string

func (*FixedSizeBinaryType) String

func (t *FixedSizeBinaryType) String() string

type FixedSizeListType

type FixedSizeListType struct {
	// contains filtered or unexported fields
}

FixedSizeListType describes a nested type in which each array slot contains a fixed-size sequence of values, all having the same relative type.

func FixedSizeListOf

func FixedSizeListOf(n int32, t DataType) *FixedSizeListType

FixedSizeListOf returns the list type with element type t. For example, if t represents int32, FixedSizeListOf(10, t) represents [10]int32.

FixedSizeListOf panics if t is nil or invalid. FixedSizeListOf panics if n is <= 0. NullableElem defaults to true

func FixedSizeListOfField

func FixedSizeListOfField(n int32, f Field) *FixedSizeListType

func FixedSizeListOfNonNullable

func FixedSizeListOfNonNullable(n int32, t DataType) *FixedSizeListType

FixedSizeListOfNonNullable is like FixedSizeListOf but NullableElem defaults to false indicating that the child type should be marked as non-nullable.

func (*FixedSizeListType) Elem

func (t *FixedSizeListType) Elem() DataType

Elem returns the FixedSizeListType's element type.

func (*FixedSizeListType) ElemField

func (t *FixedSizeListType) ElemField() Field

func (*FixedSizeListType) Fingerprint

func (t *FixedSizeListType) Fingerprint() string

func (*FixedSizeListType) ID

func (*FixedSizeListType) ID() Type

func (*FixedSizeListType) Len

func (t *FixedSizeListType) Len() int32

Len returns the FixedSizeListType's size.

func (*FixedSizeListType) Name

func (*FixedSizeListType) Name() string

func (*FixedSizeListType) String

func (t *FixedSizeListType) String() string

type FixedWidthDataType

type FixedWidthDataType interface {
	DataType
	// BitWidth returns the number of bits required to store a single element of this data type in memory.
	BitWidth() int
}

FixedWidthDataType is the representation of an Arrow type that requires a fixed number of bits in memory for each element.

type Float16Type

type Float16Type struct{}

Float16Type represents a floating point value encoded with a 16-bit precision.

func (*Float16Type) BitWidth

func (t *Float16Type) BitWidth() int

BitWidth returns the number of bits required to store a single element of this data type in memory.

func (*Float16Type) Fingerprint

func (t *Float16Type) Fingerprint() string

func (*Float16Type) ID

func (t *Float16Type) ID() Type

func (*Float16Type) Name

func (t *Float16Type) Name() string

func (*Float16Type) String

func (t *Float16Type) String() string

type Float32Type

type Float32Type struct{}

func (*Float32Type) BitWidth

func (t *Float32Type) BitWidth() int

func (*Float32Type) Fingerprint

func (t *Float32Type) Fingerprint() string

func (*Float32Type) ID

func (t *Float32Type) ID() Type

func (*Float32Type) Name

func (t *Float32Type) Name() string

func (*Float32Type) String

func (t *Float32Type) String() string

type Float64Type

type Float64Type struct{}

func (*Float64Type) BitWidth

func (t *Float64Type) BitWidth() int

func (*Float64Type) Fingerprint

func (t *Float64Type) Fingerprint() string

func (*Float64Type) ID

func (t *Float64Type) ID() Type

func (*Float64Type) Name

func (t *Float64Type) Name() string

func (*Float64Type) String

func (t *Float64Type) String() string

type Int16Type

type Int16Type struct{}

func (*Int16Type) BitWidth

func (t *Int16Type) BitWidth() int

func (*Int16Type) Fingerprint

func (t *Int16Type) Fingerprint() string

func (*Int16Type) ID

func (t *Int16Type) ID() Type

func (*Int16Type) Name

func (t *Int16Type) Name() string

func (*Int16Type) String

func (t *Int16Type) String() string

type Int32Type

type Int32Type struct{}

func (*Int32Type) BitWidth

func (t *Int32Type) BitWidth() int

func (*Int32Type) Fingerprint

func (t *Int32Type) Fingerprint() string

func (*Int32Type) ID

func (t *Int32Type) ID() Type

func (*Int32Type) Name

func (t *Int32Type) Name() string

func (*Int32Type) String

func (t *Int32Type) String() string

type Int64Type

type Int64Type struct{}

func (*Int64Type) BitWidth

func (t *Int64Type) BitWidth() int

func (*Int64Type) Fingerprint

func (t *Int64Type) Fingerprint() string

func (*Int64Type) ID

func (t *Int64Type) ID() Type

func (*Int64Type) Name

func (t *Int64Type) Name() string

func (*Int64Type) String

func (t *Int64Type) String() string

type Int8Type

type Int8Type struct{}

func (*Int8Type) BitWidth

func (t *Int8Type) BitWidth() int

func (*Int8Type) Fingerprint

func (t *Int8Type) Fingerprint() string

func (*Int8Type) ID

func (t *Int8Type) ID() Type

func (*Int8Type) Name

func (t *Int8Type) Name() string

func (*Int8Type) String

func (t *Int8Type) String() string

type ListType

type ListType struct {
	// contains filtered or unexported fields
}

ListType describes a nested type in which each array slot contains a variable-size sequence of values, all having the same relative type.

func ListOf

func ListOf(t DataType) *ListType

ListOf returns the list type with element type t. For example, if t represents int32, ListOf(t) represents []int32.

ListOf panics if t is nil or invalid. NullableElem defaults to true

func ListOfField

func ListOfField(f Field) *ListType

func ListOfNonNullable

func ListOfNonNullable(t DataType) *ListType

ListOfNonNullable is like ListOf but NullableElem defaults to false, indicating that the child type should be marked as non-nullable.

func (*ListType) Elem

func (t *ListType) Elem() DataType

Elem returns the ListType's element type.

func (*ListType) ElemField

func (t *ListType) ElemField() Field

func (*ListType) Fingerprint

func (t *ListType) Fingerprint() string

func (*ListType) ID

func (*ListType) ID() Type

func (*ListType) Name

func (*ListType) Name() string

func (*ListType) SetElemMetadata

func (t *ListType) SetElemMetadata(md Metadata)

func (*ListType) SetElemNullable

func (t *ListType) SetElemNullable(n bool)

func (*ListType) String

func (t *ListType) String() string

type MapType

type MapType struct {
	KeysSorted bool
	// contains filtered or unexported fields
}

func MapOf

func MapOf(key, item DataType) *MapType

func (*MapType) Fingerprint

func (t *MapType) Fingerprint() string

func (*MapType) ID

func (*MapType) ID() Type

func (*MapType) ItemField

func (t *MapType) ItemField() Field

func (*MapType) ItemType

func (t *MapType) ItemType() DataType

func (*MapType) KeyField

func (t *MapType) KeyField() Field

func (*MapType) KeyType

func (t *MapType) KeyType() DataType

func (*MapType) Name

func (*MapType) Name() string

func (*MapType) SetItemNullable

func (t *MapType) SetItemNullable(nullable bool)

func (*MapType) String

func (t *MapType) String() string

func (*MapType) ValueField

func (t *MapType) ValueField() Field

func (*MapType) ValueType

func (t *MapType) ValueType() *StructType

type Metadata

type Metadata struct {
	// contains filtered or unexported fields
}

func MetadataFrom

func MetadataFrom(kv map[string]string) Metadata

func NewMetadata

func NewMetadata(keys, values []string) Metadata

func (Metadata) Equal

func (md Metadata) Equal(rhs Metadata) bool

func (Metadata) FindKey

func (md Metadata) FindKey(k string) int

FindKey returns the index of the key-value pair with the provided key name, or -1 if such a key does not exist.

func (Metadata) Keys

func (md Metadata) Keys() []string

func (Metadata) Len

func (md Metadata) Len() int

func (Metadata) String

func (md Metadata) String() string

func (Metadata) Values

func (md Metadata) Values() []string

type MonthDayNanoInterval

type MonthDayNanoInterval struct {
	Months      int32 `json:"months"`
	Days        int32 `json:"days"`
	Nanoseconds int64 `json:"nanoseconds"`
}

MonthDayNanoInterval represents a number of months, days and nanoseconds (fraction of day).

type MonthDayNanoIntervalType

type MonthDayNanoIntervalType struct{}

MonthDayNanoIntervalType is encoded as two signed 32-bit integers representing a number of months and a number of days, followed by a 64-bit integer representing the number of nanoseconds since midnight for fractions of a day.

func (*MonthDayNanoIntervalType) BitWidth

func (*MonthDayNanoIntervalType) BitWidth() int

BitWidth returns the number of bits required to store a single element of this data type in memory.

func (*MonthDayNanoIntervalType) Fingerprint

func (*MonthDayNanoIntervalType) Fingerprint() string

func (*MonthDayNanoIntervalType) ID

func (*MonthDayNanoIntervalType) Name

func (*MonthDayNanoIntervalType) String

func (*MonthDayNanoIntervalType) String() string

type MonthInterval

type MonthInterval int32

MonthInterval represents a number of months.

type MonthIntervalType

type MonthIntervalType struct{}

MonthIntervalType is encoded as a 32-bit signed integer, representing a number of months.

func (*MonthIntervalType) BitWidth

func (t *MonthIntervalType) BitWidth() int

BitWidth returns the number of bits required to store a single element of this data type in memory.

func (*MonthIntervalType) Fingerprint

func (*MonthIntervalType) Fingerprint() string

func (*MonthIntervalType) ID

func (*MonthIntervalType) ID() Type

func (*MonthIntervalType) Name

func (*MonthIntervalType) Name() string

func (*MonthIntervalType) String

func (*MonthIntervalType) String() string

type NullType

type NullType struct{}

NullType describes a degenerate array, with zero physical storage.

var (
	Null *NullType
)

func (*NullType) Fingerprint

func (*NullType) Fingerprint() string

func (*NullType) ID

func (*NullType) ID() Type

func (*NullType) Name

func (*NullType) Name() string

func (*NullType) String

func (*NullType) String() string

type Schema

type Schema struct {
	// contains filtered or unexported fields
}

Schema is a sequence of Field values, describing the columns of a table or a record batch.

func NewSchema

func NewSchema(fields []Field, metadata *Metadata) *Schema

NewSchema returns a new Schema value from the slice of fields and metadata.

NewSchema panics if there is a field with an invalid DataType.

func (*Schema) Equal

func (sc *Schema) Equal(o *Schema) bool

Equal returns whether two schema are equal. Equal does not compare the metadata.

func (*Schema) Field

func (sc *Schema) Field(i int) Field

func (*Schema) FieldIndices

func (sc *Schema) FieldIndices(n string) []int

FieldIndices returns the indices of the named field or nil.

func (*Schema) Fields

func (sc *Schema) Fields() []Field

func (*Schema) FieldsByName

func (sc *Schema) FieldsByName(n string) ([]Field, bool)

func (*Schema) Fingerprint

func (s *Schema) Fingerprint() string

func (*Schema) HasField

func (sc *Schema) HasField(n string) bool

func (*Schema) HasMetadata

func (sc *Schema) HasMetadata() bool

func (*Schema) Metadata

func (sc *Schema) Metadata() Metadata

func (*Schema) String

func (s *Schema) String() string

type StringType

type StringType struct{}

func (*StringType) Fingerprint

func (t *StringType) Fingerprint() string

func (*StringType) ID

func (t *StringType) ID() Type

func (*StringType) Name

func (t *StringType) Name() string

func (*StringType) String

func (t *StringType) String() string

type StructType

type StructType struct {
	// contains filtered or unexported fields
}

StructType describes a nested type parameterized by an ordered sequence of relative types, called its fields.

func StructOf

func StructOf(fs ...Field) *StructType

StructOf returns the struct type with fields fs.

StructOf panics if there are duplicated fields. StructOf panics if there is a field with an invalid DataType.

func (*StructType) Field

func (t *StructType) Field(i int) Field

func (*StructType) FieldByName

func (t *StructType) FieldByName(name string) (Field, bool)

func (*StructType) FieldIdx

func (t *StructType) FieldIdx(name string) (int, bool)

func (*StructType) Fields

func (t *StructType) Fields() []Field

func (*StructType) Fingerprint

func (t *StructType) Fingerprint() string

func (*StructType) ID

func (*StructType) ID() Type

func (*StructType) Name

func (*StructType) Name() string

func (*StructType) String

func (t *StructType) String() string

type Time32

type Time32 int32

type Time32Type

type Time32Type struct {
	Unit TimeUnit
}

Time32Type is encoded as a 32-bit signed integer, representing either seconds or milliseconds since midnight.

func (*Time32Type) BitWidth

func (*Time32Type) BitWidth() int

func (*Time32Type) Fingerprint

func (t *Time32Type) Fingerprint() string

func (*Time32Type) ID

func (*Time32Type) ID() Type

func (*Time32Type) Name

func (*Time32Type) Name() string

func (*Time32Type) String

func (t *Time32Type) String() string

type Time64

type Time64 int64

type Time64Type

type Time64Type struct {
	Unit TimeUnit
}

Time64Type is encoded as a 64-bit signed integer, representing either microseconds or nanoseconds since midnight.

func (*Time64Type) BitWidth

func (*Time64Type) BitWidth() int

func (*Time64Type) Fingerprint

func (t *Time64Type) Fingerprint() string

func (*Time64Type) ID

func (*Time64Type) ID() Type

func (*Time64Type) Name

func (*Time64Type) Name() string

func (*Time64Type) String

func (t *Time64Type) String() string

type TimeUnit

type TimeUnit int
const (
	Nanosecond TimeUnit = iota
	Microsecond
	Millisecond
	Second
)

func (TimeUnit) Multiplier

func (u TimeUnit) Multiplier() time.Duration

func (TimeUnit) String

func (u TimeUnit) String() string

type Timestamp

type Timestamp int64

type TimestampType

type TimestampType struct {
	Unit     TimeUnit
	TimeZone string
}

TimestampType is encoded as a 64-bit signed integer since the UNIX epoch (2017-01-01T00:00:00Z). The zero-value is a nanosecond and time zone neutral. Time zone neutral can be considered UTC without having "UTC" as a time zone.

func (*TimestampType) BitWidth

func (*TimestampType) BitWidth() int

BitWidth returns the number of bits required to store a single element of this data type in memory.

func (*TimestampType) Fingerprint

func (t *TimestampType) Fingerprint() string

func (*TimestampType) ID

func (*TimestampType) ID() Type

func (*TimestampType) Name

func (*TimestampType) Name() string

func (*TimestampType) String

func (t *TimestampType) String() string

type Type

type Type int

Type is a logical type. They can be expressed as either a primitive physical type (bytes or bits of some fixed size), a nested type consisting of other data types, or another data type (e.g. a timestamp encoded as an int64)

const (
	// NULL type having no physical storage
	NULL Type = iota

	// BOOL is a 1 bit, LSB bit-packed ordering
	BOOL

	// UINT8 is an Unsigned 8-bit little-endian integer
	UINT8

	// INT8 is a Signed 8-bit little-endian integer
	INT8

	// UINT16 is an Unsigned 16-bit little-endian integer
	UINT16

	// INT16 is a Signed 16-bit little-endian integer
	INT16

	// UINT32 is an Unsigned 32-bit little-endian integer
	UINT32

	// INT32 is a Signed 32-bit little-endian integer
	INT32

	// UINT64 is an Unsigned 64-bit little-endian integer
	UINT64

	// INT64 is a Signed 64-bit little-endian integer
	INT64

	// FLOAT16 is a 2-byte floating point value
	FLOAT16

	// FLOAT32 is a 4-byte floating point value
	FLOAT32

	// FLOAT64 is an 8-byte floating point value
	FLOAT64

	// STRING is a UTF8 variable-length string
	STRING

	// BINARY is a Variable-length byte type (no guarantee of UTF8-ness)
	BINARY

	// FIXED_SIZE_BINARY is a binary where each value occupies the same number of bytes
	FIXED_SIZE_BINARY

	// DATE32 is int32 days since the UNIX epoch
	DATE32

	// DATE64 is int64 milliseconds since the UNIX epoch
	DATE64

	// TIMESTAMP is an exact timestamp encoded with int64 since UNIX epoch
	// Default unit millisecond
	TIMESTAMP

	// TIME32 is a signed 32-bit integer, representing either seconds or
	// milliseconds since midnight
	TIME32

	// TIME64 is a signed 64-bit integer, representing either microseconds or
	// nanoseconds since midnight
	TIME64

	// INTERVAL_MONTHS is YEAR_MONTH interval in SQL style
	INTERVAL_MONTHS

	// INTERVAL_DAY_TIME is DAY_TIME in SQL Style
	INTERVAL_DAY_TIME

	// DECIMAL128 is a precision- and scale-based decimal type. Storage type depends on the
	// parameters.
	DECIMAL128

	// DECIMAL256 is a precision and scale based decimal type, with 256 bit max. not yet implemented
	DECIMAL256

	// LIST is a list of some logical data type
	LIST

	// STRUCT of logical types
	STRUCT

	// SPARSE_UNION of logical types. not yet implemented
	SPARSE_UNION

	// DENSE_UNION of logical types. not yet implemented
	DENSE_UNION

	// DICTIONARY aka Category type
	DICTIONARY

	// MAP is a repeated struct logical type
	MAP

	// Custom data type, implemented by user
	EXTENSION

	// Fixed size list of some logical type
	FIXED_SIZE_LIST

	// Measure of elapsed time in either seconds, milliseconds, microseconds
	// or nanoseconds.
	DURATION

	// like STRING, but 64-bit offsets. not yet implemented
	LARGE_STRING

	// like BINARY but with 64-bit offsets, not yet implemented
	LARGE_BINARY

	// like LIST but with 64-bit offsets. not yet implmented
	LARGE_LIST

	// calendar interval with three fields
	INTERVAL_MONTH_DAY_NANO

	// INTERVAL could be any of the interval types, kept to avoid breaking anyone
	// after switching to individual type ids for the interval types that were using
	// it when calling MakeFromData or NewBuilder
	//
	// Deprecated and will be removed in the next major version release
	INTERVAL

	// Alias to ensure we do not break any consumers
	DECIMAL = DECIMAL128
)

func (Type) String

func (i Type) String() string

type TypeEqualOption

type TypeEqualOption func(*typeEqualsConfig)

TypeEqualOption is a functional option type used for configuring type equality checks.

func CheckMetadata

func CheckMetadata() TypeEqualOption

CheckMetadata is an option for TypeEqual that allows checking for metadata equality besides type equality. It only makes sense for STRUCT type.

type Uint16Type

type Uint16Type struct{}

func (*Uint16Type) BitWidth

func (t *Uint16Type) BitWidth() int

func (*Uint16Type) Fingerprint

func (t *Uint16Type) Fingerprint() string

func (*Uint16Type) ID

func (t *Uint16Type) ID() Type

func (*Uint16Type) Name

func (t *Uint16Type) Name() string

func (*Uint16Type) String

func (t *Uint16Type) String() string

type Uint32Type

type Uint32Type struct{}

func (*Uint32Type) BitWidth

func (t *Uint32Type) BitWidth() int

func (*Uint32Type) Fingerprint

func (t *Uint32Type) Fingerprint() string

func (*Uint32Type) ID

func (t *Uint32Type) ID() Type

func (*Uint32Type) Name

func (t *Uint32Type) Name() string

func (*Uint32Type) String

func (t *Uint32Type) String() string

type Uint64Type

type Uint64Type struct{}

func (*Uint64Type) BitWidth

func (t *Uint64Type) BitWidth() int

func (*Uint64Type) Fingerprint

func (t *Uint64Type) Fingerprint() string

func (*Uint64Type) ID

func (t *Uint64Type) ID() Type

func (*Uint64Type) Name

func (t *Uint64Type) Name() string

func (*Uint64Type) String

func (t *Uint64Type) String() string

type Uint8Type

type Uint8Type struct{}

func (*Uint8Type) BitWidth

func (t *Uint8Type) BitWidth() int

func (*Uint8Type) Fingerprint

func (t *Uint8Type) Fingerprint() string

func (*Uint8Type) ID

func (t *Uint8Type) ID() Type

func (*Uint8Type) Name

func (t *Uint8Type) Name() string

func (*Uint8Type) String

func (t *Uint8Type) String() string

Directories

Path Synopsis
_examples
_tools
Package array provides implementations of various Arrow array types.
Package array provides implementations of various Arrow array types.
Package arrio exposes functions to manipulate records, exposing and using interfaces not unlike the ones defined in the stdlib io package.
Package arrio exposes functions to manipulate records, exposing and using interfaces not unlike the ones defined in the stdlib io package.
Package csv reads CSV files and presents the extracted data as records, also writes data as record into CSV files
Package csv reads CSV files and presents the extracted data as records, also writes data as record into CSV files
internal
arrdata
Package arrdata exports arrays and records data ready to be used for tests.
Package arrdata exports arrays and records data ready to be used for tests.
arrjson
Package arrjson provides types and functions to encode and decode ARROW types and data to and from JSON files.
Package arrjson provides types and functions to encode and decode ARROW types and data to and from JSON files.
cpu
Package cpu implements processor feature detection used by the Go standard library.
Package cpu implements processor feature detection used by the Go standard library.
debug
Package debug provides APIs for conditional runtime assertions and debug logging.
Package debug provides APIs for conditional runtime assertions and debug logging.
flight_integration/cmd/arrow-flight-integration-client
Client for use with Arrow Flight Integration tests via archery
Client for use with Arrow Flight Integration tests via archery
testing/types
Package types contains user-defined types for use in the tests for the arrow package
Package types contains user-defined types for use in the tests for the arrow package
ipc
cmd/arrow-cat
Command arrow-cat displays the content of an Arrow stream or file.
Command arrow-cat displays the content of an Arrow stream or file.
cmd/arrow-ls
Command arrow-ls displays the listing of an Arrow file.
Command arrow-ls displays the listing of an Arrow file.
Package math provides optimized mathematical functions for processing Arrow arrays.
Package math provides optimized mathematical functions for processing Arrow arrays.
Package memory provides support for allocating and manipulating memory at a low level.
Package memory provides support for allocating and manipulating memory at a low level.
Package tensor provides types that implement n-dimensional arrays.
Package tensor provides types that implement n-dimensional arrays.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL