impalathing

package module
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2022 License: MIT Imports: 10 Imported by: 0

README

Impalathing is a small Go wrapper library the thrift interface go Impala

It's based on hivething

Working on this you quickly realize that having strings deliminated by tabs is a ugly API... (That's the thrift side of things)

Usage

package main

import (
    "log"
    "fmt"
    "time"
    "github.com/koblas/impalathing"
)

func main() {
    host := "impala-host"
    port := 21000

    con, err := impalathing.Connect(host, port, impalathing.DefaultOptions)

    if err != nil {
        log.Fatal("Error connecting", err)
        return
    }

    query, err := con.Query("SELECT user_id, action, yyyymm FROM engagements LIMIT 10000")

    startTime := time.Now()
    total := 0
    for query.Next() {
        var (
            user_id     string
            action      string
            yyyymm      int
        )

        query.Scan(&user_id, &action, &yyyymm)
        total += 1

        fmt.Println(user_id, action)
    }

    log.Printf("Fetch %d rows(s) in %.2fs", total, time.Duration(time.Since(startTime)).Seconds())
}

NOTES: changes for MediaMath:

* there are a few updates and improvements in connection and rowset:
     - fix connection leaks
     - fix rowset metadata
* interfaces are from whatver impala build we are using (currently 3.0.1)
* interfaces are edited to build thrift interfaces for Go. In particular  we need to work around list (in go []slice) map keys, which are
not valid Go. For now I have manually removed an optional member struct: SkewedInfo

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	DefaultOptions = Options{PollIntervalSeconds: 0.1, BatchSize: 10000}
)

Functions

This section is empty.

Types

type Connection

type Connection struct {
	Host    string
	Port    int
	Timeout time.Duration
	// contains filtered or unexported fields
}

func Connect

func Connect(ctx context.Context, host string, port int, options Options, timeout time.Duration) (*Connection, error)

func (*Connection) Close

func (c *Connection) Close() error

func (*Connection) CloseInsert

func (c *Connection) CloseInsert(ctx context.Context, handle *beeswax.QueryHandle) (map[string]int64, error)

func (*Connection) CloseQuery

func (c *Connection) CloseQuery(ctx context.Context, handle *beeswax.QueryHandle) error

func (*Connection) ExecuteAndWait

func (c *Connection) ExecuteAndWait(ctx context.Context, query string) (RowSet, error)

func (*Connection) Query

func (c *Connection) Query(ctx context.Context, query string) (RowSet, error)

type Options

type Options struct {
	PollIntervalSeconds float64
	BatchSize           int64
}

type RowSet

type RowSet interface {
	Columns() []string
	Next() bool
	Scan(dest ...interface{}) error
	GetRow() ([]string, error)
	Poll() (*Status, error)
	Wait() (*Status, error)
	FetchAll() []map[string]interface{}
	MapScan(dest map[string]interface{}) error
	Handle() *beeswax.QueryHandle
	Cancel() error
}

A RowSet represents an asyncronous hive operation. You can Reattach to a previously submitted hive operation if you have a valid thrift client, and the serialized Handle() from the prior operation.

type Status

type Status struct {
	Error error
	// contains filtered or unexported fields
}

Represents job status, including success state and time the status was updated.

func (*Status) IsComplete

func (s *Status) IsComplete() bool

func (*Status) IsSuccess

func (s *Status) IsSuccess() bool

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL