elasticsearch

package
v0.0.0-...-48dec30 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 12, 2018 License: MIT Imports: 16 Imported by: 0

README

MySQL access to Elasticsearch

Elasticsearch is awesome, but sometimes querying it is not, and integrating it into other apps involves custom code.
It would be great to have mysql and all the tools that talk MySql to be able to access it.

Example Usage

This example imports a couple hours worth of historical data from https://www.githubarchive.org/ into a local elasticsearch server for example querying. It only imports github_issues and github_watch.

# from root of this repo, there is a docker-compose
# which has an elasticsearch server in it we will use for example.
cd github.com/dataux/dataux
docker-compose up

# assuming elasticsearch on localhost else --host=myeshost
# lets import the data into elasticsearch
cd backends/elasticsearch/importgithub
go build && ./importgithub

# using docker start a dataux server
docker run --rm -it --net=host -p 4000:4000 gcr.io/dataux-io/dataux:latest

# now that dataux is running use mysql-client to connect
mysql -h 127.0.0.1 -P 4000

now run some queries

show databases;

-- first register the schema
-- dataux will introspect the tables
-- to create schema for the tables
CREATE schema github_archive IF NOT EXISTS WITH {
  "type":"elasticsearch",
  "schema":"github_archive",
  "hosts": ["http://127.0.0.1:9200"]
};

-- now a sql tour
show databases;
use github_archive;

show tables;

describe github_watch;

select actor, repostory.url from github_watch limit 10;

select cardinality(`actor`) AS users_who_watched, min(`repository.id`) as oldest_repo from github_watch;

SELECT actor, `repository.name`, `repository.stargazers_count`, `repository.language`
FROM github_watch where `repository.language` = "Go";

select actor, repository.name from github_watch where repository.stargazers_count BETWEEN "1000" AND 1100;

SELECT actor, repository.organization AS org
FROM github_watch 
WHERE repository.created_at BETWEEN "2008-10-21T17:20:37Z" AND "2008-10-21T19:20:37Z";

select actor, repository.name from github_watch where repository.name IN ("node", "docker","d3","myicons", "bootstrap") limit 100;

select cardinality(`actor`) AS users_who_watched, count(*) as ct, min(`repository.id`) as oldest_repo
FROM github_watch
WHERE repository.description LIKE "database";


SQL -> Elasticsearch

ES API SQL Query
Aliases show tables;
Mapping describe mytable;
hits.total for filter select count(*) from table WHERE exists(a);
aggs min, max, avg, sum select min(year), max(year), avg(year), sum(year) from table WHERE exists(a);
filter: terms select * from table WHERE year IN (2015,2014,2013);
filter: gte, range select * from table WHERE year BETWEEN 2012 AND 2014

Configuration

  • tables_to_load If you do not want to load all tables.
CREATE schema github_archive IF NOT EXISTS WITH {
  "type":"elasticsearch",
  "schema":"github_archive",
  "tables_to_load":["indexa","indexb"],
  "hosts": ["http://127.0.0.1:9200"]
};

Documentation

Index

Constants

View Source
const (
	// SourceType defines the "source" type from qlbridge datasource
	SourceType = "elasticsearch"
)

Variables

View Source
var (
	// DefaultLimit is default page size limit.
	DefaultLimit = 1000
)

Functions

This section is empty.

Types

type ResultReader

type ResultReader struct {
	*exec.TaskBase

	Docs []u.JsonHelper
	Vals [][]driver.Value

	Total    int
	Aggs     u.JsonHelper
	ScrollId string
	Req      *SqlToEs
	// contains filtered or unexported fields
}

Elasticsearch ResultProvider, adapts the elasticsearch http json

to dataux/driver values

func NewResultReader

func NewResultReader(req *SqlToEs) *ResultReader

func (*ResultReader) Close

func (m *ResultReader) Close() error

func (*ResultReader) Columns

func (m *ResultReader) Columns() []string

func (*ResultReader) Run

func (m *ResultReader) Run() error

Run() Fetch api response, wait for response, then convert response into rows (static) and allow Next() to start iterating through them.

type ResultReaderNext

type ResultReaderNext struct {
	*ResultReader
}

A wrapper, allowing us to implement sql/driver Next() interface

which is different than qlbridge/datasource Next()

type Source

type Source struct {
	// contains filtered or unexported fields
}

Source is the elasticsearch datasource

func (*Source) Close

func (m *Source) Close() error

Close this source.

func (*Source) Init

func (m *Source) Init()

Init this source

func (*Source) Open

func (m *Source) Open(schemaName string) (schema.Conn, error)

Open open connection to elasticsearch source.

func (*Source) Setup

func (m *Source) Setup(ss *schema.Schema) error

Setup this source

func (*Source) Table

func (m *Source) Table(table string) (*schema.Table, error)

Table get a single table.

func (*Source) Tables

func (m *Source) Tables() []string

Tables list of tablenames

type SqlToEs

type SqlToEs struct {
	// contains filtered or unexported fields
}

SqlToEs Sql To Elasticsearch Request Object Map sql queries into Elasticsearch Json Requests

func NewSqlToEs

func NewSqlToEs(table *schema.Table) *SqlToEs

NewSqlToEs creates request for translating SqlToEs

func (*SqlToEs) Close

func (m *SqlToEs) Close() error

Close this request

func (*SqlToEs) Columns

func (m *SqlToEs) Columns() []string

func (*SqlToEs) Host

func (m *SqlToEs) Host() string

Host get es host

func (*SqlToEs) WalkAggs

func (m *SqlToEs) WalkAggs(cur expr.Node) (q esMap, _ error)

WalkAggs() aggregate expressions when used ast part of <select_list>

  • For Aggregates (functions) it builds aggs
  • For Projectsion (non-functions) it does nothing, that will be done later during projection

func (*SqlToEs) WalkExecSource

func (m *SqlToEs) WalkExecSource(p *plan.Source) (exec.Task, error)

func (*SqlToEs) WalkGroupBy

func (m *SqlToEs) WalkGroupBy() error

Aggregations from the <select_list>

WHERE .. GROUP BY x,y,z

func (*SqlToEs) WalkNode

func (m *SqlToEs) WalkNode(cur expr.Node, q *esMap) (value.Value, error)

Walk() an expression, and its AND/OR/() logic to create an appropriately

nested json document for elasticsearch queries

TODO:  think we need to separate Value Nodes from those that return es types?

func (*SqlToEs) WalkSelectList

func (m *SqlToEs) WalkSelectList() error

Aggregations from the <select_list>

SELECT <select_list> FROM ... WHERE

func (*SqlToEs) WalkSourceSelect

func (m *SqlToEs) WalkSourceSelect(planner plan.Planner, p *plan.Source) (plan.Task, error)

WalkSourceSelect is used during planning phase, to create a plan (plan.Task)

or error, and to report back any poly-fill necessary

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL