splitquery

package
v1.0.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2020 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package splitquery contains the logic needed for implementing the tabletserver's SplitQuery RPC.

It defines the Splitter type that drives the query splitting procedure. It cooperates with the SplitParams type and splitAlgorithmInterface interface. See example_test.go for a usage example.

General guidelines for contributing to this package: 1) Error messages should not contain the "splitquery:" prefix. It will be added by the calling code in 'tabletserver'.

Example
package main

import (
	"fmt"

	"github.com/xsec-lab/vitess/go/sqltypes"
	"github.com/xsec-lab/vitess/go/vt/sqlparser"
	"github.com/xsec-lab/vitess/go/vt/vttablet/tabletserver/schema"

	querypb "github.com/xsec-lab/vitess/go/vt/proto/query"
)

func main() {
	// 1. Create a SplitParams object.
	// There are two "constructors": NewSplitParamsGivenSplitCount and
	// NewSplitParamsGivenNumRowsPerQueryPart. They each take several parameters including a "schema"
	// object which should be a map[string]*schema.Table that maps a table name to its schema.Table
	// object. It is used for error-checking the split columns and their types. We use an empty
	// object for this toy example, but in real code this object must have correct entries.
	//
	// This schema can is typically derived from tabletserver.TabletServer.qe.se.
	schema := map[string]*schema.Table{}
	splitParams, err := NewSplitParamsGivenSplitCount(
		&querypb.BoundQuery{
			Sql:           "SELECT * FROM table WHERE id > :id",
			BindVariables: map[string]*querypb.BindVariable{"id": sqltypes.Int64BindVariable(5)},
		},
		[]sqlparser.ColIdent{
			sqlparser.NewColIdent("id"),
			sqlparser.NewColIdent("user_id"),
		}, // SplitColumns
		1000, // SplitCount
		schema)
	if err != nil {
		panic(fmt.Sprintf("NewSplitParamsGivenSplitCount failed with: %v", err))
	}

	// 2. Create the SplitAlgorithmInterface object used for splitting.
	// SplitQuery supports multiple algorithms for splitting the query. These are encapsulated as
	// types implementing the SplitAlgorithmInterface. Currently two algorithms are supported
	// represented by the FullScanAlgorithm and EqualSplitsAlgorithm types. See the documentation
	// of these types for more details on each algorithm.
	// To do the split we'll need to create an object of one of these types and pass it to the
	// Splitter (see below). Here we use the FullScan algorithm.
	// We also pass a type implementing the SQLExecuter interface that the algorithm will
	// use to send statements to MySQL.
	algorithm, err := NewFullScanAlgorithm(splitParams, getSQLExecuter())
	if err != nil {
		panic(fmt.Sprintf("NewFullScanAlgorithm failed with: %v", err))
	}

	// 3. Create a splitter object. Always succeeds.
	splitter := NewSplitter(splitParams, algorithm)

	// 4. Call splitter.Split() to Split the query.
	// The result is a slice of &querypb.QuerySplit objects (and an error object).
	queryParts, err := splitter.Split()
	if err != nil {
		panic(fmt.Sprintf("splitter.Split() failed with: %v", err))
	}
	fmt.Println(queryParts)
}

func getSQLExecuter() SQLExecuter {
	// In real code, this should be an object implementing the SQLExecuter interface.
	return nil
}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type EqualSplitsAlgorithm

type EqualSplitsAlgorithm struct {
	// contains filtered or unexported fields
}

EqualSplitsAlgorithm implements the SplitAlgorithmInterface and represents the equal-splits algorithm for generating the boundary tuples. If this algorithm is used then SplitParams.split_columns must contain only one split_column. Additionally, the split_column must have numeric type (integral or floating point).

The algorithm works by issuing a query to the database to find the minimum and maximum elements of the split column in the table referenced by the given SQL query. Denote these by min and max, respecitvely. The algorithm then "splits" the interval [min, max] into SplitParams.split_count sub-intervals of equal length: [a_1, a_2], [a_2, a_3],..., [a_{split_count}, a_{split_count+1}], where min=a_1 < a_2 < a_3 < ... < a_split_count < a_{split_count+1}=max. The boundary points returned by this algorithm are then: a_2, a_3, ..., a_{split_count} (an empty list of boundary points is returned if split_count <= 1). If the type of the split column is integral, the boundary points are truncated to the integer part.

func NewEqualSplitsAlgorithm

func NewEqualSplitsAlgorithm(splitParams *SplitParams, sqlExecuter SQLExecuter) (
	*EqualSplitsAlgorithm, error)

NewEqualSplitsAlgorithm constructs a new equal splits algorithm. It requires an SQLExecuter since it needs to execute a query to figure out the minimum and maximum elements in the table.

type FullScanAlgorithm

type FullScanAlgorithm struct {
	// contains filtered or unexported fields
}

FullScanAlgorithm implements the SplitAlgorithmInterface and represents the full-scan algorithm for generating the boundary tuples. The algorithm regards the table as ordered (ascendingly) by the split columns. It then returns boundary tuples from rows which are splitParams.numRowsPerQueryPart rows apart. More precisely, it works as follows: It iteratively executes the following query over the replica’s database (recall that MySQL performs tuple comparisons lexicographically):

SELECT <split_columns> FROM <table> FORCE INDEX (PRIMARY)
                       WHERE :prev_boundary <= (<split_columns>)
                       ORDER BY <split_columns>
                       LIMIT <num_rows_per_query_part>, 1

where <split_columns> denotes the ordered list of split columns and <table> is the value of the FROM clause. The 'prev_boundary' bind variable holds a tuple consisting of split column values. It is updated after each iteration with the result of the query. In the query executed in the first iteration (the initial query) the term ':prev_boundary <= (<split_columns>)' is omitted. The algorithm stops when the query returns no results. The result of this algorithm is the list consisting of the result of each query in order.

Actually, the code below differs slightly from the above description: the lexicographial tuple inequality in the query above is re-written to use only scalar comparisons since MySQL does not optimize queries involving tuple inequalities correctly. Instead of using a single tuple bind variable: 'prev_boundary', the code uses a list of scalar bind-variables--one for each element of the tuple. The bind variable storing the tuple element corresponding to a split-column named 'x' is called <prevBindVariablePrefix><x>, where prevBindVariablePrefix is the string constant defined below.

func NewFullScanAlgorithm

func NewFullScanAlgorithm(
	splitParams *SplitParams, sqlExecuter SQLExecuter) (*FullScanAlgorithm, error)

NewFullScanAlgorithm constructs a new FullScanAlgorithm.

type SQLExecuter

type SQLExecuter interface {
	SQLExecute(sql string, bindVariables map[string]*querypb.BindVariable) (*sqltypes.Result, error)
}

SQLExecuter enacpsulates access to the MySQL database for the this package.

type SplitAlgorithmInterface

type SplitAlgorithmInterface interface {
	// contains filtered or unexported methods
}

SplitAlgorithmInterface defines the interface for a splitting algorithm.

type SplitParams

type SplitParams struct {
	// contains filtered or unexported fields
}

SplitParams stores the context for a splitquery computation. It is used by both the Splitter object and the SplitAlgorithm object and caches some data that is used by both.

func NewSplitParamsGivenNumRowsPerQueryPart

func NewSplitParamsGivenNumRowsPerQueryPart(
	query *querypb.BoundQuery,
	splitColumnNames []sqlparser.ColIdent,
	numRowsPerQueryPart int64,
	schema map[string]*schema.Table,
) (*SplitParams, error)

NewSplitParamsGivenNumRowsPerQueryPart returns a new SplitParams object to be used in a splitquery request in which the Vitess client specified a numRowsPerQueryPart parameter. See NewSplitParamsGivenSplitCount for the constructor to use if the Vitess client specified a splitCount parameter.

Parameters:

'sql' is the SQL query to split. The query must satisfy the restrictions found in the documentation of the vtgate.SplitQueryRequest.query protocol buffer field.

'bindVariables' are the bind-variables for the sql query.

'splitColumnNames' should contain the names of split columns to use. These must adhere to the restrictions found in the documentation of the vtgate.SplitQueryRequest.split_column protocol buffer field. If splitColumnNames is empty, the split columns used are the primary key columns (in order).

'numRowsPerQueryPart' is the desired number of rows per query part returned. The actual number may be different depending on the split-algorithm used.

'schema' should map a table name to a schema.Table. It is used for looking up the split-column types and error checking.

func NewSplitParamsGivenSplitCount

func NewSplitParamsGivenSplitCount(
	query *querypb.BoundQuery,
	splitColumnNames []sqlparser.ColIdent,
	splitCount int64,
	schema map[string]*schema.Table,
) (*SplitParams, error)

NewSplitParamsGivenSplitCount returns a new SplitParams object to be used in a splitquery request in which the Vitess client specified a splitCount parameter. See NewSplitParamsGivenNumRowsPerQueryPart for the constructor to use if the Vitess client specified a numRowsPerQueryPart parameter.

Parameters:

'sql' is the SQL query to split. The query must satisfy the restrictions found in the documentation of the vtgate.SplitQueryRequest.query protocol buffer field.

'bindVariables' are the bind-variables for the sql query.

'splitColumnNames' should contain the names of split columns to use. These must adhere to the restrictions found in the documentation of the vtgate.SplitQueryRequest.split_column protocol buffer field. If splitColumnNames is empty, the split columns used are the primary key columns (in order).

'splitCount' is the desired splitCount to use. The actual number may be different depending on the split-algorithm used.

'schema' should map a table name to a schema.Table. It is used for looking up the split-column types and error checking.

func (*SplitParams) GetSplitTableName

func (sp *SplitParams) GetSplitTableName() sqlparser.TableIdent

GetSplitTableName returns the name of the table to split.

type Splitter

type Splitter struct {
	// contains filtered or unexported fields
}

Splitter is used to drive the splitting procedure.

func NewSplitter

func NewSplitter(splitParams *SplitParams, algorithm SplitAlgorithmInterface) *Splitter

NewSplitter creates a new Splitter object.

func (*Splitter) Split

func (splitter *Splitter) Split() ([]*querypb.QuerySplit, error)

Split does the actual work of splitting the query. It returns a slice of *querypb.QuerySplit objects representing the query parts.

Directories

Path Synopsis
Package splitquery_testing is a generated GoMock package.
Package splitquery_testing is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL