splitter

package
v0.0.0-...-1643519 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 16, 2024 License: Apache-2.0 Imports: 4 Imported by: 1

Documentation

Overview

Package splitter implements SplitIntoRanges function useful when splitting large datastore queries into a bunch of smaller queries with approximately evenly-sized result sets.

It is based on __scatter__ magical property. For more info see: https://github.com/GoogleCloudPlatform/appengine-mapreduce/wiki/ScatterPropertyImplementation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Params

type Params struct {
	// Shards is maximum number of key ranges to return.
	//
	// Should be >=1. The function may return fewer key ranges if the query has
	// very few results. In the most extreme case it can return one shard that
	// covers the entirety of the key space.
	Shards int

	// Samples tells how many random entities to sample when deciding where to
	// split the query.
	//
	// Higher number of samples means better accuracy of the split in exchange for
	// slower execution of SplitIntoRanges. For large number of shards (hundreds),
	// number of samples can be set to number of shards. For small number of
	// shards (tens), it makes sense to sample 16x or even 32x more entities.
	//
	// If Samples is 0, default of 512 will be used. If Shards >= Samples, Shards
	// will be used instead.
	Samples int
}

Params are passed to SplitIntoRanges.

See the doc for SplitIntoRanges for more info.

type Range

type Range struct {
	Start *datastore.Key // if nil, then the range represents (0x000..., End]
	End   *datastore.Key // if nil, then the range represents (Start, 0xfff...)
}

Range represents a range of datastore keys (Start, End].

func SplitIntoRanges

func SplitIntoRanges(c context.Context, q *datastore.Query, p Params) ([]Range, error)

SplitIntoRanges returns a list of key ranges (up to 'Shards') that together cover the results of the provided query.

When all query results are fetched and split between returned ranges, sizes of resulting buckets are approximately even.

Internally uses magical entity property __scatter__. It is set on ~0.8% of datastore entities. Querying a bunch of entities ordered by __scatter__ returns a pseudorandom sample of entities that match the query. To improve chances of a more even split, we query 'Samples' entities, and then pick the split points evenly among them.

If the given query has filters, SplitIntoRanges may need a corresponding composite index that includes __scatter__ field.

May return fewer ranges than requested if it detects there are too few entities. In extreme case may return a single range (000..., fff...) represented by Range struct with 'Start' and 'End' both set to nil.

func (Range) Apply

func (r Range) Apply(q *datastore.Query) *datastore.Query

Apply adds >Start and <=End filters to the query and returns the resulting query.

func (Range) IsEmpty

func (r Range) IsEmpty() bool

IsEmpty is true if the range represents an empty set.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL