mergeplan

package
v2.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 26, 2024 License: Apache-2.0 Imports: 5 Imported by: 5

Documentation

Overview

Package mergeplan provides a segment merge planning approach that's inspired by Lucene's TieredMergePolicy.java and descriptions like http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Index

Constants

View Source
const MaxSegmentSizeLimit = 1<<31 - 1

MaxSegmentSizeLimit represents the maximum size of a segment, this limit comes with hit-1 optimisation/max encoding limit uint31.

Variables

View Source
var DefaultMergePlanOptions = MergePlanOptions{
	MaxSegmentsPerTier:   10,
	MaxSegmentSize:       5000000,
	MaxSegmentFileSize:   4000000000,
	TierGrowth:           10.0,
	SegmentsPerMergeTask: 10,
	FloorSegmentSize:     2000,
	ReclaimDeletesWeight: 2.0,
}

DefaultMergePlanOptions suggests the default options.

View Source
var ErrMaxSegmentSizeTooLarge = errors.New("MaxSegmentSize exceeds the size limit")

ErrMaxSegmentSizeTooLarge is returned when the size of the segment exceeds the MaxSegmentSizeLimit

View Source
var SingleSegmentMergePlanOptions = MergePlanOptions{
	MaxSegmentsPerTier:   1,
	MaxSegmentSize:       1 << 30,
	MaxSegmentFileSize:   1 << 40,
	TierGrowth:           1.0,
	SegmentsPerMergeTask: 10,
	FloorSegmentSize:     1 << 30,
	ReclaimDeletesWeight: 2.0,
}

SingleSegmentMergePlanOptions helps in creating a single segment index.

Functions

func CalcBudget

func CalcBudget(totalSize int64, firstTierSize int64, o *MergePlanOptions) (
	budgetNumSegments int)

Compute the number of segments that would be needed to cover the totalSize, by climbing up a logarithmically growing staircase of segment tiers.

func ScoreSegments

func ScoreSegments(segments []Segment, o *MergePlanOptions) float64

Smaller result score is better.

func ToBarChart

func ToBarChart(prefix string, barMax int, segments []Segment, plan *MergePlan) string

ToBarChart returns an ASCII rendering of the segments and the plan. The barMax is the max width of the bars in the bar chart.

func ValidateMergePlannerOptions

func ValidateMergePlannerOptions(options *MergePlanOptions) error

ValidateMergePlannerOptions validates the merge planner options

Types

type MergePlan

type MergePlan struct {
	Tasks []*MergeTask
}

A MergePlan is the result of the Plan() API.

The planner doesn’t know how or whether these tasks are executed -- that’s up to a separate merge execution system, which might execute these tasks concurrently or not, and which might execute all the tasks or not.

func Plan

func Plan(segments []Segment, o *MergePlanOptions) (*MergePlan, error)

Plan() will functionally compute a merge plan. A segment will be assigned to at most a single MergeTask in the output MergePlan. A segment not assigned to any MergeTask means the segment should remain unmerged.

type MergePlanOptions

type MergePlanOptions struct {
	// Max # segments per logarithmic tier, or max width of any
	// logarithmic “step”.  Smaller values mean more merging but fewer
	// segments.  Should be >= SegmentsPerMergeTask, else you'll have
	// too much merging.
	MaxSegmentsPerTier int

	// Max size of any segment produced after merging.  Actual
	// merging, however, may produce segment sizes different than the
	// planner’s predicted sizes.
	MaxSegmentSize int64

	// Max size (in bytes) of the persisted segment file that contains the
	// vectors.  This is used to prevent merging of segments that
	// contain vectors that are too large.
	MaxSegmentFileSize int64

	// The growth factor for each tier in a staircase of idealized
	// segments computed by CalcBudget().
	TierGrowth float64

	// The number of segments in any resulting MergeTask.  e.g.,
	// len(result.Tasks[ * ].Segments) == SegmentsPerMergeTask.
	SegmentsPerMergeTask int

	// Small segments are rounded up to this size, i.e., treated as
	// equal (floor) size for consideration.  This is to prevent lots
	// of tiny segments from resulting in a long tail in the index.
	FloorSegmentSize int64

	// Controls how aggressively merges that reclaim more deletions
	// are favored.  Higher values will more aggressively target
	// merges that reclaim deletions, but be careful not to go so high
	// that way too much merging takes place; a value of 3.0 is
	// probably nearly too high.  A value of 0.0 means deletions don't
	// impact merge selection.
	ReclaimDeletesWeight float64

	// Optional, defaults to mergeplan.CalcBudget().
	CalcBudget func(totalSize int64, firstTierSize int64,
		o *MergePlanOptions) (budgetNumSegments int)

	// Optional, defaults to mergeplan.ScoreSegments().
	ScoreSegments func(segments []Segment, o *MergePlanOptions) float64

	// Optional.
	Logger func(string)
}

The MergePlanOptions is designed to be reusable between planning calls.

func (*MergePlanOptions) RaiseToFloorSegmentSize

func (o *MergePlanOptions) RaiseToFloorSegmentSize(s int64) int64

Returns the higher of the input or FloorSegmentSize.

type MergeTask

type MergeTask struct {
	Segments []Segment
}

A MergeTask represents several segments that should be merged together into a single segment.

type Segment

type Segment interface {
	// Unique id of the segment -- used for sorting.
	Id() uint64

	// Full segment size (the size before any logical deletions).
	FullSize() int64

	// Size of the live data of the segment; i.e., FullSize() minus
	// any logical deletions.
	LiveSize() int64

	HasVector() bool

	// Size of the persisted segment file.
	FileSize() int64
}

A Segment represents the information that the planner needs to calculate segment merging.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL