Documentation ¶
Overview ¶
Package mergeplan provides a segment merge planning approach that's inspired by Lucene's TieredMergePolicy.java and descriptions like http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
Index ¶
- Constants
- Variables
- func CalcBudget(totalSize int64, firstTierSize int64, o *MergePlanOptions) (budgetNumSegments int)
- func ScoreSegments(segments []Segment, o *MergePlanOptions) float64
- func ToBarChart(prefix string, barMax int, segments []Segment, plan *MergePlan) string
- func ValidateMergePlannerOptions(options *MergePlanOptions) error
- type MergePlan
- type MergePlanOptions
- type MergeTask
- type Segment
Constants ¶
const MaxSegmentSizeLimit = 1<<31 - 1
MaxSegmentSizeLimit represents the maximum size of a segment, this limit comes with hit-1 optimisation/max encoding limit uint31.
Variables ¶
var DefaultMergePlanOptions = MergePlanOptions{
MaxSegmentsPerTier: 10,
MaxSegmentSize: 5000000,
MaxSegmentFileSize: 4000000000,
TierGrowth: 10.0,
SegmentsPerMergeTask: 10,
FloorSegmentSize: 2000,
ReclaimDeletesWeight: 2.0,
}
DefaultMergePlanOptions suggests the default options.
var ErrMaxSegmentSizeTooLarge = errors.New("MaxSegmentSize exceeds the size limit")
ErrMaxSegmentSizeTooLarge is returned when the size of the segment exceeds the MaxSegmentSizeLimit
var SingleSegmentMergePlanOptions = MergePlanOptions{
MaxSegmentsPerTier: 1,
MaxSegmentSize: 1 << 30,
MaxSegmentFileSize: 1 << 40,
TierGrowth: 1.0,
SegmentsPerMergeTask: 10,
FloorSegmentSize: 1 << 30,
ReclaimDeletesWeight: 2.0,
}
SingleSegmentMergePlanOptions helps in creating a single segment index.
Functions ¶
func CalcBudget ¶
func CalcBudget(totalSize int64, firstTierSize int64, o *MergePlanOptions) ( budgetNumSegments int)
Compute the number of segments that would be needed to cover the totalSize, by climbing up a logarithmically growing staircase of segment tiers.
func ScoreSegments ¶
func ScoreSegments(segments []Segment, o *MergePlanOptions) float64
Smaller result score is better.
func ToBarChart ¶
ToBarChart returns an ASCII rendering of the segments and the plan. The barMax is the max width of the bars in the bar chart.
func ValidateMergePlannerOptions ¶
func ValidateMergePlannerOptions(options *MergePlanOptions) error
ValidateMergePlannerOptions validates the merge planner options
Types ¶
type MergePlan ¶
type MergePlan struct {
Tasks []*MergeTask
}
A MergePlan is the result of the Plan() API.
The planner doesn’t know how or whether these tasks are executed -- that’s up to a separate merge execution system, which might execute these tasks concurrently or not, and which might execute all the tasks or not.
type MergePlanOptions ¶
type MergePlanOptions struct { // Max # segments per logarithmic tier, or max width of any // logarithmic “step”. Smaller values mean more merging but fewer // segments. Should be >= SegmentsPerMergeTask, else you'll have // too much merging. MaxSegmentsPerTier int // Max size of any segment produced after merging. Actual // merging, however, may produce segment sizes different than the // planner’s predicted sizes. MaxSegmentSize int64 // Max size (in bytes) of the persisted segment file that contains the // vectors. This is used to prevent merging of segments that // contain vectors that are too large. MaxSegmentFileSize int64 // The growth factor for each tier in a staircase of idealized // segments computed by CalcBudget(). TierGrowth float64 // The number of segments in any resulting MergeTask. e.g., // len(result.Tasks[ * ].Segments) == SegmentsPerMergeTask. SegmentsPerMergeTask int // Small segments are rounded up to this size, i.e., treated as // equal (floor) size for consideration. This is to prevent lots // of tiny segments from resulting in a long tail in the index. FloorSegmentSize int64 // Controls how aggressively merges that reclaim more deletions // are favored. Higher values will more aggressively target // merges that reclaim deletions, but be careful not to go so high // that way too much merging takes place; a value of 3.0 is // probably nearly too high. A value of 0.0 means deletions don't // impact merge selection. ReclaimDeletesWeight float64 // Optional, defaults to mergeplan.CalcBudget(). CalcBudget func(totalSize int64, firstTierSize int64, o *MergePlanOptions) (budgetNumSegments int) // Optional, defaults to mergeplan.ScoreSegments(). ScoreSegments func(segments []Segment, o *MergePlanOptions) float64 // Optional. Logger func(string) }
The MergePlanOptions is designed to be reusable between planning calls.
func (*MergePlanOptions) RaiseToFloorSegmentSize ¶
func (o *MergePlanOptions) RaiseToFloorSegmentSize(s int64) int64
Returns the higher of the input or FloorSegmentSize.
type MergeTask ¶
type MergeTask struct {
Segments []Segment
}
A MergeTask represents several segments that should be merged together into a single segment.
type Segment ¶
type Segment interface { // Unique id of the segment -- used for sorting. Id() uint64 // Full segment size (the size before any logical deletions). FullSize() int64 // Size of the live data of the segment; i.e., FullSize() minus // any logical deletions. LiveSize() int64 HasVector() bool // Size of the persisted segment file. FileSize() int64 }
A Segment represents the information that the planner needs to calculate segment merging.