cardinality

package
v1.1.0-beta.0...-91bfa27 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 18, 2025 License: Apache-2.0 Imports: 36 Imported by: 0

Documentation

Index

Constants

View Source
const (
	IndexType = iota
	PkType
	ColType
)

The type of the StatsNode.

View Source
const SelectionFactor = 0.8

SelectionFactor is the factor which is used to estimate the row count of selection.

Variables

View Source
var (
	CollectFilters4MVIndex func(
		sctx planctx.PlanContext,
		filters []expression.Expression,
		idxCols []*expression.Column,
	) (
		accessFilters,
		remainingFilters []expression.Expression,
		accessTp int,
	)
	BuildPartialPaths4MVIndex func(
		sctx planctx.PlanContext,
		accessFilters []expression.Expression,
		idxCols []*expression.Column,
		mvIndex *model.IndexInfo,
		histColl *statistics.HistColl,
	) (
		partialPaths []*planutil.AccessPath,
		isIntersection bool,
		ok bool,
		err error,
	)
)

CollectFilters4MVIndex and BuildPartialPaths4MVIndex are for matching JSON expressions against mv index. This logic is shared between the estimation logic and the access path generation logic. But the two functions are defined in planner/core package and hard to move here. So we use this trick to avoid the import cycle.

View Source
var GetTblInfoForUsedStatsByPhysicalID func(sctx planctx.PlanContext, id int64) (fullName string, tblInfo *model.TableInfo)

GetTblInfoForUsedStatsByPhysicalID get table name, partition name and TableInfo that will be used to record used stats.

Functions

func AdjustRowCountForIndexScanByLimit

func AdjustRowCountForIndexScanByLimit(sctx planctx.PlanContext,
	dsStatsInfo, dsTableStats *property.StatsInfo, dsStatisticTable *statistics.Table,
	path *util.AccessPath, expectedCnt float64, desc bool) float64

AdjustRowCountForIndexScanByLimit will adjust the row count for table scan by limit. For a query like `select k from t using index(k) where k > 10 limit 1`, the row count of the index scan should be adjusted by the limit number 1, because only one row is returned.

func AdjustRowCountForTableScanByLimit

func AdjustRowCountForTableScanByLimit(sctx planctx.PlanContext,
	dsStatsInfo, dsTableStats *property.StatsInfo, dsStatisticTable *statistics.Table,
	path *util.AccessPath, expectedCnt float64, desc bool) float64

AdjustRowCountForTableScanByLimit will adjust the row count for table scan by limit. For a query like `select pk from t using index(primary) where pk > 10 limit 1`, the row count of the table scan should be adjusted by the limit number 1, because only one row is returned.

func AvgColSize

func AvgColSize(c *statistics.Column, count int64, isKey bool) float64

AvgColSize is the average column size of the histogram. These sizes are derived from function `encode` and `Datum::ConvertTo`, so we need to update them if those 2 functions are changed.

func AvgColSizeChunkFormat

func AvgColSizeChunkFormat(c *statistics.Column, count int64) float64

AvgColSizeChunkFormat is the average column size of the histogram. These sizes are derived from function `Encode` and `DecodeToChunk`, so we need to update them if those 2 functions are changed.

func AvgColSizeDataInDiskByRows

func AvgColSizeDataInDiskByRows(c *statistics.Column, count int64) float64

AvgColSizeDataInDiskByRows is the average column size of the histogram. These sizes are derived from `chunk.DataInDiskByRows` so we need to update them if those 2 functions are changed.

func CalcTotalSelectivityForMVIdxPath

func CalcTotalSelectivityForMVIdxPath(
	coll *statistics.HistColl,
	partialPaths []*planutil.AccessPath,
	isIntersection bool,
) float64

CalcTotalSelectivityForMVIdxPath calculates the total selectivity for the given partial paths of an MV index merge path. It corresponds with the meaning of AccessPath.CountAfterAccess, as used in buildPartialPathUp4MVIndex. It uses the independence assumption to estimate the selectivity.

func ColumnEqualRowCount

func ColumnEqualRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) (float64, error)

ColumnEqualRowCount estimates the row count where the column equals to value.

func ColumnGreaterRowCount

func ColumnGreaterRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) float64

ColumnGreaterRowCount estimates the row count where the column greater than value.

func EstimateColsDNVWithMatchedLenFromUniqueIDs

func EstimateColsDNVWithMatchedLenFromUniqueIDs(ids []int64, schema *expression.Schema, profile *property.StatsInfo) (float64, int)

EstimateColsDNVWithMatchedLenFromUniqueIDs is similar to EstimateColsDNVWithMatchedLen, but it receives UniqueIDs instead of Columns.

func EstimateColsNDVWithMatchedLen

func EstimateColsNDVWithMatchedLen(cols []*expression.Column, schema *expression.Schema, profile *property.StatsInfo) (float64, int)

EstimateColsNDVWithMatchedLen returns the NDV of a couple of columns. If the columns match any GroupNDV maintained by child operator, we can get an accurate NDV. Otherwise, we simply return the max NDV among the columns, which is a lower bound.

func EstimateColumnNDV

func EstimateColumnNDV(tbl *statistics.Table, colID int64) (ndv float64)

EstimateColumnNDV computes estimated NDV of specified column using the original histogram of `DataSource` which is retrieved from storage(not the derived one).

func EstimateFullJoinRowCount

func EstimateFullJoinRowCount(sctx planctx.PlanContext,
	isCartesian bool,
	leftProfile, rightProfile *property.StatsInfo,
	leftJoinKeys, rightJoinKeys []*expression.Column,
	leftSchema, rightSchema *expression.Schema,
	leftNAJoinKeys, rightNAJoinKeys []*expression.Column) float64

EstimateFullJoinRowCount estimates the row count of a full join.

func GetAvgRowSize

func GetAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, isEncodedKey bool, isForScan bool) (size float64)

GetAvgRowSize computes average row size for given columns.

func GetAvgRowSizeDataInDiskByRows

func GetAvgRowSizeDataInDiskByRows(coll *statistics.HistColl, cols []*expression.Column) (size float64)

GetAvgRowSizeDataInDiskByRows computes average row size for given columns.

func GetColumnRowCount

func GetColumnRowCount(sctx planctx.PlanContext, c *statistics.Column, ranges []*ranger.Range, realtimeRowCount, modifyCount int64, pkIsHandle bool) (float64, error)

GetColumnRowCount estimates the row count by a slice of Range.

func GetIndexAvgRowSize

func GetIndexAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, isUnique bool) (size float64)

GetIndexAvgRowSize computes average row size for a index scan.

func GetRowCountByColumnRanges

func GetRowCountByColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, colRanges []*ranger.Range) (result float64, err error)

GetRowCountByColumnRanges estimates the row count by a slice of Range.

func GetRowCountByIndexRanges

func GetRowCountByIndexRanges(sctx planctx.PlanContext, coll *statistics.HistColl, idxID int64, indexRanges []*ranger.Range) (result float64, err error)

GetRowCountByIndexRanges estimates the row count by a slice of Range.

func GetRowCountByIntColumnRanges

func GetRowCountByIntColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, intRanges []*ranger.Range) (result float64, err error)

GetRowCountByIntColumnRanges estimates the row count by a slice of IntColumnRange.

func GetSelectivityByFilter

func GetSelectivityByFilter(sctx planctx.PlanContext, coll *statistics.HistColl, filters []expression.Expression) (ok bool, selectivity float64, err error)

GetSelectivityByFilter try to estimate selectivity of expressions by evaluate the expressions using TopN, Histogram buckets boundaries and NULL. Currently, this method can only handle expressions involving a single column.

func GetTableAvgRowSize

func GetTableAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, storeType kv.StoreType, handleInCols bool) (size float64)

GetTableAvgRowSize computes average row size for a table scan, exclude the index key-value pairs.

func PseudoAvgCountPerValue

func PseudoAvgCountPerValue(t *statistics.Table) float64

PseudoAvgCountPerValue gets a pseudo average count if histogram not exists.

Types

type StatsNode

type StatsNode struct {
	// Ranges contains all the Ranges we got.
	Ranges []*ranger.Range
	Tp     int
	ID     int64

	// Selectivity indicates the Selectivity of this column/index.
	Selectivity float64
	// contains filtered or unexported fields
}

StatsNode is used for calculating selectivity.

func GetUsableSetsByGreedy

func GetUsableSetsByGreedy(nodes []*StatsNode) (newBlocks []*StatsNode)

GetUsableSetsByGreedy will select the indices and pk used for calculate selectivity by greedy algorithm.

func Selectivity

func Selectivity(
	ctx planctx.PlanContext,
	coll *statistics.HistColl,
	exprs []expression.Expression,
	filledPaths []*planutil.AccessPath,
) (
	result float64,
	retStatsNodes []*StatsNode,
	err error,
)

Selectivity is a function calculate the selectivity of the expressions on the specified HistColl. The definition of selectivity is (row count after filter / row count before filter). And exprs must be CNF now, in other words, `exprs[0] and exprs[1] and ... and exprs[len - 1]` should be held when you call this. Currently, the time complexity is o(n^2).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL