Documentation ¶
Index ¶
- Constants
- Variables
- func AdjustRowCountForIndexScanByLimit(sctx planctx.PlanContext, dsStatsInfo, dsTableStats *property.StatsInfo, ...) float64
- func AdjustRowCountForTableScanByLimit(sctx planctx.PlanContext, dsStatsInfo, dsTableStats *property.StatsInfo, ...) float64
- func AvgColSize(c *statistics.Column, count int64, isKey bool) float64
- func AvgColSizeChunkFormat(c *statistics.Column, count int64) float64
- func AvgColSizeDataInDiskByRows(c *statistics.Column, count int64) float64
- func CalcTotalSelectivityForMVIdxPath(coll *statistics.HistColl, partialPaths []*planutil.AccessPath, ...) float64
- func ColumnEqualRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) (float64, error)
- func ColumnGreaterRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) float64
- func EstimateColsDNVWithMatchedLenFromUniqueIDs(ids []int64, schema *expression.Schema, profile *property.StatsInfo) (float64, int)
- func EstimateColsNDVWithMatchedLen(cols []*expression.Column, schema *expression.Schema, ...) (float64, int)
- func EstimateColumnNDV(tbl *statistics.Table, colID int64) (ndv float64)
- func EstimateFullJoinRowCount(sctx planctx.PlanContext, isCartesian bool, ...) float64
- func GetAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, ...) (size float64)
- func GetAvgRowSizeDataInDiskByRows(coll *statistics.HistColl, cols []*expression.Column) (size float64)
- func GetColumnRowCount(sctx planctx.PlanContext, c *statistics.Column, ranges []*ranger.Range, ...) (float64, error)
- func GetIndexAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, ...) (size float64)
- func GetRowCountByColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, ...) (result float64, err error)
- func GetRowCountByIndexRanges(sctx planctx.PlanContext, coll *statistics.HistColl, idxID int64, ...) (result float64, err error)
- func GetRowCountByIntColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, ...) (result float64, err error)
- func GetSelectivityByFilter(sctx planctx.PlanContext, coll *statistics.HistColl, ...) (ok bool, selectivity float64, err error)
- func GetTableAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, ...) (size float64)
- func PseudoAvgCountPerValue(t *statistics.Table) float64
- type StatsNode
Constants ¶
const ( IndexType = iota PkType ColType )
The type of the StatsNode.
const SelectionFactor = 0.8
SelectionFactor is the factor which is used to estimate the row count of selection.
Variables ¶
var ( CollectFilters4MVIndex func( sctx planctx.PlanContext, filters []expression.Expression, idxCols []*expression.Column, ) ( accessFilters, remainingFilters []expression.Expression, accessTp int, ) BuildPartialPaths4MVIndex func( sctx planctx.PlanContext, accessFilters []expression.Expression, idxCols []*expression.Column, mvIndex *model.IndexInfo, histColl *statistics.HistColl, ) ( partialPaths []*planutil.AccessPath, isIntersection bool, ok bool, err error, ) )
CollectFilters4MVIndex and BuildPartialPaths4MVIndex are for matching JSON expressions against mv index. This logic is shared between the estimation logic and the access path generation logic. But the two functions are defined in planner/core package and hard to move here. So we use this trick to avoid the import cycle.
var GetTblInfoForUsedStatsByPhysicalID func(sctx planctx.PlanContext, id int64) (fullName string, tblInfo *model.TableInfo)
GetTblInfoForUsedStatsByPhysicalID get table name, partition name and TableInfo that will be used to record used stats.
Functions ¶
func AdjustRowCountForIndexScanByLimit ¶
func AdjustRowCountForIndexScanByLimit(sctx planctx.PlanContext, dsStatsInfo, dsTableStats *property.StatsInfo, dsStatisticTable *statistics.Table, path *util.AccessPath, expectedCnt float64, desc bool) float64
AdjustRowCountForIndexScanByLimit will adjust the row count for table scan by limit. For a query like `select k from t using index(k) where k > 10 limit 1`, the row count of the index scan should be adjusted by the limit number 1, because only one row is returned.
func AdjustRowCountForTableScanByLimit ¶
func AdjustRowCountForTableScanByLimit(sctx planctx.PlanContext, dsStatsInfo, dsTableStats *property.StatsInfo, dsStatisticTable *statistics.Table, path *util.AccessPath, expectedCnt float64, desc bool) float64
AdjustRowCountForTableScanByLimit will adjust the row count for table scan by limit. For a query like `select pk from t using index(primary) where pk > 10 limit 1`, the row count of the table scan should be adjusted by the limit number 1, because only one row is returned.
func AvgColSize ¶
func AvgColSize(c *statistics.Column, count int64, isKey bool) float64
AvgColSize is the average column size of the histogram. These sizes are derived from function `encode` and `Datum::ConvertTo`, so we need to update them if those 2 functions are changed.
func AvgColSizeChunkFormat ¶
func AvgColSizeChunkFormat(c *statistics.Column, count int64) float64
AvgColSizeChunkFormat is the average column size of the histogram. These sizes are derived from function `Encode` and `DecodeToChunk`, so we need to update them if those 2 functions are changed.
func AvgColSizeDataInDiskByRows ¶
func AvgColSizeDataInDiskByRows(c *statistics.Column, count int64) float64
AvgColSizeDataInDiskByRows is the average column size of the histogram. These sizes are derived from `chunk.DataInDiskByRows` so we need to update them if those 2 functions are changed.
func CalcTotalSelectivityForMVIdxPath ¶
func CalcTotalSelectivityForMVIdxPath( coll *statistics.HistColl, partialPaths []*planutil.AccessPath, isIntersection bool, ) float64
CalcTotalSelectivityForMVIdxPath calculates the total selectivity for the given partial paths of an MV index merge path. It corresponds with the meaning of AccessPath.CountAfterAccess, as used in buildPartialPathUp4MVIndex. It uses the independence assumption to estimate the selectivity.
func ColumnEqualRowCount ¶
func ColumnEqualRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) (float64, error)
ColumnEqualRowCount estimates the row count where the column equals to value.
func ColumnGreaterRowCount ¶
func ColumnGreaterRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) float64
ColumnGreaterRowCount estimates the row count where the column greater than value.
func EstimateColsDNVWithMatchedLenFromUniqueIDs ¶
func EstimateColsDNVWithMatchedLenFromUniqueIDs(ids []int64, schema *expression.Schema, profile *property.StatsInfo) (float64, int)
EstimateColsDNVWithMatchedLenFromUniqueIDs is similar to EstimateColsDNVWithMatchedLen, but it receives UniqueIDs instead of Columns.
func EstimateColsNDVWithMatchedLen ¶
func EstimateColsNDVWithMatchedLen(cols []*expression.Column, schema *expression.Schema, profile *property.StatsInfo) (float64, int)
EstimateColsNDVWithMatchedLen returns the NDV of a couple of columns. If the columns match any GroupNDV maintained by child operator, we can get an accurate NDV. Otherwise, we simply return the max NDV among the columns, which is a lower bound.
func EstimateColumnNDV ¶
func EstimateColumnNDV(tbl *statistics.Table, colID int64) (ndv float64)
EstimateColumnNDV computes estimated NDV of specified column using the original histogram of `DataSource` which is retrieved from storage(not the derived one).
func EstimateFullJoinRowCount ¶
func EstimateFullJoinRowCount(sctx planctx.PlanContext, isCartesian bool, leftProfile, rightProfile *property.StatsInfo, leftJoinKeys, rightJoinKeys []*expression.Column, leftSchema, rightSchema *expression.Schema, leftNAJoinKeys, rightNAJoinKeys []*expression.Column) float64
EstimateFullJoinRowCount estimates the row count of a full join.
func GetAvgRowSize ¶
func GetAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, isEncodedKey bool, isForScan bool) (size float64)
GetAvgRowSize computes average row size for given columns.
func GetAvgRowSizeDataInDiskByRows ¶
func GetAvgRowSizeDataInDiskByRows(coll *statistics.HistColl, cols []*expression.Column) (size float64)
GetAvgRowSizeDataInDiskByRows computes average row size for given columns.
func GetColumnRowCount ¶
func GetColumnRowCount(sctx planctx.PlanContext, c *statistics.Column, ranges []*ranger.Range, realtimeRowCount, modifyCount int64, pkIsHandle bool) (float64, error)
GetColumnRowCount estimates the row count by a slice of Range.
func GetIndexAvgRowSize ¶
func GetIndexAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, isUnique bool) (size float64)
GetIndexAvgRowSize computes average row size for a index scan.
func GetRowCountByColumnRanges ¶
func GetRowCountByColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, colRanges []*ranger.Range) (result float64, err error)
GetRowCountByColumnRanges estimates the row count by a slice of Range.
func GetRowCountByIndexRanges ¶
func GetRowCountByIndexRanges(sctx planctx.PlanContext, coll *statistics.HistColl, idxID int64, indexRanges []*ranger.Range) (result float64, err error)
GetRowCountByIndexRanges estimates the row count by a slice of Range.
func GetRowCountByIntColumnRanges ¶
func GetRowCountByIntColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, intRanges []*ranger.Range) (result float64, err error)
GetRowCountByIntColumnRanges estimates the row count by a slice of IntColumnRange.
func GetSelectivityByFilter ¶
func GetSelectivityByFilter(sctx planctx.PlanContext, coll *statistics.HistColl, filters []expression.Expression) (ok bool, selectivity float64, err error)
GetSelectivityByFilter try to estimate selectivity of expressions by evaluate the expressions using TopN, Histogram buckets boundaries and NULL. Currently, this method can only handle expressions involving a single column.
func GetTableAvgRowSize ¶
func GetTableAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, storeType kv.StoreType, handleInCols bool) (size float64)
GetTableAvgRowSize computes average row size for a table scan, exclude the index key-value pairs.
func PseudoAvgCountPerValue ¶
func PseudoAvgCountPerValue(t *statistics.Table) float64
PseudoAvgCountPerValue gets a pseudo average count if histogram not exists.
Types ¶
type StatsNode ¶
type StatsNode struct { // Ranges contains all the Ranges we got. Ranges []*ranger.Range Tp int ID int64 // Selectivity indicates the Selectivity of this column/index. Selectivity float64 // contains filtered or unexported fields }
StatsNode is used for calculating selectivity.
func GetUsableSetsByGreedy ¶
GetUsableSetsByGreedy will select the indices and pk used for calculate selectivity by greedy algorithm.
func Selectivity ¶
func Selectivity( ctx planctx.PlanContext, coll *statistics.HistColl, exprs []expression.Expression, filledPaths []*planutil.AccessPath, ) ( result float64, retStatsNodes []*StatsNode, err error, )
Selectivity is a function calculate the selectivity of the expressions on the specified HistColl. The definition of selectivity is (row count after filter / row count before filter). And exprs must be CNF now, in other words, `exprs[0] and exprs[1] and ... and exprs[len - 1]` should be held when you call this. Currently, the time complexity is o(n^2).