cardinality

package

v1.1.0-beta.0...-a97aa45 Latest Latest Go to latest Published: Sep 20, 2024 License: Apache-2.0 Imports: 36 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/pingcap/tidb

Documentation ¶

Index ¶

Constants
Variables
func AdjustRowCountForIndexScanByLimit(sctx planctx.PlanContext, dsStatsInfo, dsTableStats *property.StatsInfo, ...) float64
func AdjustRowCountForTableScanByLimit(sctx planctx.PlanContext, dsStatsInfo, dsTableStats *property.StatsInfo, ...) float64
func AvgColSize(c *statistics.Column, count int64, isKey bool) float64
func AvgColSizeChunkFormat(c *statistics.Column, count int64) float64
func AvgColSizeDataInDiskByRows(c *statistics.Column, count int64) float64
func CalcTotalSelectivityForMVIdxPath(coll *statistics.HistColl, partialPaths []*planutil.AccessPath, ...) float64
func ColumnEqualRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) (float64, error)
func ColumnGreaterRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) float64
func EstimateColsDNVWithMatchedLenFromUniqueIDs(ids []int64, schema *expression.Schema, profile *property.StatsInfo) (float64, int)
func EstimateColsNDVWithMatchedLen(cols []*expression.Column, schema *expression.Schema, ...) (float64, int)
func EstimateColumnNDV(tbl *statistics.Table, colID int64) (ndv float64)
func EstimateFullJoinRowCount(sctx planctx.PlanContext, isCartesian bool, ...) float64
func GetAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, ...) (size float64)
func GetAvgRowSizeDataInDiskByRows(coll *statistics.HistColl, cols []*expression.Column) (size float64)
func GetColumnRowCount(sctx planctx.PlanContext, c *statistics.Column, ranges []*ranger.Range, ...) (float64, error)
func GetIndexAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, ...) (size float64)
func GetRowCountByColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, ...) (result float64, err error)
func GetRowCountByIndexRanges(sctx planctx.PlanContext, coll *statistics.HistColl, idxID int64, ...) (result float64, err error)
func GetRowCountByIntColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, ...) (result float64, err error)
func GetSelectivityByFilter(sctx planctx.PlanContext, coll *statistics.HistColl, ...) (ok bool, selectivity float64, err error)
func GetTableAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, ...) (size float64)
func PseudoAvgCountPerValue(t *statistics.Table) float64
type StatsNode
- func GetUsableSetsByGreedy(nodes []*StatsNode) (newBlocks []*StatsNode)
- func Selectivity(ctx planctx.PlanContext, coll *statistics.HistColl, ...) (result float64, retStatsNodes []*StatsNode, err error)

Constants ¶

View Source

const (
	IndexType = iota
	PkType
	ColType
)

The type of the StatsNode.

View Source

const SelectionFactor = 0.8

SelectionFactor is the factor which is used to estimate the row count of selection.

Variables ¶

View Source

var (
	CollectFilters4MVIndex func(
		sctx planctx.PlanContext,
		filters []expression.Expression,
		idxCols []*expression.Column,
	) (
		accessFilters,
		remainingFilters []expression.Expression,
		accessTp int,
	)
	BuildPartialPaths4MVIndex func(
		sctx planctx.PlanContext,
		accessFilters []expression.Expression,
		idxCols []*expression.Column,
		mvIndex *model.IndexInfo,
		histColl *statistics.HistColl,
	) (
		partialPaths []*planutil.AccessPath,
		isIntersection bool,
		ok bool,
		err error,
	)
)

CollectFilters4MVIndex and BuildPartialPaths4MVIndex are for matching JSON expressions against mv index. This logic is shared between the estimation logic and the access path generation logic. But the two functions are defined in planner/core package and hard to move here. So we use this trick to avoid the import cycle.

View Source

var GetTblInfoForUsedStatsByPhysicalID func(sctx planctx.PlanContext, id int64) (fullName string, tblInfo *model.TableInfo)

GetTblInfoForUsedStatsByPhysicalID get table name, partition name and TableInfo that will be used to record used stats.

Functions ¶

func AdjustRowCountForIndexScanByLimit ¶

func AdjustRowCountForIndexScanByLimit(sctx planctx.PlanContext,
	dsStatsInfo, dsTableStats *property.StatsInfo, dsStatisticTable *statistics.Table,
	path *util.AccessPath, expectedCnt float64, desc bool) float64

AdjustRowCountForIndexScanByLimit will adjust the row count for table scan by limit. For a query like `select k from t using index(k) where k > 10 limit 1`, the row count of the index scan should be adjusted by the limit number 1, because only one row is returned.

func AdjustRowCountForTableScanByLimit ¶

func AdjustRowCountForTableScanByLimit(sctx planctx.PlanContext,
	dsStatsInfo, dsTableStats *property.StatsInfo, dsStatisticTable *statistics.Table,
	path *util.AccessPath, expectedCnt float64, desc bool) float64

AdjustRowCountForTableScanByLimit will adjust the row count for table scan by limit. For a query like `select pk from t using index(primary) where pk > 10 limit 1`, the row count of the table scan should be adjusted by the limit number 1, because only one row is returned.

func AvgColSize ¶

func AvgColSize(c *statistics.Column, count int64, isKey bool) float64

AvgColSize is the average column size of the histogram. These sizes are derived from function `encode` and `Datum::ConvertTo`, so we need to update them if those 2 functions are changed.

func AvgColSizeChunkFormat ¶

func AvgColSizeChunkFormat(c *statistics.Column, count int64) float64

AvgColSizeChunkFormat is the average column size of the histogram. These sizes are derived from function `Encode` and `DecodeToChunk`, so we need to update them if those 2 functions are changed.

func AvgColSizeDataInDiskByRows ¶

func AvgColSizeDataInDiskByRows(c *statistics.Column, count int64) float64

AvgColSizeDataInDiskByRows is the average column size of the histogram. These sizes are derived from `chunk.DataInDiskByRows` so we need to update them if those 2 functions are changed.

func CalcTotalSelectivityForMVIdxPath ¶

func CalcTotalSelectivityForMVIdxPath(
	coll *statistics.HistColl,
	partialPaths []*planutil.AccessPath,
	isIntersection bool,
) float64

CalcTotalSelectivityForMVIdxPath calculates the total selectivity for the given partial paths of an MV index merge path. It corresponds with the meaning of AccessPath.CountAfterAccess, as used in buildPartialPathUp4MVIndex. It uses the independence assumption to estimate the selectivity.

func ColumnEqualRowCount ¶

func ColumnEqualRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) (float64, error)

ColumnEqualRowCount estimates the row count where the column equals to value.

func ColumnGreaterRowCount ¶

func ColumnGreaterRowCount(sctx planctx.PlanContext, t *statistics.Table, value types.Datum, colID int64) float64

ColumnGreaterRowCount estimates the row count where the column greater than value.

func EstimateColsDNVWithMatchedLenFromUniqueIDs ¶

func EstimateColsDNVWithMatchedLenFromUniqueIDs(ids []int64, schema *expression.Schema, profile *property.StatsInfo) (float64, int)

EstimateColsDNVWithMatchedLenFromUniqueIDs is similar to EstimateColsDNVWithMatchedLen, but it receives UniqueIDs instead of Columns.

func EstimateColsNDVWithMatchedLen ¶

func EstimateColsNDVWithMatchedLen(cols []*expression.Column, schema *expression.Schema, profile *property.StatsInfo) (float64, int)

EstimateColsNDVWithMatchedLen returns the NDV of a couple of columns. If the columns match any GroupNDV maintained by child operator, we can get an accurate NDV. Otherwise, we simply return the max NDV among the columns, which is a lower bound.

func EstimateColumnNDV ¶

func EstimateColumnNDV(tbl *statistics.Table, colID int64) (ndv float64)

EstimateColumnNDV computes estimated NDV of specified column using the original histogram of `DataSource` which is retrieved from storage(not the derived one).

func EstimateFullJoinRowCount ¶

func EstimateFullJoinRowCount(sctx planctx.PlanContext,
	isCartesian bool,
	leftProfile, rightProfile *property.StatsInfo,
	leftJoinKeys, rightJoinKeys []*expression.Column,
	leftSchema, rightSchema *expression.Schema,
	leftNAJoinKeys, rightNAJoinKeys []*expression.Column) float64

EstimateFullJoinRowCount estimates the row count of a full join.

func GetAvgRowSize ¶

func GetAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, isEncodedKey bool, isForScan bool) (size float64)

GetAvgRowSize computes average row size for given columns.

func GetAvgRowSizeDataInDiskByRows ¶

func GetAvgRowSizeDataInDiskByRows(coll *statistics.HistColl, cols []*expression.Column) (size float64)

GetAvgRowSizeDataInDiskByRows computes average row size for given columns.

func GetColumnRowCount ¶

func GetColumnRowCount(sctx planctx.PlanContext, c *statistics.Column, ranges []*ranger.Range, realtimeRowCount, modifyCount int64, pkIsHandle bool) (float64, error)

GetColumnRowCount estimates the row count by a slice of Range.

func GetIndexAvgRowSize ¶

func GetIndexAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, isUnique bool) (size float64)

GetIndexAvgRowSize computes average row size for a index scan.

func GetRowCountByColumnRanges ¶

func GetRowCountByColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, colRanges []*ranger.Range) (result float64, err error)

GetRowCountByColumnRanges estimates the row count by a slice of Range.

func GetRowCountByIndexRanges ¶

func GetRowCountByIndexRanges(sctx planctx.PlanContext, coll *statistics.HistColl, idxID int64, indexRanges []*ranger.Range) (result float64, err error)

GetRowCountByIndexRanges estimates the row count by a slice of Range.

func GetRowCountByIntColumnRanges ¶

func GetRowCountByIntColumnRanges(sctx planctx.PlanContext, coll *statistics.HistColl, colUniqueID int64, intRanges []*ranger.Range) (result float64, err error)

GetRowCountByIntColumnRanges estimates the row count by a slice of IntColumnRange.

func GetSelectivityByFilter ¶

func GetSelectivityByFilter(sctx planctx.PlanContext, coll *statistics.HistColl, filters []expression.Expression) (ok bool, selectivity float64, err error)

GetSelectivityByFilter try to estimate selectivity of expressions by evaluate the expressions using TopN, Histogram buckets boundaries and NULL. Currently, this method can only handle expressions involving a single column.

func GetTableAvgRowSize ¶

func GetTableAvgRowSize(ctx planctx.PlanContext, coll *statistics.HistColl, cols []*expression.Column, storeType kv.StoreType, handleInCols bool) (size float64)

GetTableAvgRowSize computes average row size for a table scan, exclude the index key-value pairs.

func PseudoAvgCountPerValue ¶

func PseudoAvgCountPerValue(t *statistics.Table) float64

PseudoAvgCountPerValue gets a pseudo average count if histogram not exists.

Types ¶

type StatsNode ¶

type StatsNode struct {
	// Ranges contains all the Ranges we got.
	Ranges []*ranger.Range
	Tp     int
	ID     int64

	// Selectivity indicates the Selectivity of this column/index.
	Selectivity float64
	// contains filtered or unexported fields
}

StatsNode is used for calculating selectivity.

func GetUsableSetsByGreedy ¶

func GetUsableSetsByGreedy(nodes []*StatsNode) (newBlocks []*StatsNode)

GetUsableSetsByGreedy will select the indices and pk used for calculate selectivity by greedy algorithm.

func Selectivity ¶

func Selectivity(
	ctx planctx.PlanContext,
	coll *statistics.HistColl,
	exprs []expression.Expression,
	filledPaths []*planutil.AccessPath,
) (
	result float64,
	retStatsNodes []*StatsNode,
	err error,
)

Selectivity is a function calculate the selectivity of the expressions on the specified HistColl. The definition of selectivity is (row count after filter / row count before filter). And exprs must be CNF now, in other words, `exprs[0] and exprs[1] and ... and exprs[len - 1]` should be held when you call this. Currently, the time complexity is o(n^2).

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL