colldata

package
v0.19.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2024 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Index

Constants

View Source
const PadToMax = math.MaxInt32

Variables

This section is empty.

Functions

func Merge

Merge returns a Coercion function for a pair of TypedCollation based on their coercibility.

The function takes the typed collations for the two sides of a text operation (namely, a comparison or concatenation of two textual expressions). These typed collations includes the actual collation for the expression on each size, their coercibility values (see: Coercibility) and their respective repertoires, and returns the target collation (i.e. the collation into which the two expressions must be coerced, and a Coercion function. The Coercion function can be called repeatedly with the different values for the two expressions and will transcode either the left-hand or right-hand value to the appropriate charset so it can be collated against the other value.

If the collations for both sides of the expressions are the same, the returned Coercion function will be a no-op. Likewise, if the two collations are not the same, but they are compatible and have the same charset, the Coercion function will also be a no-op.

If the collations for both sides of the expression are not compatible, an error will be returned and the returned TypedCollation and Coercion will be nil.

Types

type CaseAwareCollation

type CaseAwareCollation interface {
	Collation
	ToUpper(dst []byte, src []byte) []byte
	ToLower(dst []byte, src []byte) []byte
}

CaseAwareCollation implements lowercase and uppercase conventions for collations.

type Charset

type Charset = charset.Charset

type Coercion

type Coercion func(dst, in []byte) ([]byte, error)

Coercion is a function that will transform either the given argument arguments of the function into a specific character set. The `dst` argument will be used as the destination of the coerced argument, but it can be nil.

type CoercionOptions

type CoercionOptions struct {
	// ConvertToSuperset allows merging two different collations as long
	// as the charset of one of them is a strict superset of the other. In
	// order to operate on the two expressions, one of them will need to
	// be transcoded. This transcoding will always be safe because the string
	// with the smallest repertoire will be transcoded to its superset, which
	// cannot fail.
	ConvertToSuperset bool

	// ConvertWithCoercion allows merging two different collations by forcing
	// a coercion as long as the coercibility of the two sides is lax enough.
	// This will force a transcoding of one of the expressions even if their
	// respective charsets are not a strict superset, so the resulting transcoding
	// CAN fail depending on the content of their strings.
	ConvertWithCoercion bool
}

CoercionOptions is used to configure how aggressive the algorithm can be when merging two different collations by transcoding them.

type Collation

type Collation interface {
	// ID returns the numerical identifier for this collation. This is the same
	// value that is returned by MySQL in a query's headers to identify the collation
	// for a given column
	ID() collations.ID

	// Name is the full name of this collation, in the form of "ENCODING_LANG_SENSITIVITY"
	Name() string

	// Collate compares two strings using this collation. `left` and `right` must be the
	// two strings encoded in the proper encoding for this collation. If `isPrefix` is true,
	// the function instead behaves equivalently to `strings.HasPrefix(left, right)`, but
	// being collation-aware.
	// It returns a numeric value like a normal comparison function: <0 if left < right,
	// 0 if left == right, >0 if left > right
	Collate(left, right []byte, isPrefix bool) int

	// WeightString returns a weight string for the given `src` string. A weight string
	// is a binary representation of the weights for the given string, that can be
	// compared byte-wise to return identical results to collating this string.
	//
	// This means:
	//		bytes.Compare(WeightString(left), WeightString(right)) == Collate(left, right)
	//
	// The semantics of this API have been carefully designed to match MySQL's behavior
	// in its `strnxfrm` API. Most notably, the `numCodepoints` argument implies different
	// behaviors depending on the collation's padding mode:
	//
	// - For collations that pad WITH SPACE (this is, all legacy collations in MySQL except
	//	for the newly introduced UCA v9.0.0 utf8mb4 collations in MySQL 8.0), `numCodepoints`
	// 	can have the following values:
	//
	//		- if `numCodepoints` is any integer greater than zero, this treats the `src` string
	//		as if it were in a `CHAR(numCodepoints)` column in MySQL, meaning that the resulting
	//		weight string will be padded with the weight for the SPACE character until it becomes
	//		wide enough to fill the `CHAR` column. This is necessary to perform weight comparisons
	//		in fixed-`CHAR` columns. If `numCodepoints` is smaller than the actual amount of
	//		codepoints stored in `src`, the result is unspecified.
	//
	//		- if `numCodepoints` is zero, this is equivalent to `numCodepoints = RuneCount(src)`,
	//		meaning that the resulting weight string will have no padding at the end: it'll only have
	//		the weight values for the exact amount of codepoints contained in `src`. This is the
	//		behavior required to sort `VARCHAR` columns.
	//
	//		- if `numCodepoints` is the special constant PadToMax, then the `dst` slice must be
	//		pre-allocated to a zero-length slice with enough capacity to hold the complete weight
	//		string, and any remaining capacity in `dst` will be filled by the weights for the
	//		padding character, repeatedly. This is a special flag used by MySQL when performing
	//		filesorts, where all the sorting keys must have identical sizes, even for `VARCHAR`
	//		columns.
	//
	//	- For collations that have NO PAD (this is, the newly introduced UCA v9.0.0 utf8mb4 collations
	//	in MySQL 8.0), `numCodepoints` can only have the special constant `PadToMax`, which will make
	//	the weight string padding equivalent to a PAD SPACE collation (as explained in the previous
	//	section). All other values for `numCodepoints` are ignored, because NO PAD collations always
	//	return the weights for the codepoints in their strings, with no further padding at the end.
	//
	// The resulting weight string is written to `dst`, which can be pre-allocated to
	// WeightStringLen() bytes to prevent growing the slice. `dst` can also be nil, in which
	// case it will grow dynamically. If `numCodepoints` has the special PadToMax value explained
	// earlier, `dst` MUST be pre-allocated to the target size or the function will return an
	// empty slice.
	WeightString(dst, src []byte, numCodepoints int) []byte

	// WeightStringLen returns a size (in bytes) that would fit any weight strings for a string
	// with `numCodepoints` using this collation. Note that this is a higher bound for the size
	// of the string, and in practice weight strings can be significantly smaller than the
	// returned value.
	WeightStringLen(numCodepoints int) int

	// Hash returns a 32 or 64 bit identifier (depending on the platform) that uniquely identifies
	// the given string based on this collation. It is functionally equivalent to calling WeightString
	// and then hashing the result.
	//
	// Consequently, if the hashes for two strings are different, then the two strings are considered
	// different according to this collation. If the hashes for two strings are equal, the two strings
	// may or may not be considered equal according to this collation, because hashes can collide unlike
	// weight strings.
	//
	// The numCodepoints argument has the same behavior as in WeightString: if this collation uses PAD SPACE,
	// the hash will interpret the source string as if it were stored in a `CHAR(n)` column. If the value of
	// numCodepoints is 0, this is equivalent to setting `numCodepoints = RuneCount(src)`.
	// For collations with NO PAD, the numCodepoint argument is ignored.
	Hash(hasher *vthash.Hasher, src []byte, numCodepoints int)

	// Wildcard returns a matcher for the given wildcard pattern. The matcher can be used to repeatedly
	// test different strings to check if they match the pattern. The pattern must be a traditional wildcard
	// pattern, which may contain the provided special characters for matching one character or several characters.
	// The provided `escape` character will be used as an escape sequence in front of the other special characters.
	//
	// This method is fully collation aware; the matching will be performed according to the underlying collation.
	// I.e. if this is a case-insensitive collation, matching will be case-insensitive.
	//
	// The returned WildcardPattern is always valid, but if the provided special characters do not exist in this
	// collation's repertoire, the returned pattern will not match any strings. Likewise, if the provided pattern
	// has invalid syntax, the returned pattern will not match any strings.
	//
	// If the provided special characters are 0, the defaults to parse an SQL 'LIKE' statement will be used.
	// This is, '_' for matching one character, '%' for matching many and '\\' for escape.
	//
	// This method can also be used for Shell-like matching with '?', '*' and '\\' as their respective special
	// characters.
	Wildcard(pat []byte, matchOne, matchMany, escape rune) WildcardPattern

	// Charset returns the Charset with which this collation is encoded
	Charset() Charset

	// IsBinary returns whether this collation is a binary collation
	IsBinary() bool
}

Collation implements a MySQL-compatible collation. It defines how to compare for sorting order and equality two strings with the same encoding.

func All

func All(env *collations.Environment) []Collation

All returns a slice with all known collations in Vitess.

func Lookup

func Lookup(id collations.ID) Collation

type Collation_8bit_bin

type Collation_8bit_bin struct {
	// contains filtered or unexported fields
}

func (*Collation_8bit_bin) Charset

func (c *Collation_8bit_bin) Charset() charset.Charset

func (*Collation_8bit_bin) Collate

func (c *Collation_8bit_bin) Collate(left, right []byte, rightIsPrefix bool) int

func (*Collation_8bit_bin) Hash

func (c *Collation_8bit_bin) Hash(hasher *vthash.Hasher, src []byte, numCodepoints int)

func (*Collation_8bit_bin) ID

func (*Collation_8bit_bin) IsBinary

func (c *Collation_8bit_bin) IsBinary() bool

func (*Collation_8bit_bin) Name

func (c *Collation_8bit_bin) Name() string

func (*Collation_8bit_bin) ToLower

func (c *Collation_8bit_bin) ToLower(dst, src []byte) []byte

func (*Collation_8bit_bin) ToUpper

func (c *Collation_8bit_bin) ToUpper(dst, src []byte) []byte

func (*Collation_8bit_bin) WeightString

func (c *Collation_8bit_bin) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_8bit_bin) WeightStringLen

func (c *Collation_8bit_bin) WeightStringLen(numBytes int) int

func (*Collation_8bit_bin) Wildcard

func (c *Collation_8bit_bin) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type Collation_8bit_simple_ci

type Collation_8bit_simple_ci struct {
	// contains filtered or unexported fields
}

func (*Collation_8bit_simple_ci) Charset

func (*Collation_8bit_simple_ci) Collate

func (c *Collation_8bit_simple_ci) Collate(left, right []byte, rightIsPrefix bool) int

func (*Collation_8bit_simple_ci) Hash

func (c *Collation_8bit_simple_ci) Hash(hasher *vthash.Hasher, src []byte, numCodepoints int)

func (*Collation_8bit_simple_ci) ID

func (*Collation_8bit_simple_ci) IsBinary

func (c *Collation_8bit_simple_ci) IsBinary() bool

func (*Collation_8bit_simple_ci) Name

func (c *Collation_8bit_simple_ci) Name() string

func (*Collation_8bit_simple_ci) TinyWeightString added in v0.19.0

func (c *Collation_8bit_simple_ci) TinyWeightString(src []byte) uint32

func (*Collation_8bit_simple_ci) ToLower

func (c *Collation_8bit_simple_ci) ToLower(dst, src []byte) []byte

func (*Collation_8bit_simple_ci) ToUpper

func (c *Collation_8bit_simple_ci) ToUpper(dst, src []byte) []byte

func (*Collation_8bit_simple_ci) WeightString

func (c *Collation_8bit_simple_ci) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_8bit_simple_ci) WeightStringLen

func (c *Collation_8bit_simple_ci) WeightStringLen(numBytes int) int

func (*Collation_8bit_simple_ci) Wildcard

func (c *Collation_8bit_simple_ci) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type Collation_binary

type Collation_binary struct{}

func (*Collation_binary) Charset

func (c *Collation_binary) Charset() charset.Charset

func (*Collation_binary) Collate

func (c *Collation_binary) Collate(left, right []byte, isPrefix bool) int

func (*Collation_binary) Hash

func (c *Collation_binary) Hash(hasher *vthash.Hasher, src []byte, numCodepoints int)

func (*Collation_binary) ID

func (c *Collation_binary) ID() collations.ID

func (*Collation_binary) IsBinary

func (c *Collation_binary) IsBinary() bool

func (*Collation_binary) Name

func (c *Collation_binary) Name() string

func (*Collation_binary) TinyWeightString added in v0.19.0

func (c *Collation_binary) TinyWeightString(src []byte) uint32

func (*Collation_binary) ToLower

func (c *Collation_binary) ToLower(dst, raw []byte) []byte

func (*Collation_binary) ToUpper

func (c *Collation_binary) ToUpper(dst, raw []byte) []byte

func (*Collation_binary) WeightString

func (c *Collation_binary) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_binary) WeightStringLen

func (c *Collation_binary) WeightStringLen(numBytes int) int

func (*Collation_binary) Wildcard

func (c *Collation_binary) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type Collation_multibyte

type Collation_multibyte struct {
	// contains filtered or unexported fields
}

func (*Collation_multibyte) Charset

func (c *Collation_multibyte) Charset() charset.Charset

func (*Collation_multibyte) Collate

func (c *Collation_multibyte) Collate(left, right []byte, isPrefix bool) int

func (*Collation_multibyte) Hash

func (c *Collation_multibyte) Hash(hasher *vthash.Hasher, src []byte, numCodepoints int)

func (*Collation_multibyte) ID

func (*Collation_multibyte) IsBinary

func (c *Collation_multibyte) IsBinary() bool

func (*Collation_multibyte) Name

func (c *Collation_multibyte) Name() string

func (*Collation_multibyte) WeightString

func (c *Collation_multibyte) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_multibyte) WeightStringLen

func (c *Collation_multibyte) WeightStringLen(numCodepoints int) int

func (*Collation_multibyte) Wildcard

func (c *Collation_multibyte) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type Collation_uca_legacy

type Collation_uca_legacy struct {
	// contains filtered or unexported fields
}

func (*Collation_uca_legacy) Charset

func (c *Collation_uca_legacy) Charset() charset.Charset

func (*Collation_uca_legacy) Collate

func (c *Collation_uca_legacy) Collate(left, right []byte, isPrefix bool) int

func (*Collation_uca_legacy) Hash

func (c *Collation_uca_legacy) Hash(hasher *vthash.Hasher, src []byte, numCodepoints int)

func (*Collation_uca_legacy) ID

func (*Collation_uca_legacy) IsBinary

func (c *Collation_uca_legacy) IsBinary() bool

func (*Collation_uca_legacy) Name

func (c *Collation_uca_legacy) Name() string

func (*Collation_uca_legacy) WeightString

func (c *Collation_uca_legacy) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_uca_legacy) WeightStringLen

func (c *Collation_uca_legacy) WeightStringLen(numBytes int) int

func (*Collation_uca_legacy) Wildcard

func (c *Collation_uca_legacy) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type Collation_unicode_bin

type Collation_unicode_bin struct {
	// contains filtered or unexported fields
}

func (*Collation_unicode_bin) Charset

func (c *Collation_unicode_bin) Charset() charset.Charset

func (*Collation_unicode_bin) Collate

func (c *Collation_unicode_bin) Collate(left, right []byte, isPrefix bool) int

func (*Collation_unicode_bin) Hash

func (c *Collation_unicode_bin) Hash(hasher *vthash.Hasher, src []byte, numCodepoints int)

func (*Collation_unicode_bin) ID

func (*Collation_unicode_bin) IsBinary

func (c *Collation_unicode_bin) IsBinary() bool

func (*Collation_unicode_bin) Name

func (c *Collation_unicode_bin) Name() string

func (*Collation_unicode_bin) WeightString

func (c *Collation_unicode_bin) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_unicode_bin) WeightStringLen

func (c *Collation_unicode_bin) WeightStringLen(numBytes int) int

func (*Collation_unicode_bin) Wildcard

func (c *Collation_unicode_bin) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type Collation_unicode_general_ci

type Collation_unicode_general_ci struct {
	// contains filtered or unexported fields
}

func (*Collation_unicode_general_ci) Charset

func (*Collation_unicode_general_ci) Collate

func (c *Collation_unicode_general_ci) Collate(left, right []byte, isPrefix bool) int

func (*Collation_unicode_general_ci) Hash

func (c *Collation_unicode_general_ci) Hash(hasher *vthash.Hasher, src []byte, numCodepoints int)

func (*Collation_unicode_general_ci) ID

func (*Collation_unicode_general_ci) IsBinary

func (c *Collation_unicode_general_ci) IsBinary() bool

func (*Collation_unicode_general_ci) Name

func (*Collation_unicode_general_ci) WeightString

func (c *Collation_unicode_general_ci) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_unicode_general_ci) WeightStringLen

func (c *Collation_unicode_general_ci) WeightStringLen(numBytes int) int

func (*Collation_unicode_general_ci) Wildcard

func (c *Collation_unicode_general_ci) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type Collation_utf8mb4_0900_bin

type Collation_utf8mb4_0900_bin struct{}

func (*Collation_utf8mb4_0900_bin) Charset

func (*Collation_utf8mb4_0900_bin) Collate

func (c *Collation_utf8mb4_0900_bin) Collate(left, right []byte, isPrefix bool) int

func (*Collation_utf8mb4_0900_bin) Hash

func (c *Collation_utf8mb4_0900_bin) Hash(hasher *vthash.Hasher, src []byte, _ int)

func (*Collation_utf8mb4_0900_bin) ID

func (*Collation_utf8mb4_0900_bin) IsBinary

func (c *Collation_utf8mb4_0900_bin) IsBinary() bool

func (*Collation_utf8mb4_0900_bin) Name

func (*Collation_utf8mb4_0900_bin) ToLower

func (c *Collation_utf8mb4_0900_bin) ToLower(dst, src []byte) []byte

func (*Collation_utf8mb4_0900_bin) ToUpper

func (c *Collation_utf8mb4_0900_bin) ToUpper(dst, src []byte) []byte

func (*Collation_utf8mb4_0900_bin) WeightString

func (c *Collation_utf8mb4_0900_bin) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_utf8mb4_0900_bin) WeightStringLen

func (c *Collation_utf8mb4_0900_bin) WeightStringLen(numBytes int) int

func (*Collation_utf8mb4_0900_bin) Wildcard

func (c *Collation_utf8mb4_0900_bin) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type Collation_utf8mb4_uca_0900

type Collation_utf8mb4_uca_0900 struct {
	// contains filtered or unexported fields
}

func (*Collation_utf8mb4_uca_0900) Charset

func (*Collation_utf8mb4_uca_0900) Collate

func (c *Collation_utf8mb4_uca_0900) Collate(left, right []byte, rightIsPrefix bool) int

func (*Collation_utf8mb4_uca_0900) Hash

func (c *Collation_utf8mb4_uca_0900) Hash(hasher *vthash.Hasher, src []byte, _ int)

func (*Collation_utf8mb4_uca_0900) ID

func (*Collation_utf8mb4_uca_0900) IsBinary

func (c *Collation_utf8mb4_uca_0900) IsBinary() bool

func (*Collation_utf8mb4_uca_0900) Name

func (*Collation_utf8mb4_uca_0900) TinyWeightString added in v0.19.0

func (c *Collation_utf8mb4_uca_0900) TinyWeightString(src []byte) uint32

func (*Collation_utf8mb4_uca_0900) ToLower

func (c *Collation_utf8mb4_uca_0900) ToLower(dst, src []byte) []byte

func (*Collation_utf8mb4_uca_0900) ToUpper

func (c *Collation_utf8mb4_uca_0900) ToUpper(dst, src []byte) []byte

func (*Collation_utf8mb4_uca_0900) WeightString

func (c *Collation_utf8mb4_uca_0900) WeightString(dst, src []byte, numCodepoints int) []byte

func (*Collation_utf8mb4_uca_0900) WeightStringLen

func (c *Collation_utf8mb4_uca_0900) WeightStringLen(numBytes int) int

func (*Collation_utf8mb4_uca_0900) Wildcard

func (c *Collation_utf8mb4_uca_0900) Wildcard(pat []byte, matchOne rune, matchMany rune, escape rune) WildcardPattern

type TinyWeightCollation added in v0.19.0

type TinyWeightCollation interface {
	Collation
	// TinyWeightString returns a 32-bit weight string for a source string based on this collation.
	// This is usually the 4-byte prefix of the full weight string, calculated more efficiently.
	TinyWeightString(src []byte) uint32
}

TinyWeightCollation implements the TinyWeightString API for collations.

type UnicaseChar

type UnicaseChar struct {
	ToUpper, ToLower, Sort rune
}

type UnicaseInfo

type UnicaseInfo struct {
	MaxChar   rune
	Page      []*[]UnicaseChar
	LowerSort bool
}

type WildcardPattern

type WildcardPattern interface {
	// Match returns whether the given string matches this pattern
	Match(in []byte) bool
}

WildcardPattern is a matcher for a wildcard pattern, constructed from a given collation

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL