Documentation ¶
Index ¶
- Constants
- func PageOffset(cp rune) (int, int)
- func UnicodeDecomposeHangulSyllable(syl rune) []rune
- func UnicodeImplicitWeights900(weights []uint16, codepoint rune)
- func UnicodeImplicitWeightsLegacy(weights []uint16, codepoint rune)
- type Collation
- type Collation900
- func (c *Collation900) Charset() charset.Charset
- func (c *Collation900) Contractor() Contractor
- func (c *Collation900) Iterator(input []byte) WeightIterator
- func (c *Collation900) MaxLevel() int
- func (c *Collation900) WeightForSpace() uint16
- func (c *Collation900) Weights() (Weights, Layout)
- func (c *Collation900) WeightsEqual(left, right rune) bool
- type CollationLegacy
- func (c *CollationLegacy) Charset() charset.Charset
- func (c *CollationLegacy) Contractor() Contractor
- func (c *CollationLegacy) Iterator(input []byte) WeightIteratorLegacy
- func (c *CollationLegacy) WeightForSpace() uint16
- func (c *CollationLegacy) Weights() (Weights, Layout)
- func (c *CollationLegacy) WeightsEqual(left, right rune) bool
- type Contraction
- type Contractor
- type FastIterator900
- type Layout
- type Layout_uca900
- type Layout_uca_legacy
- type Patch
- type Reorder
- type WeightIterator
- type WeightIteratorLegacy
- type Weights
Constants ¶
const CodepointsPerPage = 256
const MaxCodepoint = 0x10FFFF + 1
const MaxCollationElementsPerCodepoint = 8
Variables ¶
This section is empty.
Functions ¶
func PageOffset ¶
func UnicodeDecomposeHangulSyllable ¶
UnicodeDecomposeHangulSyllable breaks down a Korean Hangul rune into its 2 or 3 composited codepoints. This is a straight port of the algorithm in http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf
func UnicodeImplicitWeights900 ¶
UnicodeImplicitWeights900 generates the implicit weights for this codepoint. This is a straight port of the algorithm in https://www.unicode.org/reports/tr10/tr10-34.html#Implicit_Weights It only applies to the UCA Standard v9.0.0
func UnicodeImplicitWeightsLegacy ¶
UnicodeImplicitWeightsLegacy generates the implicit weights for this codepoint. This is a straight port of the algorithm in https://www.unicode.org/reports/tr10/tr10-20.html#Implicit_Weights It only applies to the UCA Standard v4.0.0 and v5.2.0
Types ¶
type Collation900 ¶
type Collation900 struct {
// contains filtered or unexported fields
}
func NewCollation ¶
func NewCollation(name string, weights Weights, weightPatches []Patch, reorder []Reorder, contract Contractor, upperCaseFirst bool, levels int) *Collation900
func (*Collation900) Charset ¶
func (c *Collation900) Charset() charset.Charset
func (*Collation900) Contractor ¶ added in v0.17.0
func (c *Collation900) Contractor() Contractor
func (*Collation900) Iterator ¶
func (c *Collation900) Iterator(input []byte) WeightIterator
func (*Collation900) MaxLevel ¶ added in v0.17.0
func (c *Collation900) MaxLevel() int
func (*Collation900) WeightForSpace ¶
func (c *Collation900) WeightForSpace() uint16
func (*Collation900) Weights ¶
func (c *Collation900) Weights() (Weights, Layout)
func (*Collation900) WeightsEqual ¶
func (c *Collation900) WeightsEqual(left, right rune) bool
type CollationLegacy ¶
type CollationLegacy struct {
// contains filtered or unexported fields
}
func NewCollationLegacy ¶
func NewCollationLegacy(cs charset.Charset, weights Weights, weightPatches []Patch, contract Contractor, maxCodepoint rune) *CollationLegacy
func (*CollationLegacy) Charset ¶
func (c *CollationLegacy) Charset() charset.Charset
func (*CollationLegacy) Contractor ¶ added in v0.17.0
func (c *CollationLegacy) Contractor() Contractor
func (*CollationLegacy) Iterator ¶
func (c *CollationLegacy) Iterator(input []byte) WeightIteratorLegacy
func (*CollationLegacy) WeightForSpace ¶
func (c *CollationLegacy) WeightForSpace() uint16
func (*CollationLegacy) Weights ¶
func (c *CollationLegacy) Weights() (Weights, Layout)
func (*CollationLegacy) WeightsEqual ¶
func (c *CollationLegacy) WeightsEqual(left, right rune) bool
type Contraction ¶
type Contractor ¶
type Contractor interface { Find(cs charset.Charset, cp rune, remainder []byte) ([]uint16, []byte, int) FindContextual(cp1, cp0 rune) []uint16 }
func NewTrieContractor ¶
func NewTrieContractor(all []Contraction) Contractor
type FastIterator900 ¶
type FastIterator900 struct {
// contains filtered or unexported fields
}
func (*FastIterator900) Done ¶
func (it *FastIterator900) Done()
func (*FastIterator900) FastForward32 ¶ added in v0.14.0
func (it *FastIterator900) FastForward32(it2 *FastIterator900) int
FastForward32 fast-forwards this iterator and the given it2 in parallel until there is a mismatch in their weights, and returns their difference. This function is similar to NextWeightBlock64 in that it only succeeds if the iterators are composed of (mostly) ASCII characters. See the docs for NextWeightBlock64 for documentation on how these fast comparisons work.
func (*FastIterator900) Next ¶
func (it *FastIterator900) Next() (uint16, bool)
func (*FastIterator900) NextWeightBlock64 ¶ added in v0.14.0
func (it *FastIterator900) NextWeightBlock64(dstbytes []byte) int
NextWeightBlock64 takes a byte slice of 16 bytes and fills it with the next chunk of weights from this iterator. If the input slice is smaller than 16 bytes, the function will panic.
The function returns the weights in Big Endian ordering: this is the same ordering that MySQL uses when generating weight strings, so the return of this function can be inserted directly into a weight string and the result will be compatible with MySQL. Likewise, the resulting slice can be compared byte-wise (bytes.Compare) to obtain a proper collation ordering against another string.
Returns the number of bytes written to `dstbytes`. If 0, this iterator has been fully consumed.
Implementation notes: This is a fast-path algorithm that can only work for UCA900 collations that do not have reorderings, contractions or any weight patches. The idea is detecting runs of 8 ASCII characters in a row, which are very frequent in most UTF8 code, particularly in English, and generating the weights for these 8 characters directly from an optimized table, instead of going through the whole Unicode Collation Algorithm. This is feasible because in UCA900, all characters in the ASCII range have either 0 or 1 weight triplets, so their weight can be calculated with a single lookup in a 128-entry table for each level (0, 1, 2).
func (*FastIterator900) SkipLevel ¶
func (it *FastIterator900) SkipLevel() int
type Layout_uca900 ¶
type Layout_uca900 struct{}
func (Layout_uca900) DebugWeights ¶
func (Layout_uca900) DebugWeights(table Weights, codepoint rune) (result []uint16)
func (Layout_uca900) MaxCodepoint ¶
func (Layout_uca900) MaxCodepoint() rune
type Layout_uca_legacy ¶
type Layout_uca_legacy struct {
Max rune
}
func (Layout_uca_legacy) DebugWeights ¶
func (l Layout_uca_legacy) DebugWeights(table Weights, codepoint rune) (result []uint16)
func (Layout_uca_legacy) MaxCodepoint ¶
func (l Layout_uca_legacy) MaxCodepoint() rune
type WeightIterator ¶
type WeightIteratorLegacy ¶
type WeightIteratorLegacy struct { // Constant *CollationLegacy // contains filtered or unexported fields }
func (*WeightIteratorLegacy) DebugCodepoint ¶
func (it *WeightIteratorLegacy) DebugCodepoint() (rune, int)
func (*WeightIteratorLegacy) Length ¶
func (it *WeightIteratorLegacy) Length() int
func (*WeightIteratorLegacy) Next ¶
func (it *WeightIteratorLegacy) Next() (uint16, bool)