uca

package
v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 28, 2022 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Index

Constants

View Source
const CodepointsPerPage = 256
View Source
const MaxCodepoint = 0x10FFFF + 1
View Source
const MaxCollationElementsPerCodepoint = 8

Variables

This section is empty.

Functions

func PageOffset

func PageOffset(cp rune) (int, int)

func UnicodeDecomposeHangulSyllable

func UnicodeDecomposeHangulSyllable(syl rune) []rune

UnicodeDecomposeHangulSyllable breaks down a Korean Hangul rune into its 2 or 3 composited codepoints. This is a straight port of the algorithm in http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf

func UnicodeImplicitWeights900

func UnicodeImplicitWeights900(weights []uint16, codepoint rune)

UnicodeImplicitWeights900 generates the implicit weights for this codepoint. This is a straight port of the algorithm in https://www.unicode.org/reports/tr10/tr10-34.html#Implicit_Weights It only applies to the UCA Standard v9.0.0

func UnicodeImplicitWeightsLegacy

func UnicodeImplicitWeightsLegacy(weights []uint16, codepoint rune)

UnicodeImplicitWeightsLegacy generates the implicit weights for this codepoint. This is a straight port of the algorithm in https://www.unicode.org/reports/tr10/tr10-20.html#Implicit_Weights It only applies to the UCA Standard v4.0.0 and v5.2.0

Types

type Collation

type Collation interface {
	Charset() charset.Charset
	Weights() (Weights, Layout)
	WeightForSpace() uint16
	WeightsEqual(left, right rune) bool
}

type Collation900

type Collation900 struct {
	// contains filtered or unexported fields
}

func NewCollation

func NewCollation(name string, weights Weights, weightPatches []Patch, reorder []Reorder, contract Contractor, upperCaseFirst bool, levels int) *Collation900

func (*Collation900) Charset

func (c *Collation900) Charset() charset.Charset

func (*Collation900) Iterator

func (c *Collation900) Iterator(input []byte) WeightIterator

func (*Collation900) WeightForSpace

func (c *Collation900) WeightForSpace() uint16

func (*Collation900) Weights

func (c *Collation900) Weights() (Weights, Layout)

func (*Collation900) WeightsEqual

func (c *Collation900) WeightsEqual(left, right rune) bool

type CollationLegacy

type CollationLegacy struct {
	// contains filtered or unexported fields
}

func NewCollationLegacy

func NewCollationLegacy(cs charset.Charset, weights Weights, weightPatches []Patch, contract Contractor, maxCodepoint rune) *CollationLegacy

func (*CollationLegacy) Charset

func (c *CollationLegacy) Charset() charset.Charset

func (*CollationLegacy) Iterator

func (c *CollationLegacy) Iterator(input []byte) *WeightIteratorLegacy

func (*CollationLegacy) WeightForSpace

func (c *CollationLegacy) WeightForSpace() uint16

func (*CollationLegacy) Weights

func (c *CollationLegacy) Weights() (Weights, Layout)

func (*CollationLegacy) WeightsEqual

func (c *CollationLegacy) WeightsEqual(left, right rune) bool

type Contraction

type Contraction struct {
	Path       []rune
	Weights    []uint16
	Contextual bool
}

type Contractor

type Contractor interface {
	Find(cs charset.Charset, cp rune, remainder []byte) ([]uint16, []byte, int)
	FindContextual(cp1, cp0 rune) []uint16
}

func NewTrieContractor

func NewTrieContractor(all []Contraction) Contractor

type FastIterator900

type FastIterator900 struct {
	// contains filtered or unexported fields
}

func (*FastIterator900) Done

func (it *FastIterator900) Done()

func (*FastIterator900) FastForward32 added in v0.14.0

func (it *FastIterator900) FastForward32(it2 *FastIterator900) int

FastForward32 fast-forwards this iterator and the given it2 in parallel until there is a mismatch in their weights, and returns their difference. This function is similar to NextWeightBlock64 in that it only succeeds if the iterators are composed of (mostly) ASCII characters. See the docs for NextWeightBlock64 for documentation on how these fast comparisons work.

func (*FastIterator900) Level

func (it *FastIterator900) Level() int

func (*FastIterator900) Next

func (it *FastIterator900) Next() (uint16, bool)

func (*FastIterator900) NextWeightBlock64 added in v0.14.0

func (it *FastIterator900) NextWeightBlock64(dstbytes []byte) int

NextWeightBlock64 takes a byte slice of 16 bytes and fills it with the next chunk of weights from this iterator. If the input slice is smaller than 16 bytes, the function will panic.

The function returns the weights in Big Endian ordering: this is the same ordering that MySQL uses when generating weight strings, so the return of this function can be inserted directly into a weight string and the result will be compatible with MySQL. Likewise, the resulting slice can be compared byte-wise (bytes.Compare) to obtain a proper collation ordering against another string.

Returns the number of bytes written to `dstbytes`. If 0, this iterator has been fully consumed.

Implementation notes: This is a fast-path algorithm that can only work for UCA900 collations that do not have reorderings, contractions or any weight patches. The idea is detecting runs of 8 ASCII characters in a row, which are very frequent in most UTF8 code, particularly in English, and generating the weights for these 8 characters directly from an optimized table, instead of going through the whole Unicode Collation Algorithm. This is feasible because in UCA900, all characters in the ASCII range have either 0 or 1 weight triplets, so their weight can be calculated with a single lookup in a 128-entry table for each level (0, 1, 2).

func (*FastIterator900) SkipLevel

func (it *FastIterator900) SkipLevel() int

type Layout

type Layout interface {
	MaxCodepoint() rune
	DebugWeights(table Weights, codepoint rune) []uint16
	// contains filtered or unexported methods
}

type Layout_uca900

type Layout_uca900 struct{}

func (Layout_uca900) DebugWeights

func (Layout_uca900) DebugWeights(table Weights, codepoint rune) (result []uint16)

func (Layout_uca900) MaxCodepoint

func (Layout_uca900) MaxCodepoint() rune

type Layout_uca_legacy

type Layout_uca_legacy struct {
	Max rune
}

func (Layout_uca_legacy) DebugWeights

func (l Layout_uca_legacy) DebugWeights(table Weights, codepoint rune) (result []uint16)

func (Layout_uca_legacy) MaxCodepoint

func (l Layout_uca_legacy) MaxCodepoint() rune

type Patch

type Patch struct {
	Codepoint rune
	Patch     []uint16
}

type Reorder

type Reorder struct {
	FromMin, FromMax uint16
	ToMin, ToMax     uint16
}

type WeightIterator

type WeightIterator interface {
	Next() (uint16, bool)
	Level() int
	SkipLevel() int
	Done()
	// contains filtered or unexported methods
}

type WeightIteratorLegacy

type WeightIteratorLegacy struct {
	// Constant
	CollationLegacy
	// contains filtered or unexported fields
}

func (*WeightIteratorLegacy) DebugCodepoint

func (it *WeightIteratorLegacy) DebugCodepoint() (rune, int)

func (*WeightIteratorLegacy) Done

func (it *WeightIteratorLegacy) Done()

func (*WeightIteratorLegacy) Length

func (it *WeightIteratorLegacy) Length() int

func (*WeightIteratorLegacy) Next

func (it *WeightIteratorLegacy) Next() (uint16, bool)

type Weights

type Weights []*[]uint16

func ApplyTailoring

func ApplyTailoring(layout Layout, base Weights, patches []Patch) Weights

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL