Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type PreToken ¶
type PreToken struct { // The pre-tokenized substring String string // Start rune position on the original string, inclusive Start int // End rune position on the original string, exclusive End int }
PreToken represents a pre-tokenized substring, along with its offsets position on the original string.
type PreTokenizer ¶
type PreTokenizer interface {
PreTokenize(pts *pretokenizedstring.PreTokenizedString) error
}
PreTokenizer is implemented by any value that has a PreTokenize method, which takes care of performing a pre-segmentation step.
Pre-tokenization splits the given string into multiple substrings, keeping track of the offsets between the original string and the substrings. In some occasions, the NormalizedString might be modified.
Click to show internal directories.
Click to hide internal directories.