pretokenizers

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 12, 2020 License: BSD-2-Clause Imports: 1 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type PreToken

type PreToken struct {
	// The pre-tokenized substring
	String string
	// Start rune position on the original string, inclusive
	Start int
	// End rune position on the original string, exclusive
	End int
}

PreToken represents a pre-tokenized substring, along with its offsets position on the original string.

type PreTokenizer

type PreTokenizer interface {
	PreTokenize(pts *pretokenizedstring.PreTokenizedString) error
}

PreTokenizer is implemented by any value that has a PreTokenize method, which takes care of performing a pre-segmentation step.

Pre-tokenization splits the given string into multiple substrings, keeping track of the offsets between the original string and the substrings. In some occasions, the NormalizedString might be modified.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL