Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewSentenceTokenizer ¶
func NewSentenceTokenizer(s *sentences.Storage) (*sentences.DefaultSentenceTokenizer, error)
English customized sentence tokenizer.
Types ¶
type MultiPunctWordAnnotation ¶
type MultiPunctWordAnnotation struct { *sentences.Storage sentences.TokenParser sentences.TokenGrouper sentences.Ortho }
Attempts to tease out custom Abbreviations, e.g. F.B.I.
type WordTokenizer ¶
type WordTokenizer struct {
sentences.DefaultWordTokenizer
}
func NewWordTokenizer ¶
func NewWordTokenizer(p sentences.PunctStrings) *WordTokenizer
func (*WordTokenizer) HasSentEndChars ¶
func (e *WordTokenizer) HasSentEndChars(t *sentences.Token) bool
Find any punctuation excluding the period final
func (*WordTokenizer) HasUnreliableEndChars ¶
func (e *WordTokenizer) HasUnreliableEndChars(t *sentences.Token) bool
Find any punctuation that might mean the end of a sentence but doesn't have to
Click to show internal directories.
Click to hide internal directories.