Documentation ¶
Overview ¶
Package filter prepares the inputs and outputs.
Index ¶
- func Drop(tokens *[]tokenizer.Token, match func(t tokenizer.Token) bool)
- func Keep(tokens *[]tokenizer.Token, match func(t tokenizer.Token) bool)
- func ScanSentences(data []byte, atEOF bool) (advance int, token []byte, err error)
- type Feature
- type Features
- type FeaturesFilter
- type POS
- type POSFilter
- type SentenceSplitter
- type WordFilter
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ScanSentences ¶
ScanSentences implements SplitFunc interface of bufio.Scanner that returns each sentence of text. see. https://pkg.go.dev/bufio#SplitFunc
Types ¶
type Feature ¶
type Feature = string
Feature represents a feature.
const Any Feature = "\x00"
Any represents an arbitrary feature.
type FeaturesFilter ¶
type FeaturesFilter struct {
// contains filtered or unexported fields
}
FeaturesFilter represents a filter that filters a vector of features.
func NewFeaturesFilter ¶
func NewFeaturesFilter(fs ...Features) *FeaturesFilter
NewFeaturesFilter returns a features filter.
func (FeaturesFilter) Match ¶
func (f FeaturesFilter) Match(fs Features) bool
Match returns true if a filter matches given features.
func (FeaturesFilter) String ¶
func (f FeaturesFilter) String() string
String implements string interface.
type POSFilter ¶
type POSFilter struct {
// contains filtered or unexported fields
}
POSFilter represents a part-of-speech filter.
func NewPOSFilter ¶
NewPOSFilter returns a part-of-speech filter.
type SentenceSplitter ¶
type SentenceSplitter struct { Delim []rune // delimiter set. ex. {'。','.'} Follower []rune // allow following after delimiters. ex. {'」','』'} SkipWhiteSpace bool // eliminate white space or not DoubleLineFeedSplit bool // splite at '\n\n' or not MaxRuneLen int // max sentence length }
SentenceSplitter is a tiny sentence splitter for japanese texts.
func (SentenceSplitter) ScanSentences ¶
func (s SentenceSplitter) ScanSentences(data []byte, atEOF bool) (advance int, token []byte, err error)
ScanSentences is a split function for a Scanner that returns each sentence of text. nolint: gocyclo
type WordFilter ¶
type WordFilter struct {
// contains filtered or unexported fields
}
WordFilter represents a word filter.
func NewWordFilter ¶
func NewWordFilter(words []string) *WordFilter
NewWordFilter returns a word filter.
func (WordFilter) Drop ¶
func (f WordFilter) Drop(tokens *[]tokenizer.Token)
Drop drops a token if a filter matches token's surface.
func (WordFilter) Keep ¶
func (f WordFilter) Keep(tokens *[]tokenizer.Token)
Keep keeps a token if a filter matches token's surface.
func (WordFilter) Match ¶
func (f WordFilter) Match(w string) bool
Match returns true if a filter matches a given word.