ja

package
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 22, 2024 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Index

Constants

View Source
const (
	Name    = "ja_kagome"
	DictIPA = "ipa"
	DictUni = "uni"
)
View Source
const NormalizeCharFilterName = "ja_normalize_unicode"
View Source
const StopTagsName = "stop_tags_ja"
View Source
const StopWordsName = "stop_words_ja"

StopWordsName is the name of the stop words filter.

Variables

View Source
var DefaultInflected = analysis.TokenMap{
	"動詞":   true,
	"形容詞":  true,
	"形容動詞": true,
}

DefaultInflected represents POSs which has inflected form.

View Source
var StopTagsBytes []byte

StopTagsBytes is a stop tag list. see. https://github.com/apache/lucene-solr/blob/master/lucene/analysis/kuromoji/src/resources/org/apache/lucene/analysis/ja/stoptags.txt

View Source
var StopWordsBytes []byte

StopWordsBytes is a stop word list. see. https://github.com/apache/lucene-solr/blob/master/lucene/analysis/kuromoji/src/resources/org/apache/lucene/analysis/ja/stopwords.txt

Functions

func NewJapaneseTokenizer

func NewJapaneseTokenizer(dict *dict.Dict, opts ...TokenizerOption) analysis.Tokenizer

NewJapaneseTokenizer returns a Japanese tokenizer.

func NewUnicodeNormalizeCharFilter

func NewUnicodeNormalizeCharFilter(form norm.Form) analysis.CharFilter

NewUnicodeNormalizeCharFilter returns a normalize char filter.

func StopTagsTokenMapConstructor

func StopTagsTokenMapConstructor(_ map[string]any, _ *registry.Cache) (analysis.TokenMap, error)

StopTagsTokenMapConstructor returns a token map for stop tags (for IPA dict).

func StopWordsTokenFilterConstructor

func StopWordsTokenFilterConstructor(_ map[string]any, cache *registry.Cache) (analysis.TokenFilter, error)

StopWordsTokenFilterConstructor returns a token filter for stop words.

func StopWordsTokenMapConstructor

func StopWordsTokenMapConstructor(_ map[string]any, _ *registry.Cache) (analysis.TokenMap, error)

StopWordsTokenMapConstructor returns a token map for stop words.

func TokenizerConstructor

func TokenizerConstructor(config map[string]any, cache *registry.Cache) (analysis.Tokenizer, error)

func UnicodeNormalizeCharFilterConstructor

func UnicodeNormalizeCharFilterConstructor(config map[string]any, _ *registry.Cache) (analysis.CharFilter, error)

Types

type JapaneseTokenizer

type JapaneseTokenizer struct {
	*tokenizer.Tokenizer
	// contains filtered or unexported fields
}

JapaneseTokenizer represents a Japanese tokenizer with filters.

func (*JapaneseTokenizer) Tokenize

func (t *JapaneseTokenizer) Tokenize(input []byte) analysis.TokenStream

Tokenize tokenizes the input and filters them.

type TokenizerOption

type TokenizerOption func(t *JapaneseTokenizer)

TokenizerOption represents an option of the japanese tokenizer.

func BaseFormFilter

func BaseFormFilter(m analysis.TokenMap) TokenizerOption

BaseFormFilter returns an base form filter option.

func StopTagsFilter

func StopTagsFilter(m analysis.TokenMap) TokenizerOption

StopTagsFilter returns a stop tags filter option.

type UnicodeNormalizeCharFilter

type UnicodeNormalizeCharFilter struct {
	// contains filtered or unexported fields
}

UnicodeNormalizeCharFilter represents unicode char filter.

func (UnicodeNormalizeCharFilter) Filter

func (f UnicodeNormalizeCharFilter) Filter(input []byte) []byte

Filter applies per-char normalization.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL