tokenizer

package module
v0.0.0-...-e165605 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 15, 2022 License: GPL-3.0 Imports: 13 Imported by: 0

README

jieba-go

A copy-cat implementation of jieba as a learning exercise. Only the tokenizer is implemented.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

func NewJiebaTokenizer

func NewJiebaTokenizer() *Tokenizer

func NewTokenizer

func NewTokenizer(dictionaryFile string) *Tokenizer

func (*Tokenizer) AddWord

func (tk *Tokenizer) AddWord(word string, freq int)

Add a word to the prefix dictionary. If word already exists, the word's frequency value will be updated. If freq is less than 1, a frequency will be automatically calculated.

func (*Tokenizer) Cut

func (tk *Tokenizer) Cut(text string, useHmm bool) []string

Cut text and return a slice of tokens.

func (*Tokenizer) CutParallel

func (tk *Tokenizer) CutParallel(text string, hmm bool, numWorkers int, ordered bool) []string

Perform Cut in worker goroutines in parallel. If ordered is true, the returned slice will be sorted according to the order of the input text. Sorting will adversely impact performance by approximately 30%.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL