tokenize

package
v7.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2019 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

View Source
const (
	ADJ = 1 << iota
	ADP
	ADV
	AFFIX
	CONJ
	DET
	NOUN
	NUM
	PRON
	PRT
	PUNCT
	UNKN
	VERB
	X
	ANY = ADJ | ADP | ADV | AFFIX | CONJ | DET | NOUN | NUM | PRON | PRT | PUNCT | UNKN | VERB | X
)

Part of speech

Variables

This section is empty.

Functions

This section is empty.

Types

type Lang

type Lang string

Lang defines the language used to examine the text. Both ISO and BCP-47 language codes are accepted

var AutoLang Lang = "auto"

AutoLang tries to automatically recognize the language

type NLP

type NLP struct {
	// contains filtered or unexported fields
}

NLP tokenizes a text using NLP

func NewNLP

func NewNLP(credentialsFile, text string, entities []string, lang Lang) (*NLP, error)

NewNLP returns a new NLP instance

func (*NLP) TokenizeEntities

func (nlp *NLP) TokenizeEntities() ([][]Token, error)

TokenizeEntities returns nested tokenized entities

func (*NLP) TokenizeText

func (nlp *NLP) TokenizeText() ([]Token, error)

TokenizeText tokenizes a text

type PoSDeterm

type PoSDeterm struct {
	// contains filtered or unexported fields
}

PoSDeterm represents the default part of speech determinator

func NewPoSDetermer

func NewPoSDetermer(poS int) *PoSDeterm

NewPoSDetermer returns a new default part of speech determinator

func (*PoSDeterm) Determ

func (dps *PoSDeterm) Determ(tokenizer Tokenizer) ([]Token, error)

Determ deterimantes if a part of speech tag should be deleted

type PoSDetermer

type PoSDetermer interface {
	Determ(Tokenizer) ([]Token, error)
}

PoSDetermer determinates if part of speech tags should be deleted

type Token

type Token struct {
	PoS   int    // Part of speech
	Token string // Text
}

Token represents a tokenized text unit

type Tokenizer

type Tokenizer interface {
	TokenizeText() ([]Token, error)
	TokenizeEntities() ([][]Token, error)
}

Tokenizer tokenizes a text and entities

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL