model

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 29, 2018 License: BSD-3-Clause Imports: 9 Imported by: 0

Documentation

Overview

Package model provides the tagger's data model.

Index

Constants

View Source
const EndToken = "<END>"

EndToken is the end-of-sentence marker.

View Source
const StartToken = "<START>"

StartToken is the start-of-sentence marker.

Variables

This section is empty.

Functions

This section is empty.

Types

type Bigram

type Bigram struct {
	T1 Tag
	T2 Tag
}

Bigram stores a tag bigram.

type ClosedClassSet

type ClosedClassSet map[string]interface{}

type FrequencyCollector

type FrequencyCollector struct {
	// contains filtered or unexported fields
}

A FrequencyCollector collects frequencies from the training corpus that are relevant to a trigram HMM tagger.

func NewFrequencyCollector

func NewFrequencyCollector() FrequencyCollector

NewFrequencyCollector constructs a FrequencyCollector instance.

func (FrequencyCollector) Model

func (c FrequencyCollector) Model() Model

Model returns the collected frequencies as a model.

func (FrequencyCollector) ModelWithClosedClass

func (c FrequencyCollector) ModelWithClosedClass(closedClassTags ClosedClassSet) Model

ModelWithClosedClass returns the collected frequencies as a model, the closed class set can be used by e.g. word handlers.

func (FrequencyCollector) Process

func (c FrequencyCollector) Process(sentence []conllx.Token) error

Process a sentence.

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model stores a model of the training data.

func (Model) BigramFreqs

func (m Model) BigramFreqs() map[Bigram]int

BigramFreqs returns the tag bigram frequencies in the training data.

func (Model) ClosedClassTags

func (m Model) ClosedClassTags() ClosedClassSet

func (*Model) GobDecode

func (m *Model) GobDecode(data []byte) error

GobDecode decodes a Model from a gob.

func (Model) GobEncode

func (m Model) GobEncode() ([]byte, error)

GobEncode encodes a Model as a gob.

func (Model) String

func (m Model) String() string

String returns a summary of the model as a string.

func (Model) TagNumberer

func (m Model) TagNumberer() *StringNumberer

TagNumberer returns the tag <-> number bijection.

func (Model) TrigramFreqs

func (m Model) TrigramFreqs() map[Trigram]int

TrigramFreqs returns the tag trigram frequencies in the training data.

func (Model) UnigramFreqs

func (m Model) UnigramFreqs() map[Unigram]int

UnigramFreqs returns the tag unigram frequencies in the training data.

func (Model) WordTagFreqs

func (m Model) WordTagFreqs() map[string]map[Tag]int

WordTagFreqs returns the word-tag frequencies in the training data.

type StringNumberer

type StringNumberer struct {
	// contains filtered or unexported fields
}

A StringNumberer creates a bijection between (string-based) labels and numbers.

func NewStringStringNumberer

func NewStringStringNumberer() *StringNumberer

NewStringStringNumberer creates a new StringNumberer that is empty (it has no mappings yet).

func (*StringNumberer) GobDecode

func (l *StringNumberer) GobDecode(data []byte) error

GobDecode decodes a Model from a gob.

func (*StringNumberer) GobEncode

func (l *StringNumberer) GobEncode() ([]byte, error)

GobEncode encodes a StringNumberer as a gob.

func (*StringNumberer) Label

func (l *StringNumberer) Label(number uint) string

Label returns the label (string) for a number.

func (*StringNumberer) Number

func (l *StringNumberer) Number(label string) uint

Number returns the (unique) number for for a label (string).

func (*StringNumberer) Read

func (l *StringNumberer) Read(reader io.Reader) error

Read a label <-> number bijection from a Reader.

func (*StringNumberer) Size

func (l *StringNumberer) Size() int

Size returns the number of labels known in the bijection.

func (*StringNumberer) WriteStringStringNumberer

func (l *StringNumberer) WriteStringStringNumberer(writer io.Writer) error

WriteStringStringNumberer writes the bijection in a StringNumberer to a file.

type Tag

type Tag struct {
	Tag     uint
	Capital bool
}

Tag represents a part of speech tag. The Capital field is used to mark whether the corresponding word started with a capital letter.

type Trigram

type Trigram struct {
	T1 Tag
	T2 Tag
	T3 Tag
}

Trigram stores a tag trigram.

type Unigram

type Unigram struct {
	T1 Tag
}

Unigram stores a tag unigram.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL