corpus

package
v0.0.0-...-157c9c8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2024 License: GPL-3.0 Imports: 3 Imported by: 0

Documentation

Index

Constants

View Source
const (
	MaxUint32    = ^uint32(0)
	MaxVariantID = VariantID(MaxUint32)
	MaxRootID    = RootID(MaxUint32)
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Corpus

type Corpus struct {
	Splitter    func(string) (string, []string)
	RootVariant func(str string) (string, document.Variant)
	Root        func(str string) string
	// contains filtered or unexported fields
}

Corpus holds a collection of documents for text indexing. It fulfills document.Encoder and document.Decoder.

func New

func New() *Corpus

New creates a Corpus.

func (*Corpus) AddDoc

func (c *Corpus) AddDoc(str string) *Document

AddDoc to Corpus, returns the string encoded as a Document with a DocID.

func (*Corpus) Containing

func (c *Corpus) Containing(gram string) prefix.Nodes

Containing returns a prefix.Nodes for all nodes containing the given gram.

func (*Corpus) Find

func (c *Corpus) Find(word string) *lset.Set[DocID]

Find all documents containing a word

func (*Corpus) GetDoc

func (c *Corpus) GetDoc(id DocID) *Document

GetDoc returns a Document by DocID.

func (*Corpus) GetDocs

func (c *Corpus) GetDocs(ids []DocID) Documents

GetDocs returns a set of documents by DocID

func (*Corpus) IDToVariant

func (c *Corpus) IDToVariant(vID VariantID) document.Variant

IDToVariant converts a VariantID to a document.Variant, fulfilling document.Decoder.

func (*Corpus) IDToWord

func (c *Corpus) IDToWord(rID RootID) string

IDToWord converts a RootID to a root word, fulfilling document.Decoder.

func (*Corpus) Prefix

func (c *Corpus) Prefix(gram string) prefix.Node

Prefix returns the prefix.Node for all words in the corpus.

func (*Corpus) VariantToID

func (c *Corpus) VariantToID(v document.Variant) VariantID

VariantToID converts a root word to a VariantID, fulfilling document.Encoder.

func (*Corpus) WordToID

func (c *Corpus) WordToID(rStr string) RootID

WordToID converts a root word to a RootID, fulfilling document.Encoder.

type DocID

type DocID uint32

DocID allows references to a document to be passed around

func (DocID) ID

func (id DocID) ID() DocID

ID fullfils DocIDer

type DocIDer

type DocIDer interface {
	ID() DocID
}

DocIDer allows anything that can reference a DocID to be used to retreive a document.

type Document

type Document struct {
	DocID
	*document.Document[RootID, VariantID]
	// contains filtered or unexported fields
}

Document uses a Corpus to fulfill Encoder and Decoder for document.Document.

func (*Document) String

func (d *Document) String() string

String decodes a document

type Documents

type Documents []*Document

Documents is a collection of documents

func (Documents) Strings

func (ds Documents) Strings() []string

Strings converts all the documents in the collection to stirngs.

type RootID

type RootID uint32

RootID represents a root word - a lower case alphanumeric string

type VariantID

type VariantID uint32

VariantID represents a Variant by ID.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL