sear

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2025 License: Apache-2.0 Imports: 10 Imported by: 1

README

sear

PkgGoDev Tests Lint Coverage Status License

Sear is a Bleve index implementation designed for efficiently executing searches against a single document (or a sequence of documents one at a time).

Why is this useful? Sometimes, a use case arises where it is useful to be able to answer the question, "would this document have matched this search?" And frequently we may want to ask this same question for several documents. This index implementation is designed to supported this use case.

Details

  • This index implementation is NOT thread-safe. It is expected that a single thread will invoke all methods, from NewMatcher() to Close().
  • This index will ONLY ever contain 0 or 1 documents. Subsequent calls to Update() overwrite the previous document, regardless of using unique identifiers.
  • The Batch() method is unsupported, and always returns an error.
  • The Reader returned is NOT isolated, and will always see the currently indexed document.
  • Currently, the Document() method on a Reader is not supported (this could be added in the future)

Approach

  • Since the index only ever contains a single document, data sizes are small.
  • Therefore, avoid heavy document analysis and complex data structures.
  • After regular document analysis is complete, use this structure in place.
  • Do not build more complicated structures like vellums or roaring bitmaps.
  • If additional structure is needed, prefer arrays which have good cache locality, and can be reused.
  • Avoid copying data, prefer sub-slicing, and brute-force processing over arrays.
  • Cache reusable parts of the query, as we expect the same query to be run over multiple documents.

License

Apache License Version 2.0

Documentation

Index

Constants

View Source
const Name = "sear"

Variables

This section is empty.

Functions

func New

func New(storeName string,
	config map[string]interface{},
	analysisQueue *index.AnalysisQueue) (index.Index, error)

New creates a new instance of a Sear index. This method signature is compatible with the Bleve registry RegisterIndexType() method.

For example, in your application init() registry.RegisterIndexType(search.Name, sear.New)

Types

type DocIDReader

type DocIDReader struct {
	// contains filtered or unexported fields
}

func NewDocIDReader

func NewDocIDReader() *DocIDReader

func NewDocIDReaderEmpty

func NewDocIDReaderEmpty() *DocIDReader

func (*DocIDReader) Advance

func (*DocIDReader) Close

func (d *DocIDReader) Close() error

func (*DocIDReader) Next

func (d *DocIDReader) Next() (index.IndexInternalID, error)

func (*DocIDReader) Size

func (d *DocIDReader) Size() int

type DocValueReader

type DocValueReader struct {
	// contains filtered or unexported fields
}

func (*DocValueReader) BytesRead added in v0.1.0

func (d *DocValueReader) BytesRead() uint64

func (*DocValueReader) VisitDocValues

func (d *DocValueReader) VisitDocValues(id index.IndexInternalID, visitor index.DocValueVisitor) error

type Document

type Document struct {
	// contains filtered or unexported fields
}

func NewDocument

func NewDocument() *Document

func (*Document) Fields

func (d *Document) Fields() []string

func (*Document) Reset

func (d *Document) Reset(doc index.Document)

func (*Document) SortedTermsForField

func (d *Document) SortedTermsForField(fieldName string) ([]string, error)

func (*Document) TokenFreqsAndLen

func (d *Document) TokenFreqsAndLen(fieldName string) (index.TokenFrequencies, int, error)

func (*Document) VectorDims added in v0.2.0

func (d *Document) VectorDims(fieldName string) (dims int, err error)

type FieldDict

type FieldDict struct {
	// contains filtered or unexported fields
}

func NewFieldDictEmpty

func NewFieldDictEmpty() *FieldDict

func NewFieldDictWithTerms

func NewFieldDictWithTerms(terms []string, include func(string) bool) *FieldDict

func (*FieldDict) BytesRead added in v0.1.0

func (d *FieldDict) BytesRead() uint64

func (*FieldDict) Cardinality added in v0.3.0

func (d *FieldDict) Cardinality() int

func (*FieldDict) Close

func (d *FieldDict) Close() error

func (*FieldDict) Next

func (d *FieldDict) Next() (*index.DictEntry, error)

type FieldDictContains

type FieldDictContains struct {
	// contains filtered or unexported fields
}

func NewFieldDictContainsEmpty

func NewFieldDictContainsEmpty() *FieldDictContains

func NewFieldDictContainsFromTokenFrequencies

func NewFieldDictContainsFromTokenFrequencies(atf index.TokenFrequencies) *FieldDictContains

func (*FieldDictContains) BytesRead added in v0.1.0

func (d *FieldDictContains) BytesRead() uint64

func (*FieldDictContains) Contains

func (d *FieldDictContains) Contains(key []byte) (bool, error)

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader is responsible for reading the index data It is also responsible for caching some portions of a read operation which can be used for subsequent reads.

func NewReader

func NewReader(m *Sear) *Reader

NewReader returns a new reader for the provided Sear instance.

func (*Reader) Close

func (r *Reader) Close() error

func (*Reader) DocCount

func (r *Reader) DocCount() (uint64, error)

func (*Reader) DocIDReaderAll

func (r *Reader) DocIDReaderAll() (index.DocIDReader, error)

func (*Reader) DocIDReaderOnly

func (r *Reader) DocIDReaderOnly(ids []string) (index.DocIDReader, error)

func (*Reader) DocValueReader

func (r *Reader) DocValueReader(fields []string) (index.DocValueReader, error)

func (*Reader) Document

func (r *Reader) Document(id string) (index.Document, error)

func (*Reader) ExternalID

func (r *Reader) ExternalID(id index.IndexInternalID) (string, error)

func (*Reader) FieldDict

func (r *Reader) FieldDict(field string) (index.FieldDict, error)

func (*Reader) FieldDictContains

func (r *Reader) FieldDictContains(field string) (index.FieldDictContains, error)

func (*Reader) FieldDictFuzzy

func (r *Reader) FieldDictFuzzy(field, term string, fuzziness int, prefix string) (
	index.FieldDict, error)

func (*Reader) FieldDictFuzzyAutomaton added in v0.3.0

func (r *Reader) FieldDictFuzzyAutomaton(field, term string, fuzziness int, prefix string) (
	index.FieldDict, index.FuzzyAutomaton, error)

func (*Reader) FieldDictPrefix

func (r *Reader) FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)

func (*Reader) FieldDictRange

func (r *Reader) FieldDictRange(field string, startTerm, endTerm []byte) (index.FieldDict, error)

func (*Reader) FieldDictRegexp

func (r *Reader) FieldDictRegexp(field, regexStr string) (index.FieldDict, error)

func (*Reader) FieldDictRegexpAutomaton added in v0.3.0

func (r *Reader) FieldDictRegexpAutomaton(field, regexStr string) (
	index.FieldDict, index.RegexAutomaton, error)

func (*Reader) Fields

func (r *Reader) Fields() ([]string, error)

func (*Reader) GetInternal

func (r *Reader) GetInternal(key []byte) ([]byte, error)

func (*Reader) InternalID

func (r *Reader) InternalID(id string) (index.IndexInternalID, error)

func (*Reader) TermFieldReader

func (r *Reader) TermFieldReader(ctx context.Context, term []byte, field string, includeFreq, includeNorm,
	includeTermVectors bool) (index.TermFieldReader, error)

type Sear

type Sear struct {
	// contains filtered or unexported fields
}

Sear implements an index containing a single document.

func (*Sear) Batch

func (s *Sear) Batch(batch *index.Batch) error

Batch is not supported by this index.

func (*Sear) Close

func (s *Sear) Close() error

Close the index

func (*Sear) Delete

func (s *Sear) Delete(id string) error

Delete document from the index. Unlike other Bleve indexes, this operation will delete the document from the index, regardless of it's identifier.

func (*Sear) DeleteInternal

func (s *Sear) DeleteInternal(key []byte) error

DeleteInternal deletes a value from the index internal storage.

func (*Sear) Open

func (s *Sear) Open() error

Open the index

func (*Sear) Reader

func (s *Sear) Reader() (index.IndexReader, error)

Reader returns a reader for this index. Unlike other Bleve indexes, this reader is NOT isolated.

func (*Sear) SetInternal

func (s *Sear) SetInternal(key, val []byte) error

SetInternal sets a value in the index internal storage.

func (*Sear) StatsMap

func (s *Sear) StatsMap() map[string]interface{}

StatsMap returns stats about this index.

func (*Sear) Update

func (s *Sear) Update(doc index.Document) error

Update the index to include this document. Unlike other Bleve indexes, this operation will overwrite a previously indexed document, regardless of the document's identifiers.

type TermFieldReader

type TermFieldReader struct {
	// contains filtered or unexported fields
}

func NewTermFieldReaderEmpty

func NewTermFieldReaderEmpty() *TermFieldReader

func NewTermFieldReaderFromTokenFreqAndLen

func NewTermFieldReaderFromTokenFreqAndLen(tf *index.TokenFreq, l int, includeFreq, includeNorm,
	includeTermVectors bool) *TermFieldReader

func (*TermFieldReader) Advance

Advance resets the enumeration at specified document or its immediate follower.

func (*TermFieldReader) Close

func (t *TermFieldReader) Close() error

func (*TermFieldReader) Count

func (t *TermFieldReader) Count() uint64

func (*TermFieldReader) Next

func (t *TermFieldReader) Next(preAlloced *index.TermFieldDoc) (*index.TermFieldDoc, error)

func (*TermFieldReader) Size

func (t *TermFieldReader) Size() int

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL