spi

package
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 24, 2020 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Index

Constants

View Source
const (
	STORED_FIELD_VISITOR_STATUS_YES  = StoredFieldVisitorStatus(1)
	STORED_FIELD_VISITOR_STATUS_NO   = StoredFieldVisitorStatus(2)
	STORED_FIELD_VISITOR_STATUS_STOP = StoredFieldVisitorStatus(3)
)

Variables

View Source
var DefaultCodec = func() Codec {
	ans := LoadCodec("Lucene410")
	assert(ans != nil)
	return ans
}

Expert: returns the default codec used for newly created IndexWriterConfig(s).

Functions

func AvailableCodecs

func AvailableCodecs() []string

returns a list of all available codec names

func AvailablePostingsFormats

func AvailablePostingsFormats() []string

Returns a list of all available format names.

func RegisterCodec

func RegisterCodec(codecs ...Codec)

workaround Lucene Java's SPI mechanism

func RegisterDocValuesFormat

func RegisterDocValuesFormat(formats ...DocValuesFormat)

workaround Lucene Java's SPI mechanism

func RegisterPostingsFormat

func RegisterPostingsFormat(formats ...PostingsFormat)

workaround Lucene Java's SPI mechanism

Types

type BinaryDocValues

type BinaryDocValues interface {
	// Lookup the value for document. The returned BytesRef may be
	// re-used across calls to get() so make sure to copy it if you
	// want to keep it around.
	Get(docId int) []byte
}

A per-document []byte

type BlockTermState

type BlockTermState struct {
	*OrdTermState
	// Allow sub-class to be converted
	Self TermState

	// how many docs have this term
	DocFreq int
	// total number of occurrences of this term
	TotalTermFreq int64

	// the term's ord in the current block
	TermBlockOrd int
	// contains filtered or unexported fields
}

BlockTermState.java

Holds all state required for PostingsReaderBase

to produce a DocsEnum without re-seeking the

terms dict.

func NewBlockTermState

func NewBlockTermState() *BlockTermState

func (*BlockTermState) Clone

func (ts *BlockTermState) Clone() TermState

func (*BlockTermState) CopyFrom

func (ts *BlockTermState) CopyFrom(other TermState)

func (*BlockTermState) CopyFrom_

func (ts *BlockTermState) CopyFrom_(other TermState)

func (*BlockTermState) String

func (ts *BlockTermState) String() string

type Codec

type Codec interface {
	// Returns this codec's name
	Name() string
	// Encodes/decodes postings
	PostingsFormat() PostingsFormat
	// Encodes/decodes docvalues
	DocValuesFormat() DocValuesFormat
	// Encodes/decodes stored fields
	StoredFieldsFormat() StoredFieldsFormat
	// Encodes/decodes term vectors
	TermVectorsFormat() TermVectorsFormat
	// Encodes/decodes field infos file
	FieldInfosFormat() FieldInfosFormat
	// Encodes/decodes segment info file
	SegmentInfoFormat() SegmentInfoFormat
	// Encodes/decodes document normalization values
	NormsFormat() NormsFormat
	// Encodes/decodes live docs
	LiveDocsFormat() LiveDocsFormat
}

Encodes/decodes an inverted index segment.

Note, when extending this class, the name is written into the index. In order for the segment to be read, the name must resolve to your implementation via forName(). This method use hard-coded map to resolve codec names.

If you implement your own codec, make sure that it is included so SPI can load it.

func LoadCodec

func LoadCodec(name string) Codec

looks up a codec by name

type CodecImpl

type CodecImpl struct {
	// contains filtered or unexported fields
}

func NewCodec

func NewCodec(name string,
	fieldsFormat StoredFieldsFormat,
	vectorsFormat TermVectorsFormat,
	fieldInfosFormat FieldInfosFormat,
	infosFormat SegmentInfoFormat,
	liveDocsFormat LiveDocsFormat,
	postingsFormat PostingsFormat,
	docValuesFormat DocValuesFormat,
	normsFormat NormsFormat) *CodecImpl

func (*CodecImpl) DocValuesFormat

func (codec *CodecImpl) DocValuesFormat() DocValuesFormat

func (*CodecImpl) FieldInfosFormat

func (codec *CodecImpl) FieldInfosFormat() FieldInfosFormat

func (*CodecImpl) LiveDocsFormat

func (codec *CodecImpl) LiveDocsFormat() LiveDocsFormat

func (*CodecImpl) Name

func (codec *CodecImpl) Name() string

func (*CodecImpl) NormsFormat

func (codec *CodecImpl) NormsFormat() NormsFormat

func (*CodecImpl) PostingsFormat

func (codec *CodecImpl) PostingsFormat() PostingsFormat

func (*CodecImpl) SegmentInfoFormat

func (codec *CodecImpl) SegmentInfoFormat() SegmentInfoFormat

func (*CodecImpl) StoredFieldsFormat

func (codec *CodecImpl) StoredFieldsFormat() StoredFieldsFormat

func (*CodecImpl) String

func (codec *CodecImpl) String() string

returns the codec's name. Subclass can override to provide more detail (such as parameters.)

func (*CodecImpl) TermVectorsFormat

func (codec *CodecImpl) TermVectorsFormat() TermVectorsFormat

type DocValuesConsumer

type DocValuesConsumer interface {
	io.Closer
	// Writes numeric docvalues for a field.
	AddNumericField(*FieldInfo, func() func() (interface{}, bool)) error
}

codecs/DocValuesConsumer.java

Abstract API that consumes numeric, binary and sorted docvalues. Concret implementations of this actually do "something" with the docvalues (write it into the index in a specific format).

The lifecycle is:

1. DocValuesConsumer is created by DocValuesFormat.FieldsConsumer() or NormsFormat.NormsConsumer(). 2. AddNumericField, AddBinaryField, or addSortedField are called for each Numeric, Binary, or Sorted docvalues field. The API is a "pull" rather than "push", and the implementation is free to iterate over the values multiple times. 3. After all fields are added, the consumer is closed.

type DocValuesFormat

type DocValuesFormat interface {
	Name() string
	// Returns a DocValuesConsumer to write docvalues to the index.
	FieldsConsumer(state *SegmentWriteState) (w DocValuesConsumer, err error)
	// Returns a DocValuesProducer to read docvalues from the index.
	//
	// NOTE: by the time this call returns, it must
	// hold open any files it will need to use; else, those files may
	// be deleted. Additionally, required fiels may be deleted during
	// the execution of this call before there is a chance to open them.
	// Under these circumstances an IO error should be returned by the
	// implementation. IO errors are expected and will automatically
	// cause a retry of the segment opening logic with the newly
	// revised segments.
	FieldsProducer(state SegmentReadState) (r DocValuesProducer, err error)
}

Encodes/decodes per-document values.

Note, when extending this class, the name Name() may be written into the index in certain configurations. In order for the segment to be read, the name must resolve to your implemetation via LoadXYZ(). Since Go doesn't have Java's SPI locate mechanism, this method use manual mappings to resolve format names.

If you implement your own format, make sure that it is manually included.

type DocValuesProducer

type DocValuesProducer interface {
	io.Closer
	Numeric(field *FieldInfo) (v NumericDocValues, err error)
	Binary(field *FieldInfo) (v BinaryDocValues, err error)
	Sorted(field *FieldInfo) (v SortedDocValues, err error)
	SortedSet(field *FieldInfo) (v SortedSetDocValues, err error)
}

Abstract API that produces numeric, binary and sorted docvalues.

func LoadDocValuesProducer

func LoadDocValuesProducer(name string, state SegmentReadState) (fp DocValuesProducer, err error)

type FieldInfosFormat

type FieldInfosFormat interface {
	// Returns a FieldInfosReader to read field infos from the index
	FieldInfosReader() FieldInfosReader
	// Returns a FieldInfosWriter to write field infos to the index
	FieldInfosWriter() FieldInfosWriter
}

Encodes/decodes FieldInfos

type FieldInfosReader

type FieldInfosReader func(d store.Directory, name, suffix string, ctx store.IOContext) (infos FieldInfos, err error)

Codec API for reading FieldInfos.

type FieldInfosWriter

type FieldInfosWriter func(d store.Directory, name, suffix string, infos FieldInfos, ctx store.IOContext) error

Codec API for writing FieldInfos.

type FieldsConsumer

type FieldsConsumer interface {
	io.Closer
	// Add a new field
	AddField(field *FieldInfo) (TermsConsumer, error)
}

Abstract API that consumes terms, doc, freq, prox, offset and payloads postings. Concrete implementations of this actually do "something" with the postings (write it into the index in a specific format).

The lifecycle is:

1. FieldsConsumer is created by PostingsFormat.FieldsConsumer(). 2. For each field, AddField() is called, returning a TermsConsumer for the field.

type FieldsProducer

type FieldsProducer interface {
	Fields
	io.Closer
}

type LiveDocsFormat

type LiveDocsFormat interface {
	// Creates a new MutableBits, with all bits set, for the specified size.
	NewLiveDocs(size int) util.MutableBits
	// Creates a new MutableBits of the same bits set and size of existing.
	// NewLiveDocs(existing util.Bits) (util.MutableBits, error)
	// Persist live docs bits. Use SegmentCommitInfo.nextDelGen() to
	// determine the generation of the deletes file you should write to.
	WriteLiveDocs(bits util.MutableBits, dir store.Directory,
		info *SegmentCommitInfo, newDelCount int, ctx store.IOContext) error
	// Records all files in use by this SegmentCommitInfo
	Files(*SegmentCommitInfo) []string
}

Format for live/deleted documents

type NormsFormat

type NormsFormat interface {
	// Returns a DocValuesConsumer to write norms to the index.
	NormsConsumer(state *SegmentWriteState) (w DocValuesConsumer, err error)
	// Returns a DocValuesProducer to read norms from the index.
	//
	// NOTE: by the time this call returns, it must
	// hold open any files it will need to use; else, those files may
	// be deleted. Additionally, required fiels may be deleted during
	// the execution of this call before there is a chance to open them.
	// Under these circumstances an IO error should be returned by the
	// implementation. IO errors are expected and will automatically
	// cause a retry of the segment opening logic with the newly
	// revised segments.
	NormsProducer(state SegmentReadState) (r DocValuesProducer, err error)
}

Encodes/decodes per-document score normalization values.

type NumericDocValues

type NumericDocValues func(docID int) int64
type NumericDocValues interface {
	Value(docID int) int64
}

type OrdTermState

type OrdTermState struct {
	// contains filtered or unexported fields
}

An ordinal based TermState

func (*OrdTermState) Clone

func (ts *OrdTermState) Clone() TermState

func (*OrdTermState) CopyFrom

func (ts *OrdTermState) CopyFrom(other TermState)

func (*OrdTermState) String

func (ts *OrdTermState) String() string

type PostingsFormat

type PostingsFormat interface {
	// Returns this posting format's name
	Name() string
	// Writes a new segment
	FieldsConsumer(state *SegmentWriteState) (FieldsConsumer, error)
	// Reads a segment. NOTE: by the time this call returns, it must
	// hold open any files it will need to use; else, those files may
	// be deleted. Additionally, required fiels may be deleted during
	// the execution of this call before there is a chance to open them.
	// Under these circumstances an IO error should be returned by the
	// implementation. IO errors are expected and will automatically
	// cause a retry of the segment opening logic with the newly
	// revised segments.
	FieldsProducer(state SegmentReadState) (FieldsProducer, error)
}

Encodes/decodes terms, postings, and proximity data.

Note, when extending this class, the name Name() may be written into the index in certain configurations. In order for the segment to be read, the name must resolve to your implemetation via LoadPostingsFormat(). Since Go doesn't have Java's SPI locate mechanism, this method use manual mappings to resolve format names.

If you implement your own format, make sure that it is manually included.

func LoadPostingsFormat

func LoadPostingsFormat(name string) PostingsFormat

looks up a format by name

type PostingsFormatImpl

type PostingsFormatImpl struct {
	// contains filtered or unexported fields
}

func (*PostingsFormatImpl) Name

func (pf *PostingsFormatImpl) Name() string

Returns this posting format's name

func (*PostingsFormatImpl) String

func (pf *PostingsFormatImpl) String() string

type PostingsReaderBase

type PostingsReaderBase interface {
	io.Closer
	/** Performs any initialization, such as reading and
	 *  verifying the header from the provided terms
	 *  dictionary {@link IndexInput}. */
	Init(termsIn store.IndexInput) error
	/** Return a newly created empty TermState */
	NewTermState() *BlockTermState
	/** Actually decode metadata for next term */
	DecodeTerm([]int64, util.DataInput, *FieldInfo, *BlockTermState, bool) error
	/** Must fully consume state, since after this call that
	 *  TermState may be reused. */
	Docs(fieldInfo *FieldInfo, state *BlockTermState, skipDocs util.Bits, reuse DocsEnum, flags int) (de DocsEnum, err error)
}

The core terms dictionaries (BlockTermsReader, BlockTreeTermsReader) interacts with a single instnce of this class to manage creation of DocsEnum and DocsAndPositionsEnum instances. It provides an IndexInput (termsIn) where this class may read any previously stored data that it had written in its corresponding PostingsWrierBase at indexing time.

type SegmentCommitInfo

type SegmentCommitInfo struct {
	// The SegmentInfo that we wrap.
	Info *SegmentInfo

	// NOTE: only used by in-RAM by IW to track buffered deletes;
	// this is never written to/read from the Directory
	BufferedUpdatesGen int64
	// contains filtered or unexported fields
}

Embeds a [read-only] SegmentInfo and adds per-commit fields.

func NewSegmentCommitInfo

func NewSegmentCommitInfo(info *SegmentInfo,
	delCount int, delGen, fieldInfosGen, docValuesGen int64) *SegmentCommitInfo

func (*SegmentCommitInfo) AdvanceDelGen

func (info *SegmentCommitInfo) AdvanceDelGen()

Called when we succeed in writing deletes

func (*SegmentCommitInfo) AdvanceNextWriteDelGen

func (info *SegmentCommitInfo) AdvanceNextWriteDelGen()

Called if there was an error while writing deletes, so that we don't try to write to the same file more than once.

func (*SegmentCommitInfo) Clone

func (si *SegmentCommitInfo) Clone() *SegmentCommitInfo

func (*SegmentCommitInfo) CloneDeep

func (si *SegmentCommitInfo) CloneDeep(cloneSegmentInfo bool) *SegmentCommitInfo

func (*SegmentCommitInfo) DelCount

func (si *SegmentCommitInfo) DelCount() int

Returns the number of deleted docs in the segment.

func (*SegmentCommitInfo) DelGen

func (si *SegmentCommitInfo) DelGen() int64

Returns generation number of the live docs file or -1 if there are no deletes yet.

func (*SegmentCommitInfo) DocValuesGen

func (si *SegmentCommitInfo) DocValuesGen() int64

func (*SegmentCommitInfo) DocValuesUpdatesFiles

func (si *SegmentCommitInfo) DocValuesUpdatesFiles() map[int]map[string]bool

func (*SegmentCommitInfo) FieldInfosFiles

func (si *SegmentCommitInfo) FieldInfosFiles() map[string]bool

func (*SegmentCommitInfo) FieldInfosGen

func (si *SegmentCommitInfo) FieldInfosGen() int64

func (*SegmentCommitInfo) Files

func (si *SegmentCommitInfo) Files() []string

Returns all files in use by this segment.

func (*SegmentCommitInfo) HasDeletions

func (si *SegmentCommitInfo) HasDeletions() bool

Returns true if there are any deletions for the segment at this commit.

func (*SegmentCommitInfo) HasFieldUpdates

func (si *SegmentCommitInfo) HasFieldUpdates() bool

func (*SegmentCommitInfo) NextDelGen

func (si *SegmentCommitInfo) NextDelGen() int64

Returns the next available generation numbre of the live docs file.

func (*SegmentCommitInfo) SetBufferedUpdatesGen

func (si *SegmentCommitInfo) SetBufferedUpdatesGen(v int64)

func (*SegmentCommitInfo) SetDelCount

func (si *SegmentCommitInfo) SetDelCount(delCount int)

func (*SegmentCommitInfo) SetDocValuesUpdatesFiles

func (si *SegmentCommitInfo) SetDocValuesUpdatesFiles(dvUpdatesFiles map[int]map[string]bool)

func (*SegmentCommitInfo) SetFieldInfosFiles

func (si *SegmentCommitInfo) SetFieldInfosFiles(fieldInfosFiles map[string]bool)

func (*SegmentCommitInfo) SizeInBytes

func (si *SegmentCommitInfo) SizeInBytes() (sum int64, err error)

Returns total size in bytes of all files for this segment.

NOTE: This value is not correct for 3.0 segments that have shared docstores. To get correct value, upgrade.

func (*SegmentCommitInfo) String

func (si *SegmentCommitInfo) String() string

func (*SegmentCommitInfo) StringOf

func (si *SegmentCommitInfo) StringOf(dir store.Directory, pendingDelCount int) string

type SegmentInfoFormat

type SegmentInfoFormat interface {
	// Returns the SegmentInfoReader for reading SegmentInfo instances.
	SegmentInfoReader() SegmentInfoReader
	// Returns the SegmentInfoWriter for writing SegmentInfo instances.
	SegmentInfoWriter() SegmentInfoWriter
}

Expert: Control the format of SegmentInfo (segment metadata file).

type SegmentInfoReader

type SegmentInfoReader interface {
	Read(store.Directory, string, store.IOContext) (*SegmentInfo, error)
}

type SegmentInfoWriter

type SegmentInfoWriter interface {
	Write(store.Directory, *SegmentInfo, FieldInfos, store.IOContext) error
}

Write SegmentInfo data.

type SortedDocValues

type SortedDocValues interface {
	BinaryDocValues
	Ord(docID int) int
	LookupOrd(int) []byte
	ValueCount() int
}

type SortedSetDocValues

type SortedSetDocValues interface {
	NextOrd() int64
	SetDocument(docID int)
	LookupOrd(int64) []byte
	ValueCount() int64
}

type StoredFieldVisitor

type StoredFieldVisitor interface {
	BinaryField(fi *model.FieldInfo, value []byte) error
	StringField(fi *model.FieldInfo, value string) error
	IntField(fi *model.FieldInfo, value int) error
	LongField(fi *model.FieldInfo, value int64) error
	FloatField(fi *model.FieldInfo, value float32) error
	DoubleField(fi *model.FieldInfo, value float64) error
	NeedsField(fi *model.FieldInfo) (StoredFieldVisitorStatus, error)
}

type StoredFieldVisitorStatus

type StoredFieldVisitorStatus int

type StoredFieldsFormat

type StoredFieldsFormat interface {
	// Returns a StoredFieldsReader to load stored fields.
	FieldsReader(d store.Directory, si *SegmentInfo, fn FieldInfos, context store.IOContext) (r StoredFieldsReader, err error)
	// Returns a StoredFieldsWriter to write stored fields.
	FieldsWriter(d store.Directory, si *SegmentInfo, context store.IOContext) (w StoredFieldsWriter, err error)
}

Controls the format of stored fields

type StoredFieldsReader

type StoredFieldsReader interface {
	io.Closer
	VisitDocument(n int, visitor StoredFieldVisitor) error
	Clone() StoredFieldsReader
}

type StoredFieldsWriter

type StoredFieldsWriter interface {
	io.Closer
	// Called before writing the stored fields of te document.
	// WriteField() will be called for each stored field. Note that
	// this is called even if the document has no stored fields.
	StartDocument() error
	// Called when a document and all its fields have been added.
	FinishDocument() error
	// Writes a single stored field.
	WriteField(info *model.FieldInfo, field model.IndexableField) error
	// Aborts writing entirely, implementation should remove any
	// partially-written files, etc.
	Abort()
	// Called before Close(), passing in the number of documents that
	// were written. Note that this is intentionally redundant
	// (equivalent to the number of calls to startDocument(int)), but a
	// Codec should check that this is the case to detect the JRE bug
	// described in LUCENE-1282.
	Finish(fis model.FieldInfos, numDocs int) error
}

Codec API for writing stored fields:

1. For every document, StartDocument() is called, informing the Codec how many fields will be written. 2. WriteField() is called for each field in the document. 3. After all documents have been writen, Finish() is called for verification/sanity-checks. 4. Finally the writer is closed.

type TermVectorsFormat

type TermVectorsFormat interface {
	// Returns a TermVectorsReader to read term vectors.
	VectorsReader(d store.Directory, si *SegmentInfo, fn FieldInfos, ctx store.IOContext) (r TermVectorsReader, err error)
	// Returns a TermVectorsWriter to write term vectors.
	VectorsWriter(d store.Directory, si *SegmentInfo, ctx store.IOContext) (w TermVectorsWriter, err error)
}

Controls the format of term vectors

type TermVectorsReader

type TermVectorsReader interface {
	io.Closer
	Get(doc int) model.Fields
	Clone() TermVectorsReader
}

type TermVectorsWriter

type TermVectorsWriter interface {
	io.Closer
	// Called before writing the term vectors of the document.
	// startField() will be called numVectorsFields times. Note that if
	// term vectors are enabled, this is called even if the document
	// has no vector fields, in this case numVectorFields will be zero.
	StartDocument(int) error
	// Called after a doc and all its fields have been added
	FinishDocument() error
	// Aborts writing entirely, implementation should remove any
	// partially-written files, etc.
	Abort()
	// Called before Close(), passing in the number of documents that
	// were written. Note that this is intentionally redendant
	// (equivalent to the number of calls to startDocument(int)), but a
	// Codec should check that this is the case to detect the JRE bug
	// described in LUCENE-1282.
	Finish(model.FieldInfos, int) error
}

Codec API for writing term vecrors:

1. For every document, StartDocument() is called, informing the Codec how may fields will be written. 2. StartField() is called for each field in the document, informing the codec how many terms will be written for that field, and whether or not positions, offsets, or payloads are enabled. 3. Within each field, StartTerm() is called for each term. 4. If offsets and/or positions are enabled, then AddPosition() will be called for each term occurrence. 5. After all documents have been written, Finish() is called for verification/sanity-checks. 6. Finally the writer is closed.

type TermsConsumer

type TermsConsumer interface {
	// Starts a ew term in this field; this may be called with no
	// corresponding call to finish if the term had no docs.
	StartTerm([]byte) (PostingsConsumer, error)
	// Finishes the current term; numDocs must be > 0.
	// stats.totalTermFreq will be -1 when term frequencies are omitted
	// for the field.
	FinishTerm([]byte, *TermStats) error
	// Called when we are done adding terms to this field.
	// sumTotalTermFreq will be -1 when term frequencies are omitted
	// for the field.
	Finish(sumTotalTermFreq, sumDocFreq int64, docCount int) error
	// Return the BytesRef comparator used to sort terms before feeding
	// to this API.
	Comparator() func(a, b []byte) bool
}

Abstract API that consumes terms for an individual field.

The lifecycle is:

  • TermsConsumer is returned for each field by FieldsConsumer.addField().
  • TermsConsumer returns a PostingsCOnsumer for each term in startTerm().
  • When the producer (e.g. IndexWriter) is done adding documents for the term, it calls finishTerm(), passing in the accumulated term statistics.
  • Producer calls finish() with the accumulated collection statistics when it is finished adding terms to the field.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL