zap

package module

v16.2.1 Latest Latest Go to latest Published: Feb 6, 2025 License: Apache-2.0 Imports: 21 Imported by: 4

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

zapx file format

The zapx module is fork of zap module which maintains file format compatibility, but removes dependency on bleve, and instead depends only on the indepenent interface modules:

Advanced ZAP File Format Documentation is here.

The file is written in the reverse order that we typically access data. This helps us write in one pass since later sections of the file require file offsets of things we've already written.

Current usage:

mmap the entire file
crc-32 bytes and version are in fixed position at end of the file
reading remainder of footer could be version specific
remainder of footer gives us:
- 3 important offsets (docValue , fields index and stored data index)
- 2 important values (number of docs and chunk factor)
field data is processed once and memoized onto the heap so that we never have to go back to disk for it
access to stored data by doc number means first navigating to the stored data index, then accessing a fixed position offset into that slice, which gives us the actual address of the data. the first bytes of that section tell us the size of data so that we know where it ends.
access to all other indexed data follows the following pattern:
- first know the field name -> convert to id
- next navigate to term dictionary for that field
  - some operations stop here and do dictionary ops
- next use dictionary to navigate to posting list for a specific term
- walk posting list
- if necessary, walk posting details as we go
- if location info is desired, consult location bitmap to see if it is there

stored fields section

for each document
- preparation phase:
  - produce a slice of metadata bytes and data bytes
  - produce these slices in field id order
  - field value is appended to the data slice
  - metadata slice is varint encoded with the following values for each field value
    - field id (uint16)
    - field type (byte)
    - field value start offset in uncompressed data slice (uint64)
    - field value length (uint64)
    - field number of array positions (uint64)
    - one additional value for each array position (uint64)
    - compress the data slice using snappy
- file writing phase:
  - remember the start offset for this document
  - write out meta data length (varint uint64)
  - write out compressed data length (varint uint64)
  - write out the metadata bytes
  - write out the compressed data bytes

stored fields idx

for each document
- write start offset (remembered from previous section) of stored data (big endian uint64)

With this index and a known document number, we have direct access to all the stored field data.

posting details (freq/norm) section

for each posting list
- produce a slice containing multiple consecutive chunks (each chunk is varint stream)
- produce a slice remembering offsets of where each chunk starts
- preparation phase:
  - for each hit in the posting list
  - if this hit is in next chunk close out encoding of last chunk and record offset start of next
  - encode term frequency (uint64)
  - encode norm factor (float32)
- file writing phase:
  - remember start position for this posting list details
  - write out number of chunks that follow (varint uint64)
  - write out length of each chunk (each a varint uint64)
  - write out the byte slice containing all the chunk data

If you know the doc number you're interested in, this format lets you jump to the correct chunk (docNum/chunkFactor) directly and then seek within that chunk until you find it.

posting details (location) section

for each posting list
- produce a slice containing multiple consecutive chunks (each chunk is varint stream)
- produce a slice remembering offsets of where each chunk starts
- preparation phase:
  - for each hit in the posting list
  - if this hit is in next chunk close out encoding of last chunk and record offset start of next
  - encode field (uint16)
  - encode field pos (uint64)
  - encode field start (uint64)
  - encode field end (uint64)
  - encode number of array positions to follow (uint64)
  - encode each array position (each uint64)
- file writing phase:
  - remember start position for this posting list details
  - write out number of chunks that follow (varint uint64)
  - write out length of each chunk (each a varint uint64)
  - write out the byte slice containing all the chunk data

If you know the doc number you're interested in, this format lets you jump to the correct chunk (docNum/chunkFactor) directly and then seek within that chunk until you find it.

postings list section

for each posting list
- preparation phase:
  - encode roaring bitmap posting list to bytes (so we know the length)
- file writing phase:
  - remember the start position for this posting list
  - write freq/norm details offset (remembered from previous, as varint uint64)
  - write location details offset (remembered from previous, as varint uint64)
  - write length of encoded roaring bitmap
  - write the serialized roaring bitmap data

dictionary

for each field
- preparation phase:
  - encode vellum FST with dictionary data pointing to file offset of posting list (remembered from previous)
- file writing phase:
  - remember the start position of this persistDictionary
  - write length of vellum data (varint uint64)
  - write out vellum data

fields section

for each field
- file writing phase:
  - remember start offset for each field
  - write dictionary address (remembered from previous) (varint uint64)
  - write length of field name (varint uint64)
  - write field name bytes

fields idx

for each field
- file writing phase:
  - write big endian uint64 of start offset for each field

NOTE: currently we don't know or record the length of this fields index. Instead we rely on the fact that we know it immediately precedes a footer of known size.

fields DocValue

for each field
- preparation phase:
  - produce a slice containing multiple consecutive chunks, where each chunk is composed of a meta section followed by compressed columnar field data
  - produce a slice remembering the length of each chunk
- file writing phase:
  - remember the start position of this first field DocValue offset in the footer
  - write out number of chunks that follow (varint uint64)
  - write out length of each chunk (each a varint uint64)
  - write out the byte slice containing all the chunk data

NOTE: currently the meta header inside each chunk gives clue to the location offsets and size of the data pertaining to a given docID and any read operation leverage that meta information to extract the document specific data from the file.

file writing phase
- write number of docs (big endian uint64)
- write stored field index location (big endian uint64)
- write field index location (big endian uint64)
- write field docValue location (big endian uint64)
- write out chunk factor (big endian uint32)
- write out version (big endian uint32)
- write out file CRC of everything preceding this (big endian uint32)

Documentation ¶

Index ¶

Constants
Variables
func FSTValDecode1Hit(v uint64) (docNum uint64, normBits uint64)
func FSTValEncode1Hit(docNum uint64, normBits uint64) uint64
func PersistSegmentBase(sb *SegmentBase, path string) error
func PostingsIteratorFrom1Hit(docNum1Hit uint64, includeFreqNorm, includeLocs bool) (segment.PostingsIterator, error)
func PostingsIteratorFromBitmap(bm *roaring.Bitmap, includeFreqNorm, includeLocs bool) (segment.PostingsIterator, error)
func ReadDocValueBoundary(chunk int, metaHeaders []MetaData) (uint64, uint64)
type CountHashWriter
- func NewCountHashWriter(w io.Writer) *CountHashWriter
- func NewCountHashWriterWithStatsReporter(w io.Writer, s segment.StatsReporter) *CountHashWriter
- func (c *CountHashWriter) Count() int
- func (c *CountHashWriter) Sum32() uint32
- func (c *CountHashWriter) Write(b []byte) (int, error)
type Dictionary
- func (d *Dictionary) AutomatonIterator(a segment.Automaton, startKeyInclusive, endKeyExclusive []byte) segment.DictionaryIterator
- func (d *Dictionary) BytesRead() uint64
- func (d *Dictionary) BytesWritten() uint64
- func (d *Dictionary) Cardinality() int
- func (d *Dictionary) Contains(key []byte) (bool, error)
- func (d *Dictionary) PostingsList(term []byte, except *roaring.Bitmap, prealloc segment.PostingsList) (segment.PostingsList, error)
- func (d *Dictionary) ResetBytesRead(val uint64)
type DictionaryIterator
- func (i *DictionaryIterator) Next() (*index.DictEntry, error)
type Location
- func (l *Location) ArrayPositions() []uint64
- func (l *Location) End() uint64
- func (l *Location) Field() string
- func (l *Location) Pos() uint64
- func (l *Location) Size() int
- func (l *Location) Start() uint64
type MetaData
type Posting
- func (p *Posting) Frequency() uint64
- func (p *Posting) Locations() []segment.Location
- func (p *Posting) Norm() float64
- func (p *Posting) NormUint64() uint64
- func (p *Posting) Number() uint64
- func (p *Posting) Size() int
type PostingsIterator
- func (p *PostingsIterator) ActualBitmap() *roaring.Bitmap
- func (i *PostingsIterator) Advance(docNum uint64) (segment.Posting, error)
- func (i *PostingsIterator) BytesRead() uint64
- func (i *PostingsIterator) BytesWritten() uint64
- func (p *PostingsIterator) DocNum1Hit() (uint64, bool)
- func (i *PostingsIterator) Next() (segment.Posting, error)
- func (p *PostingsIterator) ReplaceActual(abm *roaring.Bitmap)
- func (i *PostingsIterator) ResetBytesRead(val uint64)
- func (i *PostingsIterator) Size() int
type PostingsList
- func (p *PostingsList) BytesRead() uint64
- func (p *PostingsList) BytesWritten() uint64
- func (p *PostingsList) Count() uint64
- func (p *PostingsList) Iterator(includeFreq, includeNorm, includeLocs bool, prealloc segment.PostingsIterator) segment.PostingsIterator
- func (p *PostingsList) OrInto(receiver *roaring.Bitmap)
- func (p *PostingsList) ResetBytesRead(val uint64)
- func (p *PostingsList) Size() int
type Segment
- func (s *Segment) AddRef()
- func (s *Segment) BytesRead() uint64
- func (s *Segment) BytesWritten() uint64
- func (s *Segment) CRC() uint32
- func (s *Segment) ChunkMode() uint32
- func (s *Segment) Close() (err error)
- func (s *Segment) Data() []byte
- func (s *Segment) DecRef() (err error)
- func (s *Segment) DictAddr(field string) (uint64, error)
- func (s *Segment) DocValueOffset() uint64
- func (s *Segment) FieldsIndexOffset() uint64
- func (s *Segment) NumDocs() uint64
- func (s *Segment) Path() string
- func (s *Segment) ResetBytesRead(val uint64)
- func (s *Segment) Size() int
- func (s *Segment) StoredIndexOffset() uint64
- func (s *Segment) ThesaurusAddr(name string) (uint64, error)
- func (s *Segment) Version() uint32
type SegmentBase
- func InitSegmentBase(mem []byte, memCRC uint32, chunkMode uint32, numDocs uint64, ...) (*SegmentBase, error)
- func (sb *SegmentBase) AddRef()
- func (s *SegmentBase) BytesRead() uint64
- func (s *SegmentBase) BytesWritten() uint64
- func (sb *SegmentBase) Close() (err error)
- func (s *SegmentBase) Count() uint64
- func (sb *SegmentBase) DecRef() (err error)
- func (s *SegmentBase) Dictionary(field string) (segment.TermDictionary, error)
- func (s *SegmentBase) DocID(num uint64) ([]byte, error)
- func (s *SegmentBase) DocNumbers(ids []string) (*roaring.Bitmap, error)
- func (s *SegmentBase) Fields() []string
- func (sb *SegmentBase) Persist(path string) error
- func (s *SegmentBase) ResetBytesRead(val uint64)
- func (sb *SegmentBase) Size() int
- func (s *SegmentBase) Thesaurus(name string) (segment.Thesaurus, error)
- func (s *SegmentBase) VisitDocValues(localDocNum uint64, fields []string, visitor index.DocValueVisitor, ...) (segment.DocVisitState, error)
- func (s *SegmentBase) VisitStoredFields(num uint64, visitor segment.StoredFieldValueVisitor) error
- func (s *SegmentBase) VisitableDocValueFields() ([]string, error)
- func (sb *SegmentBase) WriteTo(w io.Writer) (int64, error)
type Synonym
- func (s *Synonym) Number() uint32
- func (p *Synonym) Size() int
- func (s *Synonym) Term() string
type SynonymsIterator
- func (i *SynonymsIterator) Next() (segment.Synonym, error)
- func (i *SynonymsIterator) Size() int
type SynonymsList
- func (s *SynonymsList) Iterator(prealloc segment.SynonymsIterator) segment.SynonymsIterator
- func (p *SynonymsList) Size() int
type Thesaurus
- func (t *Thesaurus) AutomatonIterator(a segment.Automaton, startKeyInclusive, endKeyExclusive []byte) segment.ThesaurusIterator
- func (t *Thesaurus) Contains(key []byte) (bool, error)
- func (t *Thesaurus) SynonymsList(term []byte, except *roaring.Bitmap, prealloc segment.SynonymsList) (segment.SynonymsList, error)
type ThesaurusIterator
- func (i *ThesaurusIterator) Next() (*index.ThesaurusEntry, error)
type ZapPlugin
- func (*ZapPlugin) Merge(segments []seg.Segment, drops []*roaring.Bitmap, path string, ...) ([][]uint64, uint64, error)
- func (z *ZapPlugin) New(results []index.Document) (segment.Segment, uint64, error)
- func (*ZapPlugin) Open(path string) (segment.Segment, error)
- func (*ZapPlugin) Type() string
- func (*ZapPlugin) Version() uint32

Constants ¶

View Source

const (
	SectionInvertedTextIndex = iota
	SectionFaissVectorIndex
	SectionSynonymIndex
)

View Source

const DocNum1HitFinished = math.MaxUint64

View Source

const FSTValEncoding1Hit = uint64(0x8000000000000000)

View Source

const FSTValEncodingGeneral = uint64(0x0000000000000000)

View Source

const FSTValEncodingMask = uint64(0xc000000000000000)

View Source

const FooterSize = 4 + 4 + 4 + 8 + 8 + 8 + 8 + 8

FooterSize is the size of the footer record in bytes crc + ver + chunk + docValueOffset + sectionsIndexOffset + field offset + stored offset + num docs

View Source

const IndexSectionsVersion uint32 = 16

View Source

const Type string = "zap"

View Source

const Version uint32 = 16

Variables ¶

View Source

var DefaultChunkMode uint32 = 1026

DefaultChunkMode is the most recent improvement to chunking and should be used by default.

View Source

var DefaultFileMergerBufferSize = 1024 * 1024

View Source

var ErrChunkSizeZero = errors.New("chunk size is zero")

View Source

var LegacyChunkMode uint32 = 1024

LegacyChunkMode was the original chunk mode (always chunk size 1024) this mode is still used for chunking doc values.

View Source

var NewSegmentBufferAvgBytesPerDocFactor float64 = 1.0

View Source

var NewSegmentBufferNumResultsBump int = 100

View Source

var NewSegmentBufferNumResultsFactor float64 = 1.0

View Source

var NormBits1Hit = uint64(1)

View Source

var SizeOfBool int

View Source

var SizeOfFloat32 int

View Source

var SizeOfFloat64 int

View Source

var SizeOfInt int

View Source

var SizeOfMap int

View Source

var SizeOfPtr int

View Source

var SizeOfSlice int

View Source

var SizeOfString int

View Source

var SizeOfUint16 int

View Source

var SizeOfUint32 int

View Source

var SizeOfUint64 int

View Source

var SizeOfUint8 int

View Source

var ValidateDocFields = func(field index.Field) error {
	return nil
}

ValidateDocFields can be set by applications to perform additional checks on fields in a document being added to a new segment, by default it does nothing. This API is experimental and may be removed at any time.

Functions ¶

func FSTValDecode1Hit ¶

func FSTValDecode1Hit(v uint64) (docNum uint64, normBits uint64)

func FSTValEncode1Hit ¶

func FSTValEncode1Hit(docNum uint64, normBits uint64) uint64

func PersistSegmentBase ¶

func PersistSegmentBase(sb *SegmentBase, path string) error

PersistSegmentBase persists SegmentBase in the zap file format.

func PostingsIteratorFrom1Hit ¶

func PostingsIteratorFrom1Hit(docNum1Hit uint64,
	includeFreqNorm, includeLocs bool) (segment.PostingsIterator, error)

PostingsIteratorFrom1Hit constructs a PostingsIterator given a 1-hit docNum.

func PostingsIteratorFromBitmap ¶

func PostingsIteratorFromBitmap(bm *roaring.Bitmap,
	includeFreqNorm, includeLocs bool) (segment.PostingsIterator, error)

PostingsIteratorFromBitmap constructs a PostingsIterator given an "actual" bitmap.

func ReadDocValueBoundary ¶

func ReadDocValueBoundary(chunk int, metaHeaders []MetaData) (uint64, uint64)

ReadDocValueBoundary elicits the start, end offsets from a metaData header slice

Types ¶

type CountHashWriter ¶

type CountHashWriter struct {
	// contains filtered or unexported fields
}

CountHashWriter is a wrapper around a Writer which counts the number of bytes which have been written and computes a crc32 hash

func NewCountHashWriter ¶

func NewCountHashWriter(w io.Writer) *CountHashWriter

NewCountHashWriter returns a CountHashWriter which wraps the provided Writer

func NewCountHashWriterWithStatsReporter ¶

func NewCountHashWriterWithStatsReporter(w io.Writer, s segment.StatsReporter) *CountHashWriter

func (*CountHashWriter) Count ¶

func (c *CountHashWriter) Count() int

Count returns the number of bytes written

func (*CountHashWriter) Sum32 ¶

func (c *CountHashWriter) Sum32() uint32

Sum32 returns the CRC-32 hash of the content written to this writer

func (*CountHashWriter) Write ¶

func (c *CountHashWriter) Write(b []byte) (int, error)

Write writes the provided bytes to the wrapped writer and counts the bytes

type Dictionary ¶

type Dictionary struct {
	// contains filtered or unexported fields
}

Dictionary is the zap representation of the term dictionary

func (*Dictionary) AutomatonIterator ¶

func (d *Dictionary) AutomatonIterator(a segment.Automaton,
	startKeyInclusive, endKeyExclusive []byte) segment.DictionaryIterator

AutomatonIterator returns an iterator which only visits terms having the the vellum automaton and start/end key range

func (*Dictionary) BytesRead ¶

func (d *Dictionary) BytesRead() uint64

func (*Dictionary) BytesWritten ¶

func (d *Dictionary) BytesWritten() uint64

func (*Dictionary) Cardinality ¶ added in v16.2.0

func (d *Dictionary) Cardinality() int

func (*Dictionary) Contains ¶

func (d *Dictionary) Contains(key []byte) (bool, error)

func (*Dictionary) PostingsList ¶

func (d *Dictionary) PostingsList(term []byte, except *roaring.Bitmap,
	prealloc segment.PostingsList) (segment.PostingsList, error)

PostingsList returns the postings list for the specified term

func (*Dictionary) ResetBytesRead ¶

func (d *Dictionary) ResetBytesRead(val uint64)

type DictionaryIterator ¶

type DictionaryIterator struct {
	// contains filtered or unexported fields
}

DictionaryIterator is an iterator for term dictionary

func (*DictionaryIterator) Next ¶

func (i *DictionaryIterator) Next() (*index.DictEntry, error)

Next returns the next entry in the dictionary

type Location ¶

type Location struct {
	// contains filtered or unexported fields
}

Location represents the location of a single occurrence

func (*Location) ArrayPositions ¶

func (l *Location) ArrayPositions() []uint64

ArrayPositions returns the array position vector associated with this occurrence

func (*Location) End ¶

func (l *Location) End() uint64

End returns the end byte offset of this occurrence

func (*Location) Field ¶

func (l *Location) Field() string

Field returns the name of the field (useful in composite fields to know which original field the value came from)

func (*Location) Pos ¶

func (l *Location) Pos() uint64

Pos returns the 1-based phrase position of this occurrence

func (*Location) Size ¶

func (l *Location) Size() int

func (*Location) Start ¶

func (l *Location) Start() uint64

Start returns the start byte offset of this occurrence

type MetaData ¶

type MetaData struct {
	DocNum      uint64 // docNum of the data inside the chunk
	DocDvOffset uint64 // offset of data inside the chunk for the given docid
}

MetaData represents the data information inside a chunk.

type Posting ¶

type Posting struct {
	// contains filtered or unexported fields
}

Posting is a single entry in a postings list

func (*Posting) Frequency ¶

func (p *Posting) Frequency() uint64

Frequency returns the frequencies of occurrence of this term in this doc/field

func (*Posting) Locations ¶

func (p *Posting) Locations() []segment.Location

Locations returns the location information for each occurrence

func (*Posting) Norm ¶

func (p *Posting) Norm() float64

Norm returns the normalization factor for this posting

func (*Posting) NormUint64 ¶

func (p *Posting) NormUint64() uint64

NormUint64 returns the norm value as uint64

func (*Posting) Number ¶

func (p *Posting) Number() uint64

Number returns the document number of this posting in this segment

func (*Posting) Size ¶

func (p *Posting) Size() int

type PostingsIterator ¶

type PostingsIterator struct {
	Actual   roaring.IntPeekable
	ActualBM *roaring.Bitmap
	// contains filtered or unexported fields
}

PostingsIterator provides a way to iterate through the postings list

func (*PostingsIterator) ActualBitmap ¶

func (p *PostingsIterator) ActualBitmap() *roaring.Bitmap

ActualBitmap returns the underlying actual bitmap which can be used up the stack for optimizations

func (*PostingsIterator) Advance ¶

func (i *PostingsIterator) Advance(docNum uint64) (segment.Posting, error)

Advance returns the posting at the specified docNum or it is not present the next posting, or if the end is reached, nil

func (*PostingsIterator) BytesRead ¶

func (i *PostingsIterator) BytesRead() uint64

func (*PostingsIterator) BytesWritten ¶

func (i *PostingsIterator) BytesWritten() uint64

func (*PostingsIterator) DocNum1Hit ¶

func (p *PostingsIterator) DocNum1Hit() (uint64, bool)

DocNum1Hit returns the docNum and true if this is "1-hit" optimized and the docNum is available.

func (*PostingsIterator) Next ¶

func (i *PostingsIterator) Next() (segment.Posting, error)

Next returns the next posting on the postings list, or nil at the end

func (*PostingsIterator) ReplaceActual ¶

func (p *PostingsIterator) ReplaceActual(abm *roaring.Bitmap)

ReplaceActual replaces the ActualBM with the provided bitmap

func (*PostingsIterator) ResetBytesRead ¶

func (i *PostingsIterator) ResetBytesRead(val uint64)

Implements the segment.DiskStatsReporter interface The purpose of this implementation is to get the bytes read from the disk which includes the freqNorm and location specific information of a hit

func (*PostingsIterator) Size ¶

func (i *PostingsIterator) Size() int

type PostingsList ¶

type PostingsList struct {
	// contains filtered or unexported fields
}

PostingsList is an in-memory representation of a postings list

func (*PostingsList) BytesRead ¶

func (p *PostingsList) BytesRead() uint64

func (*PostingsList) BytesWritten ¶

func (p *PostingsList) BytesWritten() uint64

func (*PostingsList) Count ¶

func (p *PostingsList) Count() uint64

Count returns the number of items on this postings list

func (*PostingsList) Iterator ¶

func (p *PostingsList) Iterator(includeFreq, includeNorm, includeLocs bool,
	prealloc segment.PostingsIterator) segment.PostingsIterator

Iterator returns an iterator for this postings list

func (*PostingsList) OrInto ¶

func (p *PostingsList) OrInto(receiver *roaring.Bitmap)

func (*PostingsList) ResetBytesRead ¶

func (p *PostingsList) ResetBytesRead(val uint64)

Implements the segment.DiskStatsReporter interface The purpose of this implementation is to get the bytes read from the postings lists stored on disk, while querying

func (*PostingsList) Size ¶

func (p *PostingsList) Size() int

type Segment ¶

type Segment struct {
	SegmentBase
	// contains filtered or unexported fields
}

Segment implements a persisted segment.Segment interface, by embedding an mmap()'ed SegmentBase.

func (*Segment) AddRef ¶

func (s *Segment) AddRef()

func (*Segment) BytesRead ¶

func (s *Segment) BytesRead() uint64

func (*Segment) BytesWritten ¶

func (s *Segment) BytesWritten() uint64

func (*Segment) CRC ¶

func (s *Segment) CRC() uint32

CRC returns the CRC value stored in the file footer

func (*Segment) ChunkMode ¶

func (s *Segment) ChunkMode() uint32

ChunkFactor returns the chunk factor in the file footer

func (*Segment) Close ¶

func (s *Segment) Close() (err error)

Close releases all resources associated with this segment

func (*Segment) Data ¶

func (s *Segment) Data() []byte

Data returns the underlying mmaped data slice

func (*Segment) DecRef ¶

func (s *Segment) DecRef() (err error)

func (*Segment) DictAddr ¶

func (s *Segment) DictAddr(field string) (uint64, error)

DictAddr is a helper function to compute the file offset where the dictionary is stored for the specified field.

func (*Segment) DocValueOffset ¶

func (s *Segment) DocValueOffset() uint64

DocValueOffset returns the docValue offset in the file footer

func (*Segment) FieldsIndexOffset ¶

func (s *Segment) FieldsIndexOffset() uint64

FieldsIndexOffset returns the fields index offset in the file footer

func (*Segment) NumDocs ¶

func (s *Segment) NumDocs() uint64

NumDocs returns the number of documents in the file footer

func (*Segment) Path ¶

func (s *Segment) Path() string

Path returns the path of this segment on disk

func (*Segment) ResetBytesRead ¶

func (s *Segment) ResetBytesRead(val uint64)

Implements the segment.DiskStatsReporter interface Only the persistedSegment type implments the interface, as the intention is to retrieve the bytes read from the on-disk segment as part of the current query.

func (*Segment) Size ¶

func (s *Segment) Size() int

func (*Segment) StoredIndexOffset ¶

func (s *Segment) StoredIndexOffset() uint64

StoredIndexOffset returns the stored value index offset in the file footer

func (*Segment) ThesaurusAddr ¶ added in v16.2.0

func (s *Segment) ThesaurusAddr(name string) (uint64, error)

ThesaurusAddr is a helper function to compute the file offset where the thesaurus is stored with the specified name.

func (*Segment) Version ¶

func (s *Segment) Version() uint32

Version returns the file version in the file footer

type SegmentBase ¶

type SegmentBase struct {
	// contains filtered or unexported fields
}

SegmentBase is a memory only, read-only implementation of the segment.Segment interface, using zap's data representation.

func InitSegmentBase ¶

func InitSegmentBase(mem []byte, memCRC uint32, chunkMode uint32, numDocs uint64,
	storedIndexOffset uint64, sectionsIndexOffset uint64) (*SegmentBase, error)

func (*SegmentBase) AddRef ¶

func (sb *SegmentBase) AddRef()

func (*SegmentBase) BytesRead ¶

func (s *SegmentBase) BytesRead() uint64

func (*SegmentBase) BytesWritten ¶

func (s *SegmentBase) BytesWritten() uint64

func (*SegmentBase) Close ¶

func (sb *SegmentBase) Close() (err error)

func (*SegmentBase) Count ¶

func (s *SegmentBase) Count() uint64

Count returns the number of documents in this segment.

func (*SegmentBase) DecRef ¶

func (sb *SegmentBase) DecRef() (err error)

func (*SegmentBase) Dictionary ¶

func (s *SegmentBase) Dictionary(field string) (segment.TermDictionary, error)

Dictionary returns the term dictionary for the specified field

func (*SegmentBase) DocID ¶

func (s *SegmentBase) DocID(num uint64) ([]byte, error)

DocID returns the value of the _id field for the given docNum

func (*SegmentBase) DocNumbers ¶

func (s *SegmentBase) DocNumbers(ids []string) (*roaring.Bitmap, error)

DocNumbers returns a bitset corresponding to the doc numbers of all the provided _id strings

func (*SegmentBase) Fields ¶

func (s *SegmentBase) Fields() []string

Fields returns the field names used in this segment

func (*SegmentBase) Persist ¶

func (sb *SegmentBase) Persist(path string) error

func (*SegmentBase) ResetBytesRead ¶

func (s *SegmentBase) ResetBytesRead(val uint64)

func (*SegmentBase) Size ¶

func (sb *SegmentBase) Size() int

func (*SegmentBase) Thesaurus ¶ added in v16.2.0

func (s *SegmentBase) Thesaurus(name string) (segment.Thesaurus, error)

Thesaurus returns the thesaurus with the specified name, or an empty thesaurus if not found.

func (*SegmentBase) VisitDocValues ¶

func (s *SegmentBase) VisitDocValues(localDocNum uint64, fields []string,
	visitor index.DocValueVisitor, dvsIn segment.DocVisitState) (
	segment.DocVisitState, error)

VisitDocValues is an implementation of the DocValueVisitable interface

func (*SegmentBase) VisitStoredFields ¶

func (s *SegmentBase) VisitStoredFields(num uint64, visitor segment.StoredFieldValueVisitor) error

VisitStoredFields invokes the StoredFieldValueVisitor for each stored field for the specified doc number

func (*SegmentBase) VisitableDocValueFields ¶

func (s *SegmentBase) VisitableDocValueFields() ([]string, error)

VisitableDocValueFields returns the list of fields with persisted doc value terms ready to be visitable using the VisitDocumentFieldTerms method.

func (*SegmentBase) WriteTo ¶

func (sb *SegmentBase) WriteTo(w io.Writer) (int64, error)

WriteTo is an implementation of io.WriterTo interface.

type Synonym ¶ added in v16.2.0

type Synonym struct {
	// contains filtered or unexported fields
}

Synonym represents a single synonym, containing the term, synonymID, and document number.

func (*Synonym) Number ¶ added in v16.2.0

func (s *Synonym) Number() uint32

Number returns the document number of the Synonym.

func (*Synonym) Size ¶ added in v16.2.0

func (p *Synonym) Size() int

Size returns the memory size of the Synonym, including the length of the term string.

func (*Synonym) Term ¶ added in v16.2.0

func (s *Synonym) Term() string

Term returns the term of the Synonym.

type SynonymsIterator ¶ added in v16.2.0

type SynonymsIterator struct {
	Actual   roaring64.IntPeekable64
	ActualBM *roaring64.Bitmap
	// contains filtered or unexported fields
}

SynonymsIterator provides a way to iterate through the synonyms list.

func (*SynonymsIterator) Next ¶ added in v16.2.0

func (i *SynonymsIterator) Next() (segment.Synonym, error)

Next returns the next Synonym in the iteration or an error if the end is reached.

func (*SynonymsIterator) Size ¶ added in v16.2.0

func (i *SynonymsIterator) Size() int

type SynonymsList ¶ added in v16.2.0

type SynonymsList struct {
	// contains filtered or unexported fields
}

SynonymsList represents a list of synonyms for a term, stored in a Roaring64 bitmap.

func (*SynonymsList) Iterator ¶ added in v16.2.0

func (s *SynonymsList) Iterator(prealloc segment.SynonymsIterator) segment.SynonymsIterator

Iterator creates and returns a SynonymsIterator for the SynonymsList. If the synonyms bitmap is nil, it returns an empty iterator.

func (*SynonymsList) Size ¶ added in v16.2.0

func (p *SynonymsList) Size() int

type Thesaurus ¶ added in v16.2.0

type Thesaurus struct {
	// contains filtered or unexported fields
}

Thesaurus is the zap representation of a Thesaurus

func (*Thesaurus) AutomatonIterator ¶ added in v16.2.0

func (t *Thesaurus) AutomatonIterator(a segment.Automaton,
	startKeyInclusive, endKeyExclusive []byte) segment.ThesaurusIterator

AutomatonIterator returns an iterator which only visits terms having the the vellum automaton and start/end key range

func (*Thesaurus) Contains ¶ added in v16.2.0

func (t *Thesaurus) Contains(key []byte) (bool, error)

func (*Thesaurus) SynonymsList ¶ added in v16.2.0

func (t *Thesaurus) SynonymsList(term []byte, except *roaring.Bitmap, prealloc segment.SynonymsList) (segment.SynonymsList, error)

SynonymsList returns the synonyms list for the specified term

type ThesaurusIterator ¶ added in v16.2.0

type ThesaurusIterator struct {
	// contains filtered or unexported fields
}

ThesaurusIterator is an iterator for term dictionary

func (*ThesaurusIterator) Next ¶ added in v16.2.0

func (i *ThesaurusIterator) Next() (*index.ThesaurusEntry, error)

Next returns the next entry in the dictionary

type ZapPlugin ¶

type ZapPlugin struct{}

ZapPlugin implements the Plugin interface of the blevesearch/scorch_segment_api pkg

func (*ZapPlugin) Merge ¶

func (*ZapPlugin) Merge(segments []seg.Segment, drops []*roaring.Bitmap, path string,
	closeCh chan struct{}, s seg.StatsReporter) (
	[][]uint64, uint64, error)

Merge takes a slice of segments and bit masks describing which documents may be dropped, and creates a new segment containing the remaining data. This new segment is built at the specified path.

func (*ZapPlugin) New ¶

func (z *ZapPlugin) New(results []index.Document) (
	segment.Segment, uint64, error)

New creates an in-memory zap-encoded SegmentBase from a set of Documents

func (*ZapPlugin) Open ¶

func (*ZapPlugin) Open(path string) (segment.Segment, error)

Open returns a zap impl of a segment

func (*ZapPlugin) Type ¶

func (*ZapPlugin) Type() string

func (*ZapPlugin) Version ¶

func (*ZapPlugin) Version() uint32

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
zap
zap/cmd

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL