extsort

package
v1.1.0-beta.0...-a3cc774 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2025 License: Apache-2.0 Imports: 24 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DiskSorter

type DiskSorter struct {
	// contains filtered or unexported fields
}

DiskSorter is an external sorter that sorts data on disk.

func OpenDiskSorter

func OpenDiskSorter(dirname string, opts *DiskSorterOptions) (*DiskSorter, error)

OpenDiskSorter opens a DiskSorter with the given directory.

func (*DiskSorter) Close

func (d *DiskSorter) Close() error

Close implements the ExternalSorter.Close.

func (*DiskSorter) CloseAndCleanup

func (d *DiskSorter) CloseAndCleanup() error

CloseAndCleanup implements the ExternalSorter.CloseAndCleanup.

func (*DiskSorter) IsSorted

func (d *DiskSorter) IsSorted() bool

IsSorted implements the ExternalSorter.IsSorted.

func (*DiskSorter) NewIterator

func (d *DiskSorter) NewIterator(_ context.Context) (Iterator, error)

NewIterator implements the ExternalSorter.NewIterator.

func (*DiskSorter) NewWriter

func (d *DiskSorter) NewWriter(_ context.Context) (Writer, error)

NewWriter implements the ExternalSorter.NewWriter.

func (*DiskSorter) Sort

func (d *DiskSorter) Sort(ctx context.Context) error

Sort implements the ExternalSorter.Sort.

type DiskSorterOptions

type DiskSorterOptions struct {
	// Concurrency is the maximum number of goroutines that can be used to
	// sort data in parallel.
	//
	// The default value is runtime.GOMAXPROCS(0).
	Concurrency int

	// WriterBufferSize is the size of the buffer used by the writer.
	// Larger buffer size can improve the write and sort performance,
	// and reduce the number of disk operations.
	//
	// The default value is 128MB.
	WriterBufferSize int

	// CompactionThreshold is maximum overlap depth necessary to trigger a
	// compaction. The overlap depth is the number of files that overlap at
	// same interval.
	//
	// For example, consider the following files:
	//
	// file 0: a-----d
	// file 1:   b-----e
	// file 2:     c-------g
	// file 3:       d---f
	//
	// The overlap depth of these files is 3, because file 0, 1, 2 overlap at
	// the interval [c, d), and file 1, 2, 3 overlap at the interval [d, e).
	//
	// If the overlap depth reached CompactionThreshold, the sorter will compact
	// files to reduce the overlap depth during sorting. The larger the overlap
	// depth, the larger read amplification will be during iteration. This is a
	// trade-off between read amplification and sorting cost. Setting this value
	// to math.MaxInt will disable the compaction.
	//
	// The default value is 16.
	CompactionThreshold int

	// MaxCompactionDepth is the maximum files involved in a single compaction.
	// The minimum value is 2. Any value less than 2 will be treated as not set.
	//
	// The default value is 64.
	MaxCompactionDepth int

	// MaxCompactionSize is the maximum size of key-value pairs involved in a
	// single compaction.
	//
	// The default value is 512MB.
	MaxCompactionSize int

	// Logger is used to write log messages.
	//
	// The default value is log.L().
	Logger *zap.Logger
}

DiskSorterOptions holds the optional parameters for DiskSorter.

type ExternalSorter

type ExternalSorter interface {
	// NewWriter creates a new writer for writing key-value pairs before sorting.
	// If the sorter starts sorting or is already sorted, it will return an error.
	//
	// Multiple writers can be opened and used concurrently.
	NewWriter(ctx context.Context) (Writer, error)
	// Sort sorts the key-value pairs written by the writer.
	// It should be called after all open writers are closed.
	//
	// Implementations should guarantee that Sort() is idempotent and atomic.
	// If it returns an error, or process is killed during Sort(), the sorter should be able
	// to recover from the error or crash, and the external storage should not be corrupted.
	Sort(ctx context.Context) error
	// IsSorted returns true if the sorter is already sorted, iterators are ready to create.
	IsSorted() bool
	// NewIterator creates a new iterator for iterating over the key-value pairs after sorting.
	// If the sorter is not sorted yet, it will return an error.
	//
	// Multiple iterators can be opened and used concurrently.
	NewIterator(ctx context.Context) (Iterator, error)
	// Close releases all resources held by the sorter. It will not clean up the external storage,
	// so the sorter can recover from a crash.
	Close() error
	// CloseAndCleanup closes the external sorter and cleans up all resources created by the sorter.
	CloseAndCleanup() error
}

ExternalSorter is an interface for sorting key-value pairs in external storage. The key-value pairs are sorted by the key, duplicate keys are automatically removed.

type Iterator

type Iterator interface {
	// Seek moves the iterator to the first key-value pair whose key is greater
	// than or equal to the given key.
	Seek(key []byte) bool
	// First moves the iterator to the first key-value pair.
	First() bool
	// Next moves the iterator to the next key-value pair.
	//
	// Implementations must guarantee the next key is greater than the current key.
	Next() bool
	// Last moves the iterator to the last key-value pair.
	Last() bool
	// Valid returns true if the iterator is positioned at a valid key-value pair.
	Valid() bool
	// Error returns the error, if any, that was encountered during iteration.
	Error() error
	// UnsafeKey returns the key of the current key-value pair, without copying.
	// The memory is only valid until the next call to change the iterator.
	UnsafeKey() []byte
	// UnsafeValue returns the value of the current key-value pair, without copying.
	// The memory is only valid until the next call to change the iterator.
	UnsafeValue() []byte
	// Close releases all resources held by the iterator.
	Close() error
}

Iterator is an interface for iterating over the key-value pairs after sorting.

type Writer

type Writer interface {
	// Put adds a key-value pair to the writer.
	//
	// Implementations must not modify or retain slices passed to Put().
	Put(key, value []byte) error
	// Flush flushes all buffered key-value pairs to the external sorter.
	// the writer can be reused after calling Flush().
	Flush() error
	// Close flushes all buffered key-value pairs to the external sorter,
	// and releases all resources held by the writer.
	Close() error
}

Writer is an interface for writing key-value pairs to the external sorter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL