extsort

package
v0.0.0-...-547a3fa Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2024 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Overview

Package extsort implements external sorting algorithm, it has several differnet design choices compared with alternatives like https://github.com/lanrat/extsort:

  • apply efficient compressions(delta encoding + snappy) to the chunk files to reduce IO cost, since the items are sorted, delta encoding should be effective to it, and snappy is pretty efficient.
  • chunks are stored in separate temporary files, so the chunk sorting and saving can run in parallel (eats more ram though).
  • clean interface, user just feed `[]byte` directly, and provides a compare function based on `[]byte`.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Spawn

func Spawn(tmpDir string, opts Options, bufferSize int) (chan []byte, chan []byte)

Spawn spawns a new goroutine to do the external sorting, returns two channels for user to interact.

Types

type DeltaDecoder

type DeltaDecoder struct {
	// contains filtered or unexported fields
}

DeltaDecoder decodes delta-encoded keys

func NewDeltaDecoder

func NewDeltaDecoder() *DeltaDecoder

func (*DeltaDecoder) Read

func (dd *DeltaDecoder) Read(
	reader reader,
) ([]byte, error)

type DeltaEncoder

type DeltaEncoder struct {
	// contains filtered or unexported fields
}

DeltaEncoder applies delta encoding to a sequence of keys

func NewDeltaEncoder

func NewDeltaEncoder() *DeltaEncoder

func (*DeltaEncoder) Write

func (de *DeltaEncoder) Write(w io.Writer, key []byte) error

type ExtSorter

type ExtSorter struct {
	// contains filtered or unexported fields
}

ExtSorter implements external sorting. It split the inputs into chunks, sort each chunk separately and save to disk file, then provide an iterator of sorted items by doing a k-way merge with all the sorted chunk files.

func New

func New(tmpDir string, opts Options) *ExtSorter

New creates a new `ExtSorter`.

func (*ExtSorter) Close

func (s *ExtSorter) Close() error

Close closes and remove all the temporary chunk files

func (*ExtSorter) Feed

func (s *ExtSorter) Feed(item []byte) error

Feed add un-ordered items to the sorter.

func (*ExtSorter) Finalize

func (s *ExtSorter) Finalize() (*MultiWayMerge, error)

Finalize wait for all chunking goroutines to finish, and return the merged sorted stream.

type LesserFunc

type LesserFunc func(a, b []byte) bool

LesserFunc compares two items in external sorter

type MultiWayMerge

type MultiWayMerge struct {
	// contains filtered or unexported fields
}

MultiWayMerge implements k-way merge using min-heap provided by golang builtin library, it implements `heap.Interface` interface.

func NewMultiWayMerge

func NewMultiWayMerge(streams []NextFunc, lesserFunc LesserFunc) (*MultiWayMerge, error)

NewMultiWayMerge initialize a new `MultiWayMerge` instance.

func (*MultiWayMerge) Len

func (merge *MultiWayMerge) Len() int

Len implements `heap.Interface`

func (*MultiWayMerge) Less

func (merge *MultiWayMerge) Less(i, j int) bool

Less implements `heap.Interface`

func (*MultiWayMerge) Next

func (merge *MultiWayMerge) Next() ([]byte, error)

Next provides an iterator that yields sorted items

func (*MultiWayMerge) Pop

func (merge *MultiWayMerge) Pop() interface{}

Pop implements `heap.Interface`

func (*MultiWayMerge) Push

func (merge *MultiWayMerge) Push(x interface{})

Push implements `heap.Interface`

func (*MultiWayMerge) Swap

func (merge *MultiWayMerge) Swap(i, j int)

Swap implements `heap.Interface`

type NextFunc

type NextFunc func() ([]byte, error)

NextFunc is a stream that yields bytes, it should return `nil` when stream is exhausted

type Options

type Options struct {
	// if apply delta encoding to the items in sorted chunk
	DeltaEncoding bool
	// if apply snappy compression to the items in sorted chunk
	SnappyCompression bool
	// Maxiumum uncompressed size of the sorted chunk
	MaxChunkSize int64
	// function that compares two items
	LesserFunc LesserFunc
}

Options defines configurable options for `ExtSorter`

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL