Documentation ¶
Overview ¶
Package extsort implements external sorting algorithm, it has several differnet design choices compared with alternatives like https://github.com/lanrat/extsort:
- apply efficient compressions(delta encoding + snappy) to the chunk files to reduce IO cost, since the items are sorted, delta encoding should be effective to it, and snappy is pretty efficient.
- chunks are stored in separate temporary files, so the chunk sorting and saving can run in parallel (eats more ram though).
- clean interface, user just feed `[]byte` directly, and provides a compare function based on `[]byte`.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type DeltaDecoder ¶
type DeltaDecoder struct {
// contains filtered or unexported fields
}
DeltaDecoder decodes delta-encoded keys
func NewDeltaDecoder ¶
func NewDeltaDecoder() *DeltaDecoder
func (*DeltaDecoder) Read ¶
func (dd *DeltaDecoder) Read( reader reader, ) ([]byte, error)
type DeltaEncoder ¶
type DeltaEncoder struct {
// contains filtered or unexported fields
}
DeltaEncoder applies delta encoding to a sequence of keys
func NewDeltaEncoder ¶
func NewDeltaEncoder() *DeltaEncoder
type ExtSorter ¶
type ExtSorter struct {
// contains filtered or unexported fields
}
ExtSorter implements external sorting. It split the inputs into chunks, sort each chunk separately and save to disk file, then provide an iterator of sorted items by doing a k-way merge with all the sorted chunk files.
func (*ExtSorter) Finalize ¶
func (s *ExtSorter) Finalize() (*MultiWayMerge, error)
Finalize wait for all chunking goroutines to finish, and return the merged sorted stream.
type LesserFunc ¶
LesserFunc compares two items in external sorter
type MultiWayMerge ¶
type MultiWayMerge struct {
// contains filtered or unexported fields
}
MultiWayMerge implements k-way merge using min-heap provided by golang builtin library, it implements `heap.Interface` interface.
func NewMultiWayMerge ¶
func NewMultiWayMerge(streams []NextFunc, lesserFunc LesserFunc) (*MultiWayMerge, error)
NewMultiWayMerge initialize a new `MultiWayMerge` instance.
func (*MultiWayMerge) Less ¶
func (merge *MultiWayMerge) Less(i, j int) bool
Less implements `heap.Interface`
func (*MultiWayMerge) Next ¶
func (merge *MultiWayMerge) Next() ([]byte, error)
Next provides an iterator that yields sorted items
func (*MultiWayMerge) Pop ¶
func (merge *MultiWayMerge) Pop() interface{}
Pop implements `heap.Interface`
func (*MultiWayMerge) Push ¶
func (merge *MultiWayMerge) Push(x interface{})
Push implements `heap.Interface`
func (*MultiWayMerge) Swap ¶
func (merge *MultiWayMerge) Swap(i, j int)
Swap implements `heap.Interface`
type NextFunc ¶
NextFunc is a stream that yields bytes, it should return `nil` when stream is exhausted
type Options ¶
type Options struct { // if apply delta encoding to the items in sorted chunk DeltaEncoding bool // if apply snappy compression to the items in sorted chunk SnappyCompression bool // Maxiumum uncompressed size of the sorted chunk MaxChunkSize int64 // function that compares two items LesserFunc LesserFunc }
Options defines configurable options for `ExtSorter`