disksort

package
v0.0.24 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 16, 2016 License: Apache-2.0 Imports: 12 Imported by: 3

Documentation

Overview

Package disksort implements sorting algorithms for sets of data too large to fit fully in-memory. If the number of elements becomes to large, data are paged onto the disk.

Index

Constants

View Source
const DefaultIOBufferSize = 2 << 13

DefaultIOBufferSize is the default size of the reading/writing buffers for the temporary file shards.

View Source
const DefaultMaxInMemory = 32000

DefaultMaxInMemory is the default number of elements to keep in-memory during a merge sort.

Variables

View Source
var (
	// ErrAlreadyFinalized is returned from Interface#Add and Interface#Read when
	// Interface#Read has already been called, freezing the sort's inputs/outputs.
	ErrAlreadyFinalized = errors.New("sorter already finalized")
)

Functions

This section is empty.

Types

type Interface

type Interface interface {
	// Add adds a new element to the set of data to be sorted.
	Add(i interface{}) error

	// Iterator returns an Iterator to read each of the elements previously added
	// to the set of data to be sorted.  Once Iterator is called, no more data may
	// be added to the sorter.  This is a lower-level version of Read.  Iterator
	// and Read may only be called once.
	Iterator() (Iterator, error)

	// Read calls f on each elements previously added the set of data to be
	// sorted.  If f returns an error, it is returned immediately and f is no
	// longer called.  Once Read is called, no more data may be added to the
	// sorter.  Iterator and Read may only be called once.
	Read(f func(interface{}) error) error
}

Interface is the standard interface for disk sorting algorithms. Each element in the set of data to be sorted is added to the sorter with Add. Once all elements are added, Read can then be called to retrieve each element sequentially in sorted order. Once Read is called, no other operations on the sorter are allowed.

func NewMergeSorter

func NewMergeSorter(opts MergeOptions) (Interface, error)

NewMergeSorter returns a new disk sorter using a mergesort algorithm.

type Iterator added in v0.0.22

type Iterator interface {
	// Next returns the next ordered element.  If none exist, an io.EOF error is
	// returned.
	Next() (interface{}, error)

	// Close releases all of the Iterator's used resources.  Each Iterator must be
	// closed after the client's last call to Next or stray temporary files may be
	// left on disk.
	Close() error
}

An Iterator reads each element, in order, in a sorted dataset.

type Marshaler

type Marshaler interface {
	// Marshal binary encodes the given element.
	Marshal(interface{}) ([]byte, error)

	// Unmarshal decodes the given encoding of an element.
	Unmarshal([]byte) (interface{}, error)
}

Marshaler is an interface to functions that can binary encode/decode elements.

type MergeOptions

type MergeOptions struct {
	// Lesser is the comparison function for sorting the given elements.
	Lesser sortutil.Lesser
	// Marshaler is used for encoding/decoding elements in temporary file shards.
	Marshaler Marshaler

	// WorkDir is the directory used for writing temporary file shards.  If empty,
	// the default directory for temporary files is used.
	WorkDir string

	// MaxInMemory is the maximum number of elements to keep in-memory before
	// paging them to a temporary file shard.  If non-positive, DefaultMaxInMemory
	// is used.
	MaxInMemory int

	// CompressShards determines whether the temporary file shards should be
	// compressed.
	CompressShards bool

	// IOBufferSize is the size of the reading/writing buffers for the temporary
	// file shards.  If non-positive, DefaultIOBufferSize is used.
	IOBufferSize int
}

MergeOptions specifies how to sort elements.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL