ranger

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 8, 2023 License: MIT Imports: 7 Imported by: 2

README

RANGER

Download large files in parallel chunks in Go.

Why?

Current Go HTTP clients download files as a stream of bytes, usually with a buffer. If you consider every file an array of bytes, this means that when you initiate a download a connection is opened, and you receive an io.Reader. As you Read bytes off this Reader, more bytes are loaded up into an internal buffer (an in-memory byte array that stores a certain amount of data in the expectation that you'll read it soon). As you keep reading data, the HTTP client will fill the buffer up as fast as it can from the server.

So? Why is this a problem?

Most of the time this is what we want and need. But when we're downloading large files (say from Amazon S3 or CloudFront, or any other file-server) this is usually not optimal. These services have per-connection speed limits on the bytes going out, and if you're downloading a very large file (say over 25 GB) you're also not likely to be using the caches. This means that the number of bytes coming in per second (bandwidth) is usually lower than what your connection actually supports.

What does Ranger do?

Ranger does two orthogonal things to speed up transfers — one, it downloads files in chunks: so if there are 1000 bytes, for example, it can download the file in chunks of 100 bytes, by requesting byte range 0-99, 100-199, 200-299 and so on using an HTTP RANGE GET. This allows the service to cache each chunk, because even if the total file size is too large to cache, each chunk is still small enough to fit in. See the CloudFront Developer Guide for more information.

Two, it downloads upcoming chunks in parallel, so if the parallelism count is set at 3, in the example above it would download byte ranges 0-99, 100-199 and 200-299 in parallel, even while the first range is being Read. It will also start downloading the fourth range after the first one is read, and so on. This allows trading RAM for speed - deciding to dedicate 3 x 100 bytes of memory allows downloads to go on that much faster. In practice, 8MB to 16MB is a good chunk size, especially if that lines up with the multipart upload boundaries in a system like S3. See the S3 Developer Guide for more information.

Usage & Integration

The lowest-level usage is to create a new Ranger with chunk size and parallelism, and a fetcher function passed in. When the Ranger is invoked with a file length, it calls the fetch function with the byte range, in parallel, and collects and orders the resulting chunk Readers. This is a low level API that you can use if you have a custom protocol to fetch data.

For regular use, Ranger provides an HTTPClient wrapper that wraps any other HTTPClient. Calls made to GET files will use the wrapped client to check file length using a HEAD request, and then range over the chunks using the given chunk size and parallelism parameters.

This means that Ranger integrates well on both sides - it can be passed as a custom HTTPClient to download managers like Grab, while wrapping other HTTPClients that provide automatic retry and other features like Heimdall or go-retryablehttp.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ByteRange

type ByteRange struct {
	From int64
	To   int64
}

func (ByteRange) Header

func (r ByteRange) Header() string

func (ByteRange) Length

func (r ByteRange) Length() int64

type Chunk

type Chunk struct {
	Loader    Loader
	ByteRange ByteRange
}

func (*Chunk) Load

func (c *Chunk) Load() ([]byte, error)

type HTTPClient

type HTTPClient interface {
	Do(req *http.Request) (*http.Response, error)
}

HTTPClient provides an interface allowing us to perform HTTP requests.

type Loader

type Loader interface {
	Load(br ByteRange) ([]byte, error)
}

Loader implements a Load method that provides data as byte slice for a given byte range chunk.

Load should be safe to call from multiple goroutines.

If err is nil, the returned byte slice must always exactly as many bytes as was asked for, i.e. len([]byte) returned must always be equal to br.Length().

func WrapLoaderWithLRUCache

func WrapLoaderWithLRUCache(loader Loader, slots int) Loader

WrapLoaderWithLRUCache wraps a loader to cache the results returned by the inner loader in an LRU cache of the given slot count. For best results, wrap the returned Loader with WrapLoaderWithSingleFlight to make sure multiple calls are not make while the cache is being filled. If the given slots count is negative, zero is used.

func WrapLoaderWithSingleFlight

func WrapLoaderWithSingleFlight(loader Loader) Loader

WrapLoaderWithSingleFlight wraps a Loader to ensure that only one call at a time for a given byte range is made to the wrapped loader.

type LoaderFunc

type LoaderFunc func(br ByteRange) ([]byte, error)

LoaderFunc converts a Load function into a Loader type.

func (LoaderFunc) Load

func (l LoaderFunc) Load(br ByteRange) ([]byte, error)

type RangedSource

type RangedSource struct {
	// contains filtered or unexported fields
}

func NewRangedSource

func NewRangedSource(length int64, loader Loader, ranger Ranger) RangedSource

func (RangedSource) ReadAt

func (r RangedSource) ReadAt(p []byte, off int64) (n int, err error)

func (RangedSource) Reader

func (r RangedSource) Reader() ReaderSeekerReadAt

type Ranger

type Ranger struct {
	// contains filtered or unexported fields
}

func NewRanger

func NewRanger(chunkSize int64) Ranger

func (Ranger) Index

func (r Ranger) Index(i int64) int

func (Ranger) Ranges

func (r Ranger) Ranges(length int64) []ByteRange

type RangingHTTPClient

type RangingHTTPClient struct {
	HTTPClient
	// contains filtered or unexported fields
}

RangingHTTPClient wraps another HTTP client to issue all requests based on the Ranges provided.

func NewRangingHTTPClient

func NewRangingHTTPClient(ranger Ranger, client HTTPClient, parallelism int) RangingHTTPClient

func (RangingHTTPClient) Do

func (rhc RangingHTTPClient) Do(req *http.Request) (*http.Response, error)

type ReaderSeekerReadAt

type ReaderSeekerReadAt interface {
	io.Reader
	io.Seeker
	io.ReaderAt
	Size() int64
}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL