ranger

package module

v0.5.0 Latest Latest Go to latest Published: Jun 21, 2023 License: MIT Imports: 13 Imported by: 2

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/sudhirj/ranger

README ¶

RANGER

Download large files in parallel chunks in Go.

Why?

Current Go HTTP clients download files as a stream of bytes, usually with a buffer. If you consider every file an array of bytes, this means that when you initiate a download a connection is opened, and you receive an io.Reader. As you Read bytes off this Reader, more bytes are loaded up into an internal buffer (an in-memory byte array that stores a certain amount of data in the expectation that you'll read it soon). As you keep reading data, the HTTP client will fill the buffer up as fast as it can from the server.

So? Why is this a problem?

Most of the time this is what we want and need. But when we're downloading large files (say from Amazon S3 or CloudFront, or any other file-server) this is usually not optimal. These services have per-connection speed limits on the bytes going out, and if you're downloading a very large file (say over 25 GB) you're also not likely to be using the caches. This means that the number of bytes coming in per second (bandwidth) is usually lower than what your connection actually supports.

What does Ranger do?

Ranger does two orthogonal things to speed up transfers — one, it downloads files in chunks: so if there are 1000 bytes, for example, it can download the file in chunks of 100 bytes, by requesting byte range 0-99, 100-199, 200-299 and so on using an HTTP RANGE GET. This allows the service to cache each chunk, because even if the total file size is too large to cache, each chunk is still small enough to fit in. See the CloudFront Developer Guide for more information.

Two, it downloads upcoming chunks in parallel, so if the parallelism count is set at 3, in the example above it would download byte ranges 0-99, 100-199 and 200-299 in parallel, even while the first range is being Read. It will also start downloading the fourth range after the first one is read, and so on. This allows trading RAM for speed - deciding to dedicate 3 x 100 bytes of memory allows downloads to go on that much faster. In practice, 8MB to 16MB is a good chunk size, especially if that lines up with the multipart upload boundaries in a system like S3. See the S3 Developer Guide for more information.

Usage & Integration

The lowest-level usage is to create a new Ranger with chunk size and parallelism, and a fetcher function passed in. When the Ranger is invoked with a file length, it calls the fetch function with the byte range, in parallel, and collects and orders the resulting chunk Readers. This is a low level API that you can use if you have a custom protocol to fetch data.

For regular use, RangingHTTPClient uses a given http.Client to fetch chunks as configured. RangingHTTPClient also exposes a standard http.Client via the RangingHTTPClient.StandardClient method. The returned client will fetch chunk ranges using the RangingHTTPClient.

This means that Ranger integrates well on both sides - Grab and other download managers can use a ranging client via a standard http.Client, while wrapping other HTTPClients that provide automatic retry, etc like Heimdall or go-retryablehttp.

Documentation ¶

Index ¶

func GetContentLength(url *url.URL, client *http.Client) (int64, error)
func GetContentLengthViaGET(url *url.URL, client *http.Client) (int64, error)
func GetContentLengthViaHEAD(url *url.URL, client *http.Client) (int64, error)
type ByteRange
- func (br ByteRange) Contains(offset int64) bool
- func (br ByteRange) Length() int64
- func (br ByteRange) RangeHeader() string
type Loader
- func HTTPLoader(url *url.URL, client *http.Client) Loader
- func NewSingleFlightLoader(loader Loader) Loader
type LoaderFunc
- func (l LoaderFunc) Load(br ByteRange) ([]byte, error)
type ParallelWriter
- func NewParallelWriter(length int64, loader Loader, ranger Ranger) *ParallelWriter
- func (pw ParallelWriter) WriteInto(w io.WriterAt, parallelism int) error
type RangedSource
- func NewRangedSource(length int64, loader Loader, ranger Ranger) RangedSource
- func (rs RangedSource) Ranges() []ByteRange
- func (rs RangedSource) Reader(parallelism int) RemoteReader
type Ranger
- func NewRanger(chunkSize int64) Ranger
- func (r Ranger) Index(i int64) int
- func (r Ranger) Ranges(length int64) []ByteRange
type RangingHTTPClient
- func NewRangingClient(ranger Ranger, client *http.Client, parallelism int) *RangingHTTPClient
- func (rhc RangingHTTPClient) Do(req *http.Request) (*http.Response, error)
- func (rhc *RangingHTTPClient) StandardClient() *http.Client
type RemoteReader
type RoundTripper
- func (rt *RoundTripper) RoundTrip(req *http.Request) (*http.Response, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func GetContentLength ¶ added in v0.2.0

func GetContentLength(url *url.URL, client *http.Client) (int64, error)

GetContentLength returns the content length of the given URL, using the given HTTPClient. It first attempts to use the HEAD method, but if that fails, falls back to using the GET method.

func GetContentLengthViaGET ¶ added in v0.2.0

func GetContentLengthViaGET(url *url.URL, client *http.Client) (int64, error)

GetContentLengthViaGET returns the content length of the given URL, using the given HTTPClient. It uses a GET request with a zeroed Range header to get the content length.

func GetContentLengthViaHEAD ¶ added in v0.2.0

func GetContentLengthViaHEAD(url *url.URL, client *http.Client) (int64, error)

GetContentLengthViaHEAD returns the content length of the given URL, using the given HTTPClient. It uses a HEAD request to get the content length.

Types ¶

type ByteRange ¶

type ByteRange struct {
	From int64
	To   int64
}

ByteRange represents a range of bytes available in a file

func (ByteRange) Contains ¶ added in v0.2.0

func (br ByteRange) Contains(offset int64) bool

func (ByteRange) Length ¶

func (br ByteRange) Length() int64

Length returns the length of the byte range.

func (ByteRange) RangeHeader ¶ added in v0.2.0

func (br ByteRange) RangeHeader() string

RangeHeader returns the HTTP header representation of the byte range, suitable for use in the Range header, as described in https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range

type Loader ¶

type Loader interface {
	Load(br ByteRange) ([]byte, error)
}

Loader implements a Load method that provides data as byte slice for a given byte range chunk.

`Load` should be safe to call from multiple goroutines.

If err is nil, the returned byte slice must always have exactly as many bytes as was asked for, i.e. `len([]byte)` returned must always be equal to `br.Ranges()`.

func HTTPLoader ¶ added in v0.2.0

func HTTPLoader(url *url.URL, client *http.Client) Loader

func NewSingleFlightLoader ¶ added in v0.4.0

func NewSingleFlightLoader(loader Loader) Loader

type LoaderFunc ¶

type LoaderFunc func(br ByteRange) ([]byte, error)

LoaderFunc converts a Load function into a Loader type.

func (LoaderFunc) Load ¶

func (l LoaderFunc) Load(br ByteRange) ([]byte, error)

type ParallelWriter ¶ added in v0.5.0

type ParallelWriter struct {
	Length int64
	Loader Loader
	Ranger Ranger
}

func NewParallelWriter ¶ added in v0.5.0

func NewParallelWriter(length int64, loader Loader, ranger Ranger) *ParallelWriter

func (ParallelWriter) WriteInto ¶ added in v0.5.0

func (pw ParallelWriter) WriteInto(w io.WriterAt, parallelism int) error

type RangedSource ¶

type RangedSource struct {
	// contains filtered or unexported fields
}

RangedSource represents a remote file that can be read in chunks using the given loader.

func NewRangedSource ¶

func NewRangedSource(length int64, loader Loader, ranger Ranger) RangedSource

func (RangedSource) Ranges ¶ added in v0.3.0

func (rs RangedSource) Ranges() []ByteRange

func (RangedSource) Reader ¶

func (rs RangedSource) Reader(parallelism int) RemoteReader

Reader returns an io.Reader that reads the data in parallel, using a number of goroutines equal to the given parallelism count. Data is still returned in order. The rangedReadSeekCloser will start reading at the given offset.

type Ranger ¶

type Ranger struct {
	// contains filtered or unexported fields
}

Ranger can split a file into chunks of a given size.

func NewRanger ¶

func NewRanger(chunkSize int64) Ranger

NewRanger creates a new Ranger with the given chunk size. If the chunk size is <= 0, the default chunk size is used.

func (Ranger) Index ¶

func (r Ranger) Index(i int64) int

Index returns the index of the chunk that contains the given offset.

func (Ranger) Ranges ¶

func (r Ranger) Ranges(length int64) []ByteRange

Ranges creates a list of byte ranges with the given chunk size.

type RangingHTTPClient ¶

type RangingHTTPClient struct {
	// contains filtered or unexported fields
}

RangingHTTPClient wraps another HTTP client to issue all requests in pre-defined chunks.

func NewRangingClient ¶ added in v0.3.0

func NewRangingClient(ranger Ranger, client *http.Client, parallelism int) *RangingHTTPClient

NewRangingClient wraps and uses the given http.Client to make requests only for chunks designated by the given Ranger, but does so in parallel with the given number of goroutines. This is useful for downloading large files from cache-friendly sources in manageable chunks, with the added speed benefits of parallelism.

func (RangingHTTPClient) Do ¶

func (rhc RangingHTTPClient) Do(req *http.Request) (*http.Response, error)

func (*RangingHTTPClient) StandardClient ¶ added in v0.4.0

func (rhc *RangingHTTPClient) StandardClient() *http.Client

StandardClient returns a standard HTTP client that wraps a ranging HTTP client.

type RemoteReader ¶ added in v0.3.0

type RemoteReader interface {
	io.Reader
	io.Seeker
	io.Closer
	io.ReaderAt
}

type RoundTripper ¶ added in v0.4.0

type RoundTripper struct {
	// The client to use during requests. If nil, a default ranging client is used.
	RangingClient *RangingHTTPClient
	// contains filtered or unexported fields
}

RoundTripper implements the http.RoundTripper interface, using a ranging HTTP client to execute requests.

func (*RoundTripper) RoundTrip ¶ added in v0.4.0

func (rt *RoundTripper) RoundTrip(req *http.Request) (*http.Response, error)

RoundTrip satisfies the http.RoundTripper interface.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL