s3gof3r

package module
v2.0.0+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2020 License: MIT Imports: 29 Imported by: 0

README

s3gof3r Build Status GoDoc

s3gof3r provides fast, parallelized, pipelined streaming access to Amazon S3. It includes a command-line interface: gof3r.

It is optimized for high speed transfer of large objects into and out of Amazon S3. Streaming support allows for usage like:

  $ tar -czf - <my_dir/> | gof3r put -b <s3_bucket> -k <s3_object>    
  $ gof3r get -b <s3_bucket> -k <s3_object> | tar -zx

Speed Benchmarks

On an EC2 instance, gof3r can exceed 1 Gbps for both puts and gets:

  $ gof3r get -b test-bucket -k 8_GB_tar | pv -a | tar -x
  Duration: 53.201632211s
  [ 167MB/s]
  

  $ tar -cf - test_dir/ | pv -a | gof3r put -b test-bucket -k 8_GB_tar
  Duration: 1m16.080800315s
  [ 119MB/s]

These tests were performed on an m1.xlarge EC2 instance with a virtualized 1 Gigabit ethernet interface. See Amazon EC2 Instance Details for more information.

Features

  • Speed: Especially for larger s3 objects where parallelism can be exploited, s3gof3r will saturate the bandwidth of an EC2 instance. See the Benchmarks above.

  • Streaming Uploads and Downloads: As the above examples illustrate, streaming allows the gof3r command-line tool to be used with linux/unix pipes. This allows transformation of the data in parallel as it is uploaded or downloaded from S3.

  • End-to-end Integrity Checking: s3gof3r calculates the md5 hash of the stream in parallel while uploading and downloading. On upload, a file containing the md5 hash is saved in s3. This is checked against the calculated md5 on download. On upload, the content-md5 of each part is calculated and sent with the header to be checked by AWS. s3gof3r also checks the 'hash of hashes' returned by S3 in the Etag field on completion of a multipart upload. See the S3 API Reference for details.

  • Retry Everything: All http requests and every part is retried on both uploads and downloads. Requests to S3 frequently time out, especially under high load, so this is essential to complete large uploads or downloads.

  • Memory Efficiency: Memory used to upload and download parts is recycled. For an upload or download with the default concurrency of 10 and part size of 20 MB, the maximum memory usage is less than 300 MB. Memory footprint can be further reduced by reducing part size or concurrency.

Installation

s3gof3r is written in Go and requires go 1.5 or later. It can be installed with go get to download and compile it from source. To install the command-line tool, gof3r set GO15VENDOREXPERIMENT=1 in your environment:

$ go get github.com/rlmcpherson/s3gof3r/gof3r

To install just the package for use in other Go programs:

$ go get github.com/rlmcpherson/s3gof3r
Release Binaries

To try the latest release of the gof3r command-line interface without installing go, download the statically-linked binary for your architecture from Github Releases.

gof3r (command-line interface) usage:

  To stream up to S3:
     $  <input_stream> | gof3r put -b <bucket> -k <s3_path>
  To stream down from S3:
     $ gof3r get -b <bucket> -k <s3_path> | <output_stream>
  To upload a file to S3:
     $ $ gof3r cp <local_path> s3://<bucket>/<s3_path>
  To download a file from S3:
     $ gof3r cp s3://<bucket>/<s3_path> <local_path>

Set AWS keys as environment Variables:

  $ export AWS_ACCESS_KEY_ID=<access_key>
  $ export AWS_SECRET_ACCESS_KEY=<secret_key>

gof3r also supports IAM role-based keys from EC2 instance metadata. If available and environment variables are not set, these keys are used are used automatically.

Examples:

$ tar -cf - /foo_dir/ | gof3r put -b my_s3_bucket -k bar_dir/s3_object -m x-amz-meta-custom-metadata:abc123 -m x-amz-server-side-encryption:AES256
$ gof3r get -b my_s3_bucket -k bar_dir/s3_object | tar -x    

see the gof3r man page for complete usage

Documentation

s3gof3r package: See the godocs for api documentation.

gof3r cli : godoc and gof3r man page

Have a question? Ask it on the s3gof3r Mailing List

Documentation

Overview

Package s3gof3r provides fast, parallelized, streaming access to Amazon S3. It includes a command-line interface: `gof3r`.

Index

Examples

Constants

This section is empty.

Variables

View Source
var DefaultConfig = &Config{
	Concurrency: 10,
	PartSize:    20 * mb,
	NTry:        10,
	Md5Check:    true,
	Scheme:      "https",
	Client:      ClientWithTimeout(clientTimeout),
}

DefaultConfig contains defaults used if *Config is nil

View Source
var DefaultDomain = "s3.amazonaws.com"

DefaultDomain is set to the endpoint for the U.S. S3 service.

Functions

func ClientWithTimeout

func ClientWithTimeout(timeout time.Duration) *http.Client

ClientWithTimeout is an http client optimized for high throughput to S3, It times out more agressively than the default http client in net/http as well as setting deadlines on the TCP connection

func SetLogger added in v0.4.0

func SetLogger(out io.Writer, prefix string, flag int, debug bool)

SetLogger wraps the standard library log package.

It allows the internal logging of s3gof3r to be set to a desired output and format. Setting debug to true enables debug logging output. s3gof3r does not log output by default.

Types

type Bucket

type Bucket struct {
	*S3
	Name string
	*Config
}

A Bucket for an S3 service.

func (*Bucket) Delete added in v0.4.6

func (b *Bucket) Delete(path string) error

Delete deletes the key at path If the path does not exist, Delete returns nil (no error).

func (*Bucket) GetReader

func (b *Bucket) GetReader(path string, c *Config) (r io.ReadCloser, h http.Header, err error)

GetReader provides a reader and downloads data using parallel ranged get requests. Data from the requests are ordered and written sequentially.

Data integrity is verified via the option specified in c. Header data from the downloaded object is also returned, useful for reading object metadata. DefaultConfig is used if c is nil Callers should call Close on r to ensure that all resources are released.

To specify an object version in a versioned bucket, the version ID may be included in the path as a url parameter. See http://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectVersions.html

Example
k, err := EnvKeys() // get S3 keys from environment
if err != nil {
	return err
}

// Open bucket to put file into
s3 := New("", k)
b := s3.Bucket("bucketName")

r, h, err := b.GetReader("keyName", nil)
if err != nil {
	return err
}
// stream to standard output
if _, err = io.Copy(os.Stdout, r); err != nil {
	return err
}
err = r.Close()
if err != nil {
	return err
}
fmt.Println(h) // print key header data
return nil
Output:

func (*Bucket) PutWriter

func (b *Bucket) PutWriter(path string, h http.Header, c *Config) (w io.WriteCloser, err error)

PutWriter provides a writer to upload data as multipart upload requests.

Each header in h is added to the HTTP request header. This is useful for specifying options such as server-side encryption in metadata as well as custom user metadata. DefaultConfig is used if c is nil. Callers should call Close on w to ensure that all resources are released.

Example
k, err := EnvKeys() // get S3 keys from environment
if err != nil {
	return err
}
// Open bucket to put file into
s3 := New("", k)
b := s3.Bucket("bucketName")

// open file to upload
file, err := os.Open("fileName")
if err != nil {
	return err
}

// Open a PutWriter for upload
w, err := b.PutWriter(file.Name(), nil, nil)
if err != nil {
	return err
}
if _, err = io.Copy(w, file); err != nil { // Copy into S3
	return err
}
if err = w.Close(); err != nil {
	return err
}
return nil
Output:

func (*Bucket) Sign

func (b *Bucket) Sign(req *http.Request)

Sign signs the http.Request

type Config

type Config struct {
	*http.Client       // http client to use for requests
	Concurrency  int   // number of parts to get or put concurrently
	PartSize     int64 // initial  part size in bytes to use for multipart gets or puts
	NTry         int   // maximum attempts for each part
	Md5Check     bool  // The md5 hash of the object is stored in <bucket>/.md5/<object_key>.md5
	// When true, it is stored on puts and verified on gets
	Scheme    string // url scheme, defaults to 'https'
	PathStyle bool   // use path style bucket addressing instead of virtual host style
}

Config includes configuration parameters for s3gof3r

type Keys

type Keys struct {
	AccessKey     string
	SecretKey     string
	SecurityToken string
}

Keys for an Amazon Web Services account. Used for signing http requests.

func EnvKeys added in v0.3.2

func EnvKeys() (keys Keys, err error)

EnvKeys Reads the AWS keys from the environment

func InstanceKeys added in v0.3.2

func InstanceKeys() (keys Keys, err error)

InstanceKeys Requests the AWS keys from the instance-based metadata on EC2 Assumes only one IAM role.

type RespError added in v0.4.3

type RespError struct {
	Code       string
	Message    string
	Resource   string
	RequestID  string `xml:"RequestId"`
	StatusCode int
}

RespError representbs an http error response http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

func (*RespError) Error added in v0.4.3

func (e *RespError) Error() string

type S3

type S3 struct {
	Domain string // The s3-compatible endpoint. Defaults to "s3.amazonaws.com"
	Keys
}

S3 contains the domain or endpoint of an S3-compatible service and the authentication keys for that service.

func New

func New(domain string, keys Keys) *S3

New Returns a new S3 domain defaults to DefaultDomain if empty

func (*S3) Bucket

func (s *S3) Bucket(name string) *Bucket

Bucket returns a bucket on s3 Bucket Config is initialized to DefaultConfig

func (*S3) Region added in v0.5.0

func (s *S3) Region() string

Region returns the service region infering it from S3 domain.

Directories

Path Synopsis
gof3r is a command-line interface for s3gof3r: fast, concurrent, streaming access to Amazon S3.
gof3r is a command-line interface for s3gof3r: fast, concurrent, streaming access to Amazon S3.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL