ptarchive

command module
v0.0.0-...-f061898 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 4, 2022 License: MIT Imports: 1 Imported by: 0

README

ptarchive

Command line tool for downloading, unzipping, and filtering papertrail archives

Features
  • Control which archives are downloaded based on date range (--start & --end)
  • Automatically unzip downloaded archives (if either --substr or --pattern)
  • Filter content of archives by sub-string search (faster) or regexp (slower)
    • Archives are filtered as they're streamed to reduce disk usage
  • Dry run shows details of what would be performed without actually doing it (--dry)
    • Summary, including number of files to be downloaded and total size in bytes
    • List of files to be downloaded
    • Output directory
    • etc.
  • Concurrent archive download and processing (--concurrent)
Install

You'll need the Go tools (compiler) installed. Then run:

go install github.com/dgnorton/ptarchive@latest
Usage

ptarchive requires a Papertrail API token.

export PAPERTRAIL_API_TOK=your_token_here
Command line help:
ptarchive -h
ptarchive <command> -h
List archives available:
ptarchive ls
Download / copy archive files:

Usually, before downloading, you'll want to do a "dry run" to see what archives will be downloaded and processed. Make sure the specified command line options are going to do what is expected before running this potentially lengthy operation.

ptarchive cp -d

fetching list of archives between 2019-02-10T01:10:01Z and 2019-02-10T02:10:01Z...
found 0 matching archive files totalling 0 bytes

The -d or --dry flags will make it a dry run. Running it with no other parameters is an easy way to get a reminder of the date format, as shown in the example above.

Dry run example with concurrency bumped up to 8, a start and end date range, and sub-string filtering:

ptarchive cp --concurrent 8 -s 2019-01-21T00:00:00Z -e 2019-01-30T00:00:00Z --substr "prod-abc123-eu-west-1-data" -d

fetching list of archives between 2019-01-21T00:00:00Z and 2019-01-30T00:00:00Z...
found 217 matching archive files totalling 62831924861 bytes
output directory: /tmp/ptarchive455907957
files will contain only lines matching substring: prod-abc123-eu-west-1-data
files will be automatically unzipped
8 files will be downloaded concurrently
Dry run - these files would be downloaded...
2019-01-30-00.tsv.gz    311489677
2019-01-29-23.tsv.gz    274523994
2019-01-29-22.tsv.gz    279747796
2019-01-29-21.tsv.gz    294226547
2019-01-29-20.tsv.gz    296502765
2019-01-29-19.tsv.gz    281413200
<snip>

Once the dry run output looks good, re-run the same command without the -d.

Progress will be displayed as downloads start and finish.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL