GCS Cacher
GCS Cacher is a small CLI and Docker container that saves and restores caches on
Google Cloud Storage. It is intended to be used in CI/CD systems like
Cloud Build, but may have applications elsewhere.
Usage
-
Create a new Cloud Storage bucket. Alternatively, you can
use an existing Cloud Storage bucket. To automatically clean up the cache
after a certain period of time, set a lifecycle policy.
-
Create a cache:
gcs-cacher -bucket "my-bucket" -cache "go-mod" -dir "$GOPATH/pkg/mod"
This will compress and upload the contents at pkg/mod
to Google Cloud
Storage at the key "go-mod".
-
Restore a cache:
gcs-cacher -bucket "my-bucket" -restore "go-mod" -dir "$GOPATH/pkg/mod"
This will download the Google Cloud Storage object named "go-mod" and
decompress it to pkg/mod
.
Installation
Choose from one of the following:
-
Download the latest version from the releases.
-
Use a pre-built Docker container:
us-docker.pkg.dev/vargolabs/gcs-cacher/gcs-cacher
docker.pkg.github.com/sethvargo/gcs-cacher/gcs-cacher
Implementation
When saving the cache, the provided directory is made into a tarball, then
gzipped, then uploaded to Google Cloud Storage. When restoring the cache, the
reverse happens.
It's strongly recommend that you use a cache key based on your dependency file,
and restore up the chain. For example:
gcs-cacher \
-bucket "my-bucket" \
-cache "ruby-{{ hashGlob "Gemfile.lock" }}"
gcs-cacher \
-bucket "my-bucket" \
-restore "ruby-{{ hashGlob "Gemfile.lock" }}"
-restore "ruby-"
This will maximize cache hits.
It is strongly recommended that you enable a lifecycle rule on your cache
bucket! This will automatically purge stale entities and keep costs lower.
Why?
The primary use case is to cache large and/or expensive dependency trees like a
Ruby vendor directory or a Go module cache as part of a CI/CD step. Downloading
a compressed, packaged archive is often much faster than a full dependency
resolution. It has an unintended benefit of also reducing dependencies on
external build systems.
Why not just use gsutil?
That's a great question. In fact, there's already a cloud builder
that uses gsutil
to accomplish similar things. However, that approach has a
few drawbacks:
-
It doesn't work with large files because containers don't package the crc
package. If you're cache is > 500mb it will fail. GCS Cacher does not have
this same limitation.
-
You have to build, publish, and manage the container to your own project. We
publish pre-compiled binaries and Docker containers from multiple
registries. You're still free to build it yourself, but you don't have to.
-
The container image itself is huge. It's nearly 1GB in size. The
gcs-cacher container is just a few MBs. Since we're optimzing for build
speed, container size is important.
-
It's actually really hard to get the fallback key logic correct in bash.
There are some subtle edge cases (like when your filename contains a $
)
where this approach completely fails.
-
Despite supporting parallel uploads, that cacher is still ~3.2x slower than
GCS Cacher.