tar256sum
Repeatable tarball checksums in 100 lines of go.
- Doesn't write to disk
- Resilient from zip-bomb attacks (see
--max-decompress
)
- Stable output
Why
GitHub recently broke many build pipelines by accident. This project explores an alternative checksum method that might be more resilient.
Limitations
- Currently only supports
.tar.gz
- Needs ~512 bytes of memory per file in the archive
Install
Nix:
nix profile install github:andrewhamon/tar256sum
Go get:
go install github.com/andrewhamon/tar256sum
Usage
cat archive.tar.gz | tar256sum
Demo
-
Check out a sizeable git repo:
git clone https://github.com/NixOS/nixpkgs.git
cd nixpkgs
-
Have an older version of git handy:
oldgit --version
# git version 2.36.2
git --version
# git version 2.38.3
-
Compare sha256sum
for two archives of the same commit:
git archive --format tar.gz dbae6eb51edb8afe281e995eff341be07fc43247 | sha256sum
# 8d88969fcaf813e4d4c2f1d14f26ad45a2c35108d5419a31001b04c34cad3579 -
oldgit archive --format tar.gz dbae6eb51edb8afe281e995eff341be07fc43247 | sha256sum
# f1f69372dbb92c00a16e7f73b03d26d7d0462864df7ab854061952be7976e02c -
Observe the non-repeatability. Feel sad.
-
Try again with tar256sum:
git archive --format tar.gz dbae6eb51edb8afe281e995eff341be07fc43247 | tar256sum
# 1d4d42cf4f450f7dc3c4d071d5ce684029a09b50844d472104405bfa1bfd3efc -
oldgit archive --format tar.gz dbae6eb51edb8afe281e995eff341be07fc43247 | tar256sum
# 1d4d42cf4f450f7dc3c4d071d5ce684029a09b50844d472104405bfa1bfd3efc -
Hooray, a stable result!
How does this work?
- for each tar entry:
- checksum the entry header
- checksum the contents of the entry
- store this pair of checksums (in memory)
- sort these pairs and checksum them to produce final result
Why not cat archive.tar.gz | gunzip | sha256sum
You know, I'm starting to ask myself the same question. Obviously there is some
zip bomb risk piping to gunzip (is there a flag for that with gunzip?). But I can
only produce different git archive results with compression -- plain tar seems
more stable across versions, so this program could perhaps be made way simpler
by simply checking the raw tar without sorting the entries.
If you know of two equivalent git tar archives (git archive --format tar
)
that have different checksums, I would love to know about it.