Documentation ¶
Overview ¶
Package zipindex provides a size optimized representation of a zip file to allow decompressing the file without reading the zip file index.
It will only provide the minimal needed data for successful decompression and CRC checks.
Custom metadata can be stored per file and filtering can be performed on the incoming files.
Index ¶
- Constants
- Variables
- func RegisterDecompressor(method uint16, dcomp Decompressor)
- type Decompressor
- type ErrNeedMoreData
- type File
- func (z *File) DecodeMsg(dc *msgp.Reader) (err error)
- func (z *File) EncodeMsg(en *msgp.Writer) (err error)
- func (z *File) MarshalMsg(b []byte) (o []byte, err error)
- func (z *File) Msgsize() (s int)
- func (f *File) Open(r io.Reader) (io.ReadCloser, error)
- func (f *File) OpenRaw(r io.Reader) (io.Reader, error)
- func (f *File) UnmarshalMsg(bts []byte) (o []byte, err error)
- type FileFilter
- type Files
- type ZipDirEntry
Examples ¶
Constants ¶
const ( Store uint16 = 0 // no compression Deflate uint16 = 8 // DEFLATE compressed Zstd uint16 = zstd.ZipMethodWinZip // Zstd in zip. )
Compression methods.
const MaxCustomEntries = 1000
MaxCustomEntries is the maximum number of custom entries per file.
const MaxFiles = 1_000_000_000
MaxFiles is the maximum number of files inside a zip file.
const MaxIndexSize = 128 << 20
MaxIndexSize is the maximum index size, uncompressed.
Variables ¶
var ( // ErrFormat is returned when zip file cannot be parsed. ErrFormat = errors.New("zip: not a valid zip file") // ErrAlgorithm is returned if an unsupported compression type is used. ErrAlgorithm = errors.New("zip: unsupported compression algorithm") // ErrChecksum is returned if a file fails a CRC check. ErrChecksum = errors.New("zip: checksum error") )
var ErrMaxSizeExceeded = errors.New("index maximum size exceeded")
ErrMaxSizeExceeded is returned if the maximum size of data is exceeded.
var ErrTooManyCustomEntries = errors.New("custom entry count exceeded")
ErrTooManyCustomEntries is returned when a zip file custom entry has too many entries.
var ErrTooManyFiles = errors.New("too many files")
ErrTooManyFiles is returned when a zip file contains too many files.
Functions ¶
func RegisterDecompressor ¶
func RegisterDecompressor(method uint16, dcomp Decompressor)
RegisterDecompressor allows custom decompressors for a specified method ID. The common methods Store (0) and Deflate (8) and Zstandard (93) are built in.
Types ¶
type Decompressor ¶
type Decompressor func(r io.Reader) io.ReadCloser
A Decompressor returns a new decompressing reader, reading from r. The ReadCloser's Close method must be used to release associated resources. The Decompressor itself must be safe to invoke from multiple goroutines simultaneously, but each returned reader will be used only by one goroutine at a time.
type ErrNeedMoreData ¶
type ErrNeedMoreData struct {
FromEnd int64
}
ErrNeedMoreData is returned by ReadDir when more data is required to read the directory. The exact number of bytes from the end of the file is provided. It is reasonable to reject numbers that are too large to not run out of memory.
func (ErrNeedMoreData) Error ¶
func (e ErrNeedMoreData) Error() string
Error returns the error as string.
type File ¶
type File struct { Name string // Name of the file as stored in the zip. CompressedSize64 uint64 // Size of compressed data, excluding ZIP headers. UncompressedSize64 uint64 // Size of the Uncompressed data. Offset int64 // Offset where file data header starts. CRC32 uint32 // CRC of the uncompressed data. Method uint16 // Storage method. Flags uint16 // General purpose bit flag // Custom data. Custom map[string]string }
File is a sparse representation of a File inside a zip file.
func DefaultFileFilter ¶
func DefaultFileFilter(dst *File, entry *ZipDirEntry) *File
DefaultFileFilter will filter out all entries that are not regular files and can be compressed.
func FindSerialized ¶
FindSerialized will locate a file by name and return it. This will be less resource intensive than decoding all files, if only one it requested. Expected speed scales O(n) for n files. Returns nil, io.EOF if not found.
Example ¶
ExampleReadFile demonstrates how to read the index of a file on disk.
package main import ( "fmt" "github.com/minio/zipindex" ) func main() { files, err := zipindex.ReadFile("testdata/go-with-datadesc-sig.zip", nil) if err != nil { panic(err) } files.OptimizeSize() serialized, err := files.Serialize() if err != nil { panic(err) } file, err := zipindex.FindSerialized(serialized, "bar.txt") if err != nil { panic(err) } fmt.Printf("bar.txt: %+v", *file) }
Output: bar.txt: {Name:bar.txt CompressedSize64:4 UncompressedSize64:4 Offset:57 CRC32:0 Method:0 Flags:8 Custom:map[]}
func (*File) MarshalMsg ¶
MarshalMsg implements msgp.Marshaler
func (*File) Msgsize ¶
Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message
func (*File) Open ¶
Open returns a ReadCloser that provides access to the File's contents. The Reader 'r' must be forwarded to f.Offset before being provided.
type FileFilter ¶
type FileFilter = func(dst *File, entry *ZipDirEntry) *File
FileFilter allows transforming the incoming data. If the returned file is nil it will not be added. Custom fields can be added. Note the Custom field will usually be nil.
Example ¶
ExampleFileFilter demonstrates how to filter incoming files.
package main import ( "fmt" "github.com/minio/zipindex" ) func main() { files, err := zipindex.ReadFile("testdata/unix.zip", func(dst *zipindex.File, entry *zipindex.ZipDirEntry) *zipindex.File { if dst.Name == "hello" { // Filter out on specific properties. return nil } // Add custom data. if dst.Custom == nil { dst.Custom = make(map[string]string, 3) } dst.Custom["modified"] = entry.Modified.String() dst.Custom["perm"] = fmt.Sprintf("0%o", entry.Mode().Perm()) if len(entry.Comment) > 0 { dst.Custom["comment"] = entry.Comment } return dst }) if err != nil { panic(err) } fmt.Printf("Got %d files\n", len(files)) for i, file := range files { fmt.Printf("%d: %+v\n", i, file) } }
Output: Got 3 files 0: {Name:dir/bar CompressedSize64:6 UncompressedSize64:6 Offset:71 CRC32:2055117726 Method:0 Flags:0 Custom:map[modified:2011-12-08 10:04:50 +0000 +0000 perm:0666]} 1: {Name:dir/empty/ CompressedSize64:0 UncompressedSize64:0 Offset:142 CRC32:0 Method:0 Flags:0 Custom:map[modified:2011-12-08 10:08:06 +0000 +0000 perm:0777]} 2: {Name:readonly CompressedSize64:12 UncompressedSize64:12 Offset:210 CRC32:3127775578 Method:0 Flags:0 Custom:map[modified:2011-12-08 10:06:08 +0000 +0000 perm:0444]}
type Files ¶
type Files []File
Files is a collection of files.
func DeserializeFiles ¶
DeserializeFiles will de-serialize the files.
Example ¶
package main import ( "bytes" "fmt" "io" "log" "os" "github.com/minio/zipindex" ) func main() { exitOnErr := func(err error) { if err != nil { log.Fatalln(err) } } b, err := os.ReadFile("testdata/big.zip") exitOnErr(err) // We only need the end of the file to parse the directory. // Usually this should be at least 64K on initial try. sz := 64 << 10 var files zipindex.Files files, err = zipindex.ReadDir(b[len(b)-sz:], int64(len(b)), nil) // Omitted: Check if ErrNeedMoreData and retry with more data exitOnErr(err) // OptimizeSize files will make the size as efficient as possible // without loosing data. files.OptimizeSize() // Serialize files to binary. serialized, err := files.Serialize() exitOnErr(err) // This output may change if compression is improved. // Output is rounded up. fmt.Printf("Size of serialized data: %dKB\n", (len(serialized)+1023)/1024) // StripCRC(true) will strip CRC, even if there is no file descriptor. files.StripCRC(true) // StripFlags(1<<3) will strip all flags that aren't a file descriptor flag (bit 3). files.StripFlags(1 << 3) noCRC, err := files.Serialize() exitOnErr(err) // This output may change if compression is improved. // Output is rounded up. fmt.Printf("Size of serialized data without CRC: %dKB\n", (len(noCRC)+1023)/1024) // Deserialize the content (with CRC). files, err = zipindex.DeserializeFiles(serialized) exitOnErr(err) file := files.Find("file-10.txt") fmt.Printf("Reading file: %+v\n", *file) // Create a reader with entire zip file... rs := bytes.NewReader(b) // Seek to the file offset. _, err = rs.Seek(file.Offset, io.SeekStart) exitOnErr(err) // Provide the forwarded reader.. rc, err := file.Open(rs) exitOnErr(err) defer rc.Close() // Read the zip file content. content, err := io.ReadAll(rc) exitOnErr(err) fmt.Printf("File content is '%s'\n", string(content)) }
Output: Size of serialized data: 6KB Size of serialized data without CRC: 1KB Reading file: {Name:file-10.txt CompressedSize64:2 UncompressedSize64:2 Offset:410 CRC32:2707236321 Method:0 Flags:0 Custom:map[]} File content is '10'
func ReadDir ¶
func ReadDir(buf []byte, zipSize int64, filter FileFilter) (Files, error)
ReadDir will read the directory from the provided buffer. Regular files that are expected to be decompressable will be returned. ErrNeedMoreData may be returned if more data is required to read the directory. For initial scan at least 64KiB or the entire file if smaller should be given, but more will make it more likely that the entire directory can be read. The total size of the zip file must be provided. A custom filter can be provided. If nil DefaultFileFilter will be used.
Example ¶
package main import ( "errors" "fmt" "os" "github.com/minio/zipindex" ) func main() { b, err := os.ReadFile("testdata/big.zip") if err != nil { panic(err) } // We only need the end of the file to parse the directory. // Usually this should be at least 64K on initial try. sz := 10 << 10 var files zipindex.Files for { files, err = zipindex.ReadDir(b[len(b)-sz:], int64(len(b)), nil) if err == nil { fmt.Printf("Got %d files\n", len(files)) break } var terr zipindex.ErrNeedMoreData if errors.As(err, &terr) { if terr.FromEnd > 1<<20 { panic("we will only provide max 1MB data") } sz = int(terr.FromEnd) fmt.Printf("Retrying with %d bytes at the end of file\n", sz) } else { // Unable to parse... panic(err) } } fmt.Printf("First file: %+v", files[0]) }
Output: Retrying with 57912 bytes at the end of file Got 1000 files First file: {Name:file-0.txt CompressedSize64:1 UncompressedSize64:1 Offset:0 CRC32:4108050209 Method:0 Flags:0 Custom:map[]}
func ReadFile ¶
func ReadFile(name string, filter FileFilter) (Files, error)
ReadFile will read the directory from a file. If the ZIP file directory exceeds 100MB it will be rejected.
Example ¶
ExampleReadFile demonstrates how to read the index of a file on disk.
package main import ( "fmt" "github.com/minio/zipindex" ) func main() { files, err := zipindex.ReadFile("testdata/go-with-datadesc-sig.zip", nil) if err != nil { panic(err) } fmt.Printf("Got %d files\n", len(files)) fmt.Printf("First file: %+v", files[0]) }
Output: Got 2 files First file: {Name:foo.txt CompressedSize64:4 UncompressedSize64:4 Offset:0 CRC32:2117232040 Method:0 Flags:8 Custom:map[]}
func ReaderAt ¶
ReaderAt will read the directory from a io.ReaderAt. The total zip file must be provided. If the ZIP file directory exceeds maxDir bytes it will be rejected.
Example ¶
ExampleReadFile demonstrates how to read the index of a file on disk.
package main import ( "fmt" "os" "github.com/minio/zipindex" ) func main() { f, err := os.Open("testdata/big.zip") if err != nil { panic(err) } fi, err := f.Stat() if err != nil { panic(err) } // Read and allow up to 10MB index. files, err := zipindex.ReaderAt(f, fi.Size(), 10<<20, nil) if err != nil { panic(err) } fmt.Printf("Got %d files\n", len(files)) fmt.Printf("First file: %+v", files[0]) }
Output: Got 1000 files First file: {Name:file-0.txt CompressedSize64:1 UncompressedSize64:1 Offset:0 CRC32:4108050209 Method:0 Flags:0 Custom:map[]}
func (Files) OptimizeSize ¶
func (f Files) OptimizeSize()
OptimizeSize will sort entries and strip CRC data when the file has a file descriptor.
func (*Files) RemoveInsecurePaths ¶ added in v0.3.1
func (f *Files) RemoveInsecurePaths()
RemoveInsecurePaths will remove any file with path deemed insecure. This is files that fail either !filepath.IsLocal(file.Name) or contain a backslash.
func (Files) Sort ¶
func (f Files) Sort()
Sort files by offset in zip file. Typically, directories are already sorted by offset. This will usually provide the smallest possible serialized size.
func (Files) SortByName ¶ added in v0.4.0
func (f Files) SortByName()
SortByName will sort files by file name in zip file.
func (Files) StripCRC ¶
StripCRC will zero out the CRC for all files if there is a data descriptor (which will contain a CRC) or optionally for all.
func (Files) StripFlags ¶
StripFlags will zero out the Flags, except the ones provided in mask.
type ZipDirEntry ¶
type ZipDirEntry struct { // Name is the name of the file. // // It must be a relative path, not start with a drive letter (such as "C:"), // and must use forward slashes instead of back slashes. A trailing slash // indicates that this file is a directory and should have no data. // // When reading zip files, the Name field is populated from // the zip file directly and is not validated for correctness. // It is the caller's responsibility to sanitize it as // appropriate, including canonicalizing slash directions, // validating that paths are relative, and preventing path // traversal through filenames ("../../../"). Name string // Comment is any arbitrary user-defined string shorter than 64KiB. Comment string // NonUTF8 indicates that Name and Comment are not encoded in UTF-8. // // By specification, the only other encoding permitted should be CP-437, // but historically many ZIP readers interpret Name and Comment as whatever // the system's local character encoding happens to be. // // This flag should only be set if the user intends to encode a non-portable // ZIP file for a specific localized region. Otherwise, the Writer // automatically sets the ZIP format's UTF-8 flag for valid UTF-8 strings. NonUTF8 bool CreatorVersion uint16 ReaderVersion uint16 Flags uint16 // Method is the compression method. If zero, Store is used. Method uint16 // Modified is the modified time of the file. // // When reading, an extended timestamp is preferred over the legacy MS-DOS // date field, and the offset between the times is used as the timezone. // If only the MS-DOS date is present, the timezone is assumed to be UTC. // // When writing, an extended timestamp (which is timezone-agnostic) is // always emitted. The legacy MS-DOS date field is encoded according to the // location of the Modified time. Modified time.Time CRC32 uint32 CompressedSize64 uint64 UncompressedSize64 uint64 Extra []byte ExternalAttrs uint32 // Meaning depends on CreatorVersion // contains filtered or unexported fields }
ZipDirEntry describes a file within a zip file. See the zip spec for details.
func (*ZipDirEntry) Mode ¶
func (h *ZipDirEntry) Mode() (mode os.FileMode)
Mode returns the permission and mode bits for the FileHeader.