dictzip

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 17, 2024 License: Apache-2.0 Imports: 12 Imported by: 0

README

go-dictzip

Go Reference codecov tests

go-dictzip is a Go library for reading and writing dictzip files.

Status

The API is currently unstable and will change. This package will use module version numbering to manage versions and compatibility.

Installation

To install this package run

go get github.com/ianlewis/go-dictzip

Examples

Reading compressed files

You can open a dictionary file and read it much like a normal reader.

// Open the dictionary.
f, _ := os.Open("dictionary.dict.dz")
r, _ := dictzip.NewReader(f)
defer r.Close()

uncompressedData, _ = io.ReadAll(r)
Random access

Random access can be performed using the ReadAt method.

// Open the dictionary.
f, _ := os.Open("dictionary.dict.dz")
r, _ := dictzip.NewReader(f)
defer r.Close()

buf := make([]byte, 12)
_, _ = r.ReadAt(buf, 5)
Writing compressed files

Dictzip files can be written using the dictzip.Writer. Compressed data is stored in chunks and chunk sizes are stored in the archive header allowing for more efficient random access.

// Open the dictionary.
f, _ := os.Open("dictionary.dict.dz", os.O_WRONLY|os.O_CREATE, 0o644)
w, _ := dictzip.NewWriter(f)
defer w.Close()

buf := []byte("Hello World!")
_, _ = r.Write(buf)

dictzip Command

This repository also includes a dictzip command that is compatible with the dictzip(1) command.

# compress dictionary.dict to dictionary.dict.dz
$ dictzip dictionary.dict

# decompress dictionary.dict.dz to dictionary.dict
$ dictzip -d dictionary.dict.dz

# decompress part of the file and print to stdout
$ dictzip --stdout --start 1024 --size 25 dictionary.dict.dz
dictionary entry contents

References

Documentation

Overview

Package dictzip implements the dictzip compression format. Dictzip compresses files using the gzip(1) algorithm (LZ77) in a manner which is completely compatible with the gzip file format. See: https://linux.die.net/man/1/dictzip See: https://linux.die.net/man/1/gzip See: https://datatracker.ietf.org/doc/html/rfc1952

Unless otherwise informed clients should not assume implementations in this package are safe for parallel execution.

Example
path := "internal/testdata/hello.txt.dz"
f, err := os.Open(path)
if err != nil {
	panic(err)
}

r, err := dictzip.NewReader(f)
if err != nil {
	panic(err)
}

buf := make([]byte, 12)
_, err = r.ReadAt(buf, 5)
if err != nil {
	panic(err)
}

fmt.Println(string(buf))
Output:

Hello World!

Index

Examples

Constants

View Source
const (
	// OSFAT represents an FAT filesystem OS (MS-DOS, OS/2, NT/Win32).
	OSFAT byte = iota

	// OSAmiga represents the Amiga OS.
	OSAmiga

	// OSVMS represents VMS (or OpenVMS).
	OSVMS

	// OSUnix represents Unix operating systems.
	OSUnix

	// OSVM represents VM/CMS.
	OSVM

	// OSAtari represents Atari TOS.
	OSAtari

	// OSHPFS represents HPFS filesystem (OS/2, NT).
	OSHPFS

	// OSMacintosh represents the Macintosh operating system.
	OSMacintosh

	// OSZSystem represents Z-System.
	OSZSystem

	// OSCPM represents the CP/M operating system.
	OSCPM

	// OSTOPS20 represents the TOPS-20 operating system.
	OSTOPS20

	// OSNTFS represents an NTFS filesystem OS (NT).
	OSNTFS

	// OSQDOS represents QDOS.
	OSQDOS

	// OSAcorn represents Acorn RISCOS.
	OSAcorn

	// OSUnknown represents an unknown operating system.
	OSUnknown = 0xff
)
View Source
const (
	// XFLSlowest indicates that the compressor used maximum compression (e.g. slowest algorithm).
	XFLSlowest byte = 0x2

	// XFLFastest indicates that the compressor used the fastest algorithm.
	XFLFastest byte = 0x4
)
View Source
const (
	// NoCompression performs no compression on the input.
	NoCompression = flate.NoCompression

	// BestSpeed provides the lowest level of compression but the fastest
	// performance.
	BestSpeed = flate.BestSpeed

	// BestCompression provides the highest level of compression but the slowest
	// performance.
	BestCompression = flate.BestCompression

	// DefaultCompression is the default compression level used for compressing
	// chunks. It provides a balance between compression and performance.
	DefaultCompression = flate.DefaultCompression

	// HuffmanOnly disables Lempel-Ziv match searching and only performs Huffman
	// entropy encoding. See [flate.HuffmanOnly].
	HuffmanOnly = flate.HuffmanOnly
)
View Source
const (
	// DefaultChunkSize is the default chunk size used when writing dictzip files.
	DefaultChunkSize = math.MaxUint16
)

Variables

View Source
var (

	// ErrHeader indicates an error with gzip header data.
	ErrHeader = fmt.Errorf("%w: invalid header", errDictzip)
)

Functions

This section is empty.

Types

type Header struct {
	// Comment is the COMMENT header field.
	Comment string

	// Extra includes all EXTRA sub-fields except the dictzip RA sub-field.
	Extra []byte

	// ModTime is the MTIME modification time field.
	ModTime time.Time

	// Name is the NAME header field.
	Name string

	// OS is the OS header field.
	OS byte
	// contains filtered or unexported fields
}

Header is the gzip file header.

Strings must be UTF-8 encoded and may only contain Unicode code points U+0001 through U+00FF, due to limitations of the gzip file format.

func (*Header) ChunkSize

func (h *Header) ChunkSize() int

ChunkSize returns the dictzip uncompressed data chunk size.

func (*Header) Sizes added in v0.2.0

func (h *Header) Sizes() []int

Sizes returns the dictzip sizes for the compressed data chunks.

type Reader

type Reader struct {
	// Header is the gzip header data and is valid after [NewReader] or
	// [Reader.Reset].
	Header
	// contains filtered or unexported fields
}

Reader implements io.Reader and io.ReaderAt. It provides random access to the compressed data.

func NewReader

func NewReader(r io.ReadSeeker) (*Reader, error)

NewReader returns a new dictzip Reader reading compressed data from the given reader. It does not assume control of the given io.Reader. It is the responsibility of the caller to Close on that reader when it is not longer used.

NewReader will call Seek on the given reader to ensure that it is being read from the beginning.

It is the callers responsibility to call Reader.Close on the returned Reader when done.

func (*Reader) Close

func (z *Reader) Close() error

Close closes the reader. It does not close the underlying io.Reader.

func (*Reader) Read

func (z *Reader) Read(p []byte) (int, error)

Read implements io.Reader.

func (*Reader) ReadAt

func (z *Reader) ReadAt(p []byte, off int64) (int, error)

ReadAt implements io.ReaderAt.ReadAt.

func (*Reader) Reset

func (z *Reader) Reset(r io.ReadSeeker) error

Reset discards the reader's state and resets it to the initial state as returned by NewReader but reading from the r instead.

Reset will call Seek on the given reader to ensure that it is being read from the beginning.

func (*Reader) Seek

func (z *Reader) Seek(offset int64, whence int) (int64, error)

Seek implements io.Seeker.Seek.

type Writer added in v0.2.0

type Writer struct {
	// Header is written to the file when [Writer.Close] is called.
	Header
	// contains filtered or unexported fields
}

Writer implements io.WriteCloser for writing dictzip files. Writer writes chunks to a temporary file during write and copies the resulting data to the final file when Writer.Close is called.

For this reason, Writer.Close must be called in order to write the file correctly.

func NewWriter added in v0.2.0

func NewWriter(w io.Writer) (*Writer, error)

NewWriter initializes a new dictzip Writer with the default compression level and chunk size.

The OS Header is always set to OSUnknown (0xff) by default.

func NewWriterLevel added in v0.2.0

func NewWriterLevel(w io.Writer, level, chunkSize int) (*Writer, error)

NewWriterLevel initializes a new dictzip Writer with the given compression level and chunk size.

The OS Header is always set to OSUnknown (0xff) by default.

func (*Writer) Close added in v0.2.0

func (z *Writer) Close() error

Close closes the writer by writing the header with calculated offsets and copying chunks from the temporary file to the final output file.

func (*Writer) Write added in v0.2.0

func (z *Writer) Write(p []byte) (int, error)

Directories

Path Synopsis
cmd
dictzip
Package main is the main package for the `dictzip` command.
Package main is the main package for the `dictzip` command.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL