block

package

v0.3.3 Latest Latest Go to latest Published: Mar 13, 2024 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/creachadair/ffs

Links

Open Source Insights

Documentation ¶

Overview ¶

Package block implements content-sensitive partitioning of a stream of byte data into blocks, using a rolling hash function.

The algorithm used to split data into blocks is based on the one from LBFS:

http://pdos.csail.mit.edu/lbfs/

As described in the SOSP 2001 paper "A Low-Bandwidth Network File System":

https://pdos.csail.mit.edu/papers/lbfs:sosp01/lbfs.pdf

This package provides an implementation of the Rabin-Karp modular rolling hash algorithm; other algorithms can be plugged in by implementing the Hasher and Hash interfaces.

Example ¶

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/creachadair/ffs/block"
)

func main() {
	// Note that these two strings are similar, but the second one has been
	// edited. The edit changes the block splits around the point of the edit,
	// but the sliding window allows them to resynchronize after the break.
	const input1 = `abcdefg-hijklmnop-qrstuv-wxyz-abcdefg-hijklmnop-qrstuv-wxyz-abcdefghijklmnopqrstuv`
	const input2 = `abcdefg-hijklmnop-qrstuv-wxyz-*-abcdefg-hijklmnop-qrstuv-wxyz-abcdefghijklmnopqrstuv`

	opts := &block.SplitConfig{
		Min:    5,  // no blocks shorter than this
		Size:   10, // desired mean block size
		Max:    20, // no blocks longer than this
		Hasher: block.RabinKarpHasher(23, 997, 13),
	}

	for _, v := range []string{input1, input2} {
		s := block.NewSplitter(strings.NewReader(v), opts)
		var i int
		if err := s.Split(func(data []byte) error {
			i++
			fmt.Printf("%d. %s\n", i, string(data))
			return nil
		}); err != nil {
			log.Fatal(err)
		}
		fmt.Println()
	}

}

Output:


1. abcdefg-h
2. ijklmnop-qrstu
3. v-wxyz-abcdefg
4. -hijklmnop-qrstu
5. v-wxyz-abcdefghijklm
6. nopqrstuv

1. abcdefg-h
2. ijklmnop-qrstu
3. v-wxyz-*-
4. abcdefg-hi
5. jklmnop-qrstu
6. v-wxyz-abcdefghijklm
7. nopqrstuv

Index ¶

Constants
Variables
type Hash
type Hasher
- func RabinKarpHasher(base, modulus int64, windowSize int) Hasher
type SplitConfig
- func (c *SplitConfig) Hash() Hash
type Splitter
- func NewSplitter(r io.Reader, c *SplitConfig) *Splitter

Examples ¶

Package

Constants ¶

View Source

const (
	// DefaultMin is the default minimum block size, in bytes.
	DefaultMin = 2048

	// DefaultSize is the default target block size, in bytes.
	DefaultSize = 16384

	// DefaultMax is the default maximum block size, in bytes.
	DefaultMax = 65536
)

These values are the defaults used if none are specified in the config.

Variables ¶

View Source

var DefaultHasher = RabinKarpHasher(1031, 2147483659, 48)

DefaultHasher is used by a Splitter if no hasher is set in its config.

Functions ¶

This section is empty.

Types ¶

type Hasher ¶

type Hasher interface {
	// Hash returns a fresh Hash instance using the settings from the Hasher.
	// Instances are independent and can be safely used concurrently.
	Hash() Hash
}

A Hasher constructs rolling hash instances. Use the Hash method to obtain a fresh instance.

func RabinKarpHasher ¶

func RabinKarpHasher(base, modulus int64, windowSize int) Hasher

RabinKarpHasher returns a Rabin-Karp rolling hasher using the given base, modulus, and window size. The base and modulus must be coprime and the modulus should be prime (but note that the constructor does not check this).

type SplitConfig ¶

type SplitConfig struct {
	// The rolling hash to use. If nil, uses DefaultHasher.
	Hasher

	// Minimum block size, in bytes. The splitter will not split a block until
	// it is at least this size.
	Min int

	// Desired block size, in bytes. The splitter will attempt to generate
	// blocks of approximately this average size.
	Size int

	// Maximum block size, in bytes. The splitter will split any block that
	// exceeds this size, even if the rolling hash does not find a break.
	Max int
}

A SplitConfig contains the settings to construct a splitter.

func (*SplitConfig) Hash ¶

func (c *SplitConfig) Hash() Hash

Hash implements the Hasher interface for a SplitConfig.

type Splitter ¶

type Splitter struct {
	// contains filtered or unexported fields
}

A Splitter wraps an underlying io.Reader to split the data from the reader into blocks using a rolling hash.

func NewSplitter ¶

func NewSplitter(r io.Reader, c *SplitConfig) *Splitter

NewSplitter constructs a Splitter that reads its data from r and partitions it into blocks using the rolling hash from c. A nil *SplitConfig is ready for use with default sizes and hash settings.

func (*Splitter) Config ¶

func (s *Splitter) Config() *SplitConfig

Config returns the SplitConfig used to construct s, which may be nil.

func (*Splitter) Next ¶

func (s *Splitter) Next() ([]byte, error)

Next returns the next available block, or an error. The slice returned is only valid until a subsequent call of Next. Returns nil, io.EOF when no further blocks are available.

func (*Splitter) Split ¶

func (s *Splitter) Split(f func(data []byte) error) error

Split splits blocks from s and passes each block in sequence to f, until there are no further blocks or until f returns an error. If f returns an error, processing stops and that error is returned to the caller of Split.

The slice passed to f is only valid while f is active; if f wishes to store a block for later use, it must be copied.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL