rolling

package module
v0.0.0-...-5fdc4cc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 21, 2020 License: MIT Imports: 3 Imported by: 0

README

rolling, rolling, rolling...

This is a library of rolling hashes, aka hashes on sliding windows, and some useful code that uses them, in Go (golang).

The subpackage rolling/adler32 implements the Adler-32 checksum. It can be used in a rolling fashion or as a plain checksum. In the latter case, it is likely to be faster than the standard Go hash/adler32 package. Run the benchmarks to be sure.

The main package provides a Scanner type that allows searching for substrings matching particular hash values in a Reader. It implements a multi-pattern version of the Rabin-Karp string search algorithm, parameterized on the particular rolling hash to be used.

This library was written to replace github.com/chmduquesne/rollinghash and takes some code from it. The main difference is the separation of buffering from the hash functions and access to the buffers via the Scanner.

Documentation

Overview

Package rolling implements rolling hashes and Rabin-Karp string searching.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Hash32

type Hash32 interface {
	// Reset resets the hash to its initial state.
	//
	// Reset does not change the window size.
	// It does reset WindowLeft() to WindowSize().
	Reset()

	// Roll updates a Hash by an incoming and an outgoing byte.
	//
	// Roll must not be called before at least WindowSize() bytes have been
	// fed to Write.
	Roll(in, out byte)

	// Sum32 returns a 32-bit checksum.
	Sum32() uint32

	// WindowSize returns the size of a rolling hash's window.
	WindowSize() int

	// WriteInitial feeds initial data to a hash.
	// It returns the number of bytes written.
	//
	// WriteInitial may be called multiple times to feed initial data in
	// chunks. If the total number of bytes passed to the calls exceeds the
	// window size, the result is a short write.
	WriteInitial(p []byte) int
}

A Hash32 is a 32-bit rolling hash function.

A rolling hash function has a conceptual window of bytes that it looks at; client code is responsible for maintaining that window (the Scanner does this).

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

A Scanner scans a Reader, looking for matches to a set of 32-bit hash values.

func NewScanner

func NewScanner(r io.Reader, h Hash32) *Scanner

NewScanner returns a new scanner that scans r.

The scanner initially has no hash values for which to look. Call the Add method to add these.

The scanner takes ownership of the hash h for the span of its lifetime.

func (*Scanner) Add

func (s *Scanner) Add(h uint32)

Add instructs the scanner to look for the hash value h.

func (*Scanner) Err

func (s *Scanner) Err() error

Err returns the last error encountered by Scan, if any. EOF is not considered an error.

func (*Scanner) Hashes

func (s *Scanner) Hashes(hashes []uint32) []uint32

Hashes appends the list of hashes that s looks for to hashes.

func (*Scanner) Match

func (s *Scanner) Match(buf []byte) (h uint32, offset int64)

Match returns the hash value and start offset of the last matching block found by Scan.

If buf has sufficient space, the matching content is copied into it.

func (*Scanner) Remove

func (s *Scanner) Remove(h uint32)

Remove tells s to not longer look for the hash value h.

func (*Scanner) Scan

func (s *Scanner) Scan() bool

Scan looks for the next block in s's Reader that matches any of its hashes.

If Scan returns true, it has found a match, which can be retrieved using the Match method.

If Scan returns false, it has encountered an error or reached end-of-file. The error can be retrieved using the Err method.

func (*Scanner) WriteTo

func (s *Scanner) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes the last matching block to w.

Directories

Path Synopsis
Package adler32 implements the Adler-32 rolling checksum.
Package adler32 implements the Adler-32 rolling checksum.
Package buz implements hashing by cyclic polynomial (buzhash).
Package buz implements hashing by cyclic polynomial (buzhash).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL