Documentation ¶
Overview ¶
Package gosync is inspired by zsync, and rsync. It aims to take the fundamentals and create a very flexible library that can be adapted to work in many ways.
We rely heavily on built in Go abstractions like io.Reader, hash.Hash and our own interfaces - this makes the code easier to change, and to test. In particular, no part of the core library should know anything about the transport or layout of the reference data. If you want to do rsync and do http/https range requests, that's just as good as zsync client-server over an SSH tunnel. The goal is also to allow support for multiple concurrent connections, so that you can make the best use of your line in the face of the bandwidth latency product (or other concerns that require concurrency to solve).
The following optimizations are possible: * Generate hashes with multiple threads (both during reference generation and local file interrogation) * Multiple ranged requests (can even be used to get the hashes)
Example ¶
// due to short example strings, use a very small block size // using one this small in practice would increase your file transfer! const blockSize = 4 // This is the "file" as described by the authoritive version const reference = "The quick brown fox jumped over the lazy dog" // This is what we have locally. Not too far off, but not correct. const localVersion = "The qwik brown fox jumped 0v3r the lazy" generator := filechecksum.NewFileChecksumGenerator(blockSize) _, referenceFileIndex, _, err := indexbuilder.BuildIndexFromString( generator, reference, ) if err != nil { return } referenceAsBytes := []byte(reference) localVersionAsBytes := []byte(localVersion) blockCount := len(referenceAsBytes) / blockSize if len(referenceAsBytes)%blockSize != 0 { blockCount++ } inputFile := bytes.NewReader(localVersionAsBytes) patchedFile := bytes.NewBuffer(nil) // This is more complicated than usual, because we're using in-memory // "files" and sources. Normally you would use MakeRSync summary := &BasicSummary{ ChecksumIndex: referenceFileIndex, ChecksumLookup: nil, BlockCount: uint(blockCount), BlockSize: blockSize, FileSize: int64(len(referenceAsBytes)), } rsync := &RSync{ Input: inputFile, Output: patchedFile, Source: blocksources.NewReadSeekerBlockSource( bytes.NewReader(referenceAsBytes), blocksources.MakeNullFixedSizeResolver(uint64(blockSize)), ), Summary: summary, OnClose: nil, } if err := rsync.Patch(); err != nil { fmt.Printf("Error: %v", err) return } fmt.Printf("Patched result: \"%s\"\n", patchedFile.Bytes())
Output: Patched result: "The quick brown fox jumped over the lazy dog"
Example (HttpBlockSource) ¶
This is exceedingly similar to the module Example, but uses the http blocksource and a local http server
package main import ( "bytes" "crypto/md5" "fmt" "net" "net/http" "time" "github.com/Redundancy/go-sync/blocksources" "github.com/Redundancy/go-sync/comparer" "github.com/Redundancy/go-sync/filechecksum" "github.com/Redundancy/go-sync/indexbuilder" "github.com/Redundancy/go-sync/patcher" ) // due to short example strings, use a very small block size // using one this small in practice would increase your file transfer! const BLOCK_SIZE = 4 // This is the "file" as described by the authoritive version const REFERENCE = "The quick brown fox jumped over the lazy dog" // This is what we have locally. Not too far off, but not correct. const LOCAL_VERSION = "The qwik brown fox jumped 0v3r the lazy" var content = bytes.NewReader([]byte(REFERENCE)) func handler(w http.ResponseWriter, req *http.Request) { http.ServeContent(w, req, "", time.Now(), content) } // set up a http server locally that will respond predictably to ranged requests func setupServer() <-chan int { var PORT = 8000 s := http.NewServeMux() s.HandleFunc("/content", handler) portChan := make(chan int) go func() { var listener net.Listener var err error for { PORT++ p := fmt.Sprintf(":%v", PORT) listener, err = net.Listen("tcp", p) if err == nil { break } } portChan <- PORT http.Serve(listener, s) }() return portChan } // This is exceedingly similar to the module Example, but uses the http blocksource and a local http server func main() { PORT := <-setupServer() LOCAL_URL := fmt.Sprintf("http://localhost:%v/content", PORT) generator := filechecksum.NewFileChecksumGenerator(BLOCK_SIZE) _, referenceFileIndex, checksumLookup, err := indexbuilder.BuildIndexFromString(generator, REFERENCE) if err != nil { return } fileSize := int64(len([]byte(REFERENCE))) // This would normally be saved in a file blockCount := fileSize / BLOCK_SIZE if fileSize%BLOCK_SIZE != 0 { blockCount++ } fs := &BasicSummary{ ChecksumIndex: referenceFileIndex, ChecksumLookup: checksumLookup, BlockCount: uint(blockCount), BlockSize: uint(BLOCK_SIZE), FileSize: fileSize, } /* // Normally, this would be: rsync, err := MakeRSync( "toPatch.file", "http://localhost/content", "out.file", fs, ) */ // Need to replace the output and the input inputFile := bytes.NewReader([]byte(LOCAL_VERSION)) patchedFile := bytes.NewBuffer(nil) resolver := blocksources.MakeFileSizedBlockResolver( uint64(fs.GetBlockSize()), fs.GetFileSize(), ) rsync := &RSync{ Input: inputFile, Output: patchedFile, Source: blocksources.NewHttpBlockSource( LOCAL_URL, 1, resolver, &filechecksum.HashVerifier{ Hash: md5.New(), BlockSize: fs.GetBlockSize(), BlockChecksumGetter: fs, }, ), Summary: fs, OnClose: nil, } err = rsync.Patch() if err != nil { fmt.Printf("Error: %v\n", err) return } err = rsync.Close() if err != nil { fmt.Printf("Error: %v\n", err) return } fmt.Printf("Patched content: \"%v\"\n", patchedFile.String()) // Just for inspection remoteReferenceSource := rsync.Source.(*blocksources.BlockSourceBase) fmt.Printf("Downloaded Bytes: %v\n", remoteReferenceSource.ReadBytes()) } func ToPatcherFoundSpan(sl comparer.BlockSpanList, blockSize int64) []patcher.FoundBlockSpan { result := make([]patcher.FoundBlockSpan, len(sl)) for i, v := range sl { result[i].StartBlock = v.StartBlock result[i].EndBlock = v.EndBlock result[i].MatchOffset = v.ComparisonStartOffset result[i].BlockSize = blockSize } return result } func ToPatcherMissingSpan(sl comparer.BlockSpanList, blockSize int64) []patcher.MissingBlockSpan { result := make([]patcher.MissingBlockSpan, len(sl)) for i, v := range sl { result[i].StartBlock = v.StartBlock result[i].EndBlock = v.EndBlock result[i].BlockSize = blockSize } return result }
Output: Patched content: "The quick brown fox jumped over the lazy dog" Downloaded Bytes: 16
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // DefaultConcurrency is the default concurrency level used by patching and downloading DefaultConcurrency = runtime.NumCPU() )
Functions ¶
func IsSameFile ¶
IsSameFile checks if two file paths are the same file
Types ¶
type BasicSummary ¶
type BasicSummary struct { BlockSize uint BlockCount uint FileSize int64 *index.ChecksumIndex filechecksum.ChecksumLookup }
BasicSummary implements a version of the FileSummary interface
func (*BasicSummary) GetBlockCount ¶
func (fs *BasicSummary) GetBlockCount() uint
GetBlockCount gets the number of blocks
func (*BasicSummary) GetBlockSize ¶
func (fs *BasicSummary) GetBlockSize() uint
GetBlockSize gets the size of each block
func (*BasicSummary) GetFileSize ¶
func (fs *BasicSummary) GetFileSize() int64
GetFileSize gets the file size of the file
type FileSummary ¶
type FileSummary interface { GetBlockSize() uint GetBlockCount() uint GetFileSize() int64 FindWeakChecksum2(bytes []byte) interface{} FindStrongChecksum2(bytes []byte, weak interface{}) []chunks.ChunkChecksum GetStrongChecksumForBlock(blockID int) []byte }
FileSummary combines many of the interfaces that are needed It is expected that you might implement it by embedding existing structs
type RSync ¶
type RSync struct { Input ReadSeekerAt Source patcher.BlockSource Output io.Writer Summary FileSummary OnClose []closer }
RSync is an object designed to make the standard use-case for gosync as easy as possible.
To this end, it hides away many low level choices by default, and makes some assumptions.
func MakeRSync ¶
func MakeRSync( InputFile, Source, OutFile string, Summary FileSummary, ) (r *RSync, err error)
MakeRSync creates an RSync object using string paths, inferring most of the configuration
type ReadSeekerAt ¶
type ReadSeekerAt interface { io.ReadSeeker io.ReaderAt }
ReadSeekerAt is the combinaton of ReadSeeker and ReaderAt interfaces
Directories ¶
Path | Synopsis |
---|---|
Package chunks provides the basic structure for a pair of the weak and strong checksums.
|
Package chunks provides the basic structure for a pair of the weak and strong checksums. |
package comparer is responsible for using a FileChecksumGenerator (filechecksum) and an index to move through a file and compare it to the index, producing a FileDiffSummary
|
package comparer is responsible for using a FileChecksumGenerator (filechecksum) and an index to move through a file and compare it to the index, producing a FileDiffSummary |
package filechecksum provides the FileChecksumGenerator, whose main responsibility is to read a file, and generate both weak and strong checksums for every block.
|
package filechecksum provides the FileChecksumGenerator, whose main responsibility is to read a file, and generate both weak and strong checksums for every block. |
Package index provides the functionality to describe a reference 'file' and its contents in terms of the weak and strong checksums, in such a way that you can check if a weak checksum is present, then check if there is a strong checksum that matches.
|
Package index provides the functionality to describe a reference 'file' and its contents in terms of the weak and strong checksums, in such a way that you can check if a weak checksum is present, then check if there is a strong checksum that matches. |
Package indexbuilder provides a few shortbuts to building a checksum index by generating and then loading the checksums, and building an index from that.
|
Package indexbuilder provides a few shortbuts to building a checksum index by generating and then loading the checksums, and building an index from that. |
Package patcher follows a pattern established by hash, which defines the interface in the top level package, and then provides implementations below it.
|
Package patcher follows a pattern established by hash, which defines the interface in the top level package, and then provides implementations below it. |
sequential
Sequential Patcher will stream the patched version of the file to output, since it works strictly in order, it cannot patch the local file directly (since it might overwrite a block needed later), so there would have to be a final copy once the patching was done.
|
Sequential Patcher will stream the patched version of the file to output, since it works strictly in order, it cannot patch the local file directly (since it might overwrite a block needed later), so there would have to be a final copy once the patching was done. |
rollsum provides an implementation of a rolling checksum - a checksum that's efficient to advance a byte or more at a time.
|
rollsum provides an implementation of a rolling checksum - a checksum that's efficient to advance a byte or more at a time. |
util
|
|
readers
util/readers exists to provide convenient and composable io.Reader compatible streams to allow testing without having to check in large binary files.
|
util/readers exists to provide convenient and composable io.Reader compatible streams to allow testing without having to check in large binary files. |