Documentation
¶
Overview ¶
Package reedsolomon enables Erasure Coding in Go
For usage and examples, see https://github.com/klauspost/reedsolomon
Index ¶
- Variables
- func AllocAligned(shards, each int) [][]byte
- func SetLog(filename string, stderr ...bool)
- type Encoder
- type Extensions
- type Option
- func WithAVX2(enabled bool) Option
- func WithAVX512(enabled bool) Option
- func WithAVXGFNI(enabled bool) Option
- func WithAutoGoroutines(shardSize int) Option
- func WithCauchyMatrix() Option
- func WithConcurrentStreamReads(enabled bool) Option
- func WithConcurrentStreamWrites(enabled bool) Option
- func WithConcurrentStreams(enabled bool) Option
- func WithCustomMatrix(customMatrix [][]byte) Option
- func WithFastOneParityMatrix() Option
- func WithGFNI(enabled bool) Option
- func WithInversionCache(enabled bool) Option
- func WithJerasureMatrix() Option
- func WithLeopardGF(enabled bool) Option
- func WithLeopardGF16(enabled bool) Option
- func WithMaxGoroutines(n int) Option
- func WithMinSplitSize(n int) Option
- func WithPAR1Matrix() Option
- func WithSSE2(enabled bool) Option
- func WithSSSE3(enabled bool) Option
- func WithStreamBlockSize(n int) Option
- type StreamEncoder
- type StreamEncoder16
- type StreamReadError
- type StreamWriteError
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ErrInvShardNum = errors.New("cannot create Encoder with less than one data shard or less than zero parity shards")
ErrInvShardNum will be returned by New, if you attempt to create an Encoder with less than one data shard or less than zero parity shards.
var ErrInvalidInput = errors.New("invalid input")
ErrInvalidInput is returned if invalid input parameter of Update.
var ErrInvalidShardSize = errors.New("invalid shard size")
ErrInvalidShardSize is returned if shard length doesn't meet the requirements, typically a multiple of N.
var ErrMaxShardNum = errors.New("cannot create Encoder with more than 256 data+parity shards")
ErrMaxShardNum will be returned by New, if you attempt to create an Encoder where data and parity shards are bigger than the order of GF(2^8).
var ErrNotSupported = errors.New("operation not supported")
ErrNotSupported is returned when an operation is not supported.
var ErrReconstructMismatch = errors.New("有效分片和填充分片是互斥的")
ErrReconstructMismatch 在StreamEncoder中返回,如果您在同一索引上提供了"valid"和"fill"流。 因此无法判断您是否认为该分片有效或希望重建它。
var ErrReconstructRequired = errors.New("reconstruction required as one or more required data shards are nil")
ErrReconstructRequired is returned if too few data shards are intact and a reconstruction is required before you can successfully join the shards.
var ErrShardNoData = errors.New("no shard data")
ErrShardNoData will be returned if there are no shards, or if the length of all shards is zero.
var ErrShardSize = errors.New("shard sizes do not match")
ErrShardSize is returned if shard length isn't the same for all shards.
var ErrShortData = errors.New("not enough data to fill the number of requested shards")
ErrShortData will be returned by Split(), if there isn't enough data to fill the number of shards.
var ErrTooFewShards = errors.New("too few shards given")
ErrTooFewShards is returned if too few shards where given to Encode/Verify/Reconstruct/Update. It will also be returned from Reconstruct if there were too few shards to reconstruct the missing data.
Functions ¶
func AllocAligned ¶
AllocAligned allocates 'shards' slices, with 'each' bytes. Each slice will start on a 64 byte aligned boundary.
Types ¶
type Encoder ¶
type Encoder interface { // Encode parity for a set of data shards. // Input is 'shards' containing data shards followed by parity shards. // The number of shards must match the number given to New(). // Each shard is a byte array, and they must all be the same size. // The parity shards will always be overwritten and the data shards // will remain the same, so it is safe for you to read from the // data shards while this is running. Encode(shards [][]byte) error // EncodeIdx will add parity for a single data shard. // Parity shards should start out as 0. The caller must zero them. // Data shards must be delivered exactly once. There is no check for this. // The parity shards will always be updated and the data shards will remain the same. EncodeIdx(dataShard []byte, idx int, parity [][]byte) error // Verify returns true if the parity shards contain correct data. // The data is the same format as Encode. No data is modified, so // you are allowed to read from data while this is running. Verify(shards [][]byte) (bool, error) // Reconstruct will recreate the missing shards if possible. // // Given a list of shards, some of which contain data, fills in the // ones that don't have data. // // The length of the array must be equal to the total number of shards. // You indicate that a shard is missing by setting it to nil or zero-length. // If a shard is zero-length but has sufficient capacity, that memory will // be used, otherwise a new []byte will be allocated. // // If there are too few shards to reconstruct the missing // ones, ErrTooFewShards will be returned. // // The reconstructed shard set is complete, but integrity is not verified. // Use the Verify function to check if data set is ok. Reconstruct(shards [][]byte) error // ReconstructData will recreate any missing data shards, if possible. // // Given a list of shards, some of which contain data, fills in the // data shards that don't have data. // // The length of the array must be equal to Shards. // You indicate that a shard is missing by setting it to nil or zero-length. // If a shard is zero-length but has sufficient capacity, that memory will // be used, otherwise a new []byte will be allocated. // // If there are too few shards to reconstruct the missing // ones, ErrTooFewShards will be returned. // // As the reconstructed shard set may contain missing parity shards, // calling the Verify function is likely to fail. ReconstructData(shards [][]byte) error // ReconstructSome will recreate only requested shards, if possible. // // Given a list of shards, some of which contain data, fills in the // shards indicated by true values in the "required" parameter. // The length of the "required" array must be equal to either Shards or DataShards. // If the length is equal to DataShards, the reconstruction of parity shards will be ignored. // // The length of "shards" array must be equal to Shards. // You indicate that a shard is missing by setting it to nil or zero-length. // If a shard is zero-length but has sufficient capacity, that memory will // be used, otherwise a new []byte will be allocated. // // If there are too few shards to reconstruct the missing // ones, ErrTooFewShards will be returned. // // As the reconstructed shard set may contain missing parity shards, // calling the Verify function is likely to fail. ReconstructSome(shards [][]byte, required []bool) error // Update parity is use for change a few data shards and update it's parity. // Input 'newDatashards' containing data shards changed. // Input 'shards' containing old data shards (if data shard not changed, it can be nil) and old parity shards. // new parity shards will in shards[DataShards:] // Update is very useful if DataShards much larger than ParityShards and changed data shards is few. It will // faster than Encode and not need read all data shards to encode. Update(shards [][]byte, newDatashards [][]byte) error // Split a data slice into the number of shards given to the encoder, // and create empty parity shards if necessary. // // The data will be split into equally sized shards. // If the data size isn't divisible by the number of shards, // the last shard will contain extra zeros. // // If there is extra capacity on the provided data slice // it will be used instead of allocating parity shards. // It will be zeroed out. // // There must be at least 1 byte otherwise ErrShortData will be // returned. // // The data will not be copied, except for the last shard, so you // should not modify the data of the input slice afterwards. Split(data []byte) ([][]byte, error) // Join the shards and write the data segment to dst. // // Only the data shards are considered. // You must supply the exact output size you want. // If there are to few shards given, ErrTooFewShards will be returned. // If the total data size is less than outSize, ErrShortData will be returned. Join(dst io.Writer, shards [][]byte, outSize int) error }
Encoder is an interface to encode Reed-Salomon parity sets for your data.
Example ¶
Simple example of how to use all functions of the Encoder. Note that all error checks have been removed to keep it short.
package main import ( "fmt" "math/rand" "github.com/bpfs/defs/v2/reedsolomon" ) func fillRandom(p []byte) { for i := 0; i < len(p); i += 7 { val := rand.Int63() for j := 0; i+j < len(p) && j < 7; j++ { p[i+j] = byte(val) val >>= 8 } } } func main() { // Create some sample data var data = make([]byte, 250000) fillRandom(data) // Create an encoder with 17 data and 3 parity slices. enc, _ := reedsolomon.New(17, 3) // Split the data into shards shards, _ := enc.Split(data) // Encode the parity set _ = enc.Encode(shards) // Verify the parity set ok, _ := enc.Verify(shards) if ok { fmt.Println("ok") } // Delete two shards shards[10], shards[11] = nil, nil // Reconstruct the shards _ = enc.Reconstruct(shards) // Verify the data set ok, _ = enc.Verify(shards) if ok { fmt.Println("ok") } }
Output: ok ok
Example (Slicing) ¶
This demonstrates that shards can be arbitrary sliced and merged and still remain valid.
package main import ( "fmt" "math/rand" "github.com/bpfs/defs/v2/reedsolomon" ) func fillRandom(p []byte) { for i := 0; i < len(p); i += 7 { val := rand.Int63() for j := 0; i+j < len(p) && j < 7; j++ { p[i+j] = byte(val) val >>= 8 } } } func main() { // Create some sample data var data = make([]byte, 250000) fillRandom(data) // Create 5 data slices of 50000 elements each enc, _ := reedsolomon.New(5, 3) shards, _ := enc.Split(data) err := enc.Encode(shards) if err != nil { panic(err) } // Check that it verifies ok, err := enc.Verify(shards) if ok && err == nil { fmt.Println("encode ok") } // Split the data set of 50000 elements into two of 25000 splitA := make([][]byte, 8) splitB := make([][]byte, 8) // Merge into a 100000 element set merged := make([][]byte, 8) // Split/merge the shards for i := range shards { splitA[i] = shards[i][:25000] splitB[i] = shards[i][25000:] // Concencate it to itself merged[i] = append(make([]byte, 0, len(shards[i])*2), shards[i]...) merged[i] = append(merged[i], shards[i]...) } // Each part should still verify as ok. ok, err = enc.Verify(shards) if ok && err == nil { fmt.Println("splitA ok") } ok, err = enc.Verify(splitB) if ok && err == nil { fmt.Println("splitB ok") } ok, err = enc.Verify(merged) if ok && err == nil { fmt.Println("merge ok") } }
Output: encode ok splitA ok splitB ok merge ok
Example (Xor) ¶
This demonstrates that shards can xor'ed and still remain a valid set.
The xor value must be the same for element 'n' in each shard, except if you xor with a similar sized encoded shard set.
package main import ( "fmt" "math/rand" "github.com/bpfs/defs/v2/reedsolomon" ) func fillRandom(p []byte) { for i := 0; i < len(p); i += 7 { val := rand.Int63() for j := 0; i+j < len(p) && j < 7; j++ { p[i+j] = byte(val) val >>= 8 } } } func main() { // Create some sample data var data = make([]byte, 25000) fillRandom(data) // Create 5 data slices of 5000 elements each enc, _ := reedsolomon.New(5, 3) shards, _ := enc.Split(data) err := enc.Encode(shards) if err != nil { panic(err) } // Check that it verifies ok, err := enc.Verify(shards) if !ok || err != nil { fmt.Println("falied initial verify", err) } // Create an xor'ed set xored := make([][]byte, 8) // We xor by the index, so you can see that the xor can change, // It should however be constant vertically through your slices. for i := range shards { xored[i] = make([]byte, len(shards[i])) for j := range xored[i] { xored[i][j] = shards[i][j] ^ byte(j&0xff) } } // Each part should still verify as ok. ok, err = enc.Verify(xored) if ok && err == nil { fmt.Println("verified ok after xor") } }
Output: verified ok after xor
func New ¶
New creates a new encoder and initializes it to the number of data shards and parity shards that you want to use. You can reuse this encoder. Note that the maximum number of total shards is 65536, with some restrictions for a total larger than 256:
- Shard sizes must be multiple of 64
- The methods Join/Split/Update/EncodeIdx are not supported
If no options are supplied, default options are used.
type Extensions ¶
type Extensions interface { // ShardSizeMultiple will return the size the shard sizes must be a multiple of. ShardSizeMultiple() int // DataShards will return the number of data shards. DataShards() int // ParityShards will return the number of parity shards. ParityShards() int // TotalShards will return the total number of shards. TotalShards() int // AllocAligned will allocate TotalShards number of slices, // aligned to reasonable memory sizes. // Provide the size of each shard. AllocAligned(each int) [][]byte }
Extensions is an optional interface. All returned instances will support this interface.
type Option ¶
type Option func(*options)
Option allows to override processing parameters.
func WithAVX2 ¶
WithAVX2 allows to enable/disable AVX2 instructions. If not set, AVX will be turned on or off automatically based on CPU ID information. This will also disable AVX GFNI instructions.
func WithAVX512 ¶
WithAVX512 allows to enable/disable AVX512 (and GFNI) instructions.
func WithAVXGFNI ¶
WithAVXGFNI allows to enable/disable GFNI with AVX instructions. If not set, GFNI will be turned on or off automatically based on CPU ID information.
func WithAutoGoroutines ¶
WithAutoGoroutines will adjust the number of goroutines for optimal speed with a specific shard size. Send in the shard size you expect to send. Other shard sizes will work, but may not run at the optimal speed. Overwrites WithMaxGoroutines. If shardSize <= 0, it is ignored.
func WithCauchyMatrix ¶
func WithCauchyMatrix() Option
WithCauchyMatrix will make the encoder build a Cauchy style matrix. The output of this is not compatible with the standard output. A Cauchy matrix is faster to generate. This does not affect data throughput, but will result in slightly faster start-up time.
func WithConcurrentStreamReads ¶
WithConcurrentStreamReads will enable concurrent reads from the input streams. Default: Disabled, meaning only one stream will be read at the time. Ignored if not used on a stream input.
func WithConcurrentStreamWrites ¶
WithConcurrentStreamWrites will enable concurrent writes to the the output streams. Default: Disabled, meaning only one stream will be written at the time. Ignored if not used on a stream input.
func WithConcurrentStreams ¶
WithConcurrentStreams will enable concurrent reads and writes on the streams. Default: Disabled, meaning only one stream will be read/written at the time. Ignored if not used on a stream input.
func WithCustomMatrix ¶
WithCustomMatrix causes the encoder to use the manually specified matrix. customMatrix represents only the parity chunks. customMatrix must have at least ParityShards rows and DataShards columns. It can be used for interoperability with libraries which generate the matrix differently or to implement more complex coding schemes like LRC (locally reconstructible codes).
func WithFastOneParityMatrix ¶
func WithFastOneParityMatrix() Option
WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard. The PAR1 matrix already has this property so it has little effect there.
func WithGFNI ¶
WithGFNI allows to enable/disable AVX512+GFNI instructions. If not set, GFNI will be turned on or off automatically based on CPU ID information.
func WithInversionCache ¶
WithInversionCache allows to control the inversion cache. This will cache reconstruction matrices so they can be reused. Enabled by default, or <= 64 shards for Leopard encoding.
func WithJerasureMatrix ¶
func WithJerasureMatrix() Option
WithJerasureMatrix causes the encoder to build the Reed-Solomon-Vandermonde matrix in the same way as done by the Jerasure library. The first row and column of the coding matrix only contains 1's in this method so the first parity chunk is always equal to XOR of all data chunks.
func WithLeopardGF ¶
WithLeopardGF will use leopard GF for encoding, even when there are fewer than 256 shards. This will likely improve reconstruction time for some setups. Note that Leopard places certain restrictions on use see other documentation.
func WithLeopardGF16 ¶
WithLeopardGF16 will always use leopard GF16 for encoding, even when there is less than 256 shards. This will likely improve reconstruction time for some setups. This is not compatible with Leopard output for <= 256 shards. Note that Leopard places certain restrictions on use see other documentation.
func WithMaxGoroutines ¶
WithMaxGoroutines is the maximum number of goroutines number for encoding & decoding. Jobs will be split into this many parts, unless each goroutine would have to process less than minSplitSize bytes (set with WithMinSplitSize). For the best speed, keep this well above the GOMAXPROCS number for more fine grained scheduling. If n <= 0, it is ignored.
func WithMinSplitSize ¶
WithMinSplitSize is the minimum encoding size in bytes per goroutine. By default this parameter is determined by CPU cache characteristics. See WithMaxGoroutines on how jobs are split. If n <= 0, it is ignored.
func WithPAR1Matrix ¶
func WithPAR1Matrix() Option
WithPAR1Matrix causes the encoder to build the matrix how PARv1 does. Note that the method they use is buggy, and may lead to cases where recovery is impossible, even if there are enough parity shards.
func WithSSE2 ¶
WithSSE2 allows to enable/disable SSE2 instructions. If not set, SSE2 will be turned on or off automatically based on CPU ID information.
func WithSSSE3 ¶
WithSSSE3 allows to enable/disable SSSE3 instructions. If not set, SSSE3 will be turned on or off automatically based on CPU ID information.
func WithStreamBlockSize ¶
WithStreamBlockSize allows to set a custom block size per round of reads/writes. If not set, any shard size set with WithAutoGoroutines will be used. If WithAutoGoroutines is also unset, 4MB will be used. Ignored if not used on stream.
type StreamEncoder ¶
type StreamEncoder interface { // Encode 为一组数据分片生成奇偶校验分片。 // // 输入'shards'包含数据分片的读取器,后跟奇偶校验分片的io.Writer。 // // 分片数量必须与传给NewStream()的数量匹配。 // // 每个读取器必须提供相同数量的字节。 // // 奇偶校验分片将写入写入器。 // 写入的字节数将与输入大小匹配。 // // 如果数据流返回错误,将返回StreamReadError类型错误。如果奇偶校验写入器返回错误,将返回StreamWriteError。 Encode(inputs []io.Reader, outputs []io.Writer) error // Verify 如果奇偶校验分片包含正确的数据则返回true。 // // 分片数量必须与传给NewStream()的数据+奇偶校验分片总数匹配。 // // 每个读取器必须提供相同数量的字节。 // 如果分片流返回错误,将返回StreamReadError类型错误。 Verify(shards []io.Reader) (bool, error) // Reconstruct 将在可能的情况下重建丢失的分片。 // // 给定有效分片列表(用于读取)和无效分片列表(用于写入) // // 通过在'valid'切片中将其设置为nil并同时在"fill"中设置非nil写入器来指示分片丢失。 // 一个索引不能同时包含非nil的'valid'和'fill'条目。 // 如果两者都提供了,将返回'ErrReconstructMismatch'。 // // 如果分片太少而无法重建丢失的分片,将返回ErrTooFewShards。 // // 重建的分片集是完整的,但未验证完整性。 // 使用Verify函数检查数据集是否正常。 Reconstruct(inputs []io.Reader, outputs []io.Writer) error // Split 将输入流分割成给定给编码器的分片数。 // // 数据将被分割成大小相等的分片。 // 如果数据大小不能被分片数整除,最后一个分片将包含额外的零。 // // 您必须提供输入的总大小。 // 如果无法检索指定的字节数,将返回'ErrShortData'。 Split(data io.Reader, dst []io.Writer, size int64) error // Join 将分片连接起来并将数据段写入dst。 // // 只考虑数据分片。 // // 您必须提供想要的确切输出大小。 // 如果给定的分片太少,将返回ErrTooFewShards。 // 如果总数据大小小于outSize,将返回ErrShortData。 Join(dst io.Writer, shards []io.Reader, outSize int64) error }
StreamEncoder 是一个用于对数据进行Reed-Solomon奇偶校验编码的接口。 它提供了完整的流式接口,并以最大4MB的块处理数据。
对于10MB及以下的小分片大小,建议使用内存接口,因为流式接口有启动开销。
对于所有操作,读取器和写入器不应假定任何单个读/写的顺序/大小。
使用示例请参见examples文件夹中的"stream-encoder.go"和"streamdecoder.go"。
Example ¶
This will show a simple stream encoder where we encode from a []io.Reader which contain a reader for each shard.
Input and output can be exchanged with files, network streams or what may suit your needs.
package main import ( "bytes" "fmt" "io" "log" "math/rand" "github.com/bpfs/defs/v2/reedsolomon" ) func fillRandom(p []byte) { for i := 0; i < len(p); i += 7 { val := rand.Int63() for j := 0; i+j < len(p) && j < 7; j++ { p[i+j] = byte(val) val >>= 8 } } } func main() { dataShards := 5 parityShards := 2 // Create a StreamEncoder with the number of data andparity shards. rs, err := reedsolomon.NewStream(dataShards, parityShards) if err != nil { log.Fatal(err) } shardSize := 50000 // Create input data shards. input := make([][]byte, dataShards) for s := range input { input[s] = make([]byte, shardSize) fillRandom(input[s]) } // Convert our buffers to io.Readers readers := make([]io.Reader, dataShards) for i := range readers { readers[i] = io.Reader(bytes.NewBuffer(input[i])) } // Create our output io.Writers out := make([]io.Writer, parityShards) for i := range out { out[i] = io.Discard } // Encode from input to output. err = rs.Encode(readers, out) if err != nil { log.Fatal(err) } fmt.Println("ok") }
Output: ok
func NewStream ¶
func NewStream(dataShards, parityShards int, o ...Option) (StreamEncoder, error)
NewStream 创建一个新的编码器并将其初始化为您想要使用的数据分片和奇偶校验分片的数量。您可以重用此编码器。 注意数据分片的最大数量是256。 参数: - dataShards: int 数据分片数量 - parityShards: int 奇偶校验分片数量 - o: 可选参数,可以传递WithConcurrentStreamReads(true)和WithConcurrentStreamWrites(true)来启用并发读取和写入。 返回: - StreamEncoder: 流式编码器 - error: 错误
func NewStreamC ¶
func NewStreamC(dataShards, parityShards int, conReads, conWrites bool, o ...Option) (StreamEncoder, error)
NewStreamC 创建一个新的编码器并将其初始化为给定的数据分片和奇偶校验分片数量。
此函数与'NewStream'功能相同,但允许您启用并发读取和写入。
参数: - dataShards: int 数据分片数量 - parityShards: int 奇偶校验分片数量 - conReads: bool 是否启用并发读取 - conWrites: bool 是否启用并发写入 - o: 可选参数,可以传递WithConcurrentStreamReads(true)和WithConcurrentStreamWrites(true)来启用并发读取和写入。
type StreamEncoder16 ¶ added in v2.0.28
type StreamEncoder16 interface { // Encode 为一组数据分片生成奇偶校验分片 Encode(inputs []io.Reader, outputs []io.Writer) error // Verify 验证奇偶校验分片的正确性 Verify(shards []io.Reader) (bool, error) // Reconstruct 重建丢失的分片 Reconstruct(inputs []io.Reader, outputs []io.Writer) error // Split 将输入流分割成多个分片 Split(data io.Reader, dst []io.Writer, size int64) error }
StreamEncoder16 是一个基于GF(2^16)的Reed-Solomon编码器接口
func NewStream16 ¶ added in v2.0.28
func NewStream16(dataShards, parityShards int, opts ...Option) (StreamEncoder16, error)
NewStream16 创建一个新的GF(2^16) Reed-Solomon流式编码器 参数: - dataShards: 数据分片数量 - parityShards: 奇偶校验分片数量 - opts: 可选参数 返回: - StreamEncoder16: 新的流式编码器 - error: 如果发生错误,返回错误信息
type StreamReadError ¶
StreamReadError 在遇到与提供的流相关的读取错误时返回。 这将让您知道哪个读取器失败了。
func (StreamReadError) Error ¶
func (s StreamReadError) Error() string
Error 以字符串形式返回错误
返回: - string: 错误字符串
func (StreamReadError) String ¶
func (s StreamReadError) String() string
String 以字符串形式返回错误
返回: - string: 错误字符串
type StreamWriteError ¶
StreamWriteError 在遇到与提供的流相关的写入错误时返回。 这将让您知道哪个读取器失败了。
func (StreamWriteError) Error ¶
func (s StreamWriteError) Error() string
Error 以字符串形式返回错误
返回: - string: 错误字符串
func (StreamWriteError) String ¶
func (s StreamWriteError) String() string
String 以字符串形式返回错误
返回: - string: 错误字符串