carbites

package module
v0.0.0-...-09cca34 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2023 License: Apache-2.0, MIT Imports: 16 Imported by: 0

README

carbites

Build Standard README Go Report Card

Chunking for CAR files. Split a single CAR into multiple CARs.

Install

go get github.com/alanshaw/go-carbites

Usage

Carbites supports 2 different strategies:

  1. Simple (default) - fast but naive, only the first CAR output has a root CID, subsequent CARs have a placeholder "empty" CID. The first CAR output has roots in the header, subsequent CARs have an empty root CID bafkqaaa as recommended.
  2. Treewalk - walks the DAG to pack sub-graphs into each CAR file that is output. Every CAR file has the same root CID but contains a different portion of the DAG. The DAG is traversed from the root node and each block is decoded and links extracted in order to determine which sub-graph to include in each CAR.
package main

import (
	"io"
	"os"
	"github.com/alanshaw/go-carbites"
)

func main() {
	bigCar, _ := os.Open("big.car")
	targetSize := 1024 * 1024 // 1MiB chunks
	strategy := carbites.Simple // also carbites.Treewalk
	spltr, _ := carbites.Split(bigCar, targetSize, strategy)

	var i int
	for {
		car, err := spltr.Next()
		if err != nil {
			if err == io.EOF {
				break
			}
			panic(err)
		}
		b, _ := ioutil.ReadAll(car)
		ioutil.WriteFile(fmt.Sprintf("chunk-%d.car", i), b, 0644)
		i++
	}
}

API

pkg.go.dev Reference

Contribute

Feel free to dive in! Open an issue or submit PRs.

License

Dual-licensed under MIT + Apache 2.0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Join

func Join(in []io.Reader, s Strategy) (io.Reader, error)

Join together multiple CAR files into a single CAR file.

func JoinSimple

func JoinSimple(in []io.Reader) (io.Reader, error)

Join together multiple CAR files that were split using the "simple" strategy into a single CAR file.

func JoinTreewalk

func JoinTreewalk(in []io.Reader) (io.Reader, error)

Join together multiple CAR files into a single CAR file using the "treewalk" strategy. Note that binary equality between the original CAR and the joined CAR is not guaranteed.

func NewCarMerger

func NewCarMerger(in []io.Reader) (io.Reader, error)

NewCarMerger creates a new CAR file (an io.Reader) that is a result of merging the passed CAR files. The resultant CAR has the combined roots of the passed CAR files and any duplicate blocks are removed.

Types

type BlockReader

type BlockReader interface {
	Get(context.Context, cid.Cid) (blocks.Block, error)
}

type SimpleSplitter

type SimpleSplitter struct {
	// contains filtered or unexported fields
}

func NewSimpleSplitter

func NewSimpleSplitter(in io.Reader, targetSize int) (*SimpleSplitter, error)

Create a new CAR file splitter to create multiple smaller CAR files using the "simple" strategy.

func (*SimpleSplitter) Next

func (spltr *SimpleSplitter) Next() (io.Reader, error)

type Splitter

type Splitter interface {
	// Next splits the next CAR file out from the input CAR file.
	Next() (io.Reader, error)
}

func Split

func Split(in io.Reader, targetSize int, s Strategy) (Splitter, error)

Split a CAR file and create multiple smaller CAR files.

type Strategy

type Strategy int

Strategy describes how CAR files should be split.

const (
	// Simple is fast but naive, only the first CAR output has a root CID,
	// subsequent CARs have a placeholder "empty" CID.
	Simple Strategy = iota
	// Treewalk walks the DAG to pack sub-graphs into each CAR file that is
	// output. Every CAR has the same root CID, but contains a different portion
	// of the DAG.
	Treewalk
)

type TreewalkSplitter

type TreewalkSplitter struct {
	// contains filtered or unexported fields
}

func NewTreewalkSplitter

func NewTreewalkSplitter(r io.Reader, targetSize int) (*TreewalkSplitter, error)

Split a CAR file and create multiple smaller CAR files using the "treewalk" strategy. Note: the entire CAR will be cached in memory. Use NewTreewalkSplitterFromPath or NewTreewalkSplitterFromBlockReader for non-memory bound splitting.

func NewTreewalkSplitterFromBlockReader

func NewTreewalkSplitterFromBlockReader(root cid.Cid, br BlockReader, targetSize int) (*TreewalkSplitter, error)

Split a CAR file (passed as a root CID and a block reader populated with the blocks from the CAR) and create multiple smaller CAR files using the "treewalk" strategy.

func NewTreewalkSplitterFromPath

func NewTreewalkSplitterFromPath(path string, targetSize int) (*TreewalkSplitter, error)

Split a CAR file found on disk at the given path and create multiple smaller CAR files using the "treewalk" strategy.

func (*TreewalkSplitter) Next

func (spltr *TreewalkSplitter) Next() (io.Reader, error)

Directories

Path Synopsis
cmd
carbites Module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL