index

package
v0.0.0-...-e23051b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 3, 2023 License: Apache-2.0, MIT, Apache-2.0, + 1 more Imports: 11 Imported by: 0

Documentation

Overview

package index provides indexing functionality for CARv1 data payload represented as a mapping of CID to offset. This can then be used to implement random access over a CARv1.

Index can be written or read using the following static functions: index.WriteTo and index.ReadFrom.

Index

Examples

Constants

View Source
const CarIndexNone = 0x300000

CarIndexNone is a sentinel value used as a multicodec code for the index indicating no index.

Variables

View Source
var ErrNotFound = errors.New("not found")

ErrNotFound signals a record is not found in the index.

Functions

func GetFirst

func GetFirst(idx Index, key cid.Cid) (uint64, error)

GetFirst is a wrapper over Index.GetAll, returning the offset for the first matching indexed CID.

func ReadCodec

func ReadCodec(r io.Reader) (multicodec.Code, error)

ReadCodec reads the codec of the index by decoding the first varint read from r.

func WriteTo

func WriteTo(idx Index, w io.Writer) (uint64, error)

WriteTo writes the given idx into w. The written bytes include the index encoding. This can then be read back using index.ReadFrom

Example

ExampleWriteTo unmarshalls an index from an indexed CARv2 file, and stores it as a separate file on disk.

package main

import (
	"fmt"
	"io"
	"os"
	"reflect"

	carv2 "github.com/sevenrats/boxo/ipld/car/v2"
	"github.com/sevenrats/boxo/ipld/car/v2/index"
)

func main() {
	// Open the CARv2 file
	src := "../testdata/sample-wrapped-v2.car"
	cr, err := carv2.OpenReader(src)
	if err != nil {
		panic(err)
	}
	defer func() {
		if err := cr.Close(); err != nil {
			panic(err)
		}
	}()

	// Read and unmarshall index within CARv2 file.
	ir, err := cr.IndexReader()
	if err != nil {
		panic(err)
	}
	idx, err := index.ReadFrom(ir)
	if err != nil {
		panic(err)
	}

	// Store the index alone onto destination file.
	f, err := os.CreateTemp(os.TempDir(), "example-index-*.carindex")
	if err != nil {
		panic(err)
	}
	defer func() {
		if err := f.Close(); err != nil {
			panic(err)
		}
	}()
	_, err = index.WriteTo(idx, f)
	if err != nil {
		panic(err)
	}

	// Seek to the beginning of tile to read it back.
	_, err = f.Seek(0, io.SeekStart)
	if err != nil {
		panic(err)
	}

	// Read and unmarshall the destination file as a separate index instance.
	reReadIdx, err := index.ReadFrom(f)
	if err != nil {
		panic(err)
	}

	// Expect indices to be equal.
	if reflect.DeepEqual(idx, reReadIdx) {
		fmt.Printf("Saved index file matches the index embedded in CARv2 at %v.\n", src)
	} else {
		panic("expected to get the same index as the CARv2 file")
	}

}
Output:

Saved index file matches the index embedded in CARv2 at ../testdata/sample-wrapped-v2.car.

Types

type Index

type Index interface {
	// Codec provides the multicodec code that the index implements.
	//
	// Note that this may return a reserved code if the index
	// implementation is not defined in a spec.
	Codec() multicodec.Code

	// Marshal encodes the index in serial form.
	Marshal(w io.Writer) (uint64, error)

	// Unmarshal decodes the index from its serial form.
	// Note, this function will copy the entire index into memory.
	//
	// Do not unmarshal index from untrusted CARv2 files. Instead, the index should be
	// regenerated from the CARv2 data payload.
	Unmarshal(r io.Reader) error

	// Load inserts a number of records into the index.
	// Note that Index will load all given records. Any filtering of the records such as
	// exclusion of CIDs with multihash.IDENTITY code must occur prior to calling this function.
	//
	// Further, the actual information extracted and indexed from the given records entirely
	// depends on the concrete index implementation.
	// For example, some index implementations may only store partial multihashes.
	Load([]Record) error

	// GetAll looks up all blocks matching a given CID,
	// calling a function for each one of their offsets.
	//
	// GetAll stops if the given function returns false,
	// or there are no more offsets; whichever happens first.
	//
	// If no error occurred and the CID isn't indexed,
	// meaning that no callbacks happen,
	// ErrNotFound is returned.
	GetAll(cid.Cid, func(uint64) bool) error
}

Index provides an interface for looking up byte offset of a given CID.

Note that each indexing mechanism is free to match CIDs however it sees fit. For example, multicodec.CarIndexSorted only indexes multihash digests, meaning that Get and GetAll will find matching blocks even if the CID's encoding multicodec differs. Other index implementations might index the entire CID, the entire multihash, or just part of a multihash's digest.

See: multicodec.CarIndexSorted, multicodec.CarMultihashIndexSorted

func New

func New(codec multicodec.Code) (Index, error)

New constructs a new index corresponding to the given CAR index codec.

func ReadFrom

func ReadFrom(r io.Reader) (Index, error)

ReadFrom reads index from r. The reader decodes the index by reading the first byte to interpret the encoding. Returns error if the encoding is not known.

Attempting to read index data from untrusted sources is not recommended. Instead, the index should be regenerated from the CARv2 data payload.

Example

ExampleReadFrom unmarshalls an index from an indexed CARv2 file, and for each root CID prints the offset at which its corresponding block starts relative to the wrapped CARv1 data payload.

package main

import (
	"fmt"

	carv2 "github.com/sevenrats/boxo/ipld/car/v2"
	"github.com/sevenrats/boxo/ipld/car/v2/index"
)

func main() {
	// Open the CARv2 file
	cr, err := carv2.OpenReader("../testdata/sample-wrapped-v2.car")
	if err != nil {
		panic(err)
	}
	defer cr.Close()

	// Get root CIDs in the CARv1 file.
	roots, err := cr.Roots()
	if err != nil {
		panic(err)
	}

	// Read and unmarshall index within CARv2 file.
	ir, err := cr.IndexReader()
	if err != nil {
		panic(err)
	}
	idx, err := index.ReadFrom(ir)
	if err != nil {
		panic(err)
	}

	// For each root CID print the offset relative to CARv1 data payload.
	for _, r := range roots {
		offset, err := index.GetFirst(idx, r)
		if err != nil {
			panic(err)
		}
		fmt.Printf("Frame with CID %v starts at offset %v relative to CARv1 data payload.\n", r, offset)
	}

}
Output:

Frame with CID bafy2bzaced4ueelaegfs5fqu4tzsh6ywbbpfk3cxppupmxfdhbpbhzawfw5oy starts at offset 61 relative to CARv1 data payload.

type IterableIndex

type IterableIndex interface {
	Index

	// ForEach takes a callback function that will be called
	// on each entry in the index. The arguments to the callback are
	// the multihash of the element, and the offset in the car file
	// where the element appears.
	//
	// If the callback returns a non-nil error, the iteration is aborted,
	// and the ForEach function returns the error to the user.
	//
	// An index may contain multiple offsets corresponding to the same multihash, e.g. via duplicate blocks.
	// In such cases, the given function may be called multiple times with the same multihash but different offset.
	//
	// The order of calls to the given function is deterministic, but entirely index-specific.
	ForEach(func(multihash.Multihash, uint64) error) error
}

IterableIndex is an index which support iterating over it's elements

type MultihashIndexSorted

type MultihashIndexSorted map[uint64]*multiWidthCodedIndex

MultihashIndexSorted maps multihash code (i.e. hashing algorithm) to multiWidthCodedIndex.

func NewMultihashSorted

func NewMultihashSorted() *MultihashIndexSorted

func (*MultihashIndexSorted) Codec

func (m *MultihashIndexSorted) Codec() multicodec.Code

func (*MultihashIndexSorted) ForEach

func (m *MultihashIndexSorted) ForEach(f func(mh multihash.Multihash, offset uint64) error) error

ForEach calls f for every multihash and its associated offset stored by this index.

func (*MultihashIndexSorted) GetAll

func (m *MultihashIndexSorted) GetAll(cid cid.Cid, f func(uint64) bool) error

func (*MultihashIndexSorted) Load

func (m *MultihashIndexSorted) Load(records []Record) error

func (*MultihashIndexSorted) Marshal

func (m *MultihashIndexSorted) Marshal(w io.Writer) (uint64, error)

func (*MultihashIndexSorted) Unmarshal

func (m *MultihashIndexSorted) Unmarshal(r io.Reader) error

type Record

type Record struct {
	cid.Cid
	Offset uint64
}

Record is a pre-processed record of a car item and location.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL