indexer

package module
v0.8.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 8, 2023 License: Apache-2.0, MIT Imports: 8 Imported by: 5

README

go-indexer-core

Go Reference Coverage Status

Storage specialized for indexing provider content

The indexer-core is a key-value store that is optimized for storing large numbers of multihashes mapping to relatively few provider data objects. A multihash (CID without codec) uniquely identifies a piece of content, and the provider data describes where and how to retrieve the content.

Content is indexed by giving a provider data object (the value) and a set of multihashes (keys) that map to that value. Typically, the provider value represents a storage deal and the multihash keys identify content stored within that deal. To subsequently retrieve a provider value, the indexer-core is given a multihash key to lookup.

Data Storage provides more detail on how data is stored by the indexer-core.

This indexer-core is the component of an indexer that provides data storage and retrieval for content index data. An indexer must also supply all the service functionality necessary to create an indexing service, which is not included in the indexer-core component.

Configurable Cache

An integrated cache is included to aid in fast index lookups. By default the cache is configured as a retrieval cache, meaning that items are only stored in the cache when index data is looked up, and will speed up repeated lookups of the same data. The cache can be optionally disabled, and its size is configurable. The cache interface allows alternative cache implementations to be used if desired.

See Usage Example for details.

Choice of Persistent Storage

The persistent storage is provided by a choice of storage systems that include pebble, and an in-memory implementation. The storage interface allows any other storage system solution to be adapted.

See Usage Example for details.

Install

 go get github.com/ipni/go-indexer-core

Usage

import "github.com/ipni/go-indexer-core"

See pkg.go.dev documentation

Example
package main

import (
	"log"
	"os"

	"github.com/ipni/go-indexer-core"
	"github.com/ipni/go-indexer-core/cache"
	"github.com/ipni/go-indexer-core/cache/radixcache"
	"github.com/ipni/go-indexer-core/engine"
	"github.com/ipni/go-indexer-core/store/pebble"
	"github.com/ipfs/go-cid"
	"github.com/libp2p/go-libp2p-core/peer"
)

func main() {
	// Configuration values.
	const valueStoreDir = "/tmp/indexvaluestore"
	const cacheSize = 65536

	// Create value store of configured type.
	os.Mkdir(valueStoreDir, 0770)
    valueStore, err := pebble.New(valueStoreDir, nil)
	if err != nil {
		log.Fatal(err)
	}

	// Create result cache, or disabled it.
	var resultCache cache.Interface
	if cacheSize > 0 {
		resultCache = radixcache.New(cacheSize)
	} else {
		log.Print("Result cache disabled")
	}

	// Create indexer core.
	indexerCore := engine.New(resultCache, valueStore)

	// Put some index data into indexer core.
	cid1, _ := cid.Decode("QmPNHBy5h7f19yJDt7ip9TvmMRbqmYsa6aetkrsc1ghjLB")
	cid2, _ := cid.Decode("QmUaPc2U1nUJeVj6HxBxS5fGxTWAmpvzwnhB8kavMVAotE")
	peerID, _ := peer.Decode("12D3KooWKRyzVWW6ChFjQjK4miCty85Niy48tpPV95XdKu1BcvMA")
	ctxID := []byte("someCtxID")
	value := indexer.Value{
		ProviderID:    peerID,
		ContextID:     ctxID,
		MetadataBytes: []byte("someMetadata"),
	}
	err = indexerCore.Put(value, cid1.Hash(), cid2.Hash())
	if err != nil {
		log.Fatal(err)
	}

	// Lookup provider data by multihash.
	values, found, err := indexerCore.Get(cid1.Hash())
	if err != nil {
		log.Fatal(err)
	}
	if found {
		log.Printf("Found %d values for cid1", len(values))
	}
	
	// Remove provider values by contextID, and multihashes that map to them.
	err = indexerCore.RemoveProviderContext(peerID, ctxID)
	if err != nil {
		log.Fatal(err)                                                                                                                   
	}
}

License

This project is dual-licensed under Apache 2.0 and MIT terms:

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (

	// ErrCodecOverflow signals that unexpected size was encountered while
	// unmarshalling bytes to Value.
	ErrCodecOverflow = errors.New("overflow")
)
View Source
var ErrStatsNotSupported = errors.New("stats is not supported by store")

ErrStatsNotSupported signals that an indexer.Interface does not support Stats calculation.

Functions

This section is empty.

Types

type BinaryValueCodec

type BinaryValueCodec struct {
	// ZeroCopy controls whether decoded values of []byte type point into
	// the input []byte parameter passed to UnmarshalValue or
	// UnmarshalValueKeys.
	//
	// This optimization prevents unnecessary copying. It is optional as
	// the caller MUST ensure that the input parameter []byte is not
	// modified after the Unmarshal happens, as any changes are mirrored in
	// the decoded result.
	ZeroCopy bool
}

BinaryValueCodec serializes and deserializes Value as binary sections prepended with byte length as varint.

func (BinaryValueCodec) MarshalValue

func (BinaryValueCodec) MarshalValue(v Value) ([]byte, error)

func (BinaryValueCodec) MarshalValueKeys

func (BinaryValueCodec) MarshalValueKeys(vk [][]byte) ([]byte, error)

func (BinaryValueCodec) UnmarshalValue

func (c BinaryValueCodec) UnmarshalValue(b []byte) (Value, error)

UnmarshalValue deserializes a single value.

If a failure occurs during serialization an error is returned along with the partially deserialized value keys. Only nil error means complete and successful deserialization.

func (BinaryValueCodec) UnmarshalValueKeys

func (c BinaryValueCodec) UnmarshalValueKeys(b []byte) ([][]byte, error)

UnmarshalValueKeys deserializes value keys.

If a failure occurs during serialization an error is returned along with the partially deserialized value keys. Only nil error means complete and successful deserialization.

type BinaryWithJsonFallbackCodec

type BinaryWithJsonFallbackCodec struct {
	BinaryValueCodec
	JsonValueCodec
}

BinaryWithJsonFallbackCodec always serialises values as binary but deserializes both from binary and JSON, which gracefully and opportunistically migrates codec from JSON to the more efficient binary format.

func (BinaryWithJsonFallbackCodec) MarshalValue

func (bjc BinaryWithJsonFallbackCodec) MarshalValue(v Value) ([]byte, error)

func (BinaryWithJsonFallbackCodec) MarshalValueKeys

func (bjc BinaryWithJsonFallbackCodec) MarshalValueKeys(vk [][]byte) ([]byte, error)

func (BinaryWithJsonFallbackCodec) UnmarshalValue

func (bjc BinaryWithJsonFallbackCodec) UnmarshalValue(b []byte) (Value, error)

func (BinaryWithJsonFallbackCodec) UnmarshalValueKeys

func (bjc BinaryWithJsonFallbackCodec) UnmarshalValueKeys(b []byte) ([][]byte, error)

type Interface

type Interface interface {
	// Get retrieves a slice of Value for a multihash.
	Get(multihash.Multihash) ([]Value, bool, error)

	// Put stores a Value and adds a mapping from each of the given multihashes
	// to that Value. If the Value has the same ProviderID and ContextID as a
	// previously stored Value, then update the metadata in the stored Value
	// with the metadata from the provided Value. Call Put without any
	// multihashes to only update existing values.
	Put(Value, ...multihash.Multihash) error

	// Remove removes the mapping of each multihash to the specified value.
	Remove(Value, ...multihash.Multihash) error

	// RemoveProvider removes all values for specified provider. This is used
	// when a provider is no longer indexed by the indexer.
	RemoveProvider(context.Context, peer.ID) error

	// RemoveProviderContext removes all values for specified provider that
	// have the specified contextID. This is used when a provider no longer
	// provides values for a particular context.
	RemoveProviderContext(providerID peer.ID, contextID []byte) error

	// Size returns the total bytes of storage used to store the indexed
	// content in persistent storage. This does not include memory used by any
	// in-memory cache that the indexer implementation may have, as that would
	// only contain a limited quantity of data and not represent the total
	// amount of data stored by the indexer.
	Size() (int64, error)

	// Flush commits any changes to the value storage,
	Flush() error

	// Close gracefully closes the store flushing all pending data from memory,
	Close() error

	// Iter creates a new value store iterator.
	Iter() (Iterator, error)

	// Stats returns statistical information about the indexed values.
	// If unsupported by the backing store, ErrStatsNotSupported is returned.
	Stats() (*Stats, error)
}

type Iterator

type Iterator interface {
	// Next returns the next multihash and the value it indexer. Returns io.EOF
	// when finished iterating.
	Next() (multihash.Multihash, []Value, error)

	// Close closes the iterator releasing any resources that may be occupied by it.
	// The iterator will no longer be usable after a call to this function and is
	// discarded.
	Close() error
}

Iterator iterates multihashes and values in the value store. Any write operation invalidates the iterator.

type JsonValueCodec

type JsonValueCodec struct{}

JsonValueCodec serializes and deserializes Value as JSON. See: json.Marshal, json.Unmarshal

func (JsonValueCodec) MarshalValue

func (JsonValueCodec) MarshalValue(v Value) ([]byte, error)

func (JsonValueCodec) MarshalValueKeys

func (JsonValueCodec) MarshalValueKeys(vk [][]byte) ([]byte, error)

func (JsonValueCodec) UnmarshalValue

func (JsonValueCodec) UnmarshalValue(b []byte) (v Value, err error)

func (JsonValueCodec) UnmarshalValueKeys

func (JsonValueCodec) UnmarshalValueKeys(b []byte) (vk [][]byte, err error)

type Stats

type Stats struct {
	// MultihashCount is the number of unique multihashes indexed.
	MultihashCount uint64
}

Stats provides statistics about the indexed values.

type Value

type Value struct {
	// ProviderID is the peer ID of the provider of the multihash.
	ProviderID peer.ID `json:"p"`
	// ContextID identifies the metadata that is part of this value.
	ContextID []byte `json:"c"`
	// MetadataBytes is serialized metadata. The is kept serialized, because
	// the indexer only uses the serialized form of this data.
	MetadataBytes []byte `json:"m,omitempty"`
}

Value is the value of an index entry that is stored for each multihash in the indexer.

func (Value) Equal

func (v Value) Equal(other Value) bool

Equal returns true if two Value instances are identical.

func (Value) Match

func (v Value) Match(other Value) bool

Match return true if both values have the same ProviderID and ContextID.

func (Value) MatchEqual

func (v Value) MatchEqual(other Value) (isMatch bool, isEqual bool)

MatchEqual returns true for the first bool if both values have the same ProviderID and ContextID, and returns true for the second value if the metadata is also equal.

type ValueCodec

type ValueCodec interface {
	// MarshalValue serializes a single value.
	MarshalValue(Value) ([]byte, error)
	// UnmarshalValue deserializes a single value.
	UnmarshalValue(b []byte) (Value, error)
	// MarshalValueKeys serializes a Value list for storage.
	MarshalValueKeys([][]byte) ([]byte, error)
	// UnmarshalValueKeys deserializes value keys list.
	UnmarshalValueKeys([]byte) ([][]byte, error)
}

ValueCodec represents Value serializer and deserializer to/from bytes.

Directories

Path Synopsis
store
dhstore
Package dhstore defines an dhstore client.
Package dhstore defines an dhstore client.
dhstore/client
Package client forks the HTTP request models of dhtsotre in order to avoid forcing upstream projects into C-bindings required by Foundation DB.
Package client forks the HTTP request models of dhtsotre in order to avoid forcing upstream projects into C-bindings required by Foundation DB.
memory
Package memory defines an in-memory value store
Package memory defines an in-memory value store
test
Package test provides tests and benchmarks that are usable by any store that implements store.Interface.
Package test provides tests and benchmarks that are usable by any store that implements store.Interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL