provider

package module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2022 License: Apache-2.0, MIT Imports: 10 Imported by: 0

README Ā¶

Index Provider šŸ“¢

Go Reference Coverage Status

A golang implementation of index provider

This repo provides a reference index provider implementation that can be used to advertise content to indexer nodes and serve retrieval requests over graphsync both as a standalone service or embedded into an existing Golang application via a reusable library.

A list of features include:

  • provider CLI that can:
    • Run as a standalone provider daemon instance.
    • Generate and publish indexing advertisements directly from CAR files.
    • Serve retrieval requests for the advertised content over GraphSync.
    • list advertisements published by a provider instance
    • verify ingestion of multihashes by an indexer node from CAR files, detached CARv2 indices or from an index provider's advertisement chain.
  • A Golang SDK to embed indexing integration into existing applications, which includes:
    • Programmatic advertisement for content via index provider Engine with built-in chunking functionality
    • Announcement of changes to the advertised content over GossipSub using go-legs
    • Callback integration point for fully customizable look up of advertised multihashes.
    • Utilities to advertise multihashes directly from CAR files or detached CARv2 index files.
    • Index advertisement metadata schema for retrieval over graphsync and bitswap

Current status šŸš§

This implementation is under active development.

Background

The protocol implemented by this repository is the index provider portion of a larger indexing protocol documented here . The indexer node implementation can be found at storetheindex repository.

For more details on the ingestion protocol itself see Providing data to a network indexer .

Install

Prerequisite:

To use the provider as a Go library, execute:

go get github.com/filecoin-project/index-provider

To install the latest provider CLI, run:

git clone https://github.com/filecoin-project/index-provider
cd index-provider && cd cmd
go install ./provider

Alternatively, download the executables directly from the releases.

Usage

Running an standalone provider daemon

To run a provider service first initialize it by executing:

provider init

Initialization generates a default configuration for the provider instance along with a randomly generated identity keypair. The configuration is stored at user home under .index-provider/config in JSON format. The root configuration path can be overridden by setting the PROVIDER_PATH environment variable

Once initialized, start the service daemon by executing:

provider daemon

The running daemon allows advertisement for new content to the indexer nodes and retrieval of content over GraphSync. Additionally, it starts an admin HTTP server that enables administrative operations using the provider CLI tool. By default, the admin server is bound to http://localhost:3102.

You can then advertise content by importing/removing CAR files via the provider CLI, for example:

provider import car -l http://localhost:3102 -i <path-to-car-file>

Both CARv1 and CARv2 formats are supported. Index is regenerated on the fly if one is not present.

Embedding index provider integration

The root go module offers a set of reusable libraries that can be used to embed index provider support into existing application. The core provider.Interface is implemented by engine.Engine.

The provider Engine exposes a set of APIs that allows a user to programmatically announce the availability or removal of content to the indexer nodes referred to as ā€œadvertisementā€. Advertisements are represented as an IPLD DAG, chained together via a link to the previous advertisement. An advertisement effectively captures the "diff" of the content that is either added or is no longer provided.

Each advertisement contains:

  • Provider ID: the libp2p peer ID of the content provider.
  • Addresses: a list of addresses from which the content can be retrieved.
  • Metadata: a blob of bytes capturing how to retrieve the data.

Within each advertisement there is an entries link that points to a list of chunked multihashes. It is the multihashes that

To do this it requires a Callback to be registered. The Callback is then used to look up the list of multihashes associated to a content advertisement. For an example on how to startup a provider engine, register a callback and advertise content, see:

provider CLI

The provider CLI can be used to interact with a running daemon via the admin server to perform a range of administrative operations. For example, the provider CLI can be used to import a CAR file and advertise its content to the indexer nodes by executing:

provider import car -l http://localhost:3102 -i <path-to-car-file>

For full usage, execute provider. Usage:

NAME:
   provider - Indexer Reference Provider Implementation

USAGE:
   provider [global options] command [command options] [arguments...]

VERSION:
   v0.2.7

COMMANDS:
   daemon             Starts a reference provider
   find               Query an indexer for indexed content
   index              Push a single content index into an indexer
   init               Initialize reference provider config file and identity
   connect            Connects to an indexer through its multiaddr
   import, i          Imports sources of multihashes to the index provider.
   register           Register provider information with an indexer that trusts the provider
   remove, rm         Removes previously advertised multihashes by the provider.
   verify-ingest, vi  Verifies ingestion of multihashes to an indexer node from a CAR file or a CARv2 Index
   list               Lists advertisements
   help, h            Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h     show help (default: false)
   --version, -v  print the version (default: false)

Storage Consumption

The index provider engine uses a given datastore to persist two general category of data:

  1. Internal advertisement mappings, and
  2. Chunked entries chain cache

If the datastore passed to the engine is reused, it is recommended to wrap it in a namespace prior to instantiating the engine.

Internal advertisement mappings

The internal advertisement mappings are purely used by the engine to efficiently handle publication requests. It generally includes:

  • mapping to the latest advertisement
  • mappings between advertisement CIDs, their context ID and their corresponding metadata.

The storage consumed by such mappings is negligible and grows linearly as a factor of the number of advertisements published.

Chunked entries chain cache

This category stores chunked entries generated by publishing an advertisement with a never seen before context ID. The chunks are stored in an LRU cache, the maximum size of which is configured by the following configuration parameters in Ingest config:

  • LinkChunkSize - The maximum number of multihashes in a chunk (defaults to 16,384)
  • LinkCacheSize - The maximum number of entries links to chace (defaults to 1024)

The exact storage usage depends on the size of multihashes. For example, using the default config to advertise 128-bit long multihashes will result in chunk sizes of 0.25MiB with maximum cache growth of 256 MiB.

To delete the cache set PurgeLinkCache to true and restart the engine.

Note that the LRU cache may grow beyond its max size if the generated chain of chunks is longer than the configured LinkChunkSize. This is to avoid partial caching of chunks within a single advertisement. The cache expansion is logged in INFO level at provider/engine logging subsystem.

License

SPDX-License-Identifier: Apache-2.0 OR MIT

Documentation Ā¶

Overview Ā¶

Package provider represents a reference implementation of an index provider. It integrates with the indexer node protocol, "storetheinex" in order to advertise the availability of a list of multihashes as an IPLD DAG. For the complete advertisement IPLD schema, see:

A reference implementation of provider.Interface can be found in engine.Engine.

Index Ā¶

Constants Ā¶

This section is empty.

Variables Ā¶

View Source
var (
	// ErrNoCallback is thrown when no callback has been defined.
	ErrNoCallback = errors.New("no callback is registered")

	// ErrContextIDNotFound signals that no item is associated to the given context ID.
	ErrContextIDNotFound = errors.New("context ID not found")

	// ErrAlreadyAdvertised indicates that an advertisement for identical
	// content was already published.
	ErrAlreadyAdvertised = errors.New("advertisement already published")
)

Functions Ā¶

This section is empty.

Types Ā¶

type Callback Ā¶

type Callback func(ctx context.Context, contextID []byte) (MultihashIterator, error)

Callback is used by provider to look up a list of multihashes associated to a context ID. The callback must be deterministic: it must produce the same list of multihashes in the same order for the same context ID.

See: Interface.NotifyPut, Interface.NotifyRemove, MultihashIterator.

type Interface Ā¶

type Interface interface {
	// PublishLocal appends adv to the locally stored advertisement chain and
	// returns the corresponding CID to it.  This function does not publish the
	// changes to the advertisement chain onto gossip pubsub channel.  Use
	// Publish instead if indexer nodes must be made aware of the appended
	// advertisement.
	//
	// See: Publish.
	PublishLocal(context.Context, schema.Advertisement) (cid.Cid, error)

	// Publish appends adv to the locally stored advertisement chain, and
	// publishes the new advertisement onto gossip pubsub.  The CID returned
	// represents the ID of the advertisement appended to the chain.
	Publish(context.Context, schema.Advertisement) (cid.Cid, error)

	// RegisterCallback registers the callback used by the provider to look up
	// a list of multihashes by context ID.  Only a single callback is
	// supported; repeated calls to this function will replace the previous
	// callback.
	RegisterCallback(Callback)

	// NotifyPut signals the provider that the list of multihashes looked up by
	// the given contextID is available.  The given contextID is then used to
	// look up the list of multihashes via Callback.  An advertisement is then
	// generated, appended to the chain of advertisements and published onto
	// the gossip pubsub channel.
	//
	// A Callback must be registered prior to using this function.
	// ErrNoCallback is returned if no such callback is registered.
	//
	// The metadata is data that provides hints about how to retrieve data and
	// is protocol dependant.  The metadata must at least specify a protocol
	// ID, but its data is optional.
	//
	// If both the contextID and metadata are the same as a previous call to
	// NotifyPut, then ErrAlreadyAdvertised is returned.
	//
	// This function returns the ID of the advertisement published.
	NotifyPut(ctx context.Context, contextID []byte, metadata stiapi.Metadata) (cid.Cid, error)

	// NotifyRemove sginals to the provider that the multihashes that
	// corresponded to the given contextID are no longer available.  An advertisement
	// is then generated, appended to the chain of advertisements and published
	// onto the gossip pubsub channel.
	// The given contextID must have previously been put via NotifyPut.
	// If not found ErrContextIDNotFound is returned.
	//
	// This function returns the ID of the advertisement published.
	NotifyRemove(ctx context.Context, contextID []byte) (cid.Cid, error)

	// GetAdv gets the advertisement that corresponds to the given cid.
	GetAdv(context.Context, cid.Cid) (schema.Advertisement, error)

	// GetLatestAdv gets the latest advertisement on this provider's
	// advertisement chain and the CID to which it corresponds.
	GetLatestAdv(context.Context) (cid.Cid, schema.Advertisement, error)

	// Shutdown shuts down this provider, and blocks until all resources
	// occupied by it are discared.  After calling this function the provider
	// is no longer available for use.
	Shutdown() error
}

Interface represents an index provider that manages the advertisement of multihashes to indexer nodes.

type MultihashIterator Ā¶

type MultihashIterator interface {
	// Next returns the next multihash in the list of mulitihashes.  The
	// iterator fails fast: errors that occur during iteration are returned
	// immediately.  This function returns a zero multihash and io.EOF when
	// there are no more elements to return.
	Next() (multihash.Multihash, error)
}

MultihashIterator iterates over a list of multihashes.

See: CarMultihashIterator.

func CarMultihashIterator Ā¶

func CarMultihashIterator(idx carindex.IterableIndex) (MultihashIterator, error)

CarMultihashIterator constructs a new MultihashIterator from a CAR index.

This iterator supplies multihashes in deterministic order of their corresponding CAR offset. The order is maintained consistently regardless of the underlying IterableIndex implementation.

Directories Ā¶

Path Synopsis
Package cardatatransfer privdes a datatransfer server that can be used to retrieve multihashes supplied via engine.Engine and supplier.CarSupplier as the provider.Callback.
Package cardatatransfer privdes a datatransfer server that can be used to retrieve multihashes supplied via engine.Engine and supplier.CarSupplier as the provider.Callback.
stores
Package stores is copy pasted from go-fil-markets stores package - there is no novel code here.
Package stores is copy pasted from go-fil-markets stores package - there is no novel code here.
cmd module
Package engine provides a reference implementation of the provider.Interface in order to advertise the availability of a list of multihashes to indexer nodes such as "storetheindex".
Package engine provides a reference implementation of the provider.Interface in order to advertise the availability of a list of multihashes to indexer nodes such as "storetheindex".
chunker
Package chunker provides functionality for chunking entries chain generated from provider.MultihashIterator, represented as EntriesChunker interface.
Package chunker provides functionality for chunking entries chain generated from provider.MultihashIterator, represented as EntriesChunker interface.
Package metadata captures the metadata types known by the index-provider.
Package metadata captures the metadata types known by the index-provider.
Package mock_provider is a generated GoMock package.
Package mock_provider is a generated GoMock package.
server
admin/http
Package adminserver provides a HTTP server that allows to perform administrative operations.
Package adminserver provides a HTTP server that allows to perform administrative operations.
Package supplier provides mechanisms to supply mulithashes to an index-provider engine via provider.Callback.
Package supplier provides mechanisms to supply mulithashes to an index-provider engine via provider.Callback.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL