unicast

package
v0.37.9-en-fetch-retry... Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 4, 2024 License: AGPL-3.0 Imports: 20 Imported by: 3

README

Unicast Manager

Overview

In Flow blockchain, nodes communicate with each other in 3 different ways; unicast, multicast, and publish. The multicast and publish are handled by the pubsub (GossipSub) protocol. The unicast is a protocol that is used to send messages over direct (one-to-one) connections to remote nodes. Each unicast message is sent through a single-used, one-time stream. One can see a stream as a virtual protocol that expands the base direct connection into a full-duplex communication channel. Figure below illustrates the notion of direct connection and streams between nodes A and B. The direct connection is established between the nodes and then the nodes can open multiple streams over the connection. The streams are shown with dashed green lines, while the direct connection is illustrated by blue lines that encapsulates the streams. streams.png

The unicast Manager is responsible for establishing streams between nodes when they need to communicate over unicast protocol. When the manager receives a CreateStream invocation, it will try to establish a stream to the remote peer whose identifier is provided in the invocation (peer.ID). The manager is expanding the libp2p functionalities, hence, it operates on the notion of the peer (rather than Flow node), and peer.ID rather than flow.Identifier. It is the responsibility of the caller to provide the correct peer.ID of the remote node.

The UnicastManager relies on the underlying libp2p node to establish the connection to the remote peer. Once the underlying libp2p node receives a stream creation request from the UnicastManager, it will try to establish a connection to the remote peer if there is no existing connection to the peer. Otherwise, it will pick and re-use the best existing connection to the remote peer. Hence, the UnicastManager does not (and should not) care about the connection establishment, and rather relies on the underlying libp2p node to establish the connection. The UnicastManager only cares about the stream creation, and will return an error if the underlying libp2p node fails to establish a connection to the remote peer.

A stream is a one-time communication channel, i.e., it is assumed to be closed by the caller once the message is sent. The caller (i.e., the Flow node) does not necessarily re-use a stream, and the Manager creates one stream per request (i.e., CreateStream invocation), which is typically a single message.

Note: the limit of number of streams and connections between nodes is set throught eh libp2p resource manager limits (see config/default-config.yml):

Note: pubsub protocol also establishes connections between nodes to exchange gossip messages with each other. The connection type is the same between pubsub and unicast protocols, as they both consult the underlying LibP2P node to establish the connection. However, the level of reliability, life-cycle, and other aspects of the connections are different between the two protocols. For example, pubsub requires some number of connections to some number of peers, which in most cases is regardless of their identity. However, unicast requires a connection to a specific peer, and the connection is assumed to be persistent. Hence, both these protocols have their own notion of connection management; the unicast Manager is responsible for establishing connections when unicast protocol needs to send a message to a remote peer, while the PeerManager is responsible for establishing connections when pubsub. These two work in isolation and independent of each other to satisfy different requirements.

The PeerManager regularly checks the health of the connections and closes the connections to the peers that are not part of the Flow protocol state. One the other hand, the unicast Manager only establishes a connection if there is no existing connection to the remote peer. Currently, Flow nodes operate on a full mesh topology, meaning that every node is connected to every other node through PeerManager. The PeerManager starts connecting to every remote node of the Flow protocol upon startup, and then maintains the connections unless the node is disallow-listed or ejected by the protocol state. Accordingly, it is a rare event that a node does not have a connection to another node. Also, that is the reason behind the unicast Manager not closing the connection after the stream is closed. The unicast Manager assumes that the connection is persistent and will be kept open by the PeerManager.

Backoff and Retry Attempts

The flowchart below explains the abstract logic of the UnicastManager when it receives a CreateStream invocation. On the happy path, the UnicastManager successfully opens a stream to the peer. However, there can be cases that the remote peer is not reliable for stream creation, or the remote peer acts maliciously and does not respond stream creation requests. In order to distinguish between the cases that the remote peer is not reliable and the cases that the remote peer is malicious, the UnicastManager uses a backoff and retry mechanism.

retry.png

Addressing Unreliable Remote Peer

To address the unreliability of remote peer, upon an unsuccessful attempt to establish a stream, the UnicastManager will wait for a certain amount of time before it tries to establish (i.e., the backoff mechanism), and will retry a certain number of times before it gives up (i.e., the retry mechanism). The backoff and retry parameters are configurable through runtime flags. If all backoff and retry attempts fail, the UnicastManager will return an error to the caller. The caller can then decide to retry the request or not. By default, UnicastManager retries each stream creation attempt 3 times. Also, the backoff intervals for dialing and stream creation are initialized to 1 second and progress exponentially with a factor of 2, i.e., the i-th retry attempt is made after t * 2^(i-1), where t is the backoff interval. For example, if the backoff interval is 1s, the first attempt is made right-away, the first (retry) attempt is made after 1s * 2^(1 - 1) = 1s, the third (retry) attempt is made after 1s * 2^(2 - 1) = 2s, and so on.

These parameters are configured using the config/default-config.yml file:

  # Unicast create stream retry delay is initial delay used in the exponential backoff for create stream retries
  unicast-create-stream-retry-delay: 1s
Addressing Malicious Remote Peer

The backoff and retry mechanism is used to address the cases that the remote peer is not reliable. However, there can be cases that the remote peer is malicious and does not respond to stream creation requests. Such cases may cause the UnicastManager to wait for a long time before it gives up, resulting in a resource exhaustion and slow-down of the stream creation. To mitigate such cases, the UnicastManager uses a retry budget for the stream creation. The retry budgets are initialized using the config/default-config.yml file:

  # The maximum number of retry attempts for creating a unicast stream to a remote peer before giving up. If it is set to 3 for example, it means that if a peer fails to create
  # retry a unicast stream to a remote peer 3 times, the peer will give up and will not retry creating a unicast stream to that remote peer.
  # When it is set to zero it means that the peer will not retry creating a unicast stream to a remote peer if it fails.
  unicast-max-stream-creation-retry-attempt-times: 3

As shown in the above snippet, the stream creation is set to 3 by default for every remote peer. Each time the UnicastManager is invoked on CreateStream to pid (peer.ID), it loads the retry budgets for pid from the unicast config cache. If no unicast config record exists for pid, one is created with the default retry budgets. The UnicastManager then uses the retry budgets to decide whether to retry the stream creation attempt or not. If the retry budget for stream creation is exhausted, the UnicastManager will not retry the stream creation attempt, and returns an error to the caller. The caller can then decide to retry the request or not. Note that even when the retry budget is exhausted, the UnicastManager will try the stream creation attempt once, though it will not retry the attempt if it fails.

Penalizing Malicious Remote Peer

Each time the UnicastManager fails to create a stream to a remote peer and exhausts the retry budget, it penalizes the remote peer as follows:

  • If the UnicastManager exhausts the retry budget for stream creation, it will decrement the stream creation retry budget for the remote peer.
  • If the retry budget reaches zero, the UnicastManager will only attempt once to create a stream to the remote peer, and will not retry the attempt, and rather return an error to the caller.
  • When the budget reaches zero, the UnicastManager will not decrement the budget anymore.

Note: UnicastManager is part of the networking layer of the Flow node, which is a lower-order component than the Flow protocol engines who call the UnicastManager to send messages to remote peers. Hence, the UnicastManager must not outsmart the Flow protocol engines on deciding whether to create stream in the first place. This means that UnicastManager will attempt to create stream even to peers with zero retry budgets. However, UnicastManager does not retry attempts for the peers with zero budgets, and rather returns an error immediately upon a failure. This is the responsibility of the Flow protocol engines to decide whether to send a message to a remote peer or not after a certain number of failures.

Restoring Retry Budgets

The UnicastManager may reset the stream creation budget for a remote peers from zero to the default values in the following cases:

  • Restoring Stream Creation Retry Budget: To restore the stream creation budget from zero to the default value, the UnicastManager keeps track of the consecutive successful streams created to the remote peer. Everytime a stream is created successfully, the UnicastManager increments a counter for the remote peer. The counter is reset to zero upon the first failure to create a stream to the remote peer. If the counter reaches a certain threshold, the UnicastManager will reset the stream creation budget for the remote peer to the default value. The threshold is configurable through the config/default-config.yml file:
    # The minimum number of consecutive successful streams to reset the unicast stream creation retry budget from zero to the maximum default. If it is set to 100 for example, it
    # means that if a peer has 100 consecutive successful streams to the remote peer, and the remote peer has a zero stream creation budget,
    # the unicast stream creation retry budget for that remote peer will be reset to the maximum default.
    unicast-stream-zero-retry-reset-threshold: 100
    
    Reaching the threshold means that the remote peer is reliable enough to regain the default retry budget for stream creation.

Documentation

Index

Constants

View Source
const (
	// MaxRetryJitter is the maximum number of milliseconds to wait between attempts for a 1-1 direct connection
	MaxRetryJitter = 5
)

Variables

This section is empty.

Functions

func IsErrMaxRetries added in v0.30.0

func IsErrMaxRetries(err error) bool

IsErrMaxRetries returns whether an error is ErrMaxRetries.

Types

type Config added in v0.33.1

type Config struct {
	StreamCreationRetryAttemptBudget uint64 // number of times we have to try to open a stream to the peer before we give up.
	ConsecutiveSuccessfulStream      uint64 // consecutive number of successful streams to the peer since the last time stream creation failed.
}

Config is a struct that represents the dial config for a peer.

type ConfigCache added in v0.33.1

type ConfigCache interface {
	// GetWithInit returns the dial config for the given peer id. If the config does not exist, it creates a new config
	// using the factory function and stores it in the cache.
	// Args:
	// - peerID: the peer id of the dial config.
	// Returns:
	//   - *Config, the dial config for the given peer id.
	//   - error if the factory function returns an error. Any error should be treated as an irrecoverable error and indicates a bug.
	GetWithInit(peerID peer.ID) (*Config, error)

	// Adjust adjusts the dial config for the given peer id using the given adjustFunc.
	// It returns an error if the adjustFunc returns an error.
	// Args:
	// - peerID: the peer id of the dial config.
	// - adjustFunc: the function that adjusts the dial config.
	// Returns:
	//   - error if the adjustFunc returns an error. Any error should be treated as an irrecoverable error and indicates a bug.
	AdjustWithInit(peerID peer.ID, adjustFunc UnicastConfigAdjustFunc) (*Config, error)

	// Size returns the number of dial configs in the cache.
	Size() uint
}

ConfigCache is a thread-safe cache for dial configs. It is used by the unicast service to store the dial configs for peers.

type DialConfigCacheFactory added in v0.32.2

type DialConfigCacheFactory func(configFactory func() Config) ConfigCache

type ErrMaxRetries added in v0.30.0

type ErrMaxRetries struct {
	// contains filtered or unexported fields
}

ErrMaxRetries indicates retries completed with max retries without a successful attempt.

func NewMaxRetriesErr added in v0.30.0

func NewMaxRetriesErr(attempts uint64, err error) ErrMaxRetries

NewMaxRetriesErr returns a new ErrMaxRetries.

func (ErrMaxRetries) Error added in v0.30.0

func (e ErrMaxRetries) Error() string

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager manages libp2p stream negotiation and creation, which is utilized for unicast dispatches.

func NewUnicastManager

func NewUnicastManager(cfg *ManagerConfig) (*Manager, error)

NewUnicastManager creates a new unicast manager. Args:

  • cfg: configuration for the unicast manager.

Returns:

  • a new unicast manager.
  • an error if the configuration is invalid, any error is irrecoverable.

func (*Manager) CreateStream

func (m *Manager) CreateStream(ctx context.Context, peerID peer.ID) (libp2pnet.Stream, error)

CreateStream tries establishing a libp2p stream to the remote peer id. It tries creating streams in the descending order of preference until it either creates a successful stream or runs out of options. Args:

  • ctx: context for the stream creation.
  • peerID: peer ID of the remote peer.

Returns:

  • a new libp2p stream.
  • error if the stream creation fails; the error is benign and can be retried.

func (*Manager) Register

func (m *Manager) Register(protocol protocols.ProtocolName) error

Register registers given protocol name as preferred unicast. Each invocation of register prioritizes the current protocol over previously registered ones.

func (*Manager) SetDefaultHandler added in v0.32.0

func (m *Manager) SetDefaultHandler(defaultHandler libp2pnet.StreamHandler)

SetDefaultHandler sets the default stream handler for this unicast manager. The default handler is utilized as the core handler for other unicast protocols, e.g., compressions.

type ManagerConfig added in v0.32.2

type ManagerConfig struct {
	Logger        zerolog.Logger               `validate:"required"`
	StreamFactory p2p.StreamFactory            `validate:"required"`
	SporkId       flow.Identifier              `validate:"required"`
	Metrics       module.UnicastManagerMetrics `validate:"required"`

	Parameters *netconf.UnicastManager `validate:"required"`

	// UnicastConfigCacheFactory is a factory function to create a new dial config cache.
	UnicastConfigCacheFactory DialConfigCacheFactory `validate:"required"`
}

type UnicastConfigAdjustFunc added in v0.33.1

type UnicastConfigAdjustFunc func(Config) (Config, error)

UnicastConfigAdjustFunc is a function that is used to adjust the fields of a DialConfigEntity. The function is called with the current config and should return the adjusted record. Returned error indicates that the adjustment is not applied, and the config should not be updated. In BFT setup, the returned error should be treated as a fatal error.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL