README
¶
Unicast Manager
Overview
In Flow blockchain, nodes communicate with each other in 3 different ways; unicast
, multicast
, and publish
.
The multicast
and publish
are handled by the pubsub (GossipSub) protocol.
The unicast
is a protocol that is used to send messages over direct (one-to-one) connections to remote nodes.
Each unicast
message is sent through a single-used, one-time stream. One can see a stream as a virtual protocol
that expands the base direct connection into a full-duplex communication channel.
Figure below illustrates the notion of direct connection and streams between nodes A and B. The direct
connection is established between the nodes and then the nodes can open multiple streams over the connection.
The streams are shown with dashed green lines, while the direct connection is illustrated by blue lines that
encapsulates the streams.
The unicast
Manager
is responsible for establishing streams between nodes when they need to communicate
over unicast
protocol. When the manager receives a CreateStream
invocation, it will try to establish a stream to the
remote peer
whose identifier is provided in the invocation (peer.ID
). The manager is expanding the libp2p
functionalities, hence, it operates on the notion of the peer
(rather than Flow node), and peer.ID
rather
than flow.Identifier
. It is the responsibility of the caller to provide the correct peer.ID
of the remote
node.
The UnicastManager
relies on the underlying libp2p node to establish the connection to the remote peer. Once the underlying
libp2p node receives a stream creation request from the UnicastManager
, it will try to establish a connection to the remote peer if
there is no existing connection to the peer. Otherwise, it will pick and re-use the best existing connection to the remote peer.
Hence, the UnicastManager
does not (and should not) care about the connection establishment, and rather relies on the underlying
libp2p node to establish the connection. The UnicastManager
only cares about the stream creation, and will return an error
if the underlying libp2p node fails to establish a connection to the remote peer.
A stream is a one-time communication channel, i.e., it is assumed to be closed
by the caller once the message is sent. The caller (i.e., the Flow node) does not necessarily re-use a stream, and the
Manager
creates one stream per request (i.e., CreateStream
invocation), which is typically a single message.
Note: the limit of number of streams and connections between nodes is set throught eh libp2p resource manager limits (see config/default-config.yml
):
Note: pubsub
protocol also establishes connections between nodes to exchange gossip messages with each other.
The connection type is the same between pubsub
and unicast
protocols, as they both consult the underlying LibP2P node to
establish the connection. However, the level of reliability, life-cycle, and other aspects of the connections are different
between the two protocols. For example, pubsub
requires some number of connections to some number of peers, which in most cases
is regardless of their identity. However, unicast
requires a connection to a specific peer, and the connection is assumed
to be persistent. Hence, both these protocols have their own notion of connection management; the unicast
Manager
is responsible
for establishing connections when unicast
protocol needs to send a message to a remote peer, while the PeerManager
is responsible
for establishing connections when pubsub
. These two work in isolation and independent of each other to satisfy different requirements.
The PeerManager
regularly checks the health of the connections and closes the connections to the peers that are not part of the Flow
protocol state. One the other hand, the unicast
Manager
only establishes a connection if there is no existing connection to the remote
peer. Currently, Flow nodes operate on a full mesh topology, meaning that every node is connected to every other node through PeerManager
.
The PeerManager
starts connecting to every remote node of the Flow protocol upon startup, and then maintains the connections unless the node
is disallow-listed or ejected by the protocol state. Accordingly, it is a rare event that a node does not have a connection to another node.
Also, that is the reason behind the unicast
Manager
not closing the connection after the stream is closed. The unicast
Manager
assumes
that the connection is persistent and will be kept open by the PeerManager
.
Backoff and Retry Attempts
The flowchart below explains the abstract logic of the UnicastManager
when it receives a CreateStream
invocation.
On the happy path, the UnicastManager
successfully opens a stream to the peer.
However, there can be cases that the remote peer is not reliable for stream creation, or the remote peer acts
maliciously and does not respond stream creation requests. In order to distinguish between the cases that the remote peer
is not reliable and the cases that the remote peer is malicious, the UnicastManager
uses a backoff and retry mechanism.
Addressing Unreliable Remote Peer
To address the unreliability of remote peer, upon an unsuccessful attempt to establish a stream, the UnicastManager
will wait for a certain
amount of time before it tries to establish (i.e., the backoff mechanism), and will retry a certain number of times before it gives up (i.e., the retry mechanism).
The backoff and retry parameters are configurable through runtime flags.
If all backoff and retry attempts fail, the UnicastManager
will return an error to the caller. The caller can then decide to retry the request or not.
By default, UnicastManager
retries each stream creation attempt 3 times. Also, the backoff intervals for dialing and stream creation are initialized to 1 second and progress
exponentially with a factor of 2, i.e., the i-th
retry attempt is made after t * 2^(i-1)
, where t
is the backoff interval.
For example, if the backoff interval is 1s, the first attempt is made right-away, the first (retry) attempt is made after 1s * 2^(1 - 1) = 1s, the third (retry) attempt is made
after 1s * 2^(2 - 1) = 2s
, and so on.
These parameters are configured using the config/default-config.yml
file:
# Unicast create stream retry delay is initial delay used in the exponential backoff for create stream retries
unicast-create-stream-retry-delay: 1s
Addressing Malicious Remote Peer
The backoff and retry mechanism is used to address the cases that the remote peer is not reliable.
However, there can be cases that the remote peer is malicious and does not respond to stream creation requests.
Such cases may cause the UnicastManager
to wait for a long time before it gives up, resulting in a resource exhaustion and slow-down of the stream creation.
To mitigate such cases, the UnicastManager
uses a retry budget for the stream creation. The retry budgets are initialized
using the config/default-config.yml
file:
# The maximum number of retry attempts for creating a unicast stream to a remote peer before giving up. If it is set to 3 for example, it means that if a peer fails to create
# retry a unicast stream to a remote peer 3 times, the peer will give up and will not retry creating a unicast stream to that remote peer.
# When it is set to zero it means that the peer will not retry creating a unicast stream to a remote peer if it fails.
unicast-max-stream-creation-retry-attempt-times: 3
As shown in the above snippet, the stream creation is set to 3 by default for every remote peer.
Each time the UnicastManager
is invoked on CreateStream
to pid
(peer.ID
), it loads the retry budgets for pid
from the unicast config cache.
If no unicast config record exists for pid
, one is created with the default retry budgets. The UnicastManager
then uses the retry budgets to decide
whether to retry the stream creation attempt or not. If the retry budget for stream creation is exhausted, the UnicastManager
will not retry the stream creation attempt, and returns an error to the caller. The caller can then decide to retry the request or not.
Note that even when the retry budget is exhausted, the UnicastManager
will try the stream creation attempt once, though it will not retry the attempt if it fails.
Penalizing Malicious Remote Peer
Each time the UnicastManager
fails to create a stream to a remote peer and exhausts the retry budget, it penalizes the remote peer as follows:
- If the
UnicastManager
exhausts the retry budget for stream creation, it will decrement the stream creation retry budget for the remote peer. - If the retry budget reaches zero, the
UnicastManager
will only attempt once to create a stream to the remote peer, and will not retry the attempt, and rather return an error to the caller. - When the budget reaches zero, the
UnicastManager
will not decrement the budget anymore.
Note: UnicastManager
is part of the networking layer of the Flow node, which is a lower-order component than
the Flow protocol engines who call the UnicastManager
to send messages to remote peers. Hence, the UnicastManager
must not outsmart
the Flow protocol engines on deciding whether to create stream in the first place. This means that UnicastManager
will attempt
to create stream even to peers with zero retry budgets. However, UnicastManager
does not retry attempts for the peers with zero budgets, and rather
returns an error immediately upon a failure. This is the responsibility of the Flow protocol engines to decide whether
to send a message to a remote peer or not after a certain number of failures.
Restoring Retry Budgets
The UnicastManager
may reset the stream creation budget for a remote peers from zero to the default values in the following cases:
- Restoring Stream Creation Retry Budget: To restore the stream creation budget from zero to the default value, the
UnicastManager
keeps track of the consecutive successful streams created to the remote peer. Everytime a stream is created successfully, theUnicastManager
increments a counter for the remote peer. The counter is reset to zero upon the first failure to create a stream to the remote peer. If the counter reaches a certain threshold, theUnicastManager
will reset the stream creation budget for the remote peer to the default value. The threshold is configurable through theconfig/default-config.yml
file:
Reaching the threshold means that the remote peer is reliable enough to regain the default retry budget for stream creation.# The minimum number of consecutive successful streams to reset the unicast stream creation retry budget from zero to the maximum default. If it is set to 100 for example, it # means that if a peer has 100 consecutive successful streams to the remote peer, and the remote peer has a zero stream creation budget, # the unicast stream creation retry budget for that remote peer will be reset to the maximum default. unicast-stream-zero-retry-reset-threshold: 100
Documentation
¶
Index ¶
Constants ¶
const (
// MaxRetryJitter is the maximum number of milliseconds to wait between attempts for a 1-1 direct connection
MaxRetryJitter = 5
)
Variables ¶
This section is empty.
Functions ¶
func IsErrMaxRetries ¶ added in v0.30.0
IsErrMaxRetries returns whether an error is ErrMaxRetries.
Types ¶
type Config ¶ added in v0.33.1
type Config struct { StreamCreationRetryAttemptBudget uint64 // number of times we have to try to open a stream to the peer before we give up. ConsecutiveSuccessfulStream uint64 // consecutive number of successful streams to the peer since the last time stream creation failed. }
Config is a struct that represents the dial config for a peer.
type ConfigCache ¶ added in v0.33.1
type ConfigCache interface { // GetWithInit returns the dial config for the given peer id. If the config does not exist, it creates a new config // using the factory function and stores it in the cache. // Args: // - peerID: the peer id of the dial config. // Returns: // - *Config, the dial config for the given peer id. // - error if the factory function returns an error. Any error should be treated as an irrecoverable error and indicates a bug. GetWithInit(peerID peer.ID) (*Config, error) // Adjust adjusts the dial config for the given peer id using the given adjustFunc. // It returns an error if the adjustFunc returns an error. // Args: // - peerID: the peer id of the dial config. // - adjustFunc: the function that adjusts the dial config. // Returns: // - error if the adjustFunc returns an error. Any error should be treated as an irrecoverable error and indicates a bug. AdjustWithInit(peerID peer.ID, adjustFunc UnicastConfigAdjustFunc) (*Config, error) // Size returns the number of dial configs in the cache. Size() uint }
ConfigCache is a thread-safe cache for dial configs. It is used by the unicast service to store the dial configs for peers.
type DialConfigCacheFactory ¶ added in v0.32.2
type DialConfigCacheFactory func(configFactory func() Config) ConfigCache
type ErrMaxRetries ¶ added in v0.30.0
type ErrMaxRetries struct {
// contains filtered or unexported fields
}
ErrMaxRetries indicates retries completed with max retries without a successful attempt.
func NewMaxRetriesErr ¶ added in v0.30.0
func NewMaxRetriesErr(attempts uint64, err error) ErrMaxRetries
NewMaxRetriesErr returns a new ErrMaxRetries.
func (ErrMaxRetries) Error ¶ added in v0.30.0
func (e ErrMaxRetries) Error() string
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager manages libp2p stream negotiation and creation, which is utilized for unicast dispatches.
func NewUnicastManager ¶
func NewUnicastManager(cfg *ManagerConfig) (*Manager, error)
NewUnicastManager creates a new unicast manager. Args:
- cfg: configuration for the unicast manager.
Returns:
- a new unicast manager.
- an error if the configuration is invalid, any error is irrecoverable.
func (*Manager) CreateStream ¶
CreateStream tries establishing a libp2p stream to the remote peer id. It tries creating streams in the descending order of preference until it either creates a successful stream or runs out of options. Args:
- ctx: context for the stream creation.
- peerID: peer ID of the remote peer.
Returns:
- a new libp2p stream.
- error if the stream creation fails; the error is benign and can be retried.
func (*Manager) Register ¶
func (m *Manager) Register(protocol protocols.ProtocolName) error
Register registers given protocol name as preferred unicast. Each invocation of register prioritizes the current protocol over previously registered ones.
func (*Manager) SetDefaultHandler ¶ added in v0.32.0
func (m *Manager) SetDefaultHandler(defaultHandler libp2pnet.StreamHandler)
SetDefaultHandler sets the default stream handler for this unicast manager. The default handler is utilized as the core handler for other unicast protocols, e.g., compressions.
type ManagerConfig ¶ added in v0.32.2
type ManagerConfig struct { Logger zerolog.Logger `validate:"required"` StreamFactory p2p.StreamFactory `validate:"required"` SporkId flow.Identifier `validate:"required"` Metrics module.UnicastManagerMetrics `validate:"required"` Parameters *netconf.UnicastManager `validate:"required"` // UnicastConfigCacheFactory is a factory function to create a new dial config cache. UnicastConfigCacheFactory DialConfigCacheFactory `validate:"required"` }
type UnicastConfigAdjustFunc ¶ added in v0.33.1
UnicastConfigAdjustFunc is a function that is used to adjust the fields of a DialConfigEntity. The function is called with the current config and should return the adjusted record. Returned error indicates that the adjustment is not applied, and the config should not be updated. In BFT setup, the returned error should be treated as a fatal error.