Documentation ¶
Overview ¶
Package exponential provides an exponential backoff mechanism. Most useful when setting a single policy for all retries within a package or set of packages.
This package comes with a default policy, but can be customized for your own needs.
There is no maximum retries here, as exponential retries should be based on a maximum delay. This is set via a Context timeout. Note that the Context timeout is some point in the future after which the operation will not be retried. But setting 30 * seconds does not mean that the Retry() will return after 30 seconds. It means that after Retry() is called, no attempt will be made after 30 seconds from that point. If the first call takes 30 seconds and then fails, no retries will happen. If the first call takes 29 seconds and then fails, the second call may or may not happen depending on policy settings.
And error returned will be the last error returned by the Op. It will not be a context.Canceled or context.DeadlineExceeded error if the retry timer was cancelled. However it may still yield a context error if the Op returns a context error. Error.Cancelled() tells you if the retry was cancelled. Error.IsCancelled() tells you if the last error returned by the Op was a context error.
To understand the consequences of using any specific policy, we provide a tool to generate a time table for a given policy. This can be used to understand the consequences of a policy. It is located in the timetable sub-package. Here is sample output giving the progression to the maximum interval for a policy with the default settings:
Generating TimeTable for -1 attempts and the following settings: { "InitialInterval": 1000000000, // 100 * time.Millisecond "Multiplier": 2, "RandomizationFactor": 0.5, "MaxInterval": 60000000000 // 60 * time.Second, } ============= = TimeTable = ============= +---------+----------+-------------+-------------+ | ATTEMPT | INTERVAL | MININTERVAL | MAXINTERVAL | +---------+----------+-------------+-------------+ | 1 | 0s | 0s | 0s | | 2 | 1s | 500ms | 1.5s | | 3 | 2s | 1s | 3s | | 4 | 4s | 2s | 6s | | 5 | 8s | 4s | 12s | | 6 | 16s | 8s | 24s | | 7 | 32s | 16s | 48s | | 8 | 1m0s | 30s | 1m30s | +---------+----------+-------------+-------------+ | | MINTIME | MAXTIME | | | | | 1M1.5S | 3M4.5S | +---------+----------+-------------+-------------+
Attempt is the retry attempt, with 1 being the first (not 0). Interval is the calculated interval without randomization. MinInterval is the minimum interval after randomization. MaxInterval is the maximum interval after randomization. MINTIME and MAXTIME are the minimum and maximum that would be taken to reach that the last attempt listed.
Documentation for the timetable application is in the timetable/ directory.
The following is a list of examples of how to use this package, it is not exhaustive.
Example: With default policy and maximum time of 30 seconds while capturing a return value:
boff := exponential.New() // Captured return data. var data Data // This sets the time in which to retry to 30 seconds. This is based on the parent context, so a cancel // on the parent cancels this context. ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second) err := boff.Retry(ctx, func(ctx context.Context, r Record) error { var err error data, err = getData(ctx) return err }) cancel() // Always cancel the context when done to avoid lingering goroutines. Avoid defer. if err != nil { // Handle the error. // This will always contain the last error returned by the Op, not a context error unless // the last error by the Op was a context error. // If the retry was cancelled, you can detect this with errors.Is(err, ErrRetryCancelled). // You can determine if this was a permanent error with errors.Is(err, ErrPermanent). }
Example: With the default policy, maximum execution time of 30 seconds and each attempt can take up to 5 seconds:
boff := exponential.New() var data Data ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second) err := boff.Retry(ctx, func(ctx context.Context, r Record) error { var err error callCtx, callCancel := context.WithTimeout(ctx, 5*time.Second) data, err = getData(callCtx) callCancel() return err }) cancel() // Always cancel the context when done to avoid lingering goroutines. ...
Example: Retry forever:
boff := exponential.New() var data Data err := boff.Retry(ctx, func(ctx context.Context, r Record) error { var err error data, err = getData(ctx) return err }) cancel() ...
Example: Same as before but with a permanent error that breaks the retries:
... err := exponential.Retry(ctx, func(ctx context.Context, r Record) error { var err error data, err = getData(ctx) if err != nil && err == badError { return fmt.Errorrf("%w: %w", err, exponential.ErrPermanent) } return err }) cancel() ...
Example: No return data:
err := exponential.Retry(ctx, func(ctx context.Context, r Record) error { return doSomeOperation(ctx) }) cancel() ...
Example: Create a custom policy:
policy := exponential.Policy{ InitialInterval: 1 * time.Second, Multiplier: 2, RandomizationFactor: 0.2, MaxInterval: 30 * time.Second, } boff := exponential.New(exponential.WithPolicy(policy)) ...
Example: Retry a call that fails, but honor the service's retry timer:
... err := exponential.Retry(ctx, func(ctx context.Context, r Record) error { resp, err := client.Call(ctx, req) if err != nil { // extractRetry is a function that extracts the retry time from the error the server sends. // This might also be in the body of an http.Response or in some header. Just think of // extractRetryTime as a placeholder for whatever that is. t := extractRetryTime(err) return ErrRetryAfter{Time: t, Err: err} } return nil })
Example: Test a function without any delay that eventually succeeds
boff := exponential.New(exponential.WithTesting()) var data Data err := boff.Retry(ctx, func(ctx context.Context, r Record) error { data, err := getData(ctx) if err != nil { return err } return nil }) cancel()
Example: Test a function that eventually fails with permanent error
boff := exponential.New(exponential.WithTesting()) var data Data err := boff.Retry(ctx, func(ctx context.Context, r Record) error { data, err := getData(ctx) if err != nil { return fmt.Errorrf("%w: %w", err, exponential.ErrPermanent) } return nil } cancel()
Example: Test a function around a gRPC call that fails on certain status codes using an ErrTransformer
boff := exponential.New( WithErrTransformer(grpc.New()), // find this in the helpers sub-package ) ctx, cancel := context.WithTimeout(parentCtx, 1*time.Minute) req := &pb.HelloRequest{Name: "John"} var resp *pb.HelloReply{} err := boff.Retry(ctx, func(ctx context.Context, r Record) error { var err error resp, err = client.Call(ctx, req) return err }) cancel() // Always cancel the context when done to avoid lingering goroutines.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( // ErrRetryCanceled is an error that is returned when a retry is canceled. This is substituted for a context.Canceled // or context.DeadlineExceeded error to differentiate between a retry being cancelled and the last error from the Op being // context.Canceled or context.DeadlineExceeded. ErrRetryCanceled = errspkg.ErrRetryCanceled // This is a type alias. // ErrPermanent is an error that is permanent and cannot be retried. This // is similar to errors.ErrUnsupported in that it shouldn't be used directly, but instead // wrapped in another error. You can determine if you have a permanent error with // Is(err, ErrPermanent). ErrPermanent = errspkg.ErrPermanent // This is a type alias. )
Functions ¶
This section is empty.
Types ¶
type Backoff ¶
type Backoff struct {
// contains filtered or unexported fields
}
Backoff provides a mechanism for retrying operations with exponential backoff. This can be used in tests without a fake/mock interface to simulate retries either by using the WithTesting() option or by setting a Policy that works with your test. This keeps code leaner, avoids dynamic dispatch, unneeded allocations and is easier to test.
type ErrRetryAfter ¶
type ErrRetryAfter = errspkg.ErrRetryAfter // This is a type alias.
ErrRetryAfter can be used to wrap an error to indicate that the error can be retried after a certain time. This is useful when a remote service returns a retry interval in the response and you want to carry the signal to your retry logic. This error should not be returned to the caller of Retry(). DO NOT use this as &ErrRetryAfter{}, simply ErrRetryAfter{} or it won't work.
type ErrTransformer ¶
ErrTransformer is a function that can be used to transform an error before it is returned. The typical case is to make an error a permanent error based on some criteria in order to stop retries. The other use is to use errors.ErrRetryAfter as a wrapper to specify the minimum time the retry must wait based on a response from a service. This type allows packaging of custom retry logic in one place for reuse instead of in the Op. As ErrTransformrers are applied in order, the last one to change an error will be the error returned.
type Option ¶
Options are used to configure the backoff policy.
func WithErrTransformer ¶
func WithErrTransformer(transformers ...ErrTransformer) Option
WithErrTransformer sets the error transformers to use. If not specified, then no transformers are used. Passing multiple transformers will apply them in order. If WithErrTransformer is passed multiple times, only the final transformers are used (aka don't do that).
func WithPolicy ¶
WithPolicy sets the backoff policy to use. If not specified, then DefaultPolicy is used.
func WithTesting ¶
func WithTesting(options ...TestOption) Option
WithTesting invokes the backoff policy with no actual delay. Cannot be used outside of a test or this will panic.
type Policy ¶
type Policy struct { // InitialInterval is how long to wait after the first failure before retrying. Must be // greater than 0. // Defaults to 100ms. InitialInterval time.Duration // Multiplier is used to increase the delay after each failure. Must be greater than 1. // Defaults to 2.0. Multiplier float64 // RandomizationFactor is used to randomize the delay. This prevents problems where multiple // clients are all retrying at the same intervals, and thus all hammering the server at the same time. // This is a value between 0 and 1. Zero(0) means no randomization, 1 means randomize by the entire interval. // The randomization factor sets a range of randomness in the positive and negative direction with a maximum // window of +/= RandomizationFactor * Interval. For example, if the RandomizationFactor is 0.5, the interval // will be randomized by up to 50% in the positive and negative direction. If the interval is 1s, the randomization // window is 0.5s to 1.5s. // Randomization can push the interval above the MaxInterval. The factor can be both positive and negative. // Defaults to 0.5 RandomizationFactor float64 // MaxInterval is the maximum amount of time to wait between retries. Must be > 0. // Defaults to 60s. MaxInterval time.Duration }
Policy is the configuration for the backoff policy. Generally speaking you should use the default policy, but you can create your own if you want to customize it. But think long and hard about it before you do, as the default policy is a good mechanism for avoiding thundering herd problems, which are always remote calls. If not doing remote calls, you should question the use of this package. Note that a Policy is ignored if the service returns a delay in the error message.
func (Policy) TimeTable ¶
TimeTable will return a TimeTable for the Policy. If attempts is >= 0, then the TimeTable will be for that number of attempts. If attempts is < 0, then the TimeTable will be for all entries until the maximum interval is reached. This should only be used in tools and testing.
type Record ¶
type Record struct { // Attempt is the number of attempts (initial + retries). A zero value of Record has Attempt == 0. Attempt int // LastInterval is the last interval used. LastInterval time.Duration // TotalInterval is the total amount of time spent in intervals between attempts. TotalInterval time.Duration // Err is the last error returned by an operation. It is important to remember that this is // the last error returned by the prior invocation of the Op and should only be used for logging // purposes. Err error }
Record is the record of a Retry attempt.
type RetryOption ¶
type RetryOption func(o *retryOptions) error
RetryOption is an option for the Retry method. Functions that implement RetryOption provide an override on a single call.
type TestOption ¶
type TestOption func(t *testOptions) error
TestOption is an option for WithTesting(). Functions that implement TestOption provide options for tests. This is a placeholder for future test options and is not used at this time.
type TimeTable ¶
type TimeTable struct { // MinTime is the minimum time a program will have to wait if every attempt gets the minimum interval // when calculating the RandomizationFactor. This value changes depending // on if Policy.TimeTable() had attempts set >= 0 or < 0. If attempts is >= 0, then MinTime // is the sum of all the MinInterval values up through the attempts. If attempts is < 0, then // MinTime is the sum of all the MinInterval values until we reach our maximum interval setting. MinTime time.Duration // MaxTime is the maximum time a program will have to wait if every attempt gets the maximum interval // when calculating the RandomizationFactor. This value changes depending on // if Policy.TimeTable() had attempts set >= 0 or < 0. If attempts is >= 0, then MaxTime // is the sum of all the MaxInterval values up through the attempts. If attempts is < 0, then // MaxTime is the sum of all the MaxInterval values until we reach our maximum interval setting. MaxTime time.Duration // Entries is the list of minimum and maximum intervals for each attempt. Entries []TimeTableEntry }
TimeTable is a table of intervals describing the wait time between retries. This is useful for both testing and understanding what a policy will do.
type TimeTableEntry ¶
type TimeTableEntry struct { // Attempt is the attempt number that this entry is for. Attempt int // Interval is the interval to wait before the next attempt. However, this is // not the actual interval. The actual interval is the Interval plus or minus // the RandomizationFactor. Interval time.Duration // MinInterval is the minimum interval to wait before the next attempt. This is // Interval minus the maximum randomization factor. MinInterval time.Duration // MaxInterval is the maximum interval to wait before the next attempt. This is // Interval plus the maximum randomization factor. MaxInterval time.Duration }
TimeTableEntry is an entry in the time table.
Directories ¶
Path | Synopsis |
---|---|
helpers
|
|
grpc
Package gRPC provides an exponential.ErrTransformer that can be used to detect non-retriable errors for gRPC calls.
|
Package gRPC provides an exponential.ErrTransformer that can be used to detect non-retriable errors for gRPC calls. |
http
Package http provides an ErrTransformer for http.Client from the standard library.
|
Package http provides an ErrTransformer for http.Client from the standard library. |