exponential

package

v0.0.0-...-92c9290 Latest Latest Go to latest Published: Feb 21, 2025 License: MIT Imports: 11 Imported by: 6

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Azure/retry

README ¶

Exponential - The Exponential Backoff Package

Introduction

This package provides an implementation of the exponential backoff algorithm.

Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate. The retries exponentially increase and stop increasing when a certain threshold is met.

This is a rewrite of an existing package. The original package works as intended, but I found that with the inclusions of generics in the latest version, it now has a lot of unnecessary function calls and return values that do similar things. This package is a rewrite of that package with the intention of being more efficient and easier to use. I also used this opportunity to add some features that I found useful.

Like that package, this package has its heritage from Google's HTTP Client Library for Java.

Usage

The import path is github.com/Azure/retry/exponential.

This package has a lot of different options, but can be used with the default settings like this:

boff := exponential.New()

// Captured return data from the operation.
var data Data

// This sets the maximum time in the operation can be retried to 30 seconds.
// This is based on the parent context, so a cancel on the parent cancels
// this context.
ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second)

err := boff.Retry(ctx, func(ctx context.Context, r Record) error {
	var err error
	data, err = getData(ctx)
	return err
})
cancel() // Always cancel the context when done to avoid lingering goroutines.

There are many different options for the backoff such as:

Setting a custom Policy for the backoff.
Logging backoff attempts with the Record object.
Forcing the backoff to stop on permanent errors.
Influence the backoff with a retry timer set to a specific time.
Using a Transformer to deal with common errors like gRPC, HTTP, or SQL errors.
Using the timetable tool to see the results of a custom backoff policy.
...

Use https://pkg.go.dev/github.com/Azure/retry/exponential to view the documentation.

Documentation ¶

Overview ¶

Package exponential provides an exponential backoff mechanism. Most useful when setting a single policy for all retries within a package or set of packages.

This package comes with a default policy, but can be customized for your own needs.

There is no maximum retries here, as exponential retries should be based on a maximum delay. This is set via a Context timeout. Note that the Context timeout is some point in the future after which the operation will not be retried. But setting 30 * seconds does not mean that the Retry() will return after 30 seconds. It means that after Retry() is called, no attempt will be made after 30 seconds from that point. If the first call takes 30 seconds and then fails, no retries will happen. If the first call takes 29 seconds and then fails, the second call may or may not happen depending on policy settings.

And error returned will be the last error returned by the Op. It will not be a context.Canceled or context.DeadlineExceeded error if the retry timer was cancelled. However it may still yield a context error if the Op returns a context error. Error.Cancelled() tells you if the retry was cancelled. Error.IsCancelled() tells you if the last error returned by the Op was a context error.

To understand the consequences of using any specific policy, we provide a tool to generate a time table for a given policy. This can be used to understand the consequences of a policy. It is located in the timetable sub-package. Here is sample output giving the progression to the maximum interval for a policy with the default settings:

Generating TimeTable for -1 attempts and the following settings:
{
	"InitialInterval":     1000000000, // 100 * time.Millisecond
	"Multiplier":          2,
	"RandomizationFactor": 0.5,
	"MaxInterval":         60000000000 // 60 * time.Second,
}

=============
= TimeTable =
=============
+---------+----------+-------------+-------------+
| ATTEMPT | INTERVAL | MININTERVAL | MAXINTERVAL |
+---------+----------+-------------+-------------+
|       1 |       0s |          0s |          0s |
|       2 |       1s |       500ms |        1.5s |
|       3 |       2s |          1s |          3s |
|       4 |       4s |          2s |          6s |
|       5 |       8s |          4s |         12s |
|       6 |      16s |          8s |         24s |
|       7 |      32s |         16s |         48s |
|       8 |     1m0s |         30s |       1m30s |
+---------+----------+-------------+-------------+
|         |  MINTIME |     MAXTIME |             |
|         |          |      1M1.5S |      3M4.5S |
+---------+----------+-------------+-------------+

Attempt is the retry attempt, with 1 being the first (not 0). Interval is the calculated interval without randomization. MinInterval is the minimum interval after randomization. MaxInterval is the maximum interval after randomization. MINTIME and MAXTIME are the minimum and maximum that would be taken to reach that the last attempt listed.

Documentation for the timetable application is in the timetable/ directory.

The following is a list of examples of how to use this package, it is not exhaustive.

Example: With default policy and maximum time of 30 seconds while capturing a return value:

boff := exponential.New()

// Captured return data.
var data Data

// This sets the time in which to retry to 30 seconds. This is based on the parent context, so a cancel
// on the parent cancels this context.
ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second)

err := boff.Retry(ctx, func(ctx context.Context, r Record) error {
	var err error
	data, err = getData(ctx)
	return err
})
cancel() // Always cancel the context when done to avoid lingering goroutines. Avoid defer.

if err != nil {
	// Handle the error.
	// This will always contain the last error returned by the Op, not a context error unless
	// the last error by the Op was a context error.
	// If the retry was cancelled, you can detect this with errors.Is(err, ErrRetryCancelled).
	// You can determine if this was a permanent error with errors.Is(err, ErrPermanent).
}

Example: With the default policy, maximum execution time of 30 seconds and each attempt can take up to 5 seconds:

boff := exponential.New()

var data Data

ctx, cancel := context.WithTimeout(parentCtx, 30*time.Second)

err := boff.Retry(ctx, func(ctx context.Context, r Record) error {
	var err error
	callCtx, callCancel := context.WithTimeout(ctx, 5*time.Second)
	data, err = getData(callCtx)
	callCancel()
	return err
})
cancel() // Always cancel the context when done to avoid lingering goroutines.
...

Example: Retry forever:

boff := exponential.New()
var data Data

err := boff.Retry(ctx, func(ctx context.Context, r Record) error {
	var err error
	data, err = getData(ctx)
	return err
})
cancel()
...

Example: Same as before but with a permanent error that breaks the retries:

...
err := exponential.Retry(ctx, func(ctx context.Context, r Record) error {
	var err error
	data, err = getData(ctx)
	if err != nil && err == badError {
		return fmt.Errorrf("%w: %w", err, exponential.ErrPermanent)
	}
	return err
})
cancel()
...

Example: No return data:

err := exponential.Retry(ctx, func(ctx context.Context, r Record) error {
	return doSomeOperation(ctx)
})
cancel()
...

Example: Create a custom policy:

policy := exponential.Policy{
	InitialInterval:     1 * time.Second,
	Multiplier:          2,
	RandomizationFactor: 0.2,
	MaxInterval:         30 * time.Second,
}
boff := exponential.New(exponential.WithPolicy(policy))
...

Example: Retry a call that fails, but honor the service's retry timer:

...
err := exponential.Retry(ctx, func(ctx context.Context, r Record) error {
	resp, err := client.Call(ctx, req)
	if err != nil {
		// extractRetry is a function that extracts the retry time from the error the server sends.
		// This might also be in the body of an http.Response or in some header. Just think of
		// extractRetryTime as a placeholder for whatever that is.
		t := extractRetryTime(err)
		return ErrRetryAfter{Time: t, Err: err}
	}
	return nil
})

Example: Test a function without any delay that eventually succeeds

boff := exponential.New(exponential.WithTesting())

var data Data

err := boff.Retry(ctx, func(ctx context.Context, r Record) error {
	data, err := getData(ctx)
	if err != nil {
		return err
	}
	return nil
})
cancel()

Example: Test a function that eventually fails with permanent error

boff := exponential.New(exponential.WithTesting())

var data Data

err := boff.Retry(ctx, func(ctx context.Context, r Record) error {
	data, err := getData(ctx)
	if err != nil {
		return fmt.Errorrf("%w: %w", err, exponential.ErrPermanent)
	}
	return nil
}
cancel()

Example: Test a function around a gRPC call that fails on certain status codes using an ErrTransformer

boff := exponential.New(
	WithErrTransformer(grpc.New()), // find this in the helpers sub-package
)

ctx, cancel := context.WithTimeout(parentCtx, 1*time.Minute)

req := &pb.HelloRequest{Name: "John"}
var resp *pb.HelloReply{}

err := boff.Retry(ctx, func(ctx context.Context, r Record) error {
	var err error
	resp, err = client.Call(ctx, req)
	return err
})
cancel() // Always cancel the context when done to avoid lingering goroutines.

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// ErrRetryCanceled is an error that is returned when a retry is canceled. This is substituted for a context.Canceled
	// or context.DeadlineExceeded error to differentiate between a retry being cancelled and the last error from the Op being
	// context.Canceled or context.DeadlineExceeded.
	ErrRetryCanceled = errspkg.ErrRetryCanceled // This is a type alias.

	// ErrPermanent is an error that is permanent and cannot be retried. This
	// is similar to errors.ErrUnsupported in that it shouldn't be used directly, but instead
	// wrapped in another error. You can determine if you have a permanent error with
	// Is(err, ErrPermanent).
	ErrPermanent = errspkg.ErrPermanent // This is a type alias.
)

Functions ¶

This section is empty.

Types ¶

type Backoff ¶

type Backoff struct {
	// contains filtered or unexported fields
}

Backoff provides a mechanism for retrying operations with exponential backoff. This can be used in tests without a fake/mock interface to simulate retries either by using the WithTesting() option or by setting a Policy that works with your test. This keeps code leaner, avoids dynamic dispatch, unneeded allocations and is easier to test.

func New ¶

func New(options ...Option) (*Backoff, error)

New creates a new Backoff instance with the given options.

func (*Backoff) Retry ¶

func (b *Backoff) Retry(ctx context.Context, op Op, options ...RetryOption) error

Retry will retry the given operation until it succeeds, the context is cancelled or an error is returned with PermanentErr(). This is safe to call concurrently.

type ErrRetryAfter ¶

type ErrRetryAfter = errspkg.ErrRetryAfter // This is a type alias.

ErrRetryAfter can be used to wrap an error to indicate that the error can be retried after a certain time. This is useful when a remote service returns a retry interval in the response and you want to carry the signal to your retry logic. This error should not be returned to the caller of Retry(). DO NOT use this as &ErrRetryAfter{}, simply ErrRetryAfter{} or it won't work.

type ErrTransformer ¶

type ErrTransformer func(err error) error

ErrTransformer is a function that can be used to transform an error before it is returned. The typical case is to make an error a permanent error based on some criteria in order to stop retries. The other use is to use errors.ErrRetryAfter as a wrapper to specify the minimum time the retry must wait based on a response from a service. This type allows packaging of custom retry logic in one place for reuse instead of in the Op. As ErrTransformrers are applied in order, the last one to change an error will be the error returned.

type Op ¶

type Op func(context.Context, Record) error

Op is a function that can be retried.

type Option ¶

type Option func(*Backoff) error

Options are used to configure the backoff policy.

func WithErrTransformer ¶

func WithErrTransformer(transformers ...ErrTransformer) Option

WithErrTransformer sets the error transformers to use. If not specified, then no transformers are used. Passing multiple transformers will apply them in order. If WithErrTransformer is passed multiple times, only the final transformers are used (aka don't do that).

func WithPolicy ¶

func WithPolicy(policy Policy) Option

WithPolicy sets the backoff policy to use. If not specified, then DefaultPolicy is used.

func WithTesting ¶

func WithTesting(options ...TestOption) Option

WithTesting invokes the backoff policy with no actual delay. Cannot be used outside of a test or this will panic.

type Policy ¶

type Policy struct {
	// InitialInterval is how long to wait after the first failure before retrying. Must be
	// greater than 0.
	// Defaults to 100ms.
	InitialInterval time.Duration
	// Multiplier is used to increase the delay after each failure. Must be greater than 1.
	// Defaults to 2.0.
	Multiplier float64
	// RandomizationFactor is used to randomize the delay. This prevents problems where multiple
	// clients are all retrying at the same intervals, and thus all hammering the server at the same time.
	// This is a value between 0 and 1. Zero(0) means no randomization, 1 means randomize by the entire interval.
	// The randomization factor sets a range of randomness in the positive and negative direction with a maximum
	// window of +/= RandomizationFactor * Interval. For example, if the RandomizationFactor is 0.5, the interval
	// will be randomized by up to 50% in the positive and negative direction. If the interval is 1s, the randomization
	// window is 0.5s to 1.5s.
	// Randomization can push the interval above the MaxInterval. The factor can be both positive and negative.
	// Defaults to 0.5
	RandomizationFactor float64
	// MaxInterval is the maximum amount of time to wait between retries. Must be > 0.
	// Defaults to 60s.
	MaxInterval time.Duration
	// MaxAttempts is the maximum number of attempts to make before giving up. If 0, then there is no limit.
	// Defaults to 0. When this occurs, the error returned will contain ErrPermanent.
	MaxAttempts int
}

Policy is the configuration for the backoff policy. Generally speaking you should use the default policy, but you can create your own if you want to customize it. But think long and hard about it before you do, as the default policy is a good mechanism for avoiding thundering herd problems, which are always remote calls. If not doing remote calls, you should question the use of this package. Note that a Policy is ignored if the service returns a delay in the error message.

func (Policy) TimeTable ¶

func (p Policy) TimeTable(attempts int) TimeTable

TimeTable will return a TimeTable for the Policy. If attempts is >= 0, then the TimeTable will be for that number of attempts. If attempts is < 0, then the TimeTable will be for all entries until the maximum interval is reached. This should only be used in tools and testing.

type Record ¶

type Record struct {
	// Attempt is the number of attempts (initial + retries). A zero value of Record has Attempt == 0.
	Attempt int
	// LastInterval is the last interval used.
	LastInterval time.Duration
	// TotalInterval is the total amount of time spent in intervals between attempts.
	TotalInterval time.Duration
	// Err is the last error returned by an operation. It is important to remember that this is
	// the last error returned by the prior invocation of the Op and should only be used for logging
	// purposes.
	Err error
}

Record is the record of a Retry attempt.

type RetryOption ¶

type RetryOption func(o *retryOptions) error

RetryOption is an option for the Retry method. Functions that implement RetryOption provide an override on a single call.

type TestOption ¶

type TestOption func(t *testOptions) error

TestOption is an option for WithTesting(). Functions that implement TestOption provide options for tests. This is a placeholder for future test options and is not used at this time.

type TimeTable ¶

type TimeTable struct {
	// MinTime is the minimum time a program will have to wait if every attempt gets the minimum interval
	// when calculating the RandomizationFactor. This value changes depending
	// on if Policy.TimeTable() had attempts set >= 0 or < 0. If attempts is >= 0, then MinTime
	// is the sum of all the MinInterval values up through the attempts. If attempts is < 0, then
	// MinTime is the sum of all the MinInterval values until we reach our maximum interval setting.
	MinTime time.Duration
	// MaxTime is the maximum time a program will have to wait if every attempt gets the maximum interval
	// when calculating the RandomizationFactor. This value changes depending on
	// if Policy.TimeTable() had attempts set >= 0 or < 0. If attempts is >= 0, then MaxTime
	// is the sum of all the MaxInterval values up through the attempts. If attempts is < 0, then
	// MaxTime is the sum of all the MaxInterval values until we reach our maximum interval setting.
	MaxTime time.Duration
	// Entries is the list of minimum and maximum intervals for each attempt.
	Entries []TimeTableEntry
}

TimeTable is a table of intervals describing the wait time between retries. This is useful for both testing and understanding what a policy will do.

func (TimeTable) Litter ¶

func (t TimeTable) Litter() string

Litter writes the TimeTable as a Go struct that can be used to recreate the TimeTable. For use in internal testing only.

func (TimeTable) String ¶

func (t TimeTable) String() string

String implements fmt.Stringer.

type TimeTableEntry ¶

type TimeTableEntry struct {
	// Attempt is the attempt number that this entry is for.
	Attempt int
	// Interval is the interval to wait before the next attempt. However, this is
	// not the actual interval. The actual interval is the Interval plus or minus
	// the RandomizationFactor.
	Interval time.Duration
	// MinInterval is the minimum interval to wait before the next attempt. This is
	// Interval minus the maximum randomization factor.
	MinInterval time.Duration
	// MaxInterval is the maximum interval to wait before the next attempt. This is
	// Interval plus the maximum randomization factor.
	MaxInterval time.Duration
}

TimeTableEntry is an entry in the time table.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
helpers
grpc Package gRPC provides an exponential.ErrTransformer that can be used to detect non-retriable errors for gRPC calls.	Package gRPC provides an exponential.ErrTransformer that can be used to detect non-retriable errors for gRPC calls.
http Package http provides an ErrTransformer for http.Client from the standard library.	Package http provides an ErrTransformer for http.Client from the standard library.
timetable

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL