roko

package module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 28, 2024 License: MIT Imports: 5 Imported by: 5

README

Roko

Go Reference Build status

A Lightweight, Configurable, Easy-to-use Retry Library for Go

Installation

To install, run

go get -u github.com/buildkite/roko

This will add Roko to your go.mod file, and make it available for use in your project.

Usage

Roko allows you to configure how your application should respond to operations that can fail. Its core interface is the Retrier, which allows you tell you application how, and under what circumstances, it should retry an operation.

Let's say we have some operation that we want to perform:

func canFail() error {
  // ...
}

and if it fails, we want it to retry every 5 seconds, and give up after 3 tries. To do this, we can configure a retrier, and then perform our operation using the roko.Retrier.Do() function:

r := roko.NewRetrier(
  roko.WithMaxAttempts(3),                           // Only try 3 times, then give up
  roko.WithStrategy(roko.Constant(5 * time.Second)), // Wait 5 seconds between attempts
)

err := r.Do(func(r *roko.Retrier) error {
  return canFail()
})

In this situation, we'll try to run the canFail function, and if it returns an error, we'll wait 5 seconds, then try again. If canFail returns an error after hitting its max attempt count, r.Do will return that error. If canFail succeeds (ie it doesn't return an error), r.Do will return nil.

Giving up early

Sometimes, an error that your operation returns might not be recoverable, so we don't want to retry it. In this case, we can use the roko.Retrier.Break function. Break() instructs the retrier to halt after this run - note that it doesn't immediately halt operation.

r := roko.NewRetrier(
  roko.WithMaxAttempts(3),                           // Only try 3 times, then give up
  roko.WithStrategy(roko.Constant(5 * time.Second)), // Wait 5 seconds between attempts
)

err := r.Do(func(r *roko.Retrier) error {
  err := canFail()
  if err.Is(errorUnrecoverable) {
    r.Break()  // Give up, we can't recover from this error
    return err // We still need to return from this function, Break() doesn't halt this callback
    // return nil would be appropriate too, if we don't want to handle this error further
  }
})

In this example, if canFail() returns an unrecoverable error, the result returned by the r.Do() call is the unrecoverable error.

Never give up!

Alternatively (or as well as!), you might want your retrier to never give up, and continue trying until it eventually succeeds. Roko can facilitate this through the TryForever() option.

r := roko.NewRetrier(
  roko.TryForever(),
  roko.WithStrategy(roko.Constant(5 * time.Second)), // Wait 5 seconds between attempts
)

err := r.Do(func(r *roko.Retrier) error {
  return canFail()
})

This will try to perform canFail() until it eventaually succeeds.

Note that the Break() method mentioned above still works when TryForever() is enabled - this allows you to still exit when an unrecoverable error comes along.

Jitter

In order to avoid a thundering herd problem, roko can be configured to add jitter to its retry interval calculations. When jitter is used, the interval calulator will add a random length of time up to one second to each interval calculation.

r := roko.NewRetrier(
  roko.WithMaxAttempts(3),                           // Only try 3 times, then give up
  roko.WithJitter()                                  // Add up to a second of jitter
  roko.WithStrategy(roko.Constant(5 * time.Second)), // Wait 5ish seconds between attempts
)

err := r.Do(func(r *roko.Retrier) error {
  return canFail()
})

In this example, everything is the same as the first example, but instead of always waiting 5 seconds, the retrier will wait for a random interval between 5 and 6 seconds. This can help reduce resource contention.

Exponential Backoff

If a constant retry strategy isn't to your liking, roko can be configured to use exponential backoff instead, based on the number of attempts that have occurred so far:

r := roko.NewRetrier(
  roko.WithMaxAttempts(5),                   // Only try 5 times, then give up
  roko.WithStrategy(roko.Exponential(2, 0)), // Wait (2 ^ attemptCount) + 0 seconds between attempts
)

err := r.Do(func(r *roko.Retrier) error {
  return canFail()
})

In this case, the amount of time the retrier will wait between attempts depends on how many attempts have passed - the first wait will be 2^0 == 1 second, then 2^1 == 2 seconds, then 2^3 == 4 seconds, and so on and so forth.

The second argument to the roko.Exponential() method is a constant adjustment - roko will add this number to the calculated exponent.

Using a custom strategy

If the two retry strategies built into roko (Constant and Exponential) aren't sufficient, you can define your own - the roko.WithStrategy method will accept anything that returns a tuple of (roko.Strategy, string). For example, we could implement a custom Linear strategy, that multiplies the attempt count by a fixed number:

func Linear(gradient float64, yIntercept float64) (roko.Strategy, string) {
	return func(r *roko.Retrier) time.Duration {
		return time.Duration(((gradient * float64(r.AttemptCount())) + yIntercept)) * time.Second
	}, "linear" // The second element of the return tuple is the name of the strategy
}

err := roko.NewRetrier(
  roko.WithMaxAttempts(3),             // Only try 3 times, then give up
  roko.WithStrategy(Linear(0.5, 5.0)), // Wait 5 seconds + half of the attempt count seconds
).Do(func(r *roko.Retrier) error {
  return canFail()
})
Manually setting the next interval

Sometimes you only know the desired interval after each try, e.g. a rate-limited API may include a Retry-After header. For these cases, the SetNextInterval(time.Duration) method can be used. It will apply only to the next interval, and then revert to the configured strategy unless called again on the next attempt.

// manually specify interval during each try, defaulting to 10 seconds
roko.NewRetrier(
  roko.WithStrategy(Constant(10 * time.Second)),
  roko.WithMaxAttempts(10),
).Do(func(r *roko.Retrier) error {

  response := apiCall() // may be rate limited

  if err := response.HTTPError(); err != nil {
    if response.Status == HttpTooManyRequests {
      if retryAfter, err := strconv.Atoi(response.Header("Retry-After")); err != nil {

        r.SetNextInterval(retryAfter * time.Second) // respect the API

      }
    }
    return err
  }
  return nil
})
Retries and Testing

To speed up tests, roko can be configured with a custom sleep function:

err := roko.NewRetrier(
  roko.WithStrategy(roko.Constant(50000 * time.Hour)) // Wait a very long time between attempts...
  roko.WithSleepFunc(func(time.Duration) {})          // ...but don't actually sleep
  roko.WithMaxAttempts(3),
).Do(func(r *roko.Retrier) error {
  return canFail()
})

The actual function passed to WithSleepFunc() is arbitrary, but using a noop is probably going to be the most useful.

For deterministically-generated jitter, the Retrier also accepts a *rand.Rand:

err := roko.NewRetrier(
  roko.WithStrategy(roko.Constant(5 * time.Second))
  roko.WithRand(rand.New(rand.NewSource(12345))), // Generate the same jitters every time, using a seeded random number generator
  roko.WithMaxAttempts(3),
  roko.WithJitter(),
).Do(func(r *roko.Retrier) error {
  return canFail()
})

The random number generator is only used for jitter, so it only makes sense to pass one if you're using jitter.

What's in a name?

Roko is named after Josevata Rokocoko, a Fijian-New Zealand rugby player, and one of the best to ever do it. He scored a lot of tries, thus, he's a re-trier.

Depending on who you ask, it's also the owner of a basilisk.

Contributing

By all means, please contribute! We'd love to have your input. If you run into a bug, feel free to open an issue, and if you find missing functionality, please don't hesitate to open a PR. If you have a weird and wonderful retry strategy you'd like to add, we'd love to see it.

Looking for a great CI provider? Look no further.

Buildkite is a platform for running fast, secure, and scalable CI pipelines on your own infrastructure.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DoFunc added in v1.2.0

func DoFunc[T any](ctx context.Context, r *Retrier, callback func(*Retrier) (T, error)) (T, error)

DoFunc is a helper for retrying callback functions that return a value or an error. It returns the last value returned by a call to callback, and reports an error if none of the calls succeeded. (Note this is not a method of Retrier, since methods can't be generic.)

func DoFunc2 added in v1.2.0

func DoFunc2[T1, T2 any](ctx context.Context, r *Retrier, callback func(*Retrier) (T1, T2, error)) (T1, T2, error)

DoFunc2 is a helper for retrying callback functions that return two value or an error. It returns the last values returned by a call to callback, and reports an error if none of the calls succeeded. (Note this is not a method of Retrier, since methods can't be generic.)

func DoFunc3 added in v1.2.0

func DoFunc3[T1, T2, T3 any](ctx context.Context, r *Retrier, callback func(*Retrier) (T1, T2, T3, error)) (T1, T2, T3, error)

DoFunc3 is a helper for retrying callback functions that return 3 values or an error. It returns the last values returned by a call to callback, and reports an error if none of the calls succeeded. (Note this is not a method of Retrier, since methods can't be generic.)

func TryForever

func TryForever() retrierOpt

TryForever causes the retrier to to never give up retrying, until either the operation succeeds, or the operation calls retrier.Break()

func WithJitter

func WithJitter() retrierOpt

WithJitter enables jitter on the retrier, which will cause all of the retries to wait a random amount of time < 1 second The idea here is to avoid thundering herds - retries that are in parallel will happen at slightly different times when jitter is enabled, whereas if jitter is disabled, all the retries might happen at the same time, causing further load on the system that we're tryung to do something with

func WithMaxAttempts

func WithMaxAttempts(maxAttempts int) retrierOpt

WithMaxAttempts sets the maximum number of retries that a retrier will attempt

func WithRand

func WithRand(rand *rand.Rand) retrierOpt

func WithSleepFunc

func WithSleepFunc(f func(time.Duration)) retrierOpt

WithSleepFunc sets the function that the retrier uses to sleep between successive attempts Only really useful for testing

func WithStrategy

func WithStrategy(strategy Strategy, strategyType string) retrierOpt

WithStrategy sets the retry strategy that the retrier will use to determine how long to wait between retries

Types

type Retrier

type Retrier struct {
	// contains filtered or unexported fields
}

func NewRetrier

func NewRetrier(opts ...retrierOpt) *Retrier

NewRetrier creates a new instance of the Retrier struct. Pass in retrierOpt functions to customise the behaviour of the retrier

func (*Retrier) AttemptCount

func (r *Retrier) AttemptCount() int

func (*Retrier) Break

func (r *Retrier) Break()

Break causes the Retrier to stop retrying after it completes the next retry cycle

func (*Retrier) Do

func (r *Retrier) Do(callback func(*Retrier) error) error

Do is the core loop of a Retrier. It defines the operation that the Retrier will attempt to perform, retrying it if necessary Calling retrier.Do(someFunc) will cause the Retrier to attempt to call the function, and if it returns an error, retry it using the settings provided to it.

func (*Retrier) DoWithContext added in v1.0.2

func (r *Retrier) DoWithContext(ctx context.Context, callback func(*Retrier) error) error

DoWithContext is a context-aware variant of Do.

func (*Retrier) Jitter

func (r *Retrier) Jitter() time.Duration

Jitter returns a duration in the interval (0, 1] s if jitter is enabled, or 0 s if it's not

func (*Retrier) MarkAttempt

func (r *Retrier) MarkAttempt()

MarkAttempt increments the attempt count for the retrier. This affects ShouldGiveUp, and also affects the retry interval for Exponential retry strategy

func (*Retrier) NextInterval

func (r *Retrier) NextInterval() time.Duration

NextInterval returns the next interval that the retrier will use. Behind the scenes, it calls the function generated by either retrier's strategy

func (*Retrier) SetNextInterval added in v1.1.0

func (r *Retrier) SetNextInterval(d time.Duration)

SetNextInterval overrides the strategy for the interval before the next try

func (*Retrier) ShouldGiveUp

func (r *Retrier) ShouldGiveUp() bool

ShouldGiveUp returns whether the retrier should stop trying do do the thing it's been asked to do It returns true if the retry count is greater than r.maxAttempts, or if r.Break() has been called It returns false if the retrier is supposed to try forever

func (*Retrier) String

func (r *Retrier) String() string

type Strategy

type Strategy func(*Retrier) time.Duration

func Constant

func Constant(interval time.Duration) (Strategy, string)

Constant returns a strategy that always returns the same value, the interval passed in as an arg to the function Semantically, when this is used with a roko.Retrier, it means that the retrier will always wait the given duration before retrying

func Exponential

func Exponential(base, adjustment time.Duration) (Strategy, string)

Exponential returns a strategy that increases expontially based on the number of attempts the retrier has made It uses the calculation: adjustment + (base ** attempts) + jitter

func ExponentialSubsecond added in v1.1.0

func ExponentialSubsecond(initial time.Duration) (Strategy, string)

The 16 exponent-divisor is arbitrarily chosen to scale curves nicely for a reasonable range of initial delays and number of attempts (e.g. 1 second, 10 attempts).

Examples of a small, medium and large initial time.Second growing over 10 attempts:

100ms → 133ms → 177ms → 237ms → 316ms → 421ms → 562ms → 749ms → 1000ms
1.0s  → 1.5s  → 2.4s  → 3.7s  → 5.6s  → 8.7s  → 13.3s → 20.6s → 31.6s
5s    → 9s    → 14s   → 25s   → 42s   → 72s   → 120s  → 208s  → 354s

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL