suture

package
v0.11.3-debian Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2015 License: MIT, MPL-2.0 Imports: 7 Imported by: 0

README

Suture

Build Status

Suture provides Erlang-ish supervisor trees for Go. "Supervisor trees" -> "sutree" -> "suture" -> holds your code together when it's trying to die.

This is intended to be a production-quality library going into code that I will be very early on the phone tree to support when it goes down. However, it has not been deployed into something quite that serious yet. (I will update this statement when that changes.)

It is intended to deal gracefully with the real failure cases that can occur with supervision trees (such as burning all your CPU time endlessly restarting dead services), while also making no unnecessary demands on the "service" code, and providing hooks to perform adequate logging with in a production environment.

A blog post describing the design decisions is available.

This module is fully covered with godoc, including an example, usage, and everything else you might expect from a README.md on GitHub. (DRY.)

This is not currently tagged with particular git tags for Go as this is currently considered to be alpha code. As I move this into production and feel more confident about it, I'll give it relevant tags.

Code Signing

Starting with the commit after ac7cf8591b, I will be signing this repository with the "jerf" keybase account.

Aspiration

One of the big wins the Erlang community has with their pervasive OTP support is that it makes it easy for them to distribute libraries that easily fit into the OTP paradigm. It ought to someday be considered a good idea to distribute libraries that provide some sort of supervisor tree functionality out of the box. It is possible to provide this functionality without explicitly depending on the Suture library.

Documentation

Overview

Package suture provides Erlang-like supervisor trees.

This implements Erlang-esque supervisor trees, as adapted for Go. This is intended to be an industrial-strength implementation, but it has not yet been deployed in a hostile environment. (It's headed there, though.)

Supervisor Tree -> SuTree -> suture -> holds your code together when it's trying to fall apart.

Why use Suture?

  • You want to write bullet-resistant services that will remain available despite unforeseen failure.
  • You need the code to be smart enough not to consume 100% of the CPU restarting things.
  • You want to easily compose multiple such services in one program.
  • You want the Erlang programmers to stop lording their supervision trees over you.

Suture has 100% test coverage, and is golint clean. This doesn't prove it free of bugs, but it shows I care.

A blog post describing the design decisions is available at http://www.jerf.org/iri/post/2930 .

Using Suture

To idiomatically use Suture, create a Supervisor which is your top level "application" supervisor. This will often occur in your program's "main" function.

Create "Service"s, which implement the Service interface. .Add() them to your Supervisor. Supervisors are also services, so you can create a tree structure here, depending on the exact combination of restarts you want to create.

Finally, as what is probably the last line of your main() function, call .Serve() on your top level supervisor. This will start all the services you've defined.

See the Example for an example, using a simple service that serves out incrementing integers.

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrWrongSupervisor = errors.New("wrong supervisor for this service token, no service removed")

ErrWrongSupervisor is returned by the (*Supervisor).Remove method if you pass a ServiceToken from the wrong Supervisor.

Functions

This section is empty.

Types

type Service

type Service interface {
	Serve()
	Stop()
}

Service is the interface that describes a service to a Supervisor.

Serve Method

The Serve method is called by a Supervisor to start the service. The service should execute within the goroutine that this is called in. If this function either returns or panics, the Supervisor will call it again.

A Serve method SHOULD do as much cleanup of the state as possible, to prevent any corruption in the previous state from crashing the service again.

Stop Method

This method is used by the supervisor to stop the service. Calling this directly on a Service given to a Supervisor will simply result in the Service being restarted; use the Supervisor's .Remove(ServiceToken) method to stop a service. A supervisor will call .Stop() only once. Thus, it may be as destructive as it likes to get the service to stop.

Once Stop has been called on a Service, the Service SHOULD NOT be reused in any other supervisor! Because of the impossibility of guaranteeing that the service has actually stopped in Go, you can't prove that you won't be starting two goroutines using the exact same memory to store state, causing completely unpredictable behavior.

Stop should not return until the service has actually stopped. "Stopped" here is defined as "the service will stop servicing any further requests in the future". For instance, a common implementation is to receive a message on a dedicated "stop" channel and immediately returning. Once the stop command has been processed, the service is stopped.

Another common Stop implementation is to forcibly close an open socket or other resource, which will cause detectable errors to manifest in the service code. Bear in mind that to perfectly correctly use this approach requires a bit more work to handle the chance of a Stop command coming in before the resource has been created.

If a service does not Stop within the supervisor's timeout duration, a log entry will be made with a descriptive string to that effect. This does not guarantee that the service is hung; it may still get around to being properly stopped in the future. Until the service is fully stopped, both the service and the spawned goroutine trying to stop it will be "leaked".

Stringer Interface

It is not mandatory to implement the fmt.Stringer interface on your service, but if your Service does happen to implement that, the log messages that describe your service will use that when naming the service. Otherwise, you'll see the GoString of your service object, obtained via fmt.Sprintf("%#v", service).

type ServiceToken

type ServiceToken struct {
	// contains filtered or unexported fields
}

ServiceToken is an opaque identifier that can be used to terminate a service that has been Add()ed to a Supervisor.

type Spec

type Spec struct {
	Log              func(string)
	FailureDecay     float64
	FailureThreshold float64
	FailureBackoff   time.Duration
	Timeout          time.Duration
}

Spec is used to pass arguments to the New function to create a supervisor. See the New function for full documentation.

type Supervisor

type Supervisor struct {
	Name string
	// contains filtered or unexported fields
}

Supervisor is the core type of the module that represents a Supervisor.

Supervisors should be constructed either by New or NewSimple.

Once constructed, a Supervisor should be started in one of three ways:

  1. Calling .Serve().
  2. Calling .ServeBackground().
  3. Adding it to an existing Supervisor.

Calling Serve will cause the supervisor to run until it is shut down by an external user calling Stop() on it. If that never happens, it simply runs forever. I suggest creating your services in Supervisors, then making a Serve() call on your top-level Supervisor be the last line of your main func.

Calling ServeBackground will CORRECTLY start the supervisor running in a new goroutine. You do not want to just:

go supervisor.Serve()

because that will briefly create a race condition as it starts up, if you try to .Add() services immediately afterward.

func New

func New(name string, spec Spec) (s *Supervisor)

New is the full constructor function for a supervisor.

The name is a friendly human name for the supervisor, used in logging. Suture does not care if this is unique, but it is good for your sanity if it is.

If not set, the following values are used:

  • Log: A function is created that uses log.Print.
  • FailureDecay: 30 seconds
  • FailureThreshold: 5 failures
  • FailureBackoff: 15 seconds
  • Timeout: 10 seconds

The Log function will be called when errors occur. Suture will log the following:

  • When a service has failed, with a descriptive message about the current backoff status, and whether it was immediately restarted
  • When the supervisor has gone into its backoff mode, and when it exits it
  • When a service fails to stop

The failureRate, failureThreshold, and failureBackoff controls how failures are handled, in order to avoid the supervisor failure case where the program does nothing but restarting failed services. If you do not care how failures behave, the default values should be fine for the vast majority of services, but if you want the details:

The supervisor tracks the number of failures that have occurred, with an exponential decay on the count. Every FailureDecay seconds, the number of failures that have occurred is cut in half. (This is done smoothly with an exponential function.) When a failure occurs, the number of failures is incremented by one. When the number of failures passes the FailureThreshold, the entire service waits for FailureBackoff seconds before attempting any further restarts, at which point it resets its failure count to zero.

Timeout is how long Suture will wait for a service to properly terminate.

Example (Simple)
package main

import "fmt"

type Incrementor struct {
	current int
	next    chan int
	stop    chan bool
}

func (i *Incrementor) Stop() {
	fmt.Println("Stopping the service")
	i.stop <- true
}

func (i *Incrementor) Serve() {
	for {
		select {
		case i.next <- i.current:
			i.current += 1
		case <-i.stop:
			// We sync here just to guarantee the output of "Stopping the service",
			// so this passes the test reliably.
			// Most services would simply "return" here.
			i.stop <- true
			return
		}
	}
}

func main() {
	supervisor := NewSimple("Supervisor")
	service := &Incrementor{0, make(chan int), make(chan bool)}
	supervisor.Add(service)

	go supervisor.ServeBackground()

	fmt.Println("Got:", <-service.next)
	fmt.Println("Got:", <-service.next)
	supervisor.Stop()

	// We sync here just to guarantee the output of "Stopping the service"
	<-service.stop

}
Output:

Got: 0
Got: 1
Stopping the service

func NewSimple

func NewSimple(name string) *Supervisor

NewSimple is a convenience function to create a service with just a name and the sensible defaults.

func (*Supervisor) Add

func (s *Supervisor) Add(service Service) ServiceToken

Add adds a service to this supervisor.

If the supervisor is currently running, the service will be started immediately. If the supervisor is not currently running, the service will be started when the supervisor is.

The returned ServiceID may be passed to the Remove method of the Supervisor to terminate the service.

func (*Supervisor) Remove

func (s *Supervisor) Remove(id ServiceToken) error

Remove will remove the given service from the Supervisor, and attempt to Stop() it. The ServiceID token comes from the Add() call.

func (*Supervisor) Serve

func (s *Supervisor) Serve()

Serve starts the supervisor. You should call this on the top-level supervisor, but nothing else.

func (*Supervisor) ServeBackground

func (s *Supervisor) ServeBackground()

ServeBackground starts running a supervisor in its own goroutine. This method does not return until it is safe to use .Add() on the Supervisor.

func (*Supervisor) Services

func (s *Supervisor) Services() []Service

Services returns a []Service containing a snapshot of the services this Supervisor is managing.

func (*Supervisor) Stop

func (s *Supervisor) Stop()

Stop stops the Supervisor.

func (*Supervisor) String

func (s *Supervisor) String() string

String implements the fmt.Stringer interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL