debug

package
v2.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 24, 2022 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

The debug package is intended to aid in live debugging of misbehaving Ambassadors in production. Its role is somewhat akin to a debugger and/or a profiler in that it gives you visiblity into important internal state and performance charactaristics of Ambassador, however unlike a debugger and/or profiler it is intended to function well in a production context.

The debug package is also very complementary to a logging library. To make an analogy to GPS Navigation, a logging library is kind of like directions that your nav system produces. The information that logging provides is often similar to "turn left", "go straight", etc. This is super useful if you are trying to understand how your code got to the wrong destination, but often if you are trying to live debug a misbehaving Ambassador in production, you don't necessarily care as much about how it got to where it is. You care about *quickly* understanding where it is and how to get it to someplace better. The debug library is intended to help with that by giving you an understanding of exactly where Ambassador is without requiring the tedius exercise of pooring over the logs while also staring at and mentally executing the code in order to reconstruct the relevant state. Instead of doing that we just make the code capable of telling us exactly where it has ended up.

So how does this work? Well there is a new endpoint at `localhost:8877/debug` and you can run `curl localhost:8877/debug` to see some useful information.

There are currently two kinds of debug information that it exposes:

1. Timers

We've learned that the timing of various actions that ambassador takes is actually very important to its production behavior. Things like:

  • how long it takes us to load secrets
  • how long it takes us to validate envoy configuration
  • how long it takes us to compute a new envoy configuration
  • how long it takes us to respond to a probe request

Anywhere in the code can now easily add new timing information to this debug endpoint by doing the following:

dbg := debug.FromContext(ctx)
timer := dbg.Timer("myAction")

timer.Time(func() {
  // ... do some work
})

// or

stop := timer.Start()
// ... do some work
stop()

// or

func() {
  defer timer.Start()() // this is really the same as the stop case but using defer to guarantee stop gets called
  // ... do some work
}()

There are also special purpose convenience middleware for adding timing information to http handlers:

handler = timer.TimedHandler(handler)
// or
handlerFunc = timer.TimedHandlerFunc(handlerFunc)

A timer tracks a count, minimum, maximum, and average timing info for all actions:

{
  "timers": {
    "check_alive": "0, 0s/0s/0s",
    "check_ready": "0, 0s/0s/0s",
    "consulUpdate": "0, 0s/0s/0s",
    "katesUpdate": "615, 29.385µs/297.114µs/95.220222ms",
    "notifyWebhook:diagd": "2, 1.206967947s/1.3298432s/1.452718454s",
    "notifyWebhooks": "2, 1.207007216s/1.329901037s/1.452794859s",
    "parseAnnotations": "2, 21.944µs/22.541µs/23.138µs",
    "reconcileConsul": "2, 50.104µs/55.499µs/60.894µs",
    "reconcileSecrets": "2, 18.704µs/20.786µs/22.868µs"
  },
  ...
}

2. Atomic Values

Another tool in the toolkit for externalizing relevant state is atomic values. Anywhere in the code can now expose important values in the following way:

dbg := debug.FromContext(ctx)
value := dbg.Value("myValue")

// ...

value.Store("blah")

// or

v := StructThatCanBeJsonMarshalled{...}
value.Store(v)

The debug endpoint will now show the current value for "myValue". The only requirement for the stored value is that it can be marshalled as json.

We currently use this to expose how much memory ambassador is using as well as the state of the envoy reconfiguration, but there are lots of other relevant state we can/should expose in the future as we understand what is important to ambassador's behavior:

{
  ...
  "values": {
    "envoyReconfigs": {
      "times": [
        "2020-11-06T13:13:24.218707995-05:00",
        "2020-11-06T13:13:27.185754494-05:00",
        "2020-11-06T13:13:28.612279777-05:00"
      ],
      "staleCount": 2,
      "staleMax": 0,
      "synced": true
    },
    "memory": "39.68Gi of Unlimited (0%)"
  }
}

The full output of the debug endpoint now currently looks like this:

$ curl localhost:8877/debug
{
  "timers": {
    # these two timers track how long it takes to respond to liveness and readiness probes
    "check_alive": "7, 45.411495ms/61.85999ms/81.358927ms",
    "check_ready": "7, 49.951304ms/61.976205ms/86.279038ms",

    # these two timers track how long we spend updating our in-memory snapshot when our kubernetes
    # watches tell us something has changed
    "consulUpdate": "0, 0s/0s/0s",
    "katesUpdate": "3382, 28.662µs/102.784µs/95.220222ms",

    # These timers tell us how long we spend notifying the sidecars if changed input. This
    # includes how long the sidecars take to process that input.
    "notifyWebhook:diagd": "2, 1.206967947s/1.3298432s/1.452718454s",
    "notifyWebhooks": "2, 1.207007216s/1.329901037s/1.452794859s",

    # This timer tells us how long we spend parsing annotations.
    "parseAnnotations": "2, 21.944µs/22.541µs/23.138µs",

    # This timer tells us how long we spend reconciling changes to consul inputs.
    "reconcileConsul": "2, 50.104µs/55.499µs/60.894µs",

    # This timer tells us how long we spend reconciling secrets related changes to ambassador
    # inputs.
    "reconcileSecrets": "2, 18.704µs/20.786µs/22.868µs"
  },
  "values": {
    "envoyReconfigs": {
      "times": [
        "2020-11-06T13:13:24.218707995-05:00",
        "2020-11-06T13:13:27.185754494-05:00",
        "2020-11-06T13:13:28.612279777-05:00"
      ],
      "staleCount": 2,
      "staleMax": 0,
      "synced": true
    },
    "memory": "39.73Gi of Unlimited (0%)"
  }
}

TODO:

  • Extend the API to permit a description of a given Timer or Value in the code. This will let us implement `curl localhost:8877/debug/help` or similar instead of manually keeping the above docs up to date.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewContext

func NewContext(parent context.Context, debug *Debug) context.Context

The NewContext function creates a child context associated with the given Debug root.

Types

type ClockFunc

type ClockFunc func() time.Time

The type of the clock function to use for timing.

type Debug

type Debug struct {
	// contains filtered or unexported fields
}

This struct serves as the root of all runtime debug info for the process. This consists of timers that aggregate timing info for various kinds of actions, as well as atomic values that can be updated as relevant state changes.

func FromContext

func FromContext(ctx context.Context) (result *Debug)

The FromContext function retrieves the correct debug root for the given context.

func NewDebug

func NewDebug() *Debug

Create a new set of debug info.

func NewDebugWithClock

func NewDebugWithClock(clock ClockFunc) *Debug

Create a new set of debug info with the specified clock function.

func (*Debug) ServeHTTP

func (d *Debug) ServeHTTP(w http.ResponseWriter, r *http.Request)

The ServeHTTP() method will serve a json representation of the contents of the debug root.

func (*Debug) Timer

func (d *Debug) Timer(name string) (result *Timer)

The Timer() method ensures the named timer exists and returns it.

func (*Debug) Value

func (d *Debug) Value(name string) (result *atomic.Value)

The Value() method ensures the named atomic.Value exists and returns it.

type StopFunc

type StopFunc func()

The type of the function used to stop timing.

type Timer

type Timer struct {
	// contains filtered or unexported fields
}

The Timer struct can be used to time discrete actions. It tracks min, max, average, and total elapsed time for all actions. The Timer struct is thread safe, and the Start() method is designed to do proper bookkeeping regardless of concurrent use from multiple goroutines.

Example 1

var GlobalTimer = NewTimer()

func foo() {
  GlobalTimer.Time(func() {
    // ...
  })
}

Example 2

func foo() {
  stop := GlobalTimer.Start()
  defer stop()
  ...
}

Example 3

func foo() {
  defer GlobalTimer.Start()()
  ...
}

func NewTimer

func NewTimer() *Timer

The NewTimer function creates a new timer. It uses time.Now as the clock for the timer.

func NewTimerWithClock

func NewTimerWithClock(clock ClockFunc) *Timer

The NewTimerWithClock function creates a new timer with the given name and clock.

func (*Timer) Average

func (t *Timer) Average() (result time.Duration)

The Average() method returns the average duration of all timed events.

func (*Timer) Copy

func (t *Timer) Copy() (result *Timer)

The Copy() method returns a copy of the timer. This can be used to get a consistent snapshot of all the Count/Min/Max/Average/Total values.

func (*Timer) Count

func (t *Timer) Count() (result int)

The Count() method returns the number of events that have been timed.

func (*Timer) MarshalJSON

func (t *Timer) MarshalJSON() ([]byte, error)

func (*Timer) Max

func (t *Timer) Max() (result time.Duration)

The Max() method returns the maximum duration of all timed events.

func (*Timer) Min

func (t *Timer) Min() (result time.Duration)

The Min() method retuns the minimum duration of all timed events.

func (*Timer) Start

func (t *Timer) Start() StopFunc

The Start() method starts timing an action.

func (*Timer) Time

func (t *Timer) Time(f func())

The Time() method is a convenience method that times invocation of the supplied function.

func (*Timer) TimedHandler

func (t *Timer) TimedHandler(h http.Handler) http.Handler

The TimedHandler() method wraps the supplied Handler with a Handler that times every request.

func (*Timer) TimedHandlerFunc

func (t *Timer) TimedHandlerFunc(h http.HandlerFunc) http.HandlerFunc

The TimedHandlerFunc() method wraps the supplied HandlerFunc with a HandlerFunc that times every request.

func (*Timer) Total

func (t *Timer) Total() (result time.Duration)

The Total() method returns the total duration of all events.

type Value

type Value atomic.Value

An atomic.Value with custom json marshalling.

func (*Value) MarshalJSON

func (v *Value) MarshalJSON() ([]byte, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL