bmdb

package
v0.0.0-...-fc6e1cf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 23, 2024 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Overview

Package bmdb implements a connector to the Bare Metal Database, which is the main data store backing information about bare metal machines.

All components of the BMaaS project connect directly to the underlying CockroachDB database storing this data via this library. In the future, this library might turn into a shim which instead connects to a coordinator service over gRPC.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrSessionExpired is returned when attempting to Transact or Work on a
	// Session that has expired or been canceled. Once a Session starts returning
	// these errors, it must be re-created by another StartSession call, as no other
	// calls will succeed.
	ErrSessionExpired = errors.New("session expired")
	// ErrWorkConflict is returned when attempting to Work on a Session with a
	// process name that's already performing some work, concurrently, on the
	// requested machine.
	ErrWorkConflict = errors.New("conflicting work on machine")
)
View Source
var (
	ErrNothingToDo = errors.New("nothing to do")
)

Functions

This section is empty.

Types

type BMDB

type BMDB struct {
	Config
	// contains filtered or unexported fields
}

BMDB is the Bare Metal Database, a common schema to store information about bare metal machines in CockroachDB. This struct is supposed to be embedded/contained by different components that interact with the BMDB, and provides a common interface to BMDB operations to these components.

The BMDB provides two mechanisms facilitating a 'reactive work system' being implemented on the bare metal machine data:

  • Sessions, which are maintained by heartbeats by components and signal the liveness of said components to other components operating on the BMDB. These effectively extend CockroachDB's transactions to be visible as row data. Any session that is not actively being updated by a component can be expired by a component responsible for lease garbage collection.
  • Work locking, which bases on Sessions and allows long-standing multi-transaction work to be performed on given machines, preventing conflicting work from being performed by other components. As both Work locking and Sessions are plain row data, other components can use SQL queries to exclude machines to act on by constraining SELECT queries to not return machines with some active work being performed on them.

func (*BMDB) EnableMetrics

func (b *BMDB) EnableMetrics(registry *prometheus.Registry)

EnableMetrics configures BMDB metrics collection and registers it on the given registry. This method should only be called once, and is not goroutine safe.

func (*BMDB) Open

func (b *BMDB) Open(migrate bool) (*Connection, error)

Open creates a new Connection to the BMDB for the calling component. Multiple connections can be opened (although there is no advantage to doing so, as Connections manage an underlying CockroachDB connection pool, which performs required reconnects and connection pooling automatically).

type Backoff

type Backoff struct {
	// Initial backoff period, used for the backoff if this item failed for the first
	// time (i.e. has not had a Finish call in between two Fail calls).
	//
	// Subsequent calls will ignore this field if the backoff is exponential. If
	// non-exponential, the initial time will always override whatever was previously
	// persisted in the database, i.e. the backoff will always be of value 'Initial'.
	//
	// Cannot be lower than one second. If it is, it will be capped to it.
	Initial time.Duration `u:"initial"`

	// Maximum time for backoff. If the calculation of the next back off period
	// (based on the Exponent and last backoff value) exceeds this maximum, it will
	// be capped to it.
	//
	// Maximum is not persisted in the database. Instead, it is always read from this
	// structure.
	//
	// Cannot be lower than Initial. If it is, it will be capped to it.
	Maximum time.Duration `u:"maximum"`

	// Exponent used for next backoff calculation. Any time a work item fails
	// directly after another failure, the previous backoff period will be multiplied
	// by the exponent to yield the new backoff period. The new period will then be
	// capped to Maximum.
	//
	// Exponent is not persisted in the database. Instead, it is always read from
	// this structure.
	//
	// Cannot be lower than 1.0. If it is, it will be capped to it.
	Exponent float64 `u:"exponent"`
}

Backoff describes the configuration of backoff for a failed work item. It can be passed to Work.Fail to cause an item to not be processed again (to be 'in backoff') for a given period of time. Exponential backoff can be configured so that subsequent failures of a process will have exponentially increasing backoff periods, up to some maximum length.

The underlying unit of backoff period length in the database is one second. What that means is that all effective calculated backoff periods must be an integer number of seconds. This is performed by always rounding up this period to the nearest second. A side effect of this is that with exponential backoff, non-integer exponents will be less precisely applied for small backoff values, e.g. an exponent of 1.1 with initial backoff of 1s will generate the following sequence of backoff periods:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17

Which corresponds to the following approximate multipliers in between periods:

2.00, 1.50, 1.33, 1.25, 1.20, 1.17, 1.14, 1.12, 1.11, 1.10, 1.18, 1.15, 1.13

Thus, the exponent value should be treated more as a limit that the sequence of periods will approach than a hard rule for calculating the periods. However, if the exponent is larger than 1 (i.e. any time exponential backoff is requested), this guarantees that the backoff won't get 'stuck' on a repeated period value due to a rounding error.

A zero backoff structure is valid and represents a non-exponential backoff of one second.

A partially filled structure is also valid. See the field comments for more information about how fields are capped if not set. The described behaviour allows for two useful shorthands:

  1. If only Initial is set, then the backoff is non-exponential and will always be of value Initial (or whatever the previous period already persisted the database).
  2. If only Maximum and Exponent are set, the backoff will be exponential, starting at one second, and exponentially increasing to Maximum.

It is recommended to construct Backoff structures as const values and treat them as read-only 'descriptors', one per work kind / process.

One feature currently missing from the Backoff implementation is jitter. This might be introduced in the future if deemed necessary.

type Config

type Config struct {
	Database component.CockroachConfig

	// ComponentName is a human-readable name of the component connecting to the
	// BMDB, and is stored in any Sessions managed by this component's connector.
	ComponentName string
	// RuntimeInfo is a human-readable 'runtime information' (eg. software version,
	// host machine/job information, IP address, etc.) stored alongside the
	// ComponentName in active Sessions.
	RuntimeInfo string
}

Config is the configuration of the BMDB connector.

type Connection

type Connection struct {

	// The database name that we're connected to.
	DatabaseName string
	// The address of the CockroachDB endpoint we've connected to.
	Address string
	// Whether this connection is to an in-memory database. Note: this only works if
	// this Connection came directly from calling Open on a BMDB that was defined to
	// be in-memory. If you just connect to an in-memory CRDB manually, this will
	// still be false.
	InMemory bool
	// contains filtered or unexported fields
}

Connection to the BMDB. Internally, this contains a sql.DB connection pool, so components can (and likely should) reuse Connections as much as possible internally.

func (*Connection) GetSession

func (c *Connection) GetSession(ctx context.Context, session uuid.UUID) ([]model.Session, error)

GetSession retrieves all information about a session. It can be read without a session/transaction for debugging purposes.

func (*Connection) ListHistoryOf

func (c *Connection) ListHistoryOf(ctx context.Context, machine uuid.UUID) ([]model.WorkHistory, error)

ListHistoryOf retrieves a full audit history of a machine, sorted chronologically. It can be read without a session / transaction for debugging purposes.

func (*Connection) Reflect

func (c *Connection) Reflect(ctx context.Context) (*reflection.Schema, error)

Reflect returns a reflection.Schema as detected by inspecting the table information of this connection to the BMDB. The Schema can then be used to retrieve arbitrary tag/machine information without going through the concurrency/ordering mechanism of the BMDB.

This should only be used to implement debugging tooling and should absolutely not be in the path of any user requests.

This Connection will be used not only to query the Schema information, but also for all subsequent data retrieval operations on it. Please ensure that the Schema is rebuilt in the event of a database connection failure. Ideally, you should be rebuilding the schema often, to follow what is currently available on the production database - but not for every request. Use a cache or something.

func (*Connection) StartSession

func (c *Connection) StartSession(ctx context.Context, opts ...SessionOption) (*Session, error)

StartSession creates a new BMDB session which will be maintained in a background goroutine as long as the given context is valid. Each Session is represented by an entry in a sessions table within the BMDB, and subsequent Transact calls emit SQL transactions which depend on that entry still being present and up to date. A garbage collection system (to be implemented) will remove expired sessions from the BMDB, but this mechanism is not necessary for the session expiry mechanism to work.

When the session becomes invalid (for example due to network partition), subsequent attempts to call Transact will fail with ErrSessionExpired. This means that the caller within the component is responsible for recreating a new Session if a previously used one expires.

type Session

type Session struct {
	UUID uuid.UUID
	// contains filtered or unexported fields
}

Session is a session (identified by UUID) that has been started in the BMDB. Its liveness is maintained by a background goroutine, and as long as that session is alive, it can perform transactions and work on the BMDB.

func (*Session) Expired

func (s *Session) Expired() bool

Expired returns true if this session is expired and will fail all subsequent transactions/work.

func (*Session) Transact

func (s *Session) Transact(ctx context.Context, fn func(q *model.Queries) error) error

Transact runs a given function in the context of both a CockroachDB and BMDB transaction, retrying as necessary.

Most pure (meaning without side effects outside the database itself) BMDB transactions should be run this way.

func (*Session) Work

func (s *Session) Work(ctx context.Context, process model.Process, fn func(q *model.Queries) ([]uuid.UUID, error)) (*Work, error)

Work starts work on a machine. Full work execution is performed in three phases:

  1. Retrieval phase. This is performed by 'fn' given to this function. The retrieval function must return zero or more machines that some work should be performed on per the BMDB. The first returned machine will be locked for work under the given process and made available in the Work structure returned by this call. The function may be called multiple times, as it's run within a CockroachDB transaction which may be retried an arbitrary number of times. Thus, it should be side-effect free, ideally only performing read queries to the database.
  2. Work phase. This is performed by user code while holding on to the Work structure instance.
  3. Commit phase. This is performed by the function passed to Work.Finish. See that method's documentation for more details.

Important: after retrieving Work successfully, either Finish or Cancel must be called, otherwise the machine will be locked until the parent session expires or is closed! It's safe and recommended to `defer work.Close()` after calling Work().

If no machine is eligible for work, ErrNothingToDo should be returned by the retrieval function, and the same error (wrapped) will be returned by Work. In case the retrieval function returns no machines and no error, that error will also be returned.

The returned Work object is _not_ goroutine safe.

type SessionOption

type SessionOption struct {
	Processor metrics.Processor
}

type Work

type Work struct {
	// Machine that this work is being performed on, as retrieved by the retrieval
	// function passed to the Work method.
	Machine uuid.UUID
	// contains filtered or unexported fields
}

Work being performed on a machine.

func (*Work) Cancel

func (w *Work) Cancel(ctx context.Context)

Cancel the Work started on a machine. If the work has already been finished or canceled, this is a no-op. In case of error, a log line will be emitted.

func (*Work) Fail

func (w *Work) Fail(ctx context.Context, backoff *Backoff, cause string) error

Fail work and introduce backoff. The given cause is an operator-readable string that will be persisted alongside the backoff and the work history/audit table.

The backoff describes a period during which the same process will not be retried on this machine until its expiration.

The given backoff is a structure which describes both the initial backoff period if the work failed for the first time, and a mechanism to exponentially increase the backoff period if that work failed repeatedly. The work is defined to have failed repeatedly if it only resulted in Cancel/Fail calls without any Finish calls in the meantime.

Only the last backoff period is persisted in the database. The exponential backoff behaviour (including its maximum time) is always calculated based on the given backoff structure.

If nil, the backoff defaults to a non-exponential, one second backoff. This is the minimum designed to keep the system chugging along without repeatedly trying a failed job in a loop. However, the backoff should generally be set to some well engineered value to prevent spurious retries.

func (*Work) Finish

func (w *Work) Finish(ctx context.Context, fn func(q *model.Queries) error) error

Finish work by executing a commit function 'fn' and releasing the machine from the work performed. The function given should apply tags to the processed machine in a way that causes it to not be eligible for retrieval again. As with the retriever function, the commit function might be called an arbitrary number of times as part of cockroachdb transaction retries.

This may be called only once.

Directories

Path Synopsis
Package metrics implements a Prometheus metrics submission interface for BMDB client components.
Package metrics implements a Prometheus metrics submission interface for BMDB client components.
Package reflection implements facilities to retrieve information about the implemented Tags and their types from a plain CockroachDB SQL connection, bypassing the queries/types defined in models.
Package reflection implements facilities to retrieve information about the implemented Tags and their types from a plain CockroachDB SQL connection, bypassing the queries/types defined in models.
Package webug implements a web-based debug/troubleshooting/introspection system for the BMDB.
Package webug implements a web-based debug/troubleshooting/introspection system for the BMDB.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL