Documentation ¶
Index ¶
- Constants
- func WrapHistogramVec(hist *prometheus.HistogramVec) revsource.ObserveCallback
- type Config
- type Dispatcher
- func (disp *Dispatcher) Call(ctx context.Context, logger *zap.Logger, timeout time.Duration, ...) (*MonitorResult, error)
- func (disp *Dispatcher) ExitError() error
- func (disp *Dispatcher) ExitSignal() <-chan struct{}
- func (disp *Dispatcher) Exited() bool
- func (disp *Dispatcher) HandleMessage(ctx context.Context, logger *zap.Logger, handlers messageHandlerFuncs) error
- type DumpStateConfig
- type EnvArgs
- type GlobalMetrics
- type MainRunner
- type MetricsConfig
- type MetricsSourceConfig
- type MonitorConfig
- type MonitorResult
- type MonitorState
- type NeonVMConfig
- type PerVMMetrics
- type RateThresholdConfig
- type Runner
- func (r *Runner) DoSchedulerRequest(ctx context.Context, logger *zap.Logger, resources api.Resources, ...) (_ *api.PluginResponse, err error)
- func (r *Runner) Run(ctx context.Context, logger *zap.Logger, ...) error
- func (r *Runner) Spawn(ctx context.Context, logger *zap.Logger, ...)
- func (r *Runner) State(ctx context.Context) (*RunnerState, error)
- type RunnerState
- type ScalingConfig
- type SchedulerConfig
- type SchedulerState
- type StateDump
Constants ¶
const ( MinMonitorProtocolVersion api.MonitorProtoVersion = api.MonitorProtoV1_0 MaxMonitorProtocolVersion api.MonitorProtoVersion = api.MonitorProtoV1_0 )
const ( RunnerRestartMinWaitSeconds = 5 RunnerRestartMaxWaitSeconds = 10 )
FIXME: make these timings configurable.
const PluginProtocolVersion api.PluginProtoVersion = api.PluginProtoV5_0
PluginProtocolVersion is the current version of the agent<->scheduler plugin in use by this autoscaler-agent.
Currently, each autoscaler-agent supports only one version at a time. In the future, this may change.
Variables ¶
This section is empty.
Functions ¶
func WrapHistogramVec ¶ added in v0.33.0
func WrapHistogramVec(hist *prometheus.HistogramVec) revsource.ObserveCallback
Types ¶
type Config ¶
type Config struct { RefreshStateIntervalSeconds uint `json:"refereshStateIntervalSeconds"` Scaling ScalingConfig `json:"scaling"` Metrics MetricsConfig `json:"metrics"` Scheduler SchedulerConfig `json:"scheduler"` Monitor MonitorConfig `json:"monitor"` NeonVM NeonVMConfig `json:"neonvm"` Billing billing.Config `json:"billing"` DumpState *DumpStateConfig `json:"dumpState"` }
func ReadConfig ¶
type Dispatcher ¶ added in v0.16.3
type Dispatcher struct {
// contains filtered or unexported fields
}
The Dispatcher is the main object managing the websocket connection to the monitor. For more information on the protocol, see pkg/api/types.go
func NewDispatcher ¶ added in v0.16.3
func NewDispatcher( ctx context.Context, logger *zap.Logger, addr string, runner *Runner, sendUpscaleRequested func(request api.MoreResources, withLock func()), ) (_finalDispatcher *Dispatcher, _ error)
Create a new Dispatcher, establishing a connection with the vm-monitor and setting up all the background threads to manage the connection.
func (*Dispatcher) Call ¶ added in v0.16.3
func (disp *Dispatcher) Call( ctx context.Context, logger *zap.Logger, timeout time.Duration, messageType string, message any, ) (*MonitorResult, error)
Make a request to the monitor and wait for a response. The value passed as message must be a valid value to send to the monitor. See the docs for SerializeMonitorMessage for more.
This function must NOT be called while holding disp.runner.lock.
func (*Dispatcher) ExitError ¶ added in v0.18.0
func (disp *Dispatcher) ExitError() error
ExitError returns the error that caused the dispatcher to exit, if there was one
func (*Dispatcher) ExitSignal ¶ added in v0.18.0
func (disp *Dispatcher) ExitSignal() <-chan struct{}
ExitSignal returns a channel that is closed when the Dispatcher is no longer running
func (*Dispatcher) Exited ¶ added in v0.18.0
func (disp *Dispatcher) Exited() bool
Exited returns whether the Dispatcher is no longer running
Exited will return true iff the channel returned by ExitSignal is closed.
func (*Dispatcher) HandleMessage ¶ added in v0.16.3
func (disp *Dispatcher) HandleMessage( ctx context.Context, logger *zap.Logger, handlers messageHandlerFuncs, ) error
Handle messages from the monitor. Make sure that all message types the monitor can send are included in the inner switch statement.
type DumpStateConfig ¶ added in v0.5.0
type DumpStateConfig struct { // Port is the port to serve on Port uint16 `json:"port"` // TimeoutSeconds gives the maximum duration, in seconds, that we allow for a request to dump // internal state. TimeoutSeconds uint `json:"timeoutSeconds"` }
DumpStateConfig configures the endpoint to dump all internal state
type EnvArgs ¶
type EnvArgs struct { // ConfigPath gives the path to read static configuration from. It is taken from the CONFIG_PATH // environment variable. ConfigPath string // K8sNodeName is the Kubernetes node the autoscaler agent is running on. It is taken from the // K8S_NODE_NAME environment variable, which is set equal to the pod's Spec.NodeName. // // The Kubernetes documentation doesn't say this, but the NodeName is always populated with the // final node the pod was placed on by the time the environment variables are set. K8sNodeName string // K8sPodIP is the IP address of the Kubernetes pod that this autoscaler-agent is running in K8sPodIP string }
EnvArgs stores the static configuration data assigned to the autoscaler agent by its environment
func ArgsFromEnv ¶
type GlobalMetrics ¶ added in v0.19.0
type GlobalMetrics struct {
// contains filtered or unexported fields
}
func (*GlobalMetrics) MonitorLatency ¶ added in v0.33.0
func (m *GlobalMetrics) MonitorLatency() *prometheus.HistogramVec
func (*GlobalMetrics) NeonVMLatency ¶ added in v0.33.0
func (m *GlobalMetrics) NeonVMLatency() *prometheus.HistogramVec
func (*GlobalMetrics) PluginLatency ¶ added in v0.33.0
func (m *GlobalMetrics) PluginLatency() *prometheus.HistogramVec
type MainRunner ¶
type MetricsConfig ¶
type MetricsConfig struct { System MetricsSourceConfig `json:"system"` LFC MetricsSourceConfig `json:"lfc"` }
MetricsConfig defines a few parameters for metrics requests to the VM
type MetricsSourceConfig ¶ added in v0.31.0
type MetricsSourceConfig struct { // Port is the port that VMs are expected to provide the metrics on // // For system metrics, vm-builder installs vector (from vector.dev) to expose them on port 9100. Port uint16 `json:"port"` // RequestTimeoutSeconds gives the timeout duration, in seconds, for metrics requests RequestTimeoutSeconds uint `json:"requestTimeoutSeconds"` // SecondsBetweenRequests sets the number of seconds to wait between metrics requests SecondsBetweenRequests uint `json:"secondsBetweenRequests"` }
type MonitorConfig ¶ added in v0.16.3
type MonitorConfig struct { ResponseTimeoutSeconds uint `json:"responseTimeoutSeconds"` // ConnectionTimeoutSeconds gives how long we may take to connect to the // monitor before cancelling. ConnectionTimeoutSeconds uint `json:"connectionTimeoutSeconds"` // ConnectionRetryMinWaitSeconds gives the minimum amount of time we must wait between attempts // to connect to the vm-monitor, regardless of whether they're successful. ConnectionRetryMinWaitSeconds uint `json:"connectionRetryMinWaitSeconds"` // ServerPort is the port that the dispatcher serves from ServerPort uint16 `json:"serverPort"` // UnhealthyAfterSilenceDurationSeconds gives the duration, in seconds, after which failing to // receive a successful request from the monitor indicates that it is probably unhealthy. UnhealthyAfterSilenceDurationSeconds uint `json:"unhealthyAfterSilenceDurationSeconds"` // UnhealthyStartupGracePeriodSeconds gives the duration, in seconds, after which we will no // longer excuse total VM monitor failures - i.e. when unhealthyAfterSilenceDurationSeconds // kicks in. UnhealthyStartupGracePeriodSeconds uint `json:"unhealthyStartupGracePeriodSeconds"` // MaxHealthCheckSequentialFailuresSeconds gives the duration, in seconds, after which we // should restart the connection to the vm-monitor if health checks aren't succeeding. MaxHealthCheckSequentialFailuresSeconds uint `json:"maxHealthCheckSequentialFailuresSeconds"` // MaxFailedRequestRate defines the maximum rate of failed monitor requests, above which // a VM is considered stuck. MaxFailedRequestRate RateThresholdConfig `json:"maxFailedRequestRate"` // RetryFailedRequestSeconds gives the duration, in seconds, that we must wait before retrying a // request that previously failed. RetryFailedRequestSeconds uint `json:"retryFailedRequestSeconds"` // RetryDeniedDownscaleSeconds gives the duration, in seconds, that we must wait before retrying // a downscale request that was previously denied RetryDeniedDownscaleSeconds uint `json:"retryDeniedDownscaleSeconds"` // RequestedUpscaleValidSeconds gives the duration, in seconds, that requested upscaling should // be respected for, before allowing re-downscaling. RequestedUpscaleValidSeconds uint `json:"requestedUpscaleValidSeconds"` }
type MonitorResult ¶ added in v0.16.3
type MonitorResult struct { Result *api.DownscaleResult Confirmation *api.UpscaleConfirmation HealthCheck *api.HealthCheck }
This struct represents the result of a dispatcher.Call. Because the SignalSender passed in can only be generic over one type - we have this mock enum. Only one field should ever be non-nil, and it should always be clear which field is readable. For example, the caller of dispatcher.call(HealthCheck { .. }) should only read the healthcheck field.
type MonitorState ¶ added in v0.18.0
type MonitorState struct {
WaitersSize int `json:"waitersSize"`
}
Temporary type, to hopefully help with debugging https://github.com/neondatabase/autoscaling/issues/503
type NeonVMConfig ¶ added in v0.28.0
type NeonVMConfig struct { // RequestTimeoutSeconds gives the timeout duration, in seconds, for VM patch requests RequestTimeoutSeconds uint `json:"requestTimeoutSeconds"` // RetryFailedRequestSeconds gives the duration, in seconds, that we must wait after a previous // failed request before making another one. RetryFailedRequestSeconds uint `json:"retryFailedRequestSeconds"` // MaxFailedRequestRate defines the maximum rate of failed NeonVM requests, above which // a VM is considered stuck. MaxFailedRequestRate RateThresholdConfig `json:"maxFailedRequestRate"` }
NeonVMConfig defines a few parameters for NeonVM requests
type PerVMMetrics ¶ added in v0.19.0
type PerVMMetrics struct {
// contains filtered or unexported fields
}
type RateThresholdConfig ¶ added in v0.28.0
type Runner ¶
type Runner struct {
// contains filtered or unexported fields
}
Runner is per-VM Pod god object responsible for handling everything
It primarily operates as a source of shared data for a number of long-running tasks. For additional general information, refer to the comment at the top of this file.
func (*Runner) DoSchedulerRequest ¶ added in v0.20.0
func (r *Runner) DoSchedulerRequest( ctx context.Context, logger *zap.Logger, resources api.Resources, lastPermit *api.Resources, metrics *api.Metrics, ) (_ *api.PluginResponse, err error)
DoSchedulerRequest sends a request to the scheduler and does not validate the response.
func (*Runner) Run ¶
func (r *Runner) Run(ctx context.Context, logger *zap.Logger, vmInfoUpdated util.CondChannelReceiver) error
Run is the main entrypoint to the long-running per-VM pod tasks
type RunnerState ¶
type RunnerState struct { PodIP string `json:"podIP"` ExecutorState executor.StateDump `json:"executorState"` Monitor *MonitorState `json:"monitor"` BackgroundWorkerCount int64 `json:"backgroundWorkerCount"` }
RunnerState is the serializable state of the Runner, extracted by its State method
type ScalingConfig ¶
type ScalingConfig struct { // ComputeUnit is the desired ratio between CPU and memory that the autoscaler-agent should // uphold when making changes to a VM ComputeUnit api.Resources `json:"computeUnit"` // DefaultConfig gives the default scaling config, to be used if there is no configuration // supplied with the "autoscaling.neon.tech/config" annotation. DefaultConfig api.ScalingConfig `json:"defaultConfig"` }
ScalingConfig defines the scheduling we use for scaling up and down
type SchedulerConfig ¶
type SchedulerConfig struct { // SchedulerName is the name of the scheduler we're expecting to communicate with. // // Any VMs that don't have a matching Spec.SchedulerName will not be autoscaled. SchedulerName string `json:"schedulerName"` // RequestTimeoutSeconds gives the timeout duration, in seconds, for requests to the scheduler // // If zero, requests will have no timeout. RequestTimeoutSeconds uint `json:"requestTimeoutSeconds"` // RequestAtLeastEverySeconds gives the maximum duration we should go without attempting a // request to the scheduler, even if nothing's changed. RequestAtLeastEverySeconds uint `json:"requestAtLeastEverySeconds"` // RetryFailedRequestSeconds gives the duration, in seconds, that we must wait after a previous // failed request before making another one. RetryFailedRequestSeconds uint `json:"retryFailedRequestSeconds"` // RetryDeniedUpscaleSeconds gives the duration, in seconds, that we must wait before resending // a request for resources that were not approved RetryDeniedUpscaleSeconds uint `json:"retryDeniedUpscaleSeconds"` // RequestPort defines the port to access the scheduler's ✨special✨ API with RequestPort uint16 `json:"requestPort"` // MaxFailedRequestRate defines the maximum rate of failed scheduler requests, above which // a VM is considered stuck. MaxFailedRequestRate RateThresholdConfig `json:"maxFailedRequestRate"` }
SchedulerConfig defines a few parameters for scheduler requests
type SchedulerState ¶
type SchedulerState struct {
Info schedwatch.SchedulerInfo `json:"info"`
}
SchedulerState is the state of a Scheduler, constructed as part of a Runner's State Method