externalgrpc

package

v0.0.0-...-e4898a9 Latest Latest Go to latest Published: Dec 17, 2024 License: Apache-2.0 Imports: 23 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kubernetes/autoscaler

README ¶

External gRPC Cloud Provider

The External gRPC Cloud Provider provides a plugin system to support out-of-tree cloud provider implementations.

Cluster Autoscaler adds or removes nodes from the cluster by creating or deleting VMs. To separate the autoscaling logic (the same for all clouds) from the API calls required to execute it (different for each cloud), the latter are hidden behind an interface, CloudProvider. Each supported cloud has its own implementation in this repository and --cloud-provider flag determines which one will be used.

The gRPC Cloud Provider acts as a client for a cloud provider that implements its custom logic separately from the cluster autoscaler, and serves it as a CloudProvider gRPC service (similar to the CloudProvider interface) without the need to fork this project, follow its development lifecyle, adhere to its rules (e.g. do not use additional external dependencies) or implement the Cluster API.

Configuration

For the cluster autoscaler parameters, use the --cloud-provider=externalgrpc flag and define the cloud configuration file with --cloud-config=<file location>, this is yaml file with the following parameters:

Key	Value	Mandatory	Default
address	external gRPC cloud provider service address of the form "host:port", "host%zone:port", "[host]:port" or "[host%zone]:port"	yes	none
key	path to file containing the tls key, if using mTLS	no	none
cert	path to file containing the tls certificate, if using mTLS	no	none
cacert	path to file containing the CA certificate, if using mTLS	no	none
grpc_timeout	timeout of invoking a grpc call	no	5s

The use of mTLS is recommended, since simple, non-authenticated calls to the external gRPC cloud provider service will result in the creation / deletion of nodes.

Log levels of interest for this provider are:

1 (flag: --v=1): basic logging of errors;
5 (flag: --v=5): detailed logging of every call;

For the deployment and configuration of an external gRPC cloud provider of choice, see its specific documentation.

Examples

You can find an example of external gRPC cloud provider service implementation on the examples/external-grpc-cloud-provider-service directory: it is actually a server that wraps all the in-tree cloud providers.

A complete example:

deploy cert-manager and the manifests in examples/certmanager-manifests to generate certificates for gRPC client and server;
build the image for the example external gRPC cloud provider service as defined in examples/external-grpc-cloud-provider-service;
deploy the example external gRPC cloud provider service using the manifests at examples/external-grpc-cloud-provider-service-manifests, change the parameters as needed and test whichever cloud provider you want;
deploy the cluster autoscaler selecting the External gRPC Cloud Provider using the manifests at examples/cluster-autoscaler-manifests.

Development

External gRPC Cloud Provider service Implementation

To build a cloud provider, create a gRPC server for the CloudProvider service defined in protos/externalgrpc.proto that implements all its required RPCs.

Caching

The CloudProvider interface was designed with the assumption that its implementation functions would be fast, this may not be true anymore with the added overhead of gRPC. In the interest of performance, some gRPC API responses are cached by this cloud provider:

NodeGroupForNode() caches the node group for a node until Refresh() is called;
NodeGroups() caches the current node groups until Refresh() is called;
GPULabel() and GetAvailableGPUTypes() are cached at first call and never wiped;
A NodeGroup caches MaxSize(), MinSize() and Debug() return values during its creation, and TemplateNodeInfo() at its first call, these values will be cached for the lifetime of the NodeGroup object.

Code Generation

To regenerate the gRPC code:

install protoc and protoc-gen-go-grpc:

go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.31
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.3

generate gRPC client and server code:

protoc \
  -I ./cluster-autoscaler \
  -I ./cluster-autoscaler/vendor \
  --go_out=. \
  --go-grpc_out=. \
  ./cluster-autoscaler/cloudprovider/externalgrpc/protos/externalgrpc.proto

General considerations

Abstractions used by Cluster Autoscaler assume nodes belong to "node groups". All node within a group must be of the same machine type (have the same amount of resources), have the same set of labels and taints, and be located in the same availability zone. This doesn't mean a cloud has to have a concept of such node groups, but it helps.

There must be a way to delete a specific node. If your cloud supports instance groups, and you are only able to provide a method to decrease the size of a given group, without guaranteeing which instance will be killed, it won't work well.

There must be a way to match a Kubernetes node to an instance it is running on. This is usually done by kubelet setting node's ProviderId field to an instance id which can be used in API calls to cloud.

Documentation ¶

Index ¶

func BuildExternalGrpc(opts config.AutoscalingOptions, do cloudprovider.NodeGroupDiscoveryOptions, ...) cloudprovider.CloudProvider
type NodeGroup

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BuildExternalGrpc ¶

func BuildExternalGrpc(
	opts config.AutoscalingOptions,
	do cloudprovider.NodeGroupDiscoveryOptions,
	rl *cloudprovider.ResourceLimiter,
) cloudprovider.CloudProvider

BuildExternalGrpc builds the externalgrpc cloud provider.

Types ¶

type NodeGroup ¶

type NodeGroup struct {
	// contains filtered or unexported fields
}

NodeGroup implements cloudprovider.NodeGroup interface. NodeGroup contains configuration info and functions to control a set of nodes that have the same capacity and set of labels.

func (*NodeGroup) AtomicIncreaseSize ¶

func (n *NodeGroup) AtomicIncreaseSize(delta int) error

AtomicIncreaseSize is not implemented.

func (*NodeGroup) Autoprovisioned ¶

func (n *NodeGroup) Autoprovisioned() bool

Autoprovisioned returns true if the node group is autoprovisioned. An autoprovisioned group was created by CA and can be deleted when scaled to 0.

func (*NodeGroup) Create ¶

func (n *NodeGroup) Create() (cloudprovider.NodeGroup, error)

Create creates the node group on the cloud provider side. Implementation optional.

func (*NodeGroup) Debug ¶

func (n *NodeGroup) Debug() string

Debug returns a string containing all information regarding this node group.

func (*NodeGroup) DecreaseTargetSize ¶

func (n *NodeGroup) DecreaseTargetSize(delta int) error

DecreaseTargetSize decreases the target size of the node group. This function doesn't permit to delete any existing node and can be used only to reduce the request for new nodes that have not been yet fulfilled. Delta should be negative. It is assumed that cloud provider will not delete the existing nodes when there is an option to just decrease the target. Implementation required.

func (*NodeGroup) Delete ¶

func (n *NodeGroup) Delete() error

Delete deletes the node group on the cloud provider side. This will be executed only for autoprovisioned node groups, once their size drops to 0. Implementation optional.

func (*NodeGroup) DeleteNodes ¶

func (n *NodeGroup) DeleteNodes(nodes []*apiv1.Node) error

DeleteNodes deletes nodes from this node group (and also increasing the size of the node group with that). Error is returned either on failure or if the given node doesn't belong to this node group. This function should wait until node group size is updated. Implementation required.

func (*NodeGroup) Exist ¶

func (n *NodeGroup) Exist() bool

Exist checks if the node group really exists on the cloud provider side. Allows to tell the theoretical node group from the real one. Implementation required.

func (*NodeGroup) ForceDeleteNodes ¶

func (n *NodeGroup) ForceDeleteNodes(nodes []*apiv1.Node) error

ForceDeleteNodes deletes nodes from the group regardless of constraints.

func (*NodeGroup) GetOptions ¶

func (n *NodeGroup) GetOptions(defaults config.NodeGroupAutoscalingOptions) (*config.NodeGroupAutoscalingOptions, error)

GetOptions returns NodeGroupAutoscalingOptions that should be used for this particular NodeGroup. Returning a nil will result in using default options.

func (*NodeGroup) Id ¶

func (n *NodeGroup) Id() string

Id returns an unique identifier of the node group.

func (*NodeGroup) IncreaseSize ¶

func (n *NodeGroup) IncreaseSize(delta int) error

IncreaseSize increases the size of the node group. To delete a node you need to explicitly name it and use DeleteNode. This function should wait until node group size is updated. Implementation required.

func (*NodeGroup) MaxSize ¶

func (n *NodeGroup) MaxSize() int

MaxSize returns maximum size of the node group.

func (*NodeGroup) MinSize ¶

func (n *NodeGroup) MinSize() int

MinSize returns minimum size of the node group.

func (*NodeGroup) Nodes ¶

func (n *NodeGroup) Nodes() ([]cloudprovider.Instance, error)

Nodes returns a list of all nodes that belong to this node group. It is required that Instance objects returned by this method have Id field set. Other fields are optional.

func (*NodeGroup) TargetSize ¶

func (n *NodeGroup) TargetSize() (int, error)

TargetSize returns the current target size of the node group. It is possible that the number of nodes in Kubernetes is different at the moment but should be equal to Size() once everything stabilizes (new nodes finish startup and registration or removed nodes are deleted completely). Implementation required.

func (*NodeGroup) TemplateNodeInfo ¶

func (n *NodeGroup) TemplateNodeInfo() (*framework.NodeInfo, error)

TemplateNodeInfo returns a framework.NodeInfo structure of an empty (as if just started) node. This will be used in scale-up simulations to predict what would a new node look like if a node group was expanded. The returned NodeInfo is expected to have a fully populated Node object, with all of the labels, capacity and allocatable information as well as all pods that are started on the node by default, using manifest (most likely only kube-proxy). Implementation optional.

The definition of a generic `NodeInfo` for each potential provider is a pretty complex approach and does not cover all the scenarios. For the sake of simplicity, the `nodeInfo` is defined as a Kubernetes `k8s.io.api.core.v1.Node` type where the system could still extract certain info about the node.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples
external-grpc-cloud-provider-service
external-grpc-cloud-provider-service/wrapper
protos

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL