alicloud

package
v0.0.0-...-64ca097 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 23, 2025 License: Apache-2.0 Imports: 25 Imported by: 3

README

Cluster Autoscaler on AliCloud

The cluster autoscaler on AliCloud scales worker nodes within any specified autoscaling group. It will run as a Deployment in your cluster. This README will go over some of the necessary steps required to get the cluster autoscaler up and running.

Kubernetes Version

Cluster autoscaler must run on v1.9.3 or greater.

Instance Type Support

  • Standard Instancex86-Architecture,suitable for common scenes such as websites or api services.
  • GPU/FPGA InstanceHeterogeneous Computing,suitable for high performance computing.
  • Bare Metal InstanceBoth the elasticity of a virtual server and the high-performance and comprehensive features of a physical server.
  • Spot InstanceSpot instance are on-demand instances. They are designed to reduce your ECS costs in some cases.

ACS Console Deployment

doc: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/auto-scaling-of-nodes

Custom Deployment

1.Prepare Identity authentication
Use access-key-id and access-key-secret
apiVersion: v1
kind: Secret
metadata:
  name: cloud-config
  namespace: kube-system
data:
  # insert your base64 encoded Alicloud access id and key here, ensure there's no trailing newline:
  # such as:  echo -n "your_access_key_id" | base64
  access-key-id: "<BASE64_ACCESS_KEY_ID>"
  access-key-secret: "<BASE64_ACCESS_KEY_SECRET>"
  region-id: "<BASE64_REGION_ID>"
Use STS with RAM Role
{
  "Version": "1",
  "Statement": [
    {
      "Action": [
        "ess:Describe*",
        "ess:CreateScalingRule",
        "ess:ModifyScalingGroup",
        "ess:RemoveInstances",
        "ess:ExecuteScalingRule",
        "ess:ModifyScalingRule",
        "ess:DeleteScalingRule",
        "ess:DetachInstances",
        "ecs:DescribeInstanceTypes"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    }
  ]
}
2.ASG Setup
  • create a Scaling Group in ESS(https://essnew.console.aliyun.com) with valid configurations.
  • create a Scaling Configuration for this Scaling Group with valid instanceType and User Data.In User Data,you can specific the script to initialize the environment and join this node to kubernetes cluster.If your Kubernetes cluster is hosted by ACS.you can use the attach script like this.
#!/bin/sh
# The token is generated by ACS console. https://www.alibabacloud.com/help/doc-detail/64983.htm?spm=a2c63.l28256.b99.33.46395ad54ozJFq
curl http://aliacs-k8s-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/public/pkg/run/attach/[kubernetes_cluster_version]/attach_node.sh | bash -s -- --openapi-token [token] --ess true 
3.cluster-autoscaler deployment
Use access-key-id and access-key-secret
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: admin
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/acs/autoscaler:v1.3.1.2
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=alicloud
            - --nodes=[min]:[max]:[ASG_ID]
          imagePullPolicy: "Always"
          env:
          - name: ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: access-key-id
          - name: ACCESS_KEY_SECRET
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: access-key-secret
          - name: REGION_ID
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: region-id
          volumeMounts:
            - name: ssl-certs
              mountPath: /etc/ssl/certs/ca-certificates.crt
              readOnly: true
          imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: "/etc/ssl/certs/ca-certificates.crt"
Use STS with RAM Role
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: admin
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/acs/autoscaler:v1.3.1.2
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=alicloud
            - --nodes=[min]:[max]:[ASG_ID]
          imagePullPolicy: "Always"
Auto-Discovery Setup

Auto Discovery is not supported in AliCloud currently.

Common Notes and Gotchas:

  • The /etc/ssl/certs/ca-certificates.crt should exist by default on your ecs instance.
  • By default, cluster autoscaler will not terminate nodes running pods in the kube-system namespace. You can override this default behaviour by passing in the --skip-nodes-with-system-pods=false flag.
  • By default, cluster autoscaler will wait 10 minutes between scale down operations, you can adjust this using the --scale-down-delay flag. E.g. --scale-down-delay=5m to decrease the scale down delay to 5 minutes.
  • If you're running multiple ASGs, the --expander flag supports three options: random, most-pods and least-waste. random will expand a random ASG on scale up. most-pods will scale up the ASG that will schedule the most amount of pods. least-waste will expand the ASG that will waste the least amount of CPU/MEM resources. In the event of a tie, cluster-autoscaler will fall back to random.
  • If you're managing your own kubelets, they need to be started with the --provider-id flag.

Documentation

Index

Constants

View Source
const (
	// GPULabel is the label added to nodes with GPU resource.
	GPULabel = "aliyun.accelerator/nvidia_name"
)
View Source
const (

	//ResourceGPU GPU resource type
	ResourceGPU apiv1.ResourceName = "nvidia.com/gpu"
)

Variables

This section is empty.

Functions

func BuildAliCloudProvider

func BuildAliCloudProvider(manager *AliCloudManager, discoveryOpts cloudprovider.NodeGroupDiscoveryOptions, resourceLimiter *cloudprovider.ResourceLimiter) (cloudprovider.CloudProvider, error)

BuildAliCloudProvider builds CloudProvider implementation for AliCloud.

func BuildAlicloud

BuildAlicloud returns alicloud provider

Types

type AliCloudManager

type AliCloudManager struct {
	// contains filtered or unexported fields
}

AliCloudManager handles alicloud service communication.

func CreateAliCloudManager

func CreateAliCloudManager(configReader io.Reader) (*AliCloudManager, error)

CreateAliCloudManager constructs aliCloudManager object.

func (*AliCloudManager) DeleteInstances

func (m *AliCloudManager) DeleteInstances(instanceIds []string) error

DeleteInstances deletes the given instances. All instances must be controlled by the same ASG.

func (*AliCloudManager) GetAsgForInstance

func (m *AliCloudManager) GetAsgForInstance(instanceId string) (*Asg, error)

GetAsgForInstance returns AsgConfig of the given Instance

func (*AliCloudManager) GetAsgNodes

func (m *AliCloudManager) GetAsgNodes(sg *Asg) ([]string, error)

GetAsgNodes returns Asg nodes.

func (*AliCloudManager) GetAsgSize

func (m *AliCloudManager) GetAsgSize(asgConfig *Asg) (int64, error)

GetAsgSize gets ASG size.

func (*AliCloudManager) RegisterAsg

func (m *AliCloudManager) RegisterAsg(asg *Asg)

RegisterAsg registers asg in AliCloud Manager.

func (*AliCloudManager) SetAsgSize

func (m *AliCloudManager) SetAsgSize(asg *Asg, size int64) error

SetAsgSize sets ASG size.

type AliRef

type AliRef struct {
	ID     string
	Region string
}

AliRef contains a reference to ECS instance or .

type Asg

type Asg struct {
	// contains filtered or unexported fields
}

Asg implements NodeGroup interface.

func (*Asg) AtomicIncreaseSize

func (asg *Asg) AtomicIncreaseSize(delta int) error

AtomicIncreaseSize is not implemented.

func (*Asg) Autoprovisioned

func (asg *Asg) Autoprovisioned() bool

Autoprovisioned returns true if the node group is autoprovisioned.

func (*Asg) Belongs

func (asg *Asg) Belongs(node *apiv1.Node) (bool, error)

Belongs returns true if the given node belongs to the NodeGroup.

func (*Asg) Create

func (asg *Asg) Create() (cloudprovider.NodeGroup, error)

Create creates the node group on the cloud provider side.

func (*Asg) Debug

func (asg *Asg) Debug() string

Debug returns a debug string for the Asg.

func (*Asg) DecreaseTargetSize

func (asg *Asg) DecreaseTargetSize(delta int) error

DecreaseTargetSize decreases the target size of the node group. This function doesn't permit to delete any existing node and can be used only to reduce the request for new nodes that have not been yet fulfilled. Delta should be negative. It is assumed that cloud provider will not delete the existing nodes if the size when there is an option to just decrease the target.

func (*Asg) Delete

func (asg *Asg) Delete() error

Delete deletes the node group on the cloud provider side. This will be executed only for autoprovisioned node groups, once their size drops to 0.

func (*Asg) DeleteNodes

func (asg *Asg) DeleteNodes(nodes []*apiv1.Node) error

DeleteNodes deletes the nodes from the group.

func (*Asg) Exist

func (asg *Asg) Exist() bool

Exist checks if the node group really exists on the cloud provider side. Allows to tell the theoretical node group from the real one.

func (*Asg) ForceDeleteNodes

func (asg *Asg) ForceDeleteNodes(nodes []*apiv1.Node) error

ForceDeleteNodes deletes nodes from the group regardless of constraints.

func (*Asg) GetOptions

GetOptions returns NodeGroupAutoscalingOptions that should be used for this particular NodeGroup. Returning a nil will result in using default options.

func (*Asg) Id

func (asg *Asg) Id() string

Id returns asg id.

func (*Asg) IncreaseSize

func (asg *Asg) IncreaseSize(delta int) error

IncreaseSize increases Asg size

func (*Asg) MaxSize

func (asg *Asg) MaxSize() int

MaxSize returns maximum size of the node group.

func (*Asg) MinSize

func (asg *Asg) MinSize() int

MinSize returns minimum size of the node group.

func (*Asg) Nodes

func (asg *Asg) Nodes() ([]cloudprovider.Instance, error)

Nodes returns a list of all nodes that belong to this node group.

func (*Asg) RegionId

func (asg *Asg) RegionId() string

RegionId returns regionId of asg

func (*Asg) TargetSize

func (asg *Asg) TargetSize() (int, error)

TargetSize returns the current TARGET size of the node group. It is possible that the number is different from the number of nodes registered in Kubernetes.

func (*Asg) TemplateNodeInfo

func (asg *Asg) TemplateNodeInfo() (*framework.NodeInfo, error)

TemplateNodeInfo returns a node template for this node group.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL