tencentcloud

package
v0.0.0-...-2d37aee Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 18, 2024 License: Apache-2.0 Imports: 29 Imported by: 0

README

Cluster Autoscaler on TencentCloud

On TencentCloud, Cluster Autoscaler utilizes CVM Auto Scaling Groups to manage node groups. Cluster Autoscaler typically runs as a Deployment in your cluster.

Requirements

Cluster Autoscaler requires TKE v1.10.x or greater.

Permissions

CAM Policy

The following policy provides the minimum privileges necessary for Cluster Autoscaler to run:

{
    "version": "2.0",
    "statement": [
        {
            "effect": "allow",
            "action": [
                "as:ModifyAutoScalingGroup",
                "as:RemoveInstances",
                "as:DescribeAutoScalingGroups",
                "as:DescribeAutoScalingInstances",
                "as:DescribeLaunchConfigurations",
                "as:DescribeAutoScalingActivities",
                "cvm:DescribeZones",
                "cvm:DescribeInstanceTypeConfigs",
                "vpc:DescribeSubnets"
            ],
            "resource": [
                "*"
            ]
        }
    ]
}
Using TencentCloud Credentials

NOTICE: Make sure the access key you will be using has all the above permissions

apiVersion: v1
kind: Secret
metadata:
  name: tencentcloud-secret
type: Opaque
data:
  tencentcloud_secret_id: BASE64_OF_YOUR_TENCENTCLOUD_SECRET_ID
  tencentcloud_secret_key: BASE64_OF_YOUR_TENCENTCLOUD_SECRET_KEY

Please refer to the relevant Kubernetes documentation for creating a secret manually.

env:
  - name: SECRET_ID
    valueFrom:
      secretKeyRef:
        name: tencentcloud-secret
        key: tencentcloud_secret_id
  - name: SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: tencentcloud-secret
        key: tencentcloud_secret_key
  - name: REGION
    value: YOUR_TENCENCLOUD_REGION
  - name: CLUSTER_ID
    value: YOUR_TKE_CLUSTER_ID

Setup

cluster-autoscaler deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      qcloud-app: cluster-autoscaler
  template:
    metadata:
      labels:
        qcloud-app: cluster-autoscaler
    spec:
      containers:
      - args:
        - --cloud-provider=tencentcloud
        - --v=4
        - --ok-total-unready-count=3
        - --cloud-config=/etc/kubernetes/qcloud.conf
        - --scale-down-utilization-threshold=0.8
        - --scale-down-enabled=true
        - --max-total-unready-percentage=33
        - --nodes=[min]:[max]:[ASG_ID]
        - --logtostderr
        - --kubeconfig=/kubeconfig/config
        command:
        - /cluster-autoscaler
        env:
        - name: SECRET_ID
          valueFrom:
            secretKeyRef:
              name: tencentcloud-secret
              key: tencentcloud_secret_id
        - name: SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: tencentcloud-secret
              key: tencentcloud_secret_key
        - name: REGION
          value: YOUR_TENCENCLOUD_REGION
        image: ccr.ccs.tencentyun.com/tkeimages/cluster-autoscaler:v1.18.4-49692187a
        imagePullPolicy: Always
        name: cluster-autoscaler
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
          requests:
            cpu: 250m
            memory: 256Mi
        volumeMounts:
        - mountPath: /etc/localtime
          name: tz-config
      hostAliases:
      - hostnames:
        - as.tencentcloudapi.com
        - cvm.tencentcloudapi.com
        - vpc.tencentcloudapi.com
        ip: 169.254.0.95
      restartPolicy: Always
      serviceAccount: kube-admin
      serviceAccountName: kube-admin
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
      volumes:
      - hostPath:
          path: /etc/localtime
          type: ""
        name: tz-config
Scaling up from 0 nodes

When scaling up from 0 nodes, the Cluster Autoscaler reads ASG tags to derive information about the specifications of the nodes i.e labels and taints in that ASG. Note that it does not actually apply these labels or taints - this is done by an AWS generated user data script. It gives the Cluster Autoscaler information about whether pending pods will be able to be scheduled should a new node be spun up for a particular ASG with the asumption the ASG tags accurately reflect the labels/taint actually applied.

The following is only required if scaling up from 0 nodes. The Cluster Autoscaler will require the label tag on the ASG should a deployment have a NodeSelector, else no scaling will occur as the Cluster Autoscaler does not realise the ASG has that particular label. The tag is of the format k8s.io/cluster-autoscaler/node-template/label/<label-name>: <label-value> or tencentcloud:<label-name>: <label-value> is the name of the label and the value of each tag specifies the label value.

Example tags:

  • k8s.io/cluster-autoscaler/node-template/label/foo: bar
  • tencentcloud:foo:bar

The following is only required if scaling up from 0 nodes. The Cluster Autoscaler will require the taint tag on the ASG, else tainted nodes may get spun up that cannot actually have the pending pods run on it. The tag is of the format k8s.io/cluster-autoscaler/node-template/taint/<taint-name>:<taint-value:taint-effect> is the name of the taint and the value of each tag specifies the taint value and effect with the format <taint-value>:<taint-effect>.

Example tags:

  • k8s.io/cluster-autoscaler/node-template/taint/dedicated: true:NoSchedule

From version 1.14, Cluster Autoscaler can also determine the resources provided by each Auto Scaling Group via tags. The tag is of the format k8s.io/cluster-autoscaler/node-template/resources/<resource-name>. <resource-name> is the name of the resource, such as ephemeral-storage. The value of each tag specifies the amount of resource provided. The units are identical to the units used in the resources field of a Pod specification.

Example tags:

  • k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage: 100G

Documentation

Index

Constants

View Source
const (
	// ProviderName is the cloud provider name for Tencentcloud
	ProviderName = "tencentcloud"

	// GPULabel is the label added to nodes with GPU resource.
	GPULabel = "cloud.tencent.com/tke-accelerator"
)
View Source
const (
	ASHttpEndpoint  = "as.tencentcloudapi.com"
	VPCHttpEndpoint = "vpc.tencentcloudapi.com"
	CVMHttpEndpoint = "cvm.tencentcloudapi.com"
)

Tencent Cloud Service Http Endpoint

View Source
const LabelAutoScalingGroupID = "cloud.tencent.com/auto-scaling-group-id"

LabelAutoScalingGroupID represents the label of AutoScalingGroup

Variables

This section is empty.

Functions

func BuildTencentCloudProvider

func BuildTencentCloudProvider(tencentcloudManager TencentcloudManager, discoveryOpts cloudprovider.NodeGroupDiscoveryOptions, resourceLimiter *cloudprovider.ResourceLimiter) (cloudprovider.CloudProvider, error)

BuildTencentCloudProvider builds CloudProvider implementation for Tencentcloud.

func BuildTencentcloud

BuildTencentcloud returns tencentcloud provider

Types

type Asg

type Asg interface {
	cloudprovider.NodeGroup

	TencentcloudRef() TcRef
	GetScalingType() string
	SetScalingType(string)
}

Asg implements NodeGroup interface.

type CloudConfig

type CloudConfig struct {
	Region string `json:"region"`
}

CloudConfig represent tencentcloud configuration

type CloudService

type CloudService interface {
	// FetchAsgInstances returns instances of the specified ASG.
	FetchAsgInstances(TcRef) ([]cloudprovider.Instance, error)
	// DeleteInstances remove instances of specified ASG.
	DeleteInstances(Asg, []string) error
	// GetAsgRefByInstanceRef returns asgRef according to instanceRef
	GetAsgRefByInstanceRef(TcRef) (*TcRef, error)
	// GetAutoScalingGroups queries and returns a set of ASG.
	GetAutoScalingGroups([]string) ([]as.AutoScalingGroup, error)
	// GetAutoscalingConfigs queries and returns a set of ASG launchconfiguration.
	GetAutoscalingConfigs([]string) ([]as.LaunchConfiguration, error)
	// GetAutoScalingGroups queries and returns a set of ASG.
	GetAutoScalingGroup(TcRef) (*as.AutoScalingGroup, error)
	// ResizeAsg set the target size of ASG.
	ResizeAsg(TcRef, uint64) error
	// GetAutoScalingInstances returns instances of specific ASG.
	GetAutoScalingInstances(TcRef) ([]*as.Instance, error)
	// GetTencentcloudInstanceRef returns a Tencentcloud ref.
	GetTencentcloudInstanceRef(*as.Instance) (*TcRef, error)
	// GetInstanceInfoByType queries the number of CPU, memory, and GPU resources of the model configured for generating template
	GetInstanceInfoByType(string) (*InstanceInfo, error)
	// GetZoneBySubnetID return zone by subnetID.
	GetZoneBySubnetID(string) (string, error)
	// GetZoneInfo invokes cvm.DescribeZones to query zone information.
	GetZoneInfo(string) (*cvm.ZoneInfo, error)
}

CloudService is used for communicating with Tencentcloud API.

func NewCloudService

func NewCloudService(cvmClient, vpcClient, asClient client.Client) CloudService

NewCloudService creates an instance of caching CloudServiceImpl

type CloudServiceImpl

type CloudServiceImpl struct {
	// contains filtered or unexported fields
}

CloudServiceImpl provides several utility methods over the auto-scaling cloudService provided by Tencentcloud SDK

func (*CloudServiceImpl) DeleteInstances

func (ts *CloudServiceImpl) DeleteInstances(asg Asg, instances []string) error

DeleteInstances remove instances of specified ASG.

func (*CloudServiceImpl) FetchAsgInstances

func (ts *CloudServiceImpl) FetchAsgInstances(asgRef TcRef) ([]cloudprovider.Instance, error)

FetchAsgInstances returns instances of the specified ASG.

func (*CloudServiceImpl) GetAsgRefByInstanceRef

func (ts *CloudServiceImpl) GetAsgRefByInstanceRef(instanceRef TcRef) (*TcRef, error)

GetAsgRefByInstanceRef returns asgRef according to instanceRef

func (*CloudServiceImpl) GetAutoScalingGroup

func (ts *CloudServiceImpl) GetAutoScalingGroup(asgRef TcRef) (*as.AutoScalingGroup, error)

GetAutoScalingGroup returns the specific ASG.

func (*CloudServiceImpl) GetAutoScalingGroups

func (ts *CloudServiceImpl) GetAutoScalingGroups(asgIds []string) ([]as.AutoScalingGroup, error)

GetAutoScalingGroups queries and returns a set of ASG.

func (*CloudServiceImpl) GetAutoScalingInstances

func (ts *CloudServiceImpl) GetAutoScalingInstances(asgRef TcRef) ([]*as.Instance, error)

GetAutoScalingInstances returns instances of specific ASG.

func (*CloudServiceImpl) GetAutoscalingConfigs

func (ts *CloudServiceImpl) GetAutoscalingConfigs(ascs []string) ([]as.LaunchConfiguration, error)

GetAutoscalingConfigs queries and returns a set of ASG launchconfiguration.

func (*CloudServiceImpl) GetInstanceInfoByType

func (ts *CloudServiceImpl) GetInstanceInfoByType(instanceType string) (*InstanceInfo, error)

GetInstanceInfoByType queries the number of CPU, memory, and GPU resources of the model configured for generating template

func (*CloudServiceImpl) GetTencentcloudInstanceRef

func (ts *CloudServiceImpl) GetTencentcloudInstanceRef(instance *as.Instance) (*TcRef, error)

GetTencentcloudInstanceRef returns a Tencentcloud ref.

func (*CloudServiceImpl) GetZoneBySubnetID

func (ts *CloudServiceImpl) GetZoneBySubnetID(subnetID string) (string, error)

GetZoneBySubnetID 查询子网的所属可用区

func (*CloudServiceImpl) GetZoneInfo

func (ts *CloudServiceImpl) GetZoneInfo(zone string) (*cvm.ZoneInfo, error)

GetZoneInfo invokes cvm.DescribeZones to query zone information. zoneInfo will be cache.

func (*CloudServiceImpl) ResizeAsg

func (ts *CloudServiceImpl) ResizeAsg(ref TcRef, size uint64) error

ResizeAsg set the target size of ASG.

type InstanceInfo

type InstanceInfo struct {
	CPU            int64
	Memory         int64
	GPU            int64
	InstanceFamily string
	InstanceType   string
}

InstanceInfo represents CVM's detail

type InstanceTemplate

type InstanceTemplate struct {
	InstanceType string
	Region       string
	Zone         string
	Cpu          int64
	Mem          int64
	Gpu          int64

	Tags []*as.Tag
}

InstanceTemplate represents CVM template

type NetworkExtendedResources

type NetworkExtendedResources struct {
	TKERouteENIIP int64
	TKEDirectENI  int64
}

NetworkExtendedResources represents network extended resources

type SubnetInfo

type SubnetInfo struct {
	SubnetID string
	Zone     string
	ZoneID   int
}

SubnetInfo represents subnet's detail

type TcRef

type TcRef struct {
	ID   string
	Zone string
}

TcRef contains a reference to some entity in Tencentcloud/TKE world.

func TcRefFromProviderID

func TcRefFromProviderID(id string) (TcRef, error)

TcRefFromProviderID creates InstanceConfig object from provider id which must be in format: qcloud:///100003/ins-3ven36lk

func (TcRef) String

func (ref TcRef) String() string

func (TcRef) ToProviderID

func (ref TcRef) ToProviderID() string

ToProviderID converts tcRef to string in format used as ProviderId in Node object.

type TencentcloudCache

type TencentcloudCache struct {
	// contains filtered or unexported fields
}

TencentcloudCache is used for caching cluster resources state.

It is needed to: - keep track of autoscaled ASGs in the cluster, - keep track of instances and which ASG they belong to, - limit repetitive Tencentcloud API calls.

Cached resources: 1) ASG configuration, 2) instance->ASG mapping, 3) resource limits (self-imposed quotas), 4) instance types.

How it works: - asgs (1), resource limits (3) and machine types (4) are only stored in this cache, not updated by it. - instanceRefToAsgRef (2) is based on registered asgs (1). For each asg, its instances are fetched from Tencentcloud API using cloudService. - instanceRefToAsgRef (2) is NOT updated automatically when asgs field (1) is updated. Calling RegenerateInstancesCache is required to sync it with registered asgs.

func NewTencentcloudCache

func NewTencentcloudCache(service CloudService) *TencentcloudCache

NewTencentcloudCache create a empty TencentcloudCache

func (*TencentcloudCache) FindForInstance

func (tc *TencentcloudCache) FindForInstance(instanceRef TcRef) (Asg, error)

FindForInstance returns Asg of the given Instance

func (*TencentcloudCache) GetAsgInstanceTemplate

func (tc *TencentcloudCache) GetAsgInstanceTemplate(ref TcRef) (*InstanceTemplate, bool)

GetAsgInstanceTemplate returns the cached InstanceTemplate for a Asg TcRef

func (*TencentcloudCache) GetAsgTargetSize

func (tc *TencentcloudCache) GetAsgTargetSize(ref TcRef) (int64, bool)

GetAsgTargetSize returns the cached targetSize for a TencentcloudRef

func (*TencentcloudCache) GetAsgs

func (tc *TencentcloudCache) GetAsgs() []Asg

GetAsgs returns a copy of asgs list.

func (*TencentcloudCache) GetInstanceType

func (tc *TencentcloudCache) GetInstanceType(ref TcRef) string

GetInstanceType returns asg instanceType

func (*TencentcloudCache) GetResourceLimiter

func (tc *TencentcloudCache) GetResourceLimiter() (*cloudprovider.ResourceLimiter, error)

GetResourceLimiter returns resource limiter.

func (*TencentcloudCache) InvalidateAllAsgTargetSizes

func (tc *TencentcloudCache) InvalidateAllAsgTargetSizes()

InvalidateAllAsgTargetSizes clears the target size cache

func (*TencentcloudCache) InvalidateAsgTargetSize

func (tc *TencentcloudCache) InvalidateAsgTargetSize(ref TcRef)

InvalidateAsgTargetSize clears the target size cache

func (*TencentcloudCache) RegenerateAutoScalingGroupCache

func (tc *TencentcloudCache) RegenerateAutoScalingGroupCache() error

RegenerateAutoScalingGroupCache add some tencentcloud asg property

func (*TencentcloudCache) RegenerateInstanceCacheForAsg

func (tc *TencentcloudCache) RegenerateInstanceCacheForAsg(asgRef TcRef) error

RegenerateInstanceCacheForAsg triggers instances cache regeneration for single ASG under lock.

func (*TencentcloudCache) RegenerateInstancesCache

func (tc *TencentcloudCache) RegenerateInstancesCache() error

RegenerateInstancesCache triggers instances cache regeneration under lock.

func (*TencentcloudCache) RegisterAsg

func (tc *TencentcloudCache) RegisterAsg(newAsg Asg) bool

RegisterAsg registers asg in Tencentcloud Manager.

func (*TencentcloudCache) SetAsgInstanceTemplate

func (tc *TencentcloudCache) SetAsgInstanceTemplate(ref TcRef, instanceTemplate *InstanceTemplate)

SetAsgInstanceTemplate sets InstanceTemplate for a Asg TcRef

func (*TencentcloudCache) SetAsgTargetSize

func (tc *TencentcloudCache) SetAsgTargetSize(ref TcRef, size int64)

SetAsgTargetSize sets targetSize for a TencentcloudRef

func (*TencentcloudCache) SetResourceLimiter

func (tc *TencentcloudCache) SetResourceLimiter(resourceLimiter *cloudprovider.ResourceLimiter)

SetResourceLimiter sets resource limiter.

func (*TencentcloudCache) UnregisterAsg

func (tc *TencentcloudCache) UnregisterAsg(toBeRemoved Asg) bool

UnregisterAsg returns true if the node group has been removed, and false if it was already missing from cache.

type TencentcloudManager

type TencentcloudManager interface {
	// Refresh triggers refresh of cached resources.
	Refresh() error
	// Cleanup cleans up open resources before the cloud provider is destroyed, i.e. go routines etc.
	Cleanup() error

	RegisterAsg(asg Asg)
	// GetAsgs returns list of registered Asgs.
	GetAsgs() []Asg
	// GetAsgNodes returns Asg nodes.
	GetAsgNodes(Asg Asg) ([]cloudprovider.Instance, error)
	// GetAsgForInstance returns Asg to which the given instance belongs.
	GetAsgForInstance(instance TcRef) (Asg, error)
	// GetAsgTemplateNode returns a template node for Asg.
	GetAsgTemplateNode(Asg Asg) (*apiv1.Node, error)
	// GetResourceLimiter returns resource limiter.
	GetResourceLimiter() (*cloudprovider.ResourceLimiter, error)
	// GetAsgSize gets Asg size.
	GetAsgSize(Asg Asg) (int64, error)

	// SetAsgSize sets Asg size.
	SetAsgSize(Asg Asg, size int64) error
	// DeleteInstances deletes the given instances. All instances must be controlled by the same Asg.
	DeleteInstances(instances []TcRef) error
}

TencentcloudManager is handles tencentcloud communication and data caching.

func CreateTencentcloudManager

func CreateTencentcloudManager(discoveryOpts cloudprovider.NodeGroupDiscoveryOptions, regional bool) (TencentcloudManager, error)

CreateTencentcloudManager constructs tencentcloudManager object.

Directories

Path Synopsis
tencentcloud-sdk-go
as
cvm
vpc

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL