alibabacloud-erdma-controller
Kubernetes controller for alibabacloud erdma resource
Description
Dynamic configure erdma devices on kubernetes nodes, and automatically inject erdma-accelerated networks for Kubernetes Pods.
Getting Started
Prerequisites
- helm
- Kubernetes Cluster With AlibabaCloud ECS Nodes
- AlibabaCloud Linux 3 with Kernel Version >= 5.10.134-17
To Deploy on the cluster
create & authorize ram role&policy
alibabacloud-erdma-controller need following permissions:
{
"Version": "1",
"Statement": [
{
"Action": [
"ecs:DescribeInstances",
"ecs:DescribeInstanceTypes",
"ecs:DescribeNetworkInterfaces",
"ecs:ModifyNetworkInterfaceAttribute",
"ecs:CreateNetworkInterface",
"ecs:AttachNetworkInterface"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}
prepare configuration
prepare a values.yaml file with the following content to authorize controller to access erdma API:
use rrsa authorization
credentials:
type: "oidc_role_arn"
serviceAccount:
annotations:
pod-identity.alibabacloud.com/role-name: {your-ram-role-name}
need config rrsa components on ACK, refer doc.
use access_key authorization
credentials:
type: "access_key"
accessKeyID: "{access key}"
accessKeySecret: "{access key secret}"
helm install:
helm install -f values.yaml --namespace kube-system alibaba-erdma-controller deploy/helm/
check status
check pods
kubectl get pod -n kube-system | grep erdma
check erdma devices
kubectl get erdmadevices
check device plugin
kubectl get node -o yaml | grep aliyun/erdma
Using ERDMA Accelerated Network
Pod Configurations to Enable ERDMA Accelerated Network
- add
aliyun/erdma
resource in pod spec # config erdma devices for pod
network.alibabacloud.com/erdma-smcr: "true"
# config smcr for pod, dynamicially replace tcp connection to erdma, need network.alibabacloud.com/erdma
enabled first.
Example:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: erdma
name: erdma
spec:
replicas: 1
selector:
matchLabels:
app: erdma
template:
metadata:
labels:
app: erdma
annotations:
network.alibabacloud.com/erdma-smcr: "true"
spec:
containers:
- command:
- sleep
- "360000"
image: registry.aliyuncs.com/wangbs/netdia:latest
name: erdma
resources:
limits:
aliyun/erdma: 1
To Uninstall
uninstall helm
helm -n kube-system uninstall alibaba-erdma-controller
Build
Build Controller
docker build --tag registry.aliyuncs.com/erdma/controller:latest --target controller .
Build Agent
docker build --tag registry.aliyuncs.com/erdma/agent:latest --target agent .
Build SMCR_INIT
docker build --tag registry.aliyuncs.com/erdma/smcr_init:latest --target smcr_init .
License
Copyright 2024.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.