instance-per-pod

module

v0.1.0 Latest Latest Go to latest Published: Nov 27, 2019 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/AkihiroSuda/instance-per-pod

Links

Open Source Insights

README ¶

Instance-per-Pod Admission Webhook

Instance-per-Pod Admission Webhook (IPP) creates an IaaS instance per Kubernetes Pod to mitigate potential container breakout attacks.

Unlike Kata Containers, IPP can even mitigate CPU vulnerabilities when baremetal instances (e.g. EC2 i3.metal) are used.

Supported clusters

Tested on Google Kubernetes Engine (GKE), but any cluster with Cluster Autoscaler should work.

How it works

IPP Admission Webhook is implemented using Cluster Autoscaler, Tolerations, Node Affinity, and Pod Anti-Affinity.

See #2 for the design.

Getting started

Step 1

Create a GKE node pool with the following configuration:

Enable autoscaling. The minimum number of the nodes must be >= 1.
Add node label: "ipp" = "true"
Add node taint: "ipp" = "true" (NO_SCHEDULE mode)

If you choose to use other label and taint names, you need to modify the YAML in Step 2 accordingly.

Non-GKE clusters should work as well, but not tested.

Step 2

Install IPP Admission Webhook:

docker build -t $IMAGE . && docker push $IMAGE
./ipp.yaml.sh $IMAGE | kubectl apply -f -

You can review the YAML before running kubectl apply. Note that the YAML contains Secret resources.

Step 3

Create Pods with various ipp-class labels.

e.g.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: foo
  labels:
    app: foo
    ipp-class: class0
spec:
  selector:
    matchLabels:
      app: foo
  template:
    metadata:
      labels:
        app: foo
        ipp-class: class0
    spec:
      containers:
      - name: nginx
        image: nginx:alpine

IPP Admission Webhook automatically translates the Pod manifests as follows:

apiVersion: v1
kind: Pod
...
spec:
  tolerations:
  - effect: NoSchedule
    key: ipp
    operator: Equal
    value: "true"
...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: ipp
            operator: In
            values:
            - "true"
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: ipp-class
            operator: NotIn
            values:
            - class0
        topologyKey: kubernetes.io/hostname
...

Pods with different ipp-class label values are never colocated on the same node.

When the existing node set is not sufficient to satisfy the scheduling constraint, the Cluster Autoscaler automatically adds a node. On GKE, creating a node takes about a minute.

The cluster autoscaler also automatically remove idle nodes. On GKE, an idle node is removed when it has been idle for about 10 minutes.

Troubleshooting

If it doesn't work as expected, check the log from the IPP Admission Webhook:

kubectl logs -f --namespace=ipp-system deployments/ipp

Uninstall

kubectl delete mutatingwebhookconfiguration ipp
kubectl delete namespace ipp-system

Caveats

Best-effort

IPP Admission Webhook does not provide any guarantee for the actual Pod scheduling.

Scheduling overhead

The current implementation of IPP Admission Webhook is implemented using Pod Anti-Affinity, which doesn't really scale.

Unfortunately, the current implementation of the affinity predicate in scheduler is about 3 orders of magnitude slower than for all other predicates combined, and it makes CA hardly usable on big clusters. https://github.com/kubernetes/autoscaler/blob/6ab78a85e19d55bd9c0ff1cb9f9f588a46522d6e/cluster-autoscaler/FAQ.md#what-are-the-service-level-objectives-for-cluster-autoscaler

For large clusters, we should also support affinity-less mode, which would explicitly call the IaaS API for creating and removing dedicated IaaS instances. Acutally, an early release of IPP Admission Webhook (v0.0.1) was implemented like that.

Node label

We should use node-restriction.kubernetes.io/ prefixed label to prevent compromised nodes from modifying the label, but seems unsupported for GKE.

Directories ¶

Path	Synopsis
cmd
ipp
pkg
jsonpatch
mutator
webhook

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL