README ¶
- PMEM-CSI for Kubernetes
PMEM-CSI for Kubernetes
About
Note: This is Alpha code and not production ready.
Intel PMEM-CSI is a storage driver for container orchestrators like Kubernetes. It makes local persistent memory (PMEM) available as a filesystem volume to container applications.
It can currently utilize non-volatile memory devices that can be controlled via the libndctl utility library. In this readme, we use persistent memory to refer to a non-volatile dual in-line memory module (NVDIMM).
The PMEM-CSI driver follows the CSI specification by listening for API requests and provisioning volumes accordingly.
Design
Architecture and Operation
The PMEM-CSI driver can operate in two different device modes: LVM and direct. This table contains an overview and comparison of those modes. There is more detailed modes explanation down in following paragraphs.
LVM | direct | |
---|---|---|
Main advantage | avoids free space fragmentation1 | simpler, somewhat faster, but free space may get fragmented1 |
What is served | LVM logical volume | pmem block device |
Region affinity2 | yes: one LVM volume group is created per region, and a volume has to be in one volume group | yes: namespace can belong to one region only |
Startup | two extra stages: pmem-ns-init (creates namespaces), vgm (creates volume groups) | no extra steps at startup |
Namespace modes | fsdax, sector mode3 namespaces pre-created as pools | namespace in required mode created directly, no need to pre-create pools |
Limiting space usage | can leave part of device unused during pools creation | no limits, creates namespaces on device until runs out of space |
Name field in namespace | Name gets set to 'pmem-csi' to achieve own vs. foreign marking | Name gets set to VolumeID, without attempting own vs. foreign marking |
Minimum volume size | 4 MB | 1 GB (see also alignment adjustment below) |
Alignment requirements | LVM creation aligns size up to next 4MB boundary | driver aligns size up to next alignment boundary. The default alignment step is 1 GB. Device(s) in interleaved mode will require larger minimum as size has to be at least one alignment step. The possibly bigger alignment step is calculated as interleave-set-size multiplied by 1 GB |
1 Fragmented free space state may develop via series of creation and deletion operations where the driver is no longer able to allocate a new namespace contiguously, although the free capacity (i.e. sum of available free section sizes) indicates the opposite. The PMEM-CSI driver is not capable of de-fragmenting the space or create allocation using combined smaller blocks. A simplified example: This state develops after steps: create 63 GB namespace, create 1 GB namespace, delete 63 GB namespace. The free capacity is 127 GB, but the driver fails to create a namespace bigger than 64 GB.
---------------------------------------------------------------------
| 63 GB free | 1GB used | 64 GB free |
---------------------------------------------------------------------
2 Region affinity means that all parts of a provisioned file system are physically located on device(s) that belong to same PMEM region. This is important on multi-socket systems where media access time may vary based on where the storage device(s) are physically attached.
3 fsdax, sector modes refer to the modes of NVDIMM namespaces. See Persistent Memory Programming for details
LVM device mode
The following diagram illustrates the operation in LVM device mode:
In LVM device mode PMEM-CSI driver uses LVM for logical volumes Management to avoid the risk of fragmentation. The LVM logical volumes are served to satisfy API requests. There is one volume group created per region, ensuring the region-affinity of served volumes.
The driver consists of three separate binaries that form two initialization stages and a third API-serving stage.
During startup, the driver scans persistent memory for regions and namespaces, and tries to create more namespaces using all or part (selectable via option) of the remaining available space. This first stage is performed by a separate entity pmem-ns-init.
The second stage of initialization arranges physical volumes provided by namespaces into LVM volume groups. This is performed by a separate binary pmem-vgm.
After two initialization stages, the third binary pmem-csi-driver starts serving CSI API requests.
Namespace modes in LVM device mode
The PMEM-CSI driver can pre-create namespaces in two modes, forming
corresponding LVM volume groups, to serve volumes based on fsdax
or
sector
(alias safe
) mode namespaces. The amount of space to be
used is determined using two options -useforfsdax
and
-useforsector
given to pmem-ns-init. These options specify an
integer presenting limit as percentage, which is applied separately in
each region. The default values are useforfsdax=100
and
useforsector=0
. A CSI request for volume can specify the namespace
mode using the driver-specific argument nsmode
which has a value of
either "fsdax" (default) or "sector". A volume provisioned in fsdax
mode will have the dax
option added to mount options.
Using limited amount of total space in LVM device mode
The PMEM-CSI driver can leave space on devices for others, and
recognize "own" namespaces. Leaving space for others can be achieved
by specifying lower-than-100 values to -useforfsdax
and/or
-useforsector
options. The distinction "own" vs. "foreign" is
implemented by setting the Name field in namespace to a static
string "pmem-csi" during namespace creation. When adding physical
volumes to volume groups, only those physical volumes that are based on
namespaces with the name "pmem-csi" are considered.
Direct device mode
The following diagram illustrates the operation in Direct device mode:
In direct device mode PMEM-CSI driver allocates namespaces directly from the storage device. This creates device space fragmentation risk, but reduces complexity and run-time overhead by avoiding additional device mapping layer. Direct mode also ensures the region-affinity of served volumes, because provisioned volume can belong to one region only.
In Direct mode, the two preparation stages used in LVM mode, are not needed.
Namespace modes in direct device mode
The PMEM-CSI driver creates a namespace directly in the mode which is asked by volume creation request, thus bypassing the complexity of pre-allocated pools that are used in LVM device mode.
Using limited amount of total space in direct device mode
In direct device mode, the driver does not attempt to limit space use. It also does not mark "own" namespaces. The Name field of a namespace gets value of the VolumeID.
Driver modes
The PMEM-CSI driver supports running in different modes, which can be controlled by passing one of the below options to the driver's '-mode' command line option. In each mode, it starts a different set of open source Remote Procedure Call (gRPC) servers on given driver endpoint(s).
-
Controller should run as a single instance in cluster level. When the driver is running in Controller mode, it forwards the pmem volume create/delete requests to the registered node controller servers running on the worker node. In this mode, the driver starts the following gRPC servers:
-
One Node instance should run on each worker node that has persistent memory devices installed. When the driver starts in such mode, it registers with the Controller driver running on a given -registryEndpoint. In this mode, the driver starts the following servers:
Driver Components
Identity Server
This gRPC server operates on a given endpoint in all driver modes and implements the CSI Identity interface.
Node Registry Server
When the PMEM-CSI driver runs in Controller mode, it starts a gRPC server on a given endpoint(-registryEndpoint) and serves the RegistryServer interface. The driver(s) running in Node mode can register themselves with node specific information such as node id, NodeControllerServer endpoint, and their available persistent memory capacity.
Master Controller Server
This gRPC server is started by the PMEM-CSI driver running in Controller mode and serves the Controller interface defined by the CSI specification. The server responds to CreateVolume(), DeleteVolume(), ControllerPublishVolume(), ControllerUnpublishVolume(), and ListVolumes() calls coming from external-provisioner and external-attacher sidecars. It forwards the publish and unpublish volume requests to the appropriate Node controller server running on a worker node that was registered with the driver.
Node Controller Server
This gRPC server is started by the PMEM-CSI driver running in Node mode and implements the ControllerPublishVolume and ControllerUnpublishVolume methods of the Controller service interface defined by the CSI specification. It serves the ControllerPublishVolume() and ControllerUnpublish() requests coming from the Master controller server and creates/deletes persistent memory devices.
Node Server
This gRPC server is started by the driver running in Node mode and implements the Node service interface defined in the CSI specification. It serves the NodeStageVolume(), NodeUnstageVolume(), NodePublishVolume(), and NodeUnpublishVolume() requests coming from the Container Orchestrator (CO).
Communication between components
The following diagram illustrates the communication channels between driver components:
Security
All PMEM-CSI specific communication shown in above section between Master Controller(RegistryServer, MasterControllerServer) and NodeControllers(NodeControllerServer) is protected by mutual TLS. Both client and server must identify themselves and the certificate they present must be trusted. The common name in each certificate is used to identify the different components. The following common names have a special meaning:
pmem-registry
is used by the RegistryServer.pmem-node-controller
is used by NodeControllerServers
The test/setup-ca-kubernetes.sh
script shows how to generate certificates signed by Kubernetes cluster
root Certificate Authority. And the provided deployment
files shows how to use the generated
certificates to setup the driver. The test cluster is setup using
certificates created by that script. The
test/setup-ca.sh
script also shows how to
generate self signed certificates. These are just examples,
administrators of a cluster must ensure that they choose key lengths
and algorithms of sufficient strength for their purposes and manage
certificate distribution.
A production deployment can improve upon that by using some other key delivery mechanism, like for example Vault.
Volume Persistency
In a typical CSI deployment, volumes are provided by a storage backend that is independent of a particular node. When a node goes offline, the volume can be mounted elsewhere. But PMEM volumes are local to node and thus can only be used on the node where they were created. This means the applications using PMEM volume cannot freely move between nodes. This limitation needs to be considered when designing and deploying applications that are to use local storage.
Below are the volume persistency models considered for implementation in PMEM-CSI to serve different application use cases:
-
Persistent volumes A volume gets created independently of the application, on some node where there is enough free space. Applications using such a volume are then forced to run on that node and cannot run when the node is down. Data is retained until the volume gets deleted.
-
Ephemeral volumes Each time an application starts to run on a node, a new volume is created for it on that node. When the application stops, the volume is deleted. The volume cannot be shared with other applications. Data on this volume is retained only while the application runs.
-
Cache volumes Volumes are pre-created on a certain set of nodes, each with its own local data. Applications are started on those nodes and then get to use the volume on their node. Data persists across application restarts. This is useful when the data is only cached information that can be discarded and reconstructed at any time and the application can reuse existing local data when restarting.
Volume | Kubernetes | PMEM-CSI | Limitations |
---|---|---|---|
Persistent | supported | supported | topology aware scheduling1 |
Ephemeral | in design | in design | topology aware scheduling1, resource constraints2 |
Cache | supported | supported | topology aware scheduling1 |
1 Topology aware scheduling ensures that an application runs on a node where the volume was created. For CSI-based drivers like PMEM-CSI, Kubernetes >= 1.13 is needed. On older Kubernetes releases, pods must be scheduled manually onto the right node(s).
2 The upstream design for ephemeral volumes currently does not take resource constraints into account. If an application gets scheduled onto a node and then creating the ephemeral volume on that node fails, the application on the node cannot start until resources become available.
Usage on Kubernetes
Kubernetes cluster administrators can expose above mentioned volume
persistency types to applications using
StorageClass Parameters
. An
optional persistencyModel
parameter differentiates how the
provisioned volume can be used.
-
if no
persistencyModel
parameter specified inStorageClass
then it is treated as normal Kubernetes persistent volume. In this case PMEM-CSI creates PMEM volume on a node and the application that claims to use this volume is supposed to be scheduled onto this node by Kubernetes. Choosing of node is depend on StorageClassvolumeBindingMode
. In case ofvolumeBindingMode: Immediate
PMEM-CSI chooses a node randomly, and in case ofvolumeBindingMode: WaitForFirstConsumer
(also known as late binding) Kubernetes first chooses a node for scheduling the application, and PMEM-CSI creates the volume on that node. Applications which claim a normal persistent volume has to useReadOnlyOnce
access mode in itsaccessModes
list. This diagram illustrates how a normal persistent volume gets provisioned in Kubernetes using PMEM-CSI driver. -
persistencyModel: cache
Volumes of this type shall be used in combination withvolumeBindingMode: Immediate
. In this case, PMEM-CSI creates a set of PMEM volumes each volume on different node. The number of PMEM volumes to create can be specified bycacheSize
StorageClass parameter. Applications which claim acache
volume can useReadWriteMany
in itsaccessModes
list. Check with provided cache StorageClass example. This diagram illustrates how a cache volume gets provisioned in Kubernetes using PMEM-CSI driver.
NOTE: Cache volumes are associated with a node, not a pod. Multiple pods using the same cache volume on the same node will not get their own instance but will end up sharing the same PMEM volume instead. Application deployment has to consider this and use available Kubernetes mechanisms like node anti-affinity. Check with the provided cache application example.
WARNING: late binding (volumeBindingMode:WaitForFirstConsume
) has some caveats:
- Kubernetes does not consider available PMEM capacity on a node while scheduling the application. As a result, Kubernetes might select a node that does not have enough free PMEM space. In this case, volume creation fails and the pod is stuck until enough free space becomes available.
- Late binding only works reliably on Kubernetes >=1.14. The external-provisioner v1.0.1
for Kubernetes 1.13 lacks the
--strict-topology
flag and might allow the PMEM-CSI driver to allocate the volume on a node that is not the one where the pod is about to start. When that happens, the pod is permanently stuck. - A node is only chosen the first time a pod starts. After that it will always restart on that node, because that is where the persistent volume was created.
Prerequisites
Software required
Building of Docker images has been verified using Docker-ce versions 18.06.1, 19.03.0. The recommended mimimum Linux kernel version for running the PMEM-CSI driver is 4.15. See Persistent Memory Programming for more details about supported kernel versions.
Hardware required
Persistent memory device(s) are required for operation. However, some development and testing can be done using QEMU-emulated persistent memory devices. See the "QEMU and Kubernetes" section for the commands that create such a virtual test cluster.
Persistent memory pre-provisioning
The PMEM-CSI driver needs pre-provisioned regions on the NVDIMM device(s). The PMEM-CSI driver itself intentionally leaves that to the administrator who then can decide how much and how PMEM is to be used for PMEM-CSI.
Beware that the PMEM-CSI driver will run without errors on a node where PMEM was not prepared for it. It will then report zero local storage for that node, something that currently is only visible in the log files.
When running the Kubernetes cluster and PMEM-CSI on bare metal, the ipmctl utility can be used to create regions. App Direct Mode has two configuration options - interleaved or non-interleaved. One region per each NVDIMM is created in non-interleaved configuration. In such a configuration, a PMEM-CSI volume cannot be larger than one NVDIMM.
Example of creating regions without interleaving, using all NVDIMMs:
# ipmctl create -goal PersistentMemoryType=AppDirectNotInterleaved
Alternatively, multiple NVDIMMs can be combined to form an interleaved set. This causes the data to be striped over multiple NVDIMM devices for improved read/write performance and allowing one region (also, PMEM-CSI volume) to be larger than single NVDIMM.
Example of creating regions in interleaved mode, using all NVDIMMs:
# ipmctl create -goal PersistentMemoryType=AppDirect
When running inside virtual machines, each virtual machine typically
already gets access to one region and ipmctl
is not needed inside
the virtual machine. Instead, that region must be made available for
use with PMEM-CSI because when the virtual machine comes up for the
first time, the entire region is already allocated for use as a single
block device:
# ndctl list -RN
{
"regions":[
{
"dev":"region0",
"size":34357641216,
"available_size":0,
"max_available_extent":0,
"type":"pmem",
"persistence_domain":"unknown",
"namespaces":[
{
"dev":"namespace0.0",
"mode":"raw",
"size":34357641216,
"sector_size":512,
"blockdev":"pmem0"
}
]
}
]
}
# ls -l /dev/pmem*
brw-rw---- 1 root disk 259, 0 Jun 4 16:41 /dev/pmem0
Labels must be initialized in such a region, which must be performed once after the first boot:
# ndctl disable-region region0
disabled 1 region
# ndctl init-labels nmem0
initialized 1 nmem
# ndctl enable-region region0
enabled 1 region
# ndctl list -RN
[
{
"dev":"region0",
"size":34357641216,
"available_size":34357641216,
"max_available_extent":34357641216,
"type":"pmem",
"iset_id":10248187106440278,
"persistence_domain":"unknown"
}
]
# ls -l /dev/pmem*
ls: cannot access '/dev/pmem*': No such file or directory
Supported Kubernetes versions
PMEM-CSI driver implements CSI specification version 1.0.0, which only supported by Kubernetes versions >= v1.13. The driver deployment in Kubernetes cluster has been verified on:
Branch | Kubernetes branch/version | Required alpha feature gates |
---|---|---|
devel | Kubernetes 1.13 | CSINodeInfo, CSIDriverRegistry |
devel | Kubernetes 1.14 | |
devel | Kubernetes 1.15 |
Setup
Get source code
Use these commands:
mkdir -p $GOPATH/src/github.com/intel
git clone https://github.com/intel/pmem-csi $GOPATH/src/github.com/intel/pmem-csi
Build PMEM-CSI
-
Use
make build-images
to produce Docker container images. -
Use
make push-images
to push Docker container images to a Docker image registry. The default is to push to a local Docker registry. Some other registry can be configured by setting the variables described in in the test-config.sh file, see the configuration options section below. Alternatively, the registry can also be set with a make variable:make push-images REGISTRY_NAME=my-registry:5000
See the Makefile for additional make targets and possible make variables.
Run PMEM-CSI on Kubernetes
This section assumes that a Kubernetes cluster is already available with at least one node that has persistent memory device(s). For development or testing, it is also possible to use a cluster that runs on QEMU virtual machines, see the "QEMU and Kubernetes" section below.
- Make sure that the alpha feature gates CSINodeInfo and CSIDriverRegistry are enabled
The method to configure alpha feature gates may vary, depending on the Kubernetes deployment. It may not be necessary anymore when the feature has reached beta state, which depends on the Kubernetes version.
- Label the cluster nodes that provide persistent memory device(s)
$ kubectl label node <your node> storage=pmem
- Install add-on storage CRDs if using Kubernetes 1.13
If you are not using the test cluster described in Starting and stopping a test cluster where CRDs are installed automatically, you must install those manually. Kubernetes 1.14 and higher have those APIs built in and thus don't need these CRDs.
$ kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.13/cluster/addons/storage-crds/csidriver.yaml
$ kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.13/cluster/addons/storage-crds/csinodeinfo.yaml
- Set up certificates
Certificates are required as explained in Security.
If you are not using the test cluster described in
Starting and stopping a test cluster
where certificates are created automatically, you must set up certificates manually.
This can be done by running the ./test/setup-ca-kubernetes.sh
script for your cluster.
This script requires "cfssl" tools which can be downloaded.
These are the steps for manual set-up of certificates:
- Download cfssl tools
$ curl -L https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -o _work/bin/cfssl --create-dirs
$ curl -L https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -o _work/bin/cfssljson --create-dirs
$ chmod a+x _work/bin/cfssl _work/bin/cfssljson
- Run certificates set-up script
$ KUBCONFIG="<<your cluster kubeconfig path>> PATH="$PATH:$PWD/_work/bin" ./test/setup-ca-kubernetes.sh
- Deploy the driver to Kubernetes
The deploy/kubernetes-<kubernetes version>
directory contains
pmem-csi*.yaml
files which can be used to deploy the driver on that
Kubernetes version. The files in the directory with the highest
Kubernetes version might also work for more recent Kubernetes
releases. All of these deployments use images published by Intel on
Docker Hub.
For each Kubernetes version, four different deployment variants are provided:
direct
orlvm
: one uses direct device mode, the other LVM device mode.testing
: the variants withtesting
in the name enable debugging features and shouldn't be used in production.
For example, to deploy for production with LVM device mode onto Kubernetes 1.14, use:
$ kubectl create -f deploy/kubernetes-1.14/pmem-csi-lvm.yaml
These variants were generated with
[kustomize](https://github.com/kubernetes-sigs/kustomize)
. It is
possible to customize these variants further with kustomize
, but one
has to use a version which supports the --load_restrictor none
parameter.
The Makefile
can be used to build a suitable kustomize
:
$ make kustomize # builds the binary and re-generates .yaml files
...
go get sigs.k8s.io/kustomize@e42933ec54ce9a65f65e125a1ccf482927f0e515 && \
go build -o /work/gopath/src/github.com/intel/pmem-csi/_work/kustomize-e42933ec54ce9a65f65e125a1ccf482927f0e515 sigs.k8s.io/kustomize
...
$ _work/kustomize build --load_restrictor none deploy/kustomize/kubernetes-1.14-lvm |
kubectl create -f -
- Wait until all pods reach 'Running' status
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pmem-csi-node-8kmxf 2/2 Running 0 3m15s
pmem-csi-node-bvx7m 2/2 Running 0 3m15s
pmem-csi-controller-0 2/2 Running 0 3m15s
pmem-csi-node-fbmpg 2/2 Running 0 3m15s
- Verify that the node labels have been configured correctly
$ kubectl get nodes --show-labels
The command output must indicate that every node with PMEM has these two labels:
pmem-csi.intel.com/node=<NODE-NAME>,storage=pmem
If storage=pmem is missing, label manually as described above. If pmem-csi.intel.com/node is missing, then double-check that the alpha feature gates are enabled, that the CSI driver is running on the node, and that the driver's log output doesn't contain errors.
- Define two storage classes using the driver
$ kubectl create -f deploy/kubernetes-<kubernetes version>/pmem-storageclass-ext4.yaml
$ kubectl create -f deploy/kubernetes-<kubernetes version>/pmem-storageclass-xfs.yaml
- Provision two pmem-csi volumes
$ kubectl create -f deploy/kubernetes-<kubernetes version>/pmem-pvc.yaml
- Verify two Persistent Volume Claims have 'Bound' status
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pmem-csi-pvc-ext4 Bound pvc-f70f7b36-6b36-11e9-bf09-deadbeef0100 4Gi RWO pmem-csi-sc-ext4 16s
pmem-csi-pvc-xfs Bound pvc-f7101fd2-6b36-11e9-bf09-deadbeef0100 4Gi RWO pmem-csi-sc-xfs 16s
- Start two applications requesting one provisioned volume each
$ kubectl create -f deploy/kubernetes-<kubernetes version>/pmem-app.yaml
These applications use storage: pmem in the nodeSelector list to ensure scheduling to a node supporting pmem device, and each requests a mount of a volume, one with ext4-format and another with xfs-format file system.
- Verify two application pods reach 'Running' status
$ kubectl get po my-csi-app-1 my-csi-app-2
NAME READY STATUS RESTARTS AGE
my-csi-app-1 1/1 Running 0 6m5s
NAME READY STATUS RESTARTS AGE
my-csi-app-2 1/1 Running 0 6m1s
- Check that applications have a pmem volume mounted with added dax option
$ kubectl exec my-csi-app-1 -- df /data
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/ndbus0region0fsdax/5ccaa889-551d-11e9-a584-928299ac4b17
4062912 16376 3820440 0% /data
$ kubectl exec my-csi-app-2 -- df /data
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/ndbus0region0fsdax/5cc9b19e-551d-11e9-a584-928299ac4b17
4184064 37264 4146800 1% /data
$ kubectl exec my-csi-app-1 -- mount |grep /data
/dev/ndbus0region0fsdax/5ccaa889-551d-11e9-a584-928299ac4b17 on /data type ext4 (rw,relatime,dax)
$ kubectl exec my-csi-app-2 -- mount |grep /data
/dev/ndbus0region0fsdax/5cc9b19e-551d-11e9-a584-928299ac4b17 on /data type xfs (rw,relatime,attr2,dax,inode64,noquota)
Note about using the sector mode
PMEM-CSI can create a namespace in the sector (alias safe) mode instead of default fsdax mode. See Persistent Memory Programming for more details about namespace modes. The main difference in PMEM-CSI context is that a sector-mode volume will not get 'dax' mount option. The deployment examples do not describe sector mode to keep the amount of combinations smaller. Here are the changes to be made to deploy volumes in sector mode:
- add
nsmode: "sector"
line in parameters section in the storageclass definition file pmem-storageclass-XYZ.yaml:
parameters:
csi.storage.k8s.io/fstype: ext4
eraseafter: "true"
nsmode: "sector" <-- add this
- Only if using LVM device mode: Modify pmem-ns-init options to create sector-mode pools in addition to fsdax-mode pools. Add
-useforfsdax
and-useforsector
options to pmem-ns-init arguments in pmem-csi-lvm.yaml: (select the percentage values that fit your needs)
initContainers:
- args:
- -v=3
- -useforfsdax=60 <-- add this
- -useforsector=40 <-- add this
command:
- /go/bin/pmem-ns-init
Automated testing
Unit testing and code quality
Use the make test
command.
QEMU and Kubernetes
E2E testing relies on a cluster running inside multiple QEMU virtual machines deployed by GoVM. The same cluster can also be used interactively when real hardware is not available.
This is known to work on a Linux development host system. The user must be allowed to use Docker.
KVM must be enabled and the user must be allowed to use it. Usually this
is done by adding the user to the kvm
group. The
"Install QEMU-KVM"
section in the Clear Linux documentation contains further information
about enabling KVM.
The clear-cloud
image is downloaded automatically. By default,
four different virtual machines are prepared. Each image is pre-configured
with its own hostname and with network.
The images will contain the latest Clear Linux OS and have the Kubernetes version supported by Clear Linux installed.
PMEM-CSI images must have been created and published in some Docker registry, as described earlier in build PMEM-CSI. In addition, that registry must be accessible from inside the cluster. That works for the default (a local registry in the build host) but may require setting additional configuration options for other scenarios.
Starting and stopping a test cluster
make start
will bring up a Kubernetes test cluster inside four QEMU
virtual machines.
The first node is the Kubernetes master without
persistent memory.
The other three nodes are worker nodes with one emulated 32GB NVDIMM each.
After the cluster has been formed, make start
adds storage=pmem
label
to the worker nodes and deploys the PMEM-CSI driver.
Once make start
completes, the cluster is ready for interactive use via
kubectl
inside the virtual machine. Alternatively, you can also
set KUBECONFIG
as shown at the end of the make start
output
and use kubectl
binary on the host running VMs.
When the cluster is already running, make start
will re-deploy the
PMEM-CSI driver without recreating the virtual machines. kubectl apply
is used for this, which may limit the kind of changes that can
be made on-the-fly.
Use make stop
to stop and remove the virtual machines.
make restart
can be used to cleanly reboot all virtual
machines. This is useful during development after a make push-images
to ensure that the cluster runs those rebuilt images.
Running commands on test cluster nodes over ssh
make start
generates ssh wrapper scripts _work/clear-govm/ssh.N
for each
test cluster node which are handy for running a single command or to
start an interactive shell. Examples:
_work/clear-govm/ssh.0 kubectl get pods
runs a kubectl command on
the master node.
_work/clear-govm/ssh.1
starts a shell on the first worker node.
Configuration options
Several aspects of the cluster and build setup can be configured by overriding
the settings in the test-config.sh file. See
that file for a description of all options. Options can be set as
environment variables of make start
on a case-by-case basis or
permanently by creating a file like test/test-config.d/my-config.sh
.
Multiple different clusters can be brought up in parallel by changing
the default clear-govm
cluster name via the CLUSTER
env variable.
For example, this invocation sets up a cluster using the non-default direct device mode:
TEST_DEVICEMODE=direct CLUSTER=clear-govm-direct make start
Running E2E tests
make test_e2e
will run csi-test
sanity
tests and some Kubernetes storage
tests
against the PMEM-CSI driver.
When ginkgo is installed, then it can be used to run individual tests and to control additional aspects of the test run. For example, to run just the E2E provisioning test (create PVC, write data in one pod, read it in another) in verbose mode:
$ KUBECONFIG=$(pwd)/_work/clear-govm/kube.config REPO_ROOT=$(pwd) ginkgo -v -focus=pmem-csi.*should.provision.storage.with.defaults ./test/e2e/
Nov 26 11:21:28.805: INFO: The --provider flag is not set. Treating as a conformance test. Some tests may not be run.
Running Suite: PMEM E2E suite
=============================
Random Seed: 1543227683 - Will randomize all specs
Will run 1 of 61 specs
Nov 26 11:21:28.812: INFO: checking config
Nov 26 11:21:28.812: INFO: >>> kubeConfig: /nvme/gopath/src/github.com/intel/pmem-csi/_work/clear-govm/kube.config
Nov 26 11:21:28.817: INFO: Waiting up to 30m0s for all (but 0) nodes to be schedulable
...
Ran 1 of 61 Specs in 58.465 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 60 Skipped
PASS
Ginkgo ran 1 suite in 1m3.850672246s
Test Suite Passed
It is also possible to run just the sanity tests until one of them fails:
$ REPO_ROOT=`pwd` ginkgo '-focus=sanity' -failFast ./test/e2e/
...
Application examples
Information about specific usages of PMEM-CSI are described in separate documents:
- Deploying a Redis cluster through the redis-operator using QEMU-emulated persistent memory devices (examples/redis-operator.md).
- Installing Kubernetes and PMEM-CSI on Google Cloud machines. (examples/gce.md).
Communication and contribution
Report a bug by filing a new issue.
Contribute by opening a pull request.
Learn about pull requests.
Reporting a Potential Security Vulnerability: If you have discovered potential security vulnerability in PMEM-CSI, please send an e-mail to secure@intel.com. For issues related to Intel Products, please visit Intel Security Center.
It is important to include the following details:
- The projects and versions affected
- Detailed description of the vulnerability
- Information on known exploits
Vulnerability information is extremely sensitive. Please encrypt all security vulnerability reports using our PGP key.
A member of the Intel Product Security Team will review your e-mail and contact you to collaborate on resolving the issue. For more information on how Intel works to resolve security issues, see: vulnerability handling guidelines.