drupalSite-operator
Kubernetes operator that controls the main API of the Drupal service: the DrupalSite CRD.
For an introduction to the Operator pattern and how we use it, take a look at our presentation at Kubecon EU 2021!
This paper describes the use case served with the drupalsite-operator
.
Flip through it to get some context!
Drupal service architecture
The Drupal service is designed around the concept of the DrupalSite.
The deployment looks like this:
The architecture description explains in more detail.
CRDs
A DrupalSite
defines all the necessary info for the operator to instantiate a Drupal website, integrated with the CERN environment.
Example:
apiVersion: drupal.webservices.cern.ch/v1alpha1
kind: DrupalSite
metadata:
name: drupalsite-sample
spec:
# URL to request in the route.
# Recommended to set `<environmentName>-<projectname>.web.cern.ch`
# or `<projectname>.web.cern.ch` if this is the "live" site
siteUrl: mysite.web.cern.ch
# Generates the image tags. Changing this triggers the upgrade workflow.
version:
name: "v8.9-1"
releaseSpec: <see a sample in config/sample/...>
configuration:
# Name of the DrupalSite (in the same namespace) to clone from, typically the "live"/production website
cloneFrom: "<myproductionsite>"
# "standard", "critical" or "test"
qosClass: "standard"
databaseClass: "standard"
diskSize: "5Gi"
Controllers
The operator has three controllers/ reconcilers to perform different operations independently
DrupalSiteReconciler
- for ensuring the required Kubernetes resources for a given CR
DrupalSiteDBUpdateReconciler
- for performing database updates
SupportedDrupalVersionsReconciler
- for managing supported custom Drupal versions
Running the operator
Deployment
The operator is packaged with a helm chart.
However, we deploy CRDs separately. Both must be deployed for the operator to function.
In our infrastructure, we deploy the operator and its CRD with 2 separate ArgoCD Applications.
Configuration
When deploying the Helm chart, operator configuration is exposed as Helm values.
This reference is useful to run the operator locally.
cmdline arguments
argument |
example |
description |
sitebuilder-image |
gitlab-registry.cern.ch/drupal/paas/cern-drupal-distribution/site-builder |
The sitebuilder source image name |
php-fpm-exporter-image |
gitlab-registry.cern.ch/drupal/paas/php-fpm-prometheus-exporter:RELEASE.2021.06.02T09-41-38Z |
The php-fpm-exporter source image name |
velero-namespace |
openshift-cern-drupal |
The namespace of the Velero server to create backups |
webdav-image |
gitlab-registry.cern.ch/drupal/paas/sabredav/webdav:RELEASE-2021.10.07T13-46-43Z |
The webdav source image name |
parallel-thread-count |
5 |
The number of threads used by the main controller of DrupalSite Operator |
Configmaps for each QoS class
The operator configures each website according to its QoS class with configmaps.
It reads the configmaps from /tmp/runtime-config
.
In order to test locally, we must first copy them:
$ cp -r chart/drupalsite-operator/runtime-config/ /tmp/
Testing
This project uses envtest for basic integration tests by running a local control plane. The control plane spun up by envtest
, doesn't have any K8s controllers except for the controller it is testing. The tests for the drupalsite controller are located in controllers/drupalsite_controller_test.go.
To run these tests locally, use make test
This project was generated with the operator-sdk
and has been updated to operator-sdk-v1.3
.
Managing sites with failed updates
Sometimes, there can be site which can't be checked for pending database updates or can't run updates on the tables. In these cases, the Status
on the DrupalSite
will be depicted accordingly. Especially the DBUpdatesFailed
status will be set. In such cases, the site needs to be fixed manually by running the following steps.
- The site
Spec.Version
needs to be set back to Status.ReleaseID.Failsafe
value
- This will remove the failed statuses from the
DrupalSite
- Now, the site can be updated to a new version that won't cause the failures
- In case the failures repeat, the site has to be manually fixed & the failed status has to be manually removed for the reconciliation to complete successfully
- To ensure a site is fixed & has no errors, ensure the
Status.Release.Failsafe
and Status.Release.Current
values are the same, without any failed error statuses