Dedicated Admin Operator
Summary
The Dedicated Admin Operator was created for the OpenShift Dedicated platform to manage permissions (via k8s RoleBindings) to all the projects/namespaces owned by the clients. The idea is to monitor the creation of new namespaces and add the proper permissions for the dedicated-admins, a group of local admin (not cluster admins) managed by the client.
It contains the following components:
- Namespace controller: watches for new namespaces and guarantees that the proper RoleBindings are assigned to them.
- RoleBinding controller: watches for rolebinding changes, if someone removes a dedicated admin RoleBinding, the controller adds it back
- Operator controller: watches the operator's namespace to install resources that cannot be installed by OLM (service and servicemonitor)
To avoid giving admin permission to infra/cluster-admin related namespaces, a blacklist is used to determine which namespaces should get the RoleBinding assignment.
Metrics
The Dedicated Admin Operator exposes the following Prometheus metrics:
- dedicated_admin_blacklisted: gauge of blacklisted namespaces
On OLM and Bundling
OLM can deploy a resource that is bundled in a CatalogSource. But OLM won't update it. And OLM won't delete it if it's removed in a future version of the CatalogSource. There are two other options for managing these resources. 1) controller in the operator code or 2) manage it externally. Option #1 is a lot of work, though it does cover the cases where a resource is deleted. Option #2 is less work but has a gap in that fixing a broken config requires external action.
In July 2019 a PR switched to option #2, relying on Hive to manage the resources via SelectorSyncSet. Hive will fix anything that breaks within 2 hours or a human can force it to sync by removing the related SyncSetInstance CR. It means no go code to manage the resources and a simpler deployment. This means ALL resources move out of the bundle.
We can go back to bundling in the future when OLM will manage bundled resources. It's causing pain now. Hive will reconcile resources moving forward.
Building
Dependencies
Makefile
The following make
targets are included.
- clean - remove any generated output
- build - run
docker build
- push - run
docker push
- gocheck - run
go vet
- gotest - run
go test
- gobuild - run
go build
- env - export useful env vars for use in other scripts
The following variables (with defaults) are available for overriding by the user of make
:
- OPERATOR_NAME - the name of the operator (dedicated-admin-operator)
- OPERATOR_NAMESPACE - the operator namespace (openshift-dedicated-admin)
- IMAGE_REGISTRY - target container registry (quay.io)
- IMAGE_REPOSITORY - target container repository ($USER)
- IMAGE_NAME - target image name ($OPERATOR_NAME)
- ALLOW_DIRTY_CHECKOUT - if a dirty local checkout is allowed (false)
Note that IMAGE_REPOSITORY
defaults to the current user's name. The default behavior of make build
and make push
will therefore be to create images in the user's namespace. Automation would override this to push to an organization like this:
IMAGE_REGISTRY=quay.io IMAGE_REPOSITORY=openshift-sre make build push
For local testing you might want to build with dirty checkouts. Keep in mind version is based on the number of commits and the latest git hash, so this is not desired for any officially published image and can cause issues for pulling latest images in some scenarios if tags (based on version) are reused.
ALLOW_DIRTY_CHECKOUT=true make build
Docker
The Dockerfile provided (in build/Dockerfile
) takes advantage of the multi-stage feature, so docker version >= 17.05 is required. See make build
OLM
OLM catalog source is not generated by this codebase, but the make env
target is created to support this process. See osd-operators [subject to rename / moving].
Testing, Manual
To test a new version of the operator in a cluster you need to:
- build a new image
- deploy the image to a registry that's available to the cluster
- deploy the updated operator to the cluster
- do validation
The following steps make some assumptions:
- you can push images to a repository in quay.io called
$USER
- you are logged into an OCP cluster with enough permissions to deploy the operator and resources
Furthermore, if you have installed this via OLM you'll need to remove it else OLM will replace your deployment:
# remove subscription and operatorgroup
oc -n openshift-dedicated-admin delete subscription dedicated-admin-operator
oc -n openshift-dedicated-admin delete operatorgroup dedicated-admin-operator
Build and deploy updated version of the operator for test purposes with the following:
export IMAGE_REGISTRY=quay.io
export IMAGE_REPOSITORY=$USER
# build & push (with dirty checkout)
ALLOW_DIRTY_CHECKOUT=true make build push
# create deployment with correct image
sed "s|\(^[ ]*image:\).*|\1 $IMAGE_REGISTRY/$IMAGE_REPOSITORY/dedicated-admin-operator:latest|" manifests/10-dedicated-admin-operator.Deployment.yaml > /tmp/dedicated-admin-operator.Deployment.yaml
# deploy operator
find manifests/ -name '*Namespace*.yaml' -exec oc replace -f {} \;
find manifests/ -name '*Role*.yaml' -exec oc replace -f {} \;
find manifests/ -name '*Service.yaml' -exec oc apply -f {} \;
find manifests/ -name '*ServiceMonitor.yaml' -exec oc replace -f {} \;
find manifests/ -name '*Prometheus*.yaml' -exec oc replace -f {} \;
oc replace -f /tmp/dedicated-admin-operator.Deployment.yaml
# cleanup
unset IMAGE_REGISTRY
unset IMAGE_REPOSITORY
rm -f /tmp/dedicated-admin-operator.Deployment.yaml
Controllers
Namespace Controller
Watch for the creation of new Namespaces
that are not part of the blacklist. When discovered create RoleBindings
in that namespace to the dedicated-admins
group for the following ClusterRoles
:
admin
dedicated-admins-project
RoleBinding Controller
Watch for the deletion of RoleBindings
owned by this operator. If a RoleBinding
owned by this operator is deleted it is recreated.
Operator Controller
OLM currently cannot support creation of arbitrary resources when an operator is installed. Therefore the following are created by this operator by this controller at startup:
Service
- for exposing metrics
ServiceMonitor
- for prometheus to scrape metrics from the Service
ClusterRoleBinding
- for dedicated-admins-cluster
ClusterRole
to dedicated-admins
Group
Note the creation of ClusterRoleBindings
is possible via a ClusterServiceVersion
CR, used to deploy an operator. But it can only have a ServiceAccount
as the subject. At this time you cannot create a ClusterRoleBinding
to other subjects in a ClusterServiceVersion
.
Create resources in the operator's Namespace