cosmos-operator

command module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 28, 2022 License: Apache-2.0 Imports: 19 Imported by: 0

README

Cosmos Operator

Cosmos Operator manages custom resource definitions (CRDs) for full nodes (aka RPC nodes) and eventually validator nodes for blockchains created with the Cosmos SDK.

The long-term vision of the Operator is to allow you to "configure it and forget it".

CosmosFullNode

The CosmosFullNode creates a highly available, fault-tolerant full node deployment.

The CosmosFullNode controller acts like a hybrid between a StatefulSet and a Deployment. Like a StatefulSet, each pod has a corresponding persistent volume to manage blockchain state and data. But, you can also configure rolling updates similar to a Deployment.

Additionally, because full node persistent data can be destroyed and recreated with little consequence, the controller will clean up PVCs which is different from StatefulSets which never delete PVCs. Deleting a CosmosFullNode also cleans up PVCs.

Validators?

Coming soon!

Release Process

Releases should follow https://0ver.org.

  1. Create and push a git tag on branch main. git tag v0.X.X && git push --tags
  2. Triggers CICD action to build and push docker image to ghcr.
  3. When complete, view the docker image in packages.

Best Practices

Resource Names

If you plan to have multiple network environments in the same cluster or namespace, append the network name and any other identifying information.

Example:

apiVersion: cosmos.strange.love/v1
kind: CosmosFullNode
metadata:
  name: cosmoshub-mainnet-fullnode
spec:
  chain:
    network: mainnet # Should align with metadata.name above.

Like a StatefulSet, the Operator uses the .metadata.name of the CosmosFullNode to name resources it creates and manages.

Volumes, PVCs and StorageClass

Generally, Volumes are bound to a single Availability Zone (AZ). Therefore, use or define a StorageClass which has volumeBindingMode: WaitForFirstConsumer. This way, kubernetes will not provision the volume until there is a pod ready to bind to it.

If you do not configure volumeBindingMode to wait, you risk the scheduler ignoring pod topology rules such as Affinity. For example, in GKE, volumes will be provisioned in random zones.

The Operator cannot define a StorageClass for you. Instead, you must configure the CRD with a pre-existing StorageClass.

Cloud providers generally provide default StorageClasses for you. Some of them set volumeBindingMode: WaitForFirstConsumer such as GKE's premium-rwo.

kubectl get storageclass

Additionally, Cosmos nodes require heavy disk IO. Therefore, choose a faster StorageClass such as GKE's premium-rwo.

Resizing Volumes

The StorageClass must support resizing. Most cloud providers (like GKE) support it.

To resize, update resources in the CRD like so:

resources:
  requests:
    storage: 100Gi # increase size here

You can only increase the storage (never decrease).

You must manually watch the PVC for a status of FileSystemResizePending. Then manually restart the pod associated with the PVC to complete resizing.

The above is a workaround; there is future work planned to allow the Operator to handle this scenario for you.

Updating Volumes

Most PVC fields are immutable (such as StorageClass), so once the Operator creates PVCs, immutable fields are not updated even if you change values in the CRD.

As mentioned in the above section, you can only update the storage size.

If you need to update an immutable field like the StorageClass, the workaround is to kubectl apply the CRD. Then manually delete PVCs and pods. The Operator will recreate them with the new configuration.

There is future work planned for the Operator to handle this scenario for you.

Pod Affinity

The Operator cannot assume your preferred topology. Therefore, set affinity appropriately to fit your use case.

E.g. To encourage the scheduler to spread pods across nodes:

template:
  affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/name
                    operator: In
                    values:
                      - <name of crd>
              topologyKey: kubernetes.io/hostname

Using Volume Snapshots

TODO: How to use snapscheduler to create and restore from a kubernetes volume snapshot.

Getting Started

Run these commands to setup your environment:

make tools

You’ll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster. Note: Your controller will automatically use the current context in your kubeconfig file (i.e. whatever cluster kubectl cluster-info shows).

Running a Prerelease on the Cluster

  1. Authenticate with docker to push images to repository.

Create a PAT on Github with package read and write permissions.

printenv GH_PAT | docker login ghcr.io -u <your GH username> --password-stdin 
  1. Deploy a prerelease.

Warning: Make sure you're kube context is set appropriately, so you don't install in the wrong cluster!

make deploy-prerelease

Uninstall CRDs

To delete the CRDs from the cluster:

make uninstall

Undeploy controller

UnDeploy the controller to the cluster:

make undeploy

Contributing

// TODO(user): Add detailed information on how you would like others to contribute to this project

How it works

This project aims to follow the Kubernetes Operator pattern

It uses Controllers which provides a reconcile function responsible for synchronizing resources untile the desired state is reached on the cluster

Test It Out

  1. Install the CRDs into the cluster:
make install
  1. Run your controller (this will run in the foreground, so switch to a new terminal if you want to leave it running):
make run

NOTE: You can also run this in one step by running: make install run

Modifying the API definitions

If you are editing the API definitions, generate the manifests such as CRs or CRDs using:

make manifests

NOTE: Run make --help for more information on all potential make targets

More information can be found via the Kubebuilder Documentation

License

Copyright 2022 Strangelove Ventures LLC.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
api
v1
Package v1 contains API Schema definitions for the cosmos v1 API group +kubebuilder:object:generate=true +groupName=cosmos.strange.love
Package v1 contains API Schema definitions for the cosmos v1 API group +kubebuilder:object:generate=true +groupName=cosmos.strange.love
internal/kube
Package kube contains utility types and methods for managing kubernetes state.
Package kube contains utility types and methods for managing kubernetes state.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL