README ¶
Typesense Kubernetes Operator
The Typesense Kubernetes Operator is designed to manage the deployment and lifecycle of Typesense clusters within Kubernetes environments. The operator is developed in Go using Operator SDK Framework, an open source toolkit to manage Kubernetes native applications, called Operators, in an effective, automated, and scalable way.
Description
Key features of Typesense Kubernetes Operator include:
- Custom Resource Management: Provides a Kubernetes-native interface to define and manage Typesense cluster configurations using a CRD named
TypesenseCluster
. - Typesense Lifecycle Automation: Simplifies deploying, scaling, and managing Typesense clusters. Handles aspects like:
- bootstrap Typesense's Admin API Key creation as a
Secret
, - deploy Typesense as a
StatefulSet
, - provision Typesense services (headless & discovery
Services
), - actively discover and update Typesense's nodes list (quorum configuration mounted as
ConfigMap
), - place claims for Typesense
PersistentVolumes
- optionally expose Typesense API endpoint via an
Ingress
- optionally provision one or multiple instances (one per target URL) of DocSearch as
Cronjobs
- bootstrap Typesense's Admin API Key creation as a
- Raft Quorum Configuration & Recovery Automation:
- Continuous active (re)discovery of the quorum configuration reacting to changes in
ReplicaSet
without the need of an additional sidecar container, - Automatic recovery of a cluster that has lost quorum without the need of manual intervention.
- Continuous active (re)discovery of the quorum configuration reacting to changes in
Background
Typesense is using raft in the background to establish its clusters. Raft is a consensus algorithm based on the paper "Raft: In Search of an Understandable Consensus Algorithm".
Raft nodes operate in one of three possible states: follower, candidate, or leader. Every new node always joins the quorum as a follower. Followers can receive log entries from the leader and participate in voting for electing a leader. If no log entries are received for a specified period of time, a follower transitions to the candidate state. As a candidate, the node can accept votes from its peers nodes. Upon receiving a majority of votes, the candidate is becoming the leader of the quorum. The leader’s responsibilities include handling new log entries and replicating them to other nodes.
Another thing to consider is what happens when the node set changes, when nodes join or leave the cluster.
If a quorum of nodes is available, raft can dynamically modify the node set without any issue (this happens every 30sec).
But if the cluster cannot form a quorum, then problems start to appear or better to pile up. A cluster with N
nodes can tolerate
a failure of at most (N-1)/2
nodes without losing its quorum. If the available nodes go below this threshold then two events
are taking place:
- raft declares the whole cluster as unavailable (no leader can be elected, no more log entries can be processed)
- the remaining nodes are restarted in bootstrap mode
In a Kubernetes environment, the nodes are actually Pods
which are rather volatile by nature and their lifetime is quite ephemeral and subjects
to potential restarts, and that puts the whole concept of raft protocol consensus under a tough spot. As we can read in the official
documentation of Typesense when it comes to recovering a cluster that has lost quorum,
it is explicitly stated:
If a Typesense cluster loses more than
(N-1)/2
nodes at the same time, the cluster becomes unstable because it loses quorum and the remaining node(s) cannot safely build consensus on which node is the leader. To avoid a potential split brain issue, Typesense then stops accepting writes and reads until some manual verification and intervention is done.
[!NOTE] Illustration's been taken from Free Gophers Pack
In production environments, manual intervention is sometimes impossible or undesirable, and downtime for a service like Typesense may be unacceptable. The Typesense Kubernetes Operator addresses both of these challenges.
Problem 1: Quorum reconfiguration
The Typesense Kubernetes Operator manages the entire lifecycle of Typesense Clusters within Kubernetes:
- A random token is generated and stored as a base64-encoded value in a new
Secret
. This token serves as the Admin API key for bootstrapping the Typesense cluster.
[!NOTE] You can alternative provide your own
Secret
by setting the value ofadminApiKey
inTypesenseCluster
specs; this will be used instead. The data key name has to be alwaystypesense-api-key
!apiVersion: v1 kind: Secret metadata: name: typesense-common-bootstrap-key type: Opaque data: typesense-api-key: SXdpVG9CcnFYTHZYeTJNMG1TS1hPaGt0dlFUY3VWUloxc1M5REtsRUNtMFFwQU93R1hoanVIVWJLQnE2ejdlSQ==
- A
ConfigMap
is created, containing the endpoints of the cluster nodes as a single concatenated string in itsdata
field. During each reconciliation loop, the operator identifies any changes in endpoints and updates theConfigMap
. ThisConfigMap
is mounted in everyPod
at the path where raft expects the quorum configuration, ensuring quorum configuration stays always updated. The Fully Qualified Domain Name (FQDN) for each endpoint of the headless service adheres to the following naming convention:
{cluster-name}-sts-{pod-index}.{cluster-name}-sts-svc.{namespace}.svc.cluster.local:{peering-port}:{api-port}
[!IMPORTANT] This completely eliminates the need for a sidecar to translate the endpoints of the headless
Service
intoPod
IP addresses. The FQDN of the endpoints automatically resolves to the new IP addresses, and raft will begin contacting these endpoints within its 30-second polling interval.
- Next, the reconciler creates a headless
Service
required for theStatefulSet
, along with a standard Kubernetes Service of typeClusterIP
. The latter exposes the REST/API endpoints of the Typesense cluster to external systems. - A
StatefulSet
is then created. The quorum configuration stored in theConfigMap
is mounted as a volume in eachPod
under/usr/share/typesense/nodelist
. NoPod
restart is necessary when theConfigMap
changes, as raft automatically detects and applies the updates. - Optionally, an nginx:alpine workload is provisioned as
Deployment
and published via anIngress
, in order to exposed safely the Typesense REST/API endpoint outside the Kubernetes cluster only to selected referrers. The configuration of the nginx workload is stored in aConfigMap
. - Optionally, one or more instances of DocSearch are deployed as distinct
CronJobs
(one per scraping target URL), which based on user-defined schedules, periodically scrape the target sites and store the results in Typesense.
[!NOTE] The interval between reconciliation loops depends on the number of nodes. This approach ensures raft has sufficient "breathing room" to carry out its operations—such as leader election, log replication, and bootstrapping—before the next quorum health reconciliation begins.
- The controller assesses the quorum's health by probing each node at
http://{nodeUrl}:{api-port}/health
. Based on the results, it formulates an action plan for the next reconciliation loop. This process is detailed in the following section:
Problem 2: Recovering a cluster that has lost quorum
During configuration changes, we cannot switch directly from the old configuration to the next, because conflicting
majorities could arise. When that happens, no leader can be elected and eventually raft declares the whole cluster
as unavailable which leaves it in a hot loop. One way to solve it, is to force the cluster downgrade to a single instance
cluster and then gradually introduce new nodes (by scaling up the StatefulSet
). With that approach we avoid the need
of manual intervention in order to recover a cluster that has lost quorum.
[!IMPORTANT] Scaling the
StatefulSet
down and subsequently up, would typically be the manual intervention needed to recover a cluster that has lost its quorum. However, the controller automates this process, as long as is not a memory or disk capacity issue, ensuring no service interruption and eliminating the need for any administration action.
Left Path:
- The quorum reconciler probes each cluster node at
http://{nodeUrl}:{api-port}/health
. If every node responds with{ ok: true }
, theConditionReady
status of theTypesenseCluster
custom resource is updated toQuorumReady
, indicating that the cluster is fully healthy and operational. -
- If the cluster size matches the desired size defined in the
TypesenseCluster
custom resource (and was not downgraded during a previous loop—this scenario will be discussed later), the quorum reconciliation loop sets theConditionReady
status of theTypesenseCluster
custom resource toQuorumReady
, exits, and hands control back to the main controller loop. - If the cluster was downgraded to a single instance during a previous reconciliation loop, the quorum reconciliation loop
sets the
ConditionReady
status of theTypesenseCluster
custom resource toQuorumUpgraded
. It then returns control to the main controller loop, which will attempt to restore the cluster to the desired size defined in theTypesenseCluster
custom resource during the next reconciliation loop. Raft will then identify the new quorum configuration and elect a new leader. - If a node runs out of memory or disk, the health endpoint response will include an additional
resource_error
field, set to eitherOUT_OF_MEMORY
orOUT_OF_DISK
, depending on the issue. In this case, the quorum reconciler marks theConditionReady
status of theTypesenseCluster
asQuorumNeedsIntervention
, triggers a KubernetesEvent
, and returns control to the main controller loop. In this scenario, manual intervention is required. You must adjust the resources in thePodSpec
or the storage in thePersistentVolumeClaim
of theStatefulSet
to provide new memory limits or increased storage size. This can be done by modifying and re-applying the correspondingTypesenseCluster
manifest.
- If the cluster size matches the desired size defined in the
Right Path:
- The quorum reconciler probes each node of the cluster at http://{nodeUrl}:{api-port}/health.
- If the required number of nodes (at least
(N-1)/2
) return{ ok: true }
, theConditionReady
status of theTypesenseCluster
custom resource is set toQuorumReady
, indicating that the cluster is healthy and operational, even if some nodes are unavailable. Control is then returned to the main controller loop. - If the required number of nodes (at least
(N-1)/2
) return{ ok: false }
, theConditionReady
status of theTypesenseCluster
custom resource is set toQuorumDowngrade
, marking the cluster as unhealthy. As part of the mitigation plan, the cluster is scheduled for a downgrade to a single instance, with the intent to allow raft to automatically recover the quorum. The quorum reconciliation loop then returns control to the main controller loop. - In the next quorum reconciliation, the process will take the Left Path, that will eventually discover a healthy quorum,
nevertheless with the wrong amount of nodes; thing that will lead to setting the
ConditionReady
condition of theTypesenseCluster
asQuorumUpgraded
. What happens next is already described in the Left Path.
- If the required number of nodes (at least
Custom Resource Definitions
TypesenseCluster
Typesense Kubernetes Operator is controlling the lifecycle of multiple Typesense instances in the same Kubernetes cluster by
introducing TypesenseCluster
, a new Custom Resource Definition:
Spec
Name | Description | Optional | Default |
---|---|---|---|
image | Typesense image | ||
adminApiKey | Reference to the Secret to be used for bootstrap |
X | |
replicas | Size of the cluster | 1 | |
apiPort | REST/API port | 8108 | |
peeringPort | Peering port | 8107 | |
resetPeersOnError | automatic reset of peers on error | true | |
corsDomains | domains that would be allowed for CORS calls | X | |
storage | check StorageSpec below |
||
ingress | check IngressSpec below |
X | |
scrapers | array of DocSearchScraperSpec ; check below |
X |
StorageSpec (optional)
Name | Description | Optional | Default |
---|---|---|---|
size | Size of the underlying PV |
X | 100Mi |
storageClassName | StorageClass to be used |
standard |
IngressSpec (optional)
Name | Description | Optional | Default |
---|---|---|---|
referer | FQDN allowed to access reverse proxy | X | |
host | Ingress Host | ||
clusterIssuer | cert-manager ClusterIssuer |
||
ingressClassName | Ingress to be used | ||
annotations | User-Defined annotations | X |
[!IMPORTANT] This feature requires the existence of cert-manager in the cluster, but does not actively enforce it with an error. If you are targeting Open Telekom Cloud, you might be interested in provisioning additionally the designated DNS solver webhook for Open Telekom Cloud. You can find it here.
DocSearchScraperSpec (optional)
Name | Description | Optional | Default |
---|---|---|---|
name | name of the scraper | ||
image | container image to use | ||
config | config to use | ||
schedule | cron expression; no timezone; no seconds |
[!CAUTION] Although in Typesense documentation under Production Best Practices -> Configuration is stated: "Typesense comes built-in with a high performance HTTP server (opens new window)that is used by likes of Fastly (opens new window)in their edge servers at scale. So Typesense can be directly exposed to incoming public-facing internet traffic, without the need to place it behind another web server like Nginx / Apache or your backend API."
It is highly recommended, from this operator's perspective, to always expose Typesense behind a reverse proxy (using the
referer
option).
Status
Condition | Value | Reason | Description |
---|---|---|---|
ConditionReady | true | QuorumReady | Cluster is Operational |
false | QuorumNotReady | Cluster is not Operational | |
false | QuorumDegraded | Cluster is not Operational; Scheduled to Single-Instance | |
false | QuorumUpgraded | Cluster is Operational; Scheduled to Original Size | |
false | QuorumNeedsIntervention | Cluster is not Operational; Administrative Action Required |
Getting Started
You’ll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster.
Note: Your controller will automatically use the current context in your kubeconfig file (i.e. whatever cluster kubectl cluster-info
shows).
Deploy Using Helm
If you are deploying on a production environment, it is highly recommended to deploy the controller to the cluster using a Helm chart from its repo:
helm repo add typesense-operator https://akyriako.github.io/typesense-operator/
helm repo update
helm upgrade --install typesense-operator typesense-operator/typesense-operator -n typesense-system --create-namespace
Running on the cluster
Deploy from Sources
- Build and push your image to the location specified by
IMG
:
make docker-build docker-push IMG=<some-registry>/typesense-operator:<tag>
- Deploy the controller to the cluster with the image specified by
IMG
:
make deploy IMG=<some-registry>/typesense-operator:<tag>
- Install Instances of Custom Resources:
Provision one of the samples available in config/samples
:
Suffix | Description | CSI Driver | Storage Class |
---|---|---|---|
Generic | standard | ||
azure | Microsoft Azure | disk.csi.azure.com | managed-csi |
aws | AWS | ebs.csi.aws.com | gp2 |
opentelekomcloud | Open Telekom Cloud | disk.csi.everest.io obs.csi.everest.io |
csi-disk csi-obs |
bm | Bare Metal | democratic-csi (iscsi/nfs) | iscsi nfs |
kind | KiND | rancher.io/local-path |
kubectl apply -f config/samples/ts_v1alpha1_typesensecluster_{{Suffix}}.yaml
e.g. for Open Telekom Cloud it would look like:
apiVersion: v1
kind: Secret
metadata:
name: typesense-bootstrap-key
type: Opaque
data:
typesense-api-key: SXdpVG9CcnFYTHZYeTJNMG1TS1hPaGt0dlFUY3VWUloxc1M5REtsRUNtMFFwQU93R1hoanVIVWJLQnE2ejdlSQ==
---
apiVersion: ts.opentelekomcloud.com/v1alpha1
kind: TypesenseCluster
metadata:
name: cluster-1
spec:
image: typesense/typesense:27.1
replicas: 3
storage:
size: 100Mi
storageClassName: csi-disk
ingress:
host: ts.example.de
ingressClassName: nginx
clusterIssuer: opentelekomcloud-letsencrypt
adminApiKey:
name: typesense-common-bootstrap-key
scrapers:
- name: docusaurus-example-com
image: typesense/docsearch-scraper:0.11.0
config: "{\"index_name\":\"docusaurus-example\",\"start_urls\":[\"https://docusaurus.example.com/\"],\"sitemap_urls\":[\"https://docusaurus.example.com/sitemap.xml\"],\"sitemap_alternate_links\":true,\"stop_urls\":[\"/tests\"],\"selectors\":{\"lvl0\":{\"selector\":\"(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]\",\"type\":\"xpath\",\"global\":true,\"default_value\":\"Documentation\"},\"lvl1\":\"header h1\",\"lvl2\":\"article h2\",\"lvl3\":\"article h3\",\"lvl4\":\"article h4\",\"lvl5\":\"article h5, article td:first-child\",\"lvl6\":\"article h6\",\"text\":\"article p, article li, article td:last-child\"},\"strip_chars\":\" .,;:#\",\"custom_settings\":{\"separatorsToIndex\":\"_\",\"attributesForFaceting\":[\"language\",\"version\",\"type\",\"docusaurus_tag\"],\"attributesToRetrieve\":[\"hierarchy\",\"content\",\"anchor\",\"url\",\"url_without_anchor\",\"type\"]},\"conversation_id\":[\"833762294\"],\"nb_hits\":46250}"
schedule: '*/2 * * * *'
Uninstall CRDs
To delete the CRDs from the cluster:
make uninstall
Undeploy controller
UnDeploy the controller from the cluster:
make undeploy
How it works
This project aims to follow the Kubernetes Operator pattern.
It uses Controllers, which provide a reconcile function responsible for synchronizing resources until the desired state is reached on the cluster.
Test It Out
- Install the CRDs into the cluster:
make install
- Run your controller (this will run in the foreground, so switch to a new terminal if you want to leave it running):
make run
NOTE: You can also run this in one step by running: make install run
Modifying the API definitions
If you are editing the API definitions, generate the manifests such as CRs or CRDs using:
make generate && make manifests
NOTE: Run make --help
for more information on all potential make
targets
More information can be found via the Kubebuilder Documentation
License
Copyright 2023.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Directories ¶
Path | Synopsis |
---|---|
api
|
|
v1alpha1
Package v1alpha1 contains API Schema definitions for the ts v1alpha1 API group +kubebuilder:object:generate=true +groupName=ts.opentelekomcloud.com
|
Package v1alpha1 contains API Schema definitions for the ts v1alpha1 API group +kubebuilder:object:generate=true +groupName=ts.opentelekomcloud.com |
internal
|
|
test
|
|