Kubernetes Multi-Operator Stacks and Advanced Pod Scheduling

5 min readNov 7, 2019

Kubernetes Operators enable running third-party software natively on Kubernetes. Various Operators are being built today for a variety of softwares such as MySQL, Postgres, Cassandra, Kafka, Prometheus, Moodle, Wordpress, etc. Increasingly installing more than one Operators in a cluster is becoming common. In this blog post we present unique scheduling requirements that arise when using Kubernetes Custom Resources from multiple Operators to build platform stacks. We also provide suggestions to Operator developers on how to ensure that these requirements are satisfied.

Example Multi-Operator setup

In order to present the scheduling requirements that may occur when using multiple Operators together, we use a sample use-case for a basic Moodle platform stack on Kubernetes. Moodle is an eLearning software. It uses MySQL as its backend database. For simplicity we will use only two Operators in the discussion below — the Moodle Operator and the MySQL Operator. Moodle Operator defines Moodle Custom Resource and the MySQL Operator defines MysqlCluster Custom Resource. A Moodle stack is formed using a Moodle Custom Resource instance and a MysqlCluster Custom Resource instance. Respective Operators internally create Kubernetes’s built-in resources such as Pods, Services, etc. as part of instantiating its Custom Resource instances.

Custom Resource Pod Scheduling — requirements and solutions

1. Atomic deployments: We want to ensure ‘atomicity’ of Moodle stack deployments, i.e. either all Pods of a stack are provisioned or none are. This is important from Moodle service provider point of view as it ensures that there are no ‘half deployments’ — situations where only Moodle Pods are running without MysqlCluster Pods, or vice versa.

The way to achieve atomic deployments is by leveraging Kubernetes Pod resource requests and resource limits. Kubernetes provides mechanism of requests and limits for specifying the cpu and memory resource needs of a Pod’s containers. A Pod with request and limits specified for every container is given guaranteed Quality-of-Service (QoS) by the Kubernetes scheduler. A Pod in which only resource requests are specified for at least one container is given burstable QoS. A Pod with no requests/limits specified is given best effort QoS. In order to achieve atomic deployments, application developers should be able to specify resource requests and resource limits in all their Custom Resource Specs (in this case — Moodle and MysqlCluster). The Custom Controllers needs to be written to pass these values to the Pod Spec that they will create as part of instantiating the Custom Resource instance. Doing this will ensure that Pods for both the Custom Resources will be provided `guaranteed` quality-of-service. This will in turn make sure that Kubernetes will not remove one of the Pods of a Moodle stack, which it will otherwise do for burstable or best effort QoS Pods when the scheduler is under resource pressure.

2. Co-location: We want to co-locate all Pods of any Moodle stack on the same node. This requirement arises from the need to ensure that a Moodle stack is treated as a ‘unit’ and not get affected during node downtime. Without co-location, a Moodle stack may become ‘broken’ if, for example, the node on which Mysqlcluster Pods were running becomes down while the node on which Moodle Pods were running remains active.

There are two ways to achieve this requirement — (a) Node affinity rules / Node selector; (b) Labels and Pod affinity rules.

(a) Kubernetes provides mechanism of Pod Node Affinity. This mechanism enables identifying through labels the nodes on which a Pod should run. The way this is achieved is by providing a set of labels on the Pod Spec that are matched by the scheduler with the labels on the nodes when making the scheduling decision. In order to co-locate Moodle and MysqlCluster Pods on the same node, it should be possible to specify such Node selector/Node affinity labels through their Custom Resource Spec definitions that would then be put on the underlying Pods. The Custom Controllers would need to pass these labels to the Pods that they would create.

(b) Another way to co-locate a stack together on a node would be to use Labels on Pods and use Pod Affinity rules. Say, Custom Resource for MySQL provides way to pass labels to be added to its Pods and Moodle Custom Resource provides a way to specify Affinity rules. Then to co-locate Moodle and MySQL Pods, we could add a label to MySQL Custom Resource and specify Affinity rules with that label on Moodle Custom Resource.

3. Node Separation: We may want to ensure that two Moodle stacks are never co-located on the same node. This requirement arises from the need to maintain primary and backup Moodle stacks. Ensuring such stacks are run on different nodes helps providing the intended Moodle stack availability guarantees to the users.

This can be achieved by using Kubernetes Pod anti-affinity rules. Kubernetes provides mechanism of Pod Anti-Affinity, which enables defining Pod scheduling rules based on the labels of other Pods that are running on a node. You can use these rules to ensure that no two Moodle stacks get deployed on the same node. For this, an attribute will need to be provided in the Custom Resource Spec definition where such rules can be specified. Your Custom Controller will need to pass this information through to the Pod Spec when instantiating Custom Resource instances.

4. Dedicated Nodes: There might be a need to provide a dedicated node for some Moodle stack, for example, to support differentiated services to high priority customers.

This can be achieved by adding a taint to a node and then adding tolerations for that taint to the Custom Resource Specs. For this, Custom Resource Spec definitions need to provide a way to specify tolerations. Also, the Custom Controller needs to written to pass this toleration information to the Pods that it will instantiate. Note that using taints and tolerations is different than using Node affinity/nodeSelector as those mechanisms will not keep a node dedicated for a particular Moodle stack.

Summary

Kubernetes provides following constructs that collectively determine on which worker node a Pod will be scheduled.

Pod’s resource requests and limits (cpu and memory).
Pod Node Affinity
Pod Affinity
Pod Anti-Affinity
Taints and Tolerations

These constructs are used with a Pod’s Spec. However, when using Operators, Application developers are working with Custom Resources and not with Pods directly. Hence they cannot make use of the above constructs, unless the Custom Resource Spec definitions allow specifying them. Hence when developing your Kubernetes Operators it will be helpful for your users if you define your Custom Resource Specs such that they allow providing these scheduling cues to your Operator. This becomes even more important in multi-Operator environments to ensure that platform stacks in which your Custom Resources are used will be (i) treated as atomic deployment units; (ii) co-located on a node when needed; (iii) deployed on nodes different from other stacks when needed; and (iv) deployed on a dedicated node in situations that demand such exclusive deployments.

Conclusion

Kubernetes provides several advanced Pod scheduling constructs. As an Operator developer it is important for you to carefully evaluate these constructs when designing your Custom Resources and Custom Controllers. At CloudARK, we have developed guidelines to ensure consistency and usability of Operators in multi-Operator settings. Above mentioned points are included in our guidelines. Follow these guidelines when developing your Operators to ensure that they are ready to be deployed and used alongside other Operators in a cluster.

Kubernetes Multi-Operator Stacks and Advanced Pod Scheduling

Written by CloudARK