--- title: "Use Ansible to manage the QoS of your OpenShift workload" date: 2019-02-06T00:00:00+02:00 opensource: - OpenShift - Ansible --- As I was administering my OpenShift cluster, I found out that I had a too much memory requests. To preserve a good quality of service on my cluster, I had to tacle this issue. Resource requests and limits in OpenShift (and Kubernetes in general) are the concepts that helps define the quality of service of every running Pod. Resource requests can target memory, CPU or both. When a Pod has a resource request (memory, CPU or both), it is guaranted to receives those resources and when it has a resource limit, it is cannot overconsume those resources. Based on the requests and limits, OpenShift divides the workload into three classes of Quality of Service: Guaranteed, Burstable and Best Effort. When the requests are equal to the limits, the Pod has a "Guaranteed" QoS. When the requests are less than the limits, the Pod has a "Burstable" QoS. And when no requests and no limits are set, the Pod has a "Best Effort" QoS. All of this is true when there are enough resources for every running Pods. But as soon as a resource shortage happens, OpenShift will start to throttle CPU or kill Pods if there is no more memory. It does so by first killing the Pods that have the "Best Effort" QoS, if the situation does not improve, it continues with Pods that have the "Burstable" QoS. Since the Kubernetes Scheduler used the requests and limits to schedule Pods, you should not run into a situation where "Guaranteed" Pods needs to be killed (hopefully). **So, you definitely don't want to have all your eggs (Pods) in the same basket (class of QoS)!** Back to the original issue, I needed to find out which Pod were part of the Burstable or Guaranteed QoS class and lower the less critical ones to the Best Effort class. I settled for an Ansible playbook to help me fix this. The first step was discovering which Pods were part of the Burstable or Guaranteed QoS class. And since most Pods are created from a `Deployment`, `DeploymentConfig` or `StatefulSet`, I had to find out which of those objects had a `requests` or `limits` field in it. This first task has been accomplished very easily with a first playbook: ```yaml - name: List all DeploymentConfig having a request or limit set hosts: localhost gather_facts: no tasks: - name: Get a list of all DeploymentConfig on our OpenShift cluster command: oc get dc -o json --all-namespaces register: oc_get_dc changed_when: false - block: - debug: var: to_update vars: all_objects: '{{ (oc_get_dc.stdout|from_json)[''items''] }}' to_update: '{{ all_objects|json_query(json_query) }}' json_query: > [? spec.template.spec.containers[].resources.requests || spec.template.spec.containers[].resources.limits ].{ name: metadata.name, namespace: metadata.namespace, kind: kind } ``` If you run it with `ansible-playbook /path/to/playbook.yaml` you will get a list of all `DeploymentConfig` having requests or limits set: ```raw PLAY [List all DeploymentConfig having a request or limit set] *********************************************** TASK [Get a list of all DeploymentConfig on our OpenShift cluster] ******************************************* ok: [localhost] TASK [debug] ************************************************************************************************* ok: [localhost] => { "to_update": [ { "kind": "DeploymentConfig", "name": "router", "namespace": "default" }, { "kind": "DeploymentConfig", "name": "docker-registry", "namespace": "default" }, ... PLAY RECAP *************************************************************************************************** localhost : ok=2 changed=0 unreachable=0 failed=0 ``` I completed the playbook to also find out the `Deployment` and `StatefulSet` objects having requests or limits set. ```raw tasks: [...] - name: Get a list of all Deployment on our OpenShift cluster command: oc get deploy -o json --all-namespaces register: oc_get_deploy changed_when: false - name: Get a list of all StatefulSet on our OpenShift cluster command: oc get sts -o json --all-namespaces register: oc_get_sts changed_when: false - block: [...] vars: all_objects: > {{ (oc_get_dc.stdout|from_json)['items'] }} + {{ (oc_get_deploy.stdout|from_json)['items'] }} + {{ (oc_get_sts.stdout|from_json)['items'] }} ``` And last but not least, I added a call to the `oc set resources` command in order to bring back those objects to the Best Effort QoS class. ```raw [...] - block: [...] - debug: msg: 'Will update {{ to_update|length }} objects' - pause: prompt: 'Proceed ?' - name: Change the QoS class to "Best Effort" command: > oc set resources {{ obj.kind }} {{ obj.name }} -n {{ obj.namespace }} --requests=cpu=0,memory=0 --limits=cpu=0,memory=0 loop: '{{ to_update }}' loop_control: loop_var: obj ``` Since I do not want all Pods to have the Best Effort QoS class, I added a blacklist of critical namespaces that should not be touched. ```raw - name: Change the QoS class of commodity projects hosts: localhost gather_facts: no vars: namespace_blacklist: - default - openshift-sdn - openshift-monitoring - openshift-console - openshift-web-console tasks: [...] - name: Change the QoS class to "Best Effort" command: > oc set resources {{ obj.kind }} {{ obj.name }} -n {{ obj.namespace }} --requests=cpu=0,memory=0 --limits=cpu=0,memory=0 loop: '{{ to_update }}' loop_control: loop_var: obj when: obj.namespace not in namespace_blacklist ``` You can find the complete playbook [here](change-qos.yaml). Of course, it is very rough and would need to more work to be used on a daily basis but for a single use this is sufficient.