mustafayildirim · November 5, 2024 17:29 · Apr 7, 2020 · Jan 14, 2020 · Jan 14, 2020
diff --git a/prom-k8s-request-limits.md b/prom-k8s-request-limits.md
@@ -16,6 +16,7 @@ It returns a number between 0 and 1 so format the left Y axis as `percent (0.0-1
 Note that we added some filtering here to get rid of some noise: `name!~".*prometheus.*", image!="", container_name!="POD"`. The `name!~".*prometheus.*"` is just because we aren't interested in the CPU usage of all the prometheus exporters running in our k8s cluster.
 
 ![Screen Shot 2019-04-24 at 10 58 31](https://user-images.githubusercontent.com/8859277/56646508-fe030000-667f-11e9-94ae-3c29b5a939ae.png)
+(Title on this image is wrong)
 
 # CPU: show as cores with request/limit lines
 
@@ -48,4 +49,59 @@ We then add 2 series overrides to hide the request and limit in the tooltip and
 
 The result looks like this:
 
-![Screen Shot 2020-01-14 at 10 05 20](https://user-images.githubusercontent.com/8859277/72329714-8796aa00-36b5-11ea-88ee-919736cc1f1d.png)
+![Screen Shot 2020-01-14 at 10 05 20](https://user-images.githubusercontent.com/8859277/72329714-8796aa00-36b5-11ea-88ee-919736cc1f1d.png)
+
+# Queries to show memory and CPU as percentage of both request and limit
+
+Percentage of CPU request:
+
+```
+round(
+  100 *
+    sum(
+      rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m])
+    ) by (pod, container_name, namespace, slave)
+      /
+    sum(
+      kube_pod_container_resource_requests_cpu_cores{container_name!="POD"}
+    ) by (pod, container_name, namespace, slave)
+)
+```
+
+Percentage of CPU limit:
+
+```
+round(
+  100 *
+    sum(
+      rate(container_cpu_usage_seconds_total{image!="", container_name!="POD"}[5m])
+    ) by (pod_name, container_name, namespace, slave)
+      /
+    sum(
+      container_spec_cpu_quota{image!="", container_name!="POD"} / container_spec_cpu_period{image!="", container_name!="POD"}
+    ) by (pod_name, container_name, namespace, slave)
+)
+```
+
+Percentage of memory request:
+
+```
+round(
+  100 *
+    sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (container, pod, namespace, slave)
+      /
+    sum(kube_pod_container_resource_requests_memory_bytes{container_name!="POD"} > 0) by (container, pod, namespace, slave)
+)
+```
+
+
+Percentage of memory limit:
+
+```
+round(
+  100 *
+    sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (container, pod_name, namespace, slave)
+      /
+    sum(container_spec_memory_limit_bytes{image!="", container_name!="POD"} > 0) by (container, pod_name, namespace, slave)
+)
+```
diff --git a/prom-k8s-request-limits.md b/prom-k8s-request-limits.md
@@ -1,4 +1,25 @@
-I have actually switched our Grafana dashboards since my [last comment](https://github.com/google/cadvisor/issues/2026#issuecomment-486134079). Since some applications have a small request and large limit (to save money), then just showing a percentage of the request is sometimes not useful. It also can go over 100% which looks weird.
+# CPU: percentage of limit
+
+A lot of people land when trying to find out how to calculate CPU usage metric correctly in prometheus, myself included! So I'll post what I eventually ended up using as I think it's still a little difficult trying to tie together all the snippets of info here and elsewhere.
+
+This is specific to k8s and containers that have CPU limits set.
+
+To show CPU usage as a percentage of the limit given to the container, this is the Prometheus query we used to create nice graphs in Grafana:
+
+```
+sum(rate(container_cpu_usage_seconds_total{name!~".*prometheus.*", image!="", container_name!="POD"}[5m])) by (pod_name, container_name) /
+sum(container_spec_cpu_quota{name!~".*prometheus.*", image!="", container_name!="POD"}/container_spec_cpu_period{name!~".*prometheus.*", image!="", container_name!="POD"}) by (pod_name, container_name)
+```
+
+It returns a number between 0 and 1 so format the left Y axis as `percent (0.0-1.0)` or multiply by 100 to get CPU usage percentage.
+
+Note that we added some filtering here to get rid of some noise: `name!~".*prometheus.*", image!="", container_name!="POD"`. The `name!~".*prometheus.*"` is just because we aren't interested in the CPU usage of all the prometheus exporters running in our k8s cluster.
+
+![Screen Shot 2019-04-24 at 10 58 31](https://user-images.githubusercontent.com/8859277/56646508-fe030000-667f-11e9-94ae-3c29b5a939ae.png)
+
+# CPU: show as cores with request/limit lines
+
+Since some applications have a small request and large limit (to save money) or have an HPA, then just showing a percentage of the limit is sometimes not useful.
 
 So what we do now is display the CPU usage in cores and then add a horizontal line for each of the request and limit. This shows more information and also shows the usage in the same metric that is used in k8s: CPU cores.
 

diff --git a/prom-k8s-request-limits.md b/prom-k8s-request-limits.md
@@ -0,0 +1,30 @@
+I have actually switched our Grafana dashboards since my [last comment](https://github.com/google/cadvisor/issues/2026#issuecomment-486134079). Since some applications have a small request and large limit (to save money), then just showing a percentage of the request is sometimes not useful. It also can go over 100% which looks weird.
+
+So what we do now is display the CPU usage in cores and then add a horizontal line for each of the request and limit. This shows more information and also shows the usage in the same metric that is used in k8s: CPU cores.
+
+#### CPU usage
+
+Legend: `{{container_name}} in {{pod_name}}`
+Query: `sum(rate(container_cpu_usage_seconds_total{pod_name=~"deployment-name-[^-]*-[^-]*$", image!="", container_name!="POD"}[5m])) by (pod_name, container_name)`
+
+#### CPU limit
+
+Legend: `limit`
+Query: `sum(kube_pod_container_resource_limits_cpu_cores{pod=~"deployment-name-[^-]*-[^-]*$"}) by (pod)`
+
+#### CPU request
+
+Legend: `request`
+Query: `sum(kube_pod_container_resource_requests_cpu_cores{pod=~"deployment-name-[^-]*-[^-]*$"}) by (pod)`
+
+You will need to edit these 3 queries for your environment so that only pods from a single deployment a returned, e.g. replace `deployment-name`.
+
+The pod request/limit metrics come from [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics).
+
+We then add 2 series overrides to hide the request and limit in the tooltip and legend:
+
+![Screen Shot 2020-01-13 at 17 05 03](https://user-images.githubusercontent.com/8859277/72271285-e6611280-3626-11ea-87f5-b223697ddf5d.png)
+
+The result looks like this:
+
+![Screen Shot 2020-01-14 at 10 05 20](https://user-images.githubusercontent.com/8859277/72329714-8796aa00-36b5-11ea-88ee-919736cc1f1d.png)
No results found