Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save mustafayildirim/6ebc96653eb66b825a619b24fa3a6b3f to your computer and use it in GitHub Desktop.

Select an option

Save mustafayildirim/6ebc96653eb66b825a619b24fa3a6b3f to your computer and use it in GitHub Desktop.

Revisions

  1. @max-rocket-internet max-rocket-internet revised this gist Apr 7, 2020. 1 changed file with 57 additions and 1 deletion.
    58 changes: 57 additions & 1 deletion prom-k8s-request-limits.md
    Original file line number Diff line number Diff line change
    @@ -16,6 +16,7 @@ It returns a number between 0 and 1 so format the left Y axis as `percent (0.0-1
    Note that we added some filtering here to get rid of some noise: `name!~".*prometheus.*", image!="", container_name!="POD"`. The `name!~".*prometheus.*"` is just because we aren't interested in the CPU usage of all the prometheus exporters running in our k8s cluster.

    ![Screen Shot 2019-04-24 at 10 58 31](https://user-images.githubusercontent.com/8859277/56646508-fe030000-667f-11e9-94ae-3c29b5a939ae.png)
    (Title on this image is wrong)

    # CPU: show as cores with request/limit lines

    @@ -48,4 +49,59 @@ We then add 2 series overrides to hide the request and limit in the tooltip and

    The result looks like this:

    ![Screen Shot 2020-01-14 at 10 05 20](https://user-images.githubusercontent.com/8859277/72329714-8796aa00-36b5-11ea-88ee-919736cc1f1d.png)
    ![Screen Shot 2020-01-14 at 10 05 20](https://user-images.githubusercontent.com/8859277/72329714-8796aa00-36b5-11ea-88ee-919736cc1f1d.png)

    # Queries to show memory and CPU as percentage of both request and limit

    Percentage of CPU request:

    ```
    round(
    100 *
    sum(
    rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m])
    ) by (pod, container_name, namespace, slave)
    /
    sum(
    kube_pod_container_resource_requests_cpu_cores{container_name!="POD"}
    ) by (pod, container_name, namespace, slave)
    )
    ```

    Percentage of CPU limit:

    ```
    round(
    100 *
    sum(
    rate(container_cpu_usage_seconds_total{image!="", container_name!="POD"}[5m])
    ) by (pod_name, container_name, namespace, slave)
    /
    sum(
    container_spec_cpu_quota{image!="", container_name!="POD"} / container_spec_cpu_period{image!="", container_name!="POD"}
    ) by (pod_name, container_name, namespace, slave)
    )
    ```

    Percentage of memory request:

    ```
    round(
    100 *
    sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (container, pod, namespace, slave)
    /
    sum(kube_pod_container_resource_requests_memory_bytes{container_name!="POD"} > 0) by (container, pod, namespace, slave)
    )
    ```


    Percentage of memory limit:

    ```
    round(
    100 *
    sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (container, pod_name, namespace, slave)
    /
    sum(container_spec_memory_limit_bytes{image!="", container_name!="POD"} > 0) by (container, pod_name, namespace, slave)
    )
    ```
  2. @max-rocket-internet max-rocket-internet revised this gist Jan 14, 2020. 1 changed file with 22 additions and 1 deletion.
    23 changes: 22 additions & 1 deletion prom-k8s-request-limits.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,25 @@
    I have actually switched our Grafana dashboards since my [last comment](https://github.com/google/cadvisor/issues/2026#issuecomment-486134079). Since some applications have a small request and large limit (to save money), then just showing a percentage of the request is sometimes not useful. It also can go over 100% which looks weird.
    # CPU: percentage of limit

    A lot of people land when trying to find out how to calculate CPU usage metric correctly in prometheus, myself included! So I'll post what I eventually ended up using as I think it's still a little difficult trying to tie together all the snippets of info here and elsewhere.

    This is specific to k8s and containers that have CPU limits set.

    To show CPU usage as a percentage of the limit given to the container, this is the Prometheus query we used to create nice graphs in Grafana:

    ```
    sum(rate(container_cpu_usage_seconds_total{name!~".*prometheus.*", image!="", container_name!="POD"}[5m])) by (pod_name, container_name) /
    sum(container_spec_cpu_quota{name!~".*prometheus.*", image!="", container_name!="POD"}/container_spec_cpu_period{name!~".*prometheus.*", image!="", container_name!="POD"}) by (pod_name, container_name)
    ```

    It returns a number between 0 and 1 so format the left Y axis as `percent (0.0-1.0)` or multiply by 100 to get CPU usage percentage.

    Note that we added some filtering here to get rid of some noise: `name!~".*prometheus.*", image!="", container_name!="POD"`. The `name!~".*prometheus.*"` is just because we aren't interested in the CPU usage of all the prometheus exporters running in our k8s cluster.

    ![Screen Shot 2019-04-24 at 10 58 31](https://user-images.githubusercontent.com/8859277/56646508-fe030000-667f-11e9-94ae-3c29b5a939ae.png)

    # CPU: show as cores with request/limit lines

    Since some applications have a small request and large limit (to save money) or have an HPA, then just showing a percentage of the limit is sometimes not useful.

    So what we do now is display the CPU usage in cores and then add a horizontal line for each of the request and limit. This shows more information and also shows the usage in the same metric that is used in k8s: CPU cores.

  3. @max-rocket-internet max-rocket-internet created this gist Jan 14, 2020.
    30 changes: 30 additions & 0 deletions prom-k8s-request-limits.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,30 @@
    I have actually switched our Grafana dashboards since my [last comment](https://github.com/google/cadvisor/issues/2026#issuecomment-486134079). Since some applications have a small request and large limit (to save money), then just showing a percentage of the request is sometimes not useful. It also can go over 100% which looks weird.

    So what we do now is display the CPU usage in cores and then add a horizontal line for each of the request and limit. This shows more information and also shows the usage in the same metric that is used in k8s: CPU cores.

    #### CPU usage

    Legend: `{{container_name}} in {{pod_name}}`
    Query: `sum(rate(container_cpu_usage_seconds_total{pod_name=~"deployment-name-[^-]*-[^-]*$", image!="", container_name!="POD"}[5m])) by (pod_name, container_name)`

    #### CPU limit

    Legend: `limit`
    Query: `sum(kube_pod_container_resource_limits_cpu_cores{pod=~"deployment-name-[^-]*-[^-]*$"}) by (pod)`

    #### CPU request

    Legend: `request`
    Query: `sum(kube_pod_container_resource_requests_cpu_cores{pod=~"deployment-name-[^-]*-[^-]*$"}) by (pod)`

    You will need to edit these 3 queries for your environment so that only pods from a single deployment a returned, e.g. replace `deployment-name`.

    The pod request/limit metrics come from [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics).

    We then add 2 series overrides to hide the request and limit in the tooltip and legend:

    ![Screen Shot 2020-01-13 at 17 05 03](https://user-images.githubusercontent.com/8859277/72271285-e6611280-3626-11ea-87f5-b223697ddf5d.png)

    The result looks like this:

    ![Screen Shot 2020-01-14 at 10 05 20](https://user-images.githubusercontent.com/8859277/72329714-8796aa00-36b5-11ea-88ee-919736cc1f1d.png)