Skip to content

Instantly share code, notes, and snippets.

@rajivreddy
Created February 10, 2025 04:55
Show Gist options
  • Save rajivreddy/d9238268df0c3fa142a37b6e08057686 to your computer and use it in GitHub Desktop.
Save rajivreddy/d9238268df0c3fa142a37b6e08057686 to your computer and use it in GitHub Desktop.

Revisions

  1. rajivreddy created this gist Feb 10, 2025.
    241 changes: 241 additions & 0 deletions Kubernetes Auto Scaling Best Practices.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,241 @@
    # Resource Allocation,Scaling in Kubernetes

    ## Namespace limits

    When you decide to segregate your cluster in namespaces, you should protect against misuses in resources.

    You shouldn't allow your user to use more resources than what you agreed in advance.

    Cluster administrators can set constraints to limit the number of objects or amount of computing resources that are used in your project with quotas and limit ranges.

    ### Namespaces have LimitRange

    Containers without limits can lead to resource contention with other containers and unoptimized consumption of computing resources.

    Kubernetes has two features for constraining resource utilisation: ResourceQuota and LimitRange.

    With the LimitRange object, you can define default values for resource requests and limits for individual containers inside namespaces.

    Any container created inside that namespace, without request and limit values explicitly specified, is assigned the default values.

    **Example Code:**

    ```yaml
    apiVersion: v1
    kind: LimitRange
    metadata:
    name: cpu-resource-constraint
    spec:
    limits:
    - default: # this section defines default limits
    cpu: 500m
    defaultRequest: # this section defines default requests
    cpu: 500m
    max: # max and min define the limit range
    cpu: "1"
    min:
    cpu: 100m
    type: Container
    ```
    ### Namespaces have ResourceQuotas
    With ResourceQuotas, you can limit the total resource consumption of all Pods/containers inside a Namespace.
    Defining a resource quota for a namespace limits the total amount of CPU, memory or storage resources that can be consumed by all containers belonging to that namespace.
    You can also set quotas for other Kubernetes objects such as the number of Pods in the current namespace.
    If you're thinking that someone could exploit your cluster and create 20000 ConfigMaps, using the LimitRange is how you can prevent that.
    **Example**
    ```yaml
    apiVersion: v1
    kind: ResourceQuota
    metadata:
    name: configmap-quota
    namespace: my-namespace # Change to your namespace
    spec:
    hard:
    configmaps: "10" # Maximum of 10 ConfigMaps allowed in the namespace
    ```
    For `cpu` and `memory` Limits

    ```yaml
    apiVersion: v1
    kind: List
    items:
    - apiVersion: v1
    kind: ResourceQuota
    metadata:
    name: pods-high
    spec:
    hard:
    cpu: "1000"
    memory: "200Gi"
    pods: "10"
    scopeSelector:
    matchExpressions:
    - operator: In
    scopeName: PriorityClass
    values: ["high"]
    - apiVersion: v1
    kind: ResourceQuota
    metadata:
    name: pods-medium
    spec:
    hard:
    cpu: "10"
    memory: "20Gi"
    pods: "10"
    scopeSelector:
    matchExpressions:
    - operator: In
    scopeName: PriorityClass
    values: ["medium"]
    # dfd
    - apiVersion: v1
    kind: ResourceQuota
    metadata:
    name: pods-low
    spec:
    hard:
    cpu: "5"
    memory: "10Gi"
    pods: "10"
    scopeSelector:
    matchExpressions:
    - operator: In
    scopeName: PriorityClass
    values: ["low"]
    ```

    ## How does this impact your AutoScaling

    If you provision and Deployment with out Resource Allocation, `Limit Range` will assign default values that are configured in the policy. this allows HPA to calculate the CPU and memory metrics

    **Best Practice:** Always define `requests` and `limits` to prevent a single pod from consuming excessive resources.

    ### How HPA Works Internally

    To Make HPA works you need to have metrics, Example if you want to scale based on CPU and memory utilization then make sure metrics server is already installed on kubernetes.
    By default it will checks CPU, Memory metrics

    ```yaml
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
    name: my-hpa
    spec:
    scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
    minReplicas: 2
    maxReplicas: 10
    metrics:
    - type: Resource
    resource:
    name: cpu
    target:
    type: Utilization
    averageUtilization: 50
    ```

    HPA determines the desired number of replicas using this formula:

    ```shell
    desiredReplicas = (currentMetricValue/targetMetricValue)*currentReplicas
    ```

    in above example, we set `averageUtilization` is `50%` and assume current CPU Utilization is `80%`, based on formula

    ```log
    desiredReplicas=(80/50)*2 == 3.2
    ```

    it will be rounded off to `3`.

    #### Scaling Behaviors & Considerations

    there are other ways to configure the velocity of the scale. You can add `behavior` configuration to the HAP manifest to have differnete velocity of scaling example

    ```yaml
    behavior:
    scaleUp:
    policies:
    - type: Percent
    value: 900
    periodSeconds: 60
    scaleDown:
    policies:
    - type: Pods
    value: 1
    periodSeconds: 600 # (i.e., scale down one pod every 10 min)
    ```

    The `900` implies that `9 times` the current number of pods can be added, effectively making the number of replicas 10 times the current size. All other parameters are not specified (default values are used)
    If the application is started with 1 pod, it will scale up with the following number of pods:

    ```log
    1 -> 10 -> 100 -> 1000
    ```

    but the scale down will be gradual and it will scale down 1 pod every 10 mins.

    **stabilizationWindowSeconds** - this value indicates the amount of time the HPA controller should consider previous recommendations to prevent flapping of the number of replicas.
    this configuration allows you to Avoid false positive signals for scaling up(In scale up mode ) and does not want to scale down pods too early expecting some late load spikes(Scale down mode)

    #### Using Custom metrics

    HPA supports Custom, External metrics as well( data Sources cab be Prometheus, Datadog, AWS CloudWatch and etc ), if you have any non functional requirements you can use these metrics to scale the application for example `http_requests_per_second`(This metrics can be available from your ingress.)
    **Example:**

    ```yaml
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
    name: web-app-hpa
    spec:
    scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
    minReplicas: 2
    maxReplicas: 15
    metrics:
    - type: External
    external:
    metric:
    name: http_requests_per_second
    target:
    type: AverageValue
    averageValue: 100
    ```

    ### How VPA works

    /Add Contenet

    ## Best Pratices for HPA

    1. Use the Right Metrics for Scaling your Application

    1. Default Metrics: CPU, Memory
    2. Custom Metrics: HTTP request rate (RPS), Failed requests Per Sec
    3. External Metrics: API call rate

    2. Avoid Overly Aggressive Scaling

    you can use `stabilization windows` to prevent frequent scaling (flapping).

    3. Combine HPA with Readiness & Liveness Probes
    New pods takes time to get to Ready State, Make sure your Liveness and Readiness probes are configured Right.
    4. Set Min and Max Replicas Properly

    5. Scale Using Multiple Metrics

    You can use tools like keda.sh for Event Driven Auto Scaling

    https://keda.sh/