Created
February 10, 2025 04:55
-
-
Save rajivreddy/d9238268df0c3fa142a37b6e08057686 to your computer and use it in GitHub Desktop.
Revisions
-
rajivreddy created this gist
Feb 10, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,241 @@ # Resource Allocation,Scaling in Kubernetes ## Namespace limits When you decide to segregate your cluster in namespaces, you should protect against misuses in resources. You shouldn't allow your user to use more resources than what you agreed in advance. Cluster administrators can set constraints to limit the number of objects or amount of computing resources that are used in your project with quotas and limit ranges. ### Namespaces have LimitRange Containers without limits can lead to resource contention with other containers and unoptimized consumption of computing resources. Kubernetes has two features for constraining resource utilisation: ResourceQuota and LimitRange. With the LimitRange object, you can define default values for resource requests and limits for individual containers inside namespaces. Any container created inside that namespace, without request and limit values explicitly specified, is assigned the default values. **Example Code:** ```yaml apiVersion: v1 kind: LimitRange metadata: name: cpu-resource-constraint spec: limits: - default: # this section defines default limits cpu: 500m defaultRequest: # this section defines default requests cpu: 500m max: # max and min define the limit range cpu: "1" min: cpu: 100m type: Container ``` ### Namespaces have ResourceQuotas With ResourceQuotas, you can limit the total resource consumption of all Pods/containers inside a Namespace. Defining a resource quota for a namespace limits the total amount of CPU, memory or storage resources that can be consumed by all containers belonging to that namespace. You can also set quotas for other Kubernetes objects such as the number of Pods in the current namespace. If you're thinking that someone could exploit your cluster and create 20000 ConfigMaps, using the LimitRange is how you can prevent that. **Example** ```yaml apiVersion: v1 kind: ResourceQuota metadata: name: configmap-quota namespace: my-namespace # Change to your namespace spec: hard: configmaps: "10" # Maximum of 10 ConfigMaps allowed in the namespace ``` For `cpu` and `memory` Limits ```yaml apiVersion: v1 kind: List items: - apiVersion: v1 kind: ResourceQuota metadata: name: pods-high spec: hard: cpu: "1000" memory: "200Gi" pods: "10" scopeSelector: matchExpressions: - operator: In scopeName: PriorityClass values: ["high"] - apiVersion: v1 kind: ResourceQuota metadata: name: pods-medium spec: hard: cpu: "10" memory: "20Gi" pods: "10" scopeSelector: matchExpressions: - operator: In scopeName: PriorityClass values: ["medium"] # dfd - apiVersion: v1 kind: ResourceQuota metadata: name: pods-low spec: hard: cpu: "5" memory: "10Gi" pods: "10" scopeSelector: matchExpressions: - operator: In scopeName: PriorityClass values: ["low"] ``` ## How does this impact your AutoScaling If you provision and Deployment with out Resource Allocation, `Limit Range` will assign default values that are configured in the policy. this allows HPA to calculate the CPU and memory metrics **Best Practice:** Always define `requests` and `limits` to prevent a single pod from consuming excessive resources. ### How HPA Works Internally To Make HPA works you need to have metrics, Example if you want to scale based on CPU and memory utilization then make sure metrics server is already installed on kubernetes. By default it will checks CPU, Memory metrics ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 ``` HPA determines the desired number of replicas using this formula: ```shell desiredReplicas = (currentMetricValue/targetMetricValue)*currentReplicas ``` in above example, we set `averageUtilization` is `50%` and assume current CPU Utilization is `80%`, based on formula ```log desiredReplicas=(80/50)*2 == 3.2 ``` it will be rounded off to `3`. #### Scaling Behaviors & Considerations there are other ways to configure the velocity of the scale. You can add `behavior` configuration to the HAP manifest to have differnete velocity of scaling example ```yaml behavior: scaleUp: policies: - type: Percent value: 900 periodSeconds: 60 scaleDown: policies: - type: Pods value: 1 periodSeconds: 600 # (i.e., scale down one pod every 10 min) ``` The `900` implies that `9 times` the current number of pods can be added, effectively making the number of replicas 10 times the current size. All other parameters are not specified (default values are used) If the application is started with 1 pod, it will scale up with the following number of pods: ```log 1 -> 10 -> 100 -> 1000 ``` but the scale down will be gradual and it will scale down 1 pod every 10 mins. **stabilizationWindowSeconds** - this value indicates the amount of time the HPA controller should consider previous recommendations to prevent flapping of the number of replicas. this configuration allows you to Avoid false positive signals for scaling up(In scale up mode ) and does not want to scale down pods too early expecting some late load spikes(Scale down mode) #### Using Custom metrics HPA supports Custom, External metrics as well( data Sources cab be Prometheus, Datadog, AWS CloudWatch and etc ), if you have any non functional requirements you can use these metrics to scale the application for example `http_requests_per_second`(This metrics can be available from your ingress.) **Example:** ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-app minReplicas: 2 maxReplicas: 15 metrics: - type: External external: metric: name: http_requests_per_second target: type: AverageValue averageValue: 100 ``` ### How VPA works /Add Contenet ## Best Pratices for HPA 1. Use the Right Metrics for Scaling your Application 1. Default Metrics: CPU, Memory 2. Custom Metrics: HTTP request rate (RPS), Failed requests Per Sec 3. External Metrics: API call rate 2. Avoid Overly Aggressive Scaling you can use `stabilization windows` to prevent frequent scaling (flapping). 3. Combine HPA with Readiness & Liveness Probes New pods takes time to get to Ready State, Make sure your Liveness and Readiness probes are configured Right. 4. Set Min and Max Replicas Properly 5. Scale Using Multiple Metrics You can use tools like keda.sh for Event Driven Auto Scaling https://keda.sh/