rajivreddy · February 10, 2025 04:55 · Feb 10, 2025
diff --git a/Kubernetes Auto Scaling Best Practices.md b/Kubernetes Auto Scaling Best Practices.md
@@ -0,0 +1,241 @@
+# Resource Allocation,Scaling in Kubernetes
+
+## Namespace limits
+
+When you decide to segregate your cluster in namespaces, you should protect against misuses in resources.
+
+You shouldn't allow your user to use more resources than what you agreed in advance.
+
+Cluster administrators can set constraints to limit the number of objects or amount of computing resources that are used in your project with quotas and limit ranges.
+
+### Namespaces have LimitRange
+
+Containers without limits can lead to resource contention with other containers and unoptimized consumption of computing resources.
+
+Kubernetes has two features for constraining resource utilisation: ResourceQuota and LimitRange.
+
+With the LimitRange object, you can define default values for resource requests and limits for individual containers inside namespaces.
+
+Any container created inside that namespace, without request and limit values explicitly specified, is assigned the default values.
+
+**Example Code:**
+
+```yaml
+apiVersion: v1
+kind: LimitRange
+metadata:
+  name: cpu-resource-constraint
+spec:
+  limits:
+    - default: # this section defines default limits
+        cpu: 500m
+      defaultRequest: # this section defines default requests
+        cpu: 500m
+      max: # max and min define the limit range
+        cpu: "1"
+      min:
+        cpu: 100m
+      type: Container
+```
+
+### Namespaces have ResourceQuotas
+
+With ResourceQuotas, you can limit the total resource consumption of all Pods/containers inside a Namespace.
+
+Defining a resource quota for a namespace limits the total amount of CPU, memory or storage resources that can be consumed by all containers belonging to that namespace.
+
+You can also set quotas for other Kubernetes objects such as the number of Pods in the current namespace.
+
+If you're thinking that someone could exploit your cluster and create 20000 ConfigMaps, using the LimitRange is how you can prevent that.
+**Example**
+
+```yaml
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: configmap-quota
+  namespace: my-namespace # Change to your namespace
+spec:
+  hard:
+    configmaps: "10" # Maximum of 10 ConfigMaps allowed in the namespace
+```
+
+For `cpu` and `memory` Limits
+
+```yaml
+apiVersion: v1
+kind: List
+items:
+  - apiVersion: v1
+    kind: ResourceQuota
+    metadata:
+      name: pods-high
+    spec:
+      hard:
+        cpu: "1000"
+        memory: "200Gi"
+        pods: "10"
+      scopeSelector:
+        matchExpressions:
+          - operator: In
+            scopeName: PriorityClass
+            values: ["high"]
+  - apiVersion: v1
+    kind: ResourceQuota
+    metadata:
+      name: pods-medium
+    spec:
+      hard:
+        cpu: "10"
+        memory: "20Gi"
+        pods: "10"
+      scopeSelector:
+        matchExpressions:
+          - operator: In
+            scopeName: PriorityClass
+            values: ["medium"]
+  # dfd
+  - apiVersion: v1
+    kind: ResourceQuota
+    metadata:
+      name: pods-low
+    spec:
+      hard:
+        cpu: "5"
+        memory: "10Gi"
+        pods: "10"
+      scopeSelector:
+        matchExpressions:
+          - operator: In
+            scopeName: PriorityClass
+            values: ["low"]
+```
+
+## How does this impact your AutoScaling
+
+If you provision and Deployment with out Resource Allocation, `Limit Range` will assign default values that are configured in the policy. this allows HPA to calculate the CPU and memory metrics
+
+**Best Practice:** Always define `requests` and `limits` to prevent a single pod from consuming excessive resources.
+
+### How HPA Works Internally
+
+To Make HPA works you need to have metrics, Example if you want to scale based on CPU and memory utilization then make sure metrics server is already installed on kubernetes.
+By default it will checks CPU, Memory metrics
+
+```yaml
+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: my-hpa
+spec:
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: my-deployment
+  minReplicas: 2
+  maxReplicas: 10
+  metrics:
+    - type: Resource
+      resource:
+        name: cpu
+        target:
+          type: Utilization
+          averageUtilization: 50
+```
+
+HPA determines the desired number of replicas using this formula:
+
+```shell
+desiredReplicas = (currentMetricValue/targetMetricValue)*currentReplicas
+```
+
+in above example, we set `averageUtilization` is `50%` and assume current CPU Utilization is `80%`, based on formula
+
+```log
+desiredReplicas=(80/50)*2 == 3.2
+```
+
+it will be rounded off to `3`.
+
+#### Scaling Behaviors & Considerations
+
+there are other ways to configure the velocity of the scale. You can add `behavior` configuration to the HAP manifest to have differnete velocity of scaling example
+
+```yaml
+behavior:
+  scaleUp:
+    policies:
+      - type: Percent
+        value: 900
+        periodSeconds: 60
+  scaleDown:
+    policies:
+      - type: Pods
+        value: 1
+        periodSeconds: 600 # (i.e., scale down one pod every 10 min)
+```
+
+The `900` implies that `9 times` the current number of pods can be added, effectively making the number of replicas 10 times the current size. All other parameters are not specified (default values are used)
+If the application is started with 1 pod, it will scale up with the following number of pods:
+
+```log
+1 -> 10 -> 100 -> 1000
+```
+
+but the scale down will be gradual and it will scale down 1 pod every 10 mins.
+
+**stabilizationWindowSeconds** - this value indicates the amount of time the HPA controller should consider previous recommendations to prevent flapping of the number of replicas.
+this configuration allows you to Avoid false positive signals for scaling up(In scale up mode ) and does not want to scale down pods too early expecting some late load spikes(Scale down mode)
+
+#### Using Custom metrics
+
+HPA supports Custom, External metrics as well( data Sources cab be Prometheus, Datadog, AWS CloudWatch and etc ), if you have any non functional requirements you can use these metrics to scale the application for example `http_requests_per_second`(This metrics can be available from your ingress.)
+**Example:**
+
+```yaml
+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: web-app-hpa
+spec:
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: web-app
+  minReplicas: 2
+  maxReplicas: 15
+  metrics:
+    - type: External
+      external:
+        metric:
+          name: http_requests_per_second
+        target:
+          type: AverageValue
+          averageValue: 100
+```
+
+### How VPA works
+
+/Add Contenet
+
+## Best Pratices for HPA
+
+1. Use the Right Metrics for Scaling your Application
+
+   1. Default Metrics: CPU, Memory
+   2. Custom Metrics: HTTP request rate (RPS), Failed requests Per Sec
+   3. External Metrics: API call rate
+
+2. Avoid Overly Aggressive Scaling
+
+   you can use `stabilization windows` to prevent frequent scaling (flapping).
+
+3. Combine HPA with Readiness & Liveness Probes
+   New pods takes time to get to Ready State, Make sure your Liveness and Readiness probes are configured Right.
+4. Set Min and Max Replicas Properly
+
+5. Scale Using Multiple Metrics
+
+You can use tools like keda.sh for Event Driven Auto Scaling
+
+https://keda.sh/