Punjab Cluster Resizing
Considering the fact that the Punjab k8s cluster nodes are often (80% of the time) reaching max memory limits (70 - 90%) on existing "m4.large" instance types for the current load and for the no.of services deployed. We have to resize and optimize the nodes to serve more memory. We would need to upgrade to more memory served and still reasonably sufficient CPU to take the peak load.
To achieve the optimisation, below are our recommendations:
- Need to move from "m4.large" instance type to "r5.large" which doubles the memory. r5.large are almost same price.
- Can reduce 9 instance to 7 instances, so that we would save 2k/month, refer the cost estimation.
- We need to move from KOPS cluster to EKS (Can save 8k by eliminating bastion and master nodes cost which balances EKS pricing).
- We need to move from "On-demand" to "Reserved Instance Type" which can save min 30 - 40% of the overall cost annually.
Benefits of moving to EKS (available in Mumbai):
- IAM Policy based for cluster access. Secured.
- Auto K8S upgrade management.
Current Cluster Size/Structure:
Current Cluster nodes utilisation:
Relative CPU & Memory consumption by the various PODs
- The PODs snapshot gives us the summary that es-cluster is consuming almost 3 times the memory comparatively.
- Least memory used by jaeger, cloud-push-metrics PODS
- eGov PODs we need to evaluate is it really a expected memory consumption.
- Fluentd & Logrotate can be replaced by other better alternates with (Fluentbit & Kafka, nithin has finished the POC)
Recommendation 1:
PunjabCluster | Instance | vCPU | ECU | Memory (GiB) | Storage (GB) | Usage | Total Nodes | Total CPUs | Total Memory | |
Existing | m4.large | 2 | 6.5 | 8 GiB | EBS Only | $0.105 per Hour | 9 | 18 | 72 | |
Proposing | m5.xlarge | 2 | 10 | 16 GiB | EBS Only | $0.13 per Hour | 7 | 14 | 112 |
PunjabCluster | Instance Type | No Of Instances | Cost month |
Existing KOPS | Master Nodes + Bastion | 4 | ₹ 7977.00 |
Proposing | EKS | 1 | ₹ 9897.00 |