Preparation

Scale Down Nginx-Ingress Controller
- Scale down the nginx-ingress controller to zero replicas.
```
kubectl scale deployment nginx-ingress-controller --replicas=0 -n <namespace>
```

Monitor Kafka Lags

Monitor Kafka consumer lags until they reach zero to ensure no pending messages.

kafka-consumer-groups --bootstrap-server <kafka-broker> --describe --group <consumer-group>

If latest monitoring is available, use Kafka-UI to monitor Kafka consumer lags.

kubectl port-forward svc/kafka-ui 8080:8080 -n <namespace>
# visit http://localhost:8080/kafka-ui to access dashboard

Scale Down Cluster Worker Nodes
- Scale down the worker nodes from AWS Auto Scaling Groups, to prevent any further activities.
Backup EBS Volumes
- Take snapshots of the EBS volumes attached to Persistent Volumes (PVs).
  - Kafka
  - Kafka - Infra (if available)
  - Zookeeper (if available)
  - Elasticsearch (data & master)
  - Elasticsearch - Infra (data & master) (if available)

Upgrading

Clone the DIGIT-DevOps repository.

git clone https://github.com/egovernments/DIGIT-DevOps.git

Navigate to the cloned repository and checkout the release-1.28-Kubernetes branch.
```
cd DIGIT-DevOps 
git checkout release-1.28-kubernetes
```
Check if the correct aws credentials are configured using aws configure list.
Else run to aws configure to configure AWS CLI.
Open input.yaml file and fill in the inputs as per the regex mentioned in the comments.
Go to infra-as-code/terraform/sample-aws and run init.go script to enrich different files based on input.yaml.
```
cd infra-as-code/terraform/sample-aws 
go run ../scripts/init.go
```

Update EKS version under variable "kubernetes_version" in variables.tf file and update ami_id under module "eks" "worker_groups" in main.tf.
Note: ami_id can be fetched using below cmd

aws ssm get-parameters --region <region> --names /aws/service/eks/optimized-ami/<eks_version>/amazon-linux-2/recommended/image_id

Run below terraform commands to upgrade EKS.

terraform init    # initializes terraform in current working directory
terraform plan    # creates an execution plan (verify this before runnind "apply")
terraform apply   # applies changes mentioned in execution plan

Post Upgrade

Verify Kubeconfigs
- Confirm that both admin and user kubeconfigs are working as expected. If issues are found, obtain the latest admin kubeconfig and update necessary roles for the user kubeconfig to ensure streamlined access.
```
aws eks update-kubeconfig --region <region> --name <cluster-name>
```
Scale Up Worker Nodes
- Scale up the worker nodes from AWS Auto Scaling Groups & ensure they are successfully attached to the EKS cluster.
Verify Pod Status
- Check that all pods are up and running.
```
kubectl get pods -A
```
  In-case of ImagePullBackOff error due for pull limit exceeded, please wait for additional 6-10 Hrs upon which the issue will be resolved on it's own. Refer to official doc for more information.

Check Kafka Consumers

Verify Kafka consumers for any irregularities such as negative lag.

kafka-consumer-groups --bootstrap-server <kafka-broker> --describe --group <consumer-group>

If latest monitoring is available, use Kafka-UI to monitor Kafka consumer lags.

kubectl port-forward svc/kafka-ui 8080:8080 -n <namespace>
# visit http://localhost:8080/kafka-ui to access dashboard

In case there is negative lags in any of the consumer, identify the topic within the consumer group & reset the offset to LATEST.

kafka-consumer-groups --bootstrap-server <kafka-broker> --reset-offsets --group <consumer-group> --topic <topic-name> --to-latest --execute

Scale Up Nginx-Ingress Controller
- Scale up the nginx-ingress controller to required N replicas.
```
kubectl scale deployment nginx-ingress-controller --replicas=N -n <namespace>
```
Verify System Health
- Monitor the overall system health to ensure everything is functioning as expected.

EKS Upgrade

Preparation

Upgrading

Post Upgrade