...
Scale Down Nginx-Ingress Controller
Scale down the nginx-ingress controller to zero replicas.
Code Block language bashnone kubectl scale deployment nginx-ingress-controller --replicas=0 -n <namespace>
Monitor Kafka Lags
Monitor Kafka consumer lags until they reach zero to ensure no pending messages.
Code Block language bash kafka-consumer-groups --bootstrap-server <kafka-broker> --describe --group <consumer-group>
If latest monitoring is available, use Kafka-UI to monitor Kafka consumer lags.
Code Block language bashnone kubectl port-forward svc/kafka-ui 8080:8080 -n <namespace> # visit http://localhost:8080/kafka-ui to access dashboard
Scale Down Cluster Worker Nodes
- Scale down the worker nodes from AWS Auto Scaling Groups, to prevent any further activities.
Backup EBS Volumes
Take snapshots of the EBS volumes attached to Persistent Volumes (PVs).
Kafka
Kafka - Infra (if available)
Zookeeper (if available)
Elasticsearch (data & master)
Elasticsearch - Infra (data & master) (if available)
Scale Down Cluster Worker Nodes
Scale down the worker nodes from AWS Auto Scaling Groups, to prevent any further activities.
Upgrading
Clone the DIGIT-DevOps repository.
Code Block language bash git clone https://github.com/egovernments/DIGIT-DevOps.git
Navigate to the cloned repository and checkout the release-1.28-Kubernetes branch.
Code Block language bash cd DIGIT-DevOps git checkout release-1.28-kubernetes
Check if the correct aws credentials are configured using
aws configure list
.
Else run toaws configure
to configure AWS CLI.Open input.yaml file and fill in the inputs as per the regex mentioned in the comments.
Go to infra-as-code/terraform/sample-aws and run init.go script to enrich different files based on input.yaml.
Code Block language bash cd infra-as-code/terraform/sample-aws go run ../scripts/init.go
Update EKS version under
variable "kubernetes_version"
invariables.tf
file and update ami_id undermodule "eks" "worker_groups"
inmain.tf
.
Note: ami_id can be fetched using below cmdCode Block language bash aws ssm get-parameters --region <region> --names /aws/service/eks/optimized-ami/<eks_version>/amazon-linux-2/recommended/image_id
Run below terraform commands to upgrade EKS.Code Block language bash terraform init # initializes terraform in current working directory terraform plan # creates an execution plan (verify this before runnind "apply") terraform apply # applies changes mentioned in execution plan
Post Upgrade
Verify Kubeconfigs
Confirm that both admin and user kubeconfigs are working as expected. If issues are found, obtain the latest admin kubeconfig and update necessary roles for the user kubeconfig to ensure streamlined access.
Code Block aws eks update-kubeconfig --region <region> --name <cluster-name>
Scale Up Worker Nodes
Scale up the worker nodes from AWS Auto Scaling Groups & ensure they are successfully attached to the EKS cluster.
Verify Pod Status
Check that all pods are up and running.
Code Block language bash kubectl get pods -A
In-case of ImagePullBackOff error due for pull limit exceeded, please wait for additional 6-10 Hrs upon which the issue will be resolved on it's own. Refer to official doc for more information.
Check Kafka Consumers
Verify Kafka consumers for any irregularities such as negative lag.
Code Block language bash kafka-consumer-groups --bootstrap-server <kafka-broker> --describe --group <consumer-group>
If latest monitoring is available, use Kafka-UI to monitor Kafka consumer lags.
Code Block language bash kubectl port-forward svc/kafka-ui 8080:8080 -n <namespace> # visit http://localhost:8080/kafka-ui to access dashboard
In case there is negative lags in any of the consumer, identify the topic within the consumer group & reset the offset to LATEST.
Code Block language bash kafka-consumer-groups --bootstrap-server <kafka-broker> --reset-offsets --group <consumer-group> --topic <topic-name> --to-latest --execute
Scale Up Nginx-Ingress Controller
Scale up the nginx-ingress controller to required N replicas.
Code Block kubectl scale deployment nginx-ingress-controller --replicas=N -n <namespace>
Verify System Health
Monitor the overall system health to ensure everything is functioning as expected.
...
Page Tree | ||||||
---|---|---|---|---|---|---|
|