Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Preparation

  1. Scale Down Nginx-Ingress Controller

    • Scale down the nginx-ingress controller to zero replicas.

      Code Block
      languagebash
      kubectl scale deployment nginx-ingress-controller --replicas=0 -n <namespace>
  2. Monitor Kafka Lags

    • Monitor Kafka consumer lags until they reach zero to ensure no pending messages.

      Code Block
      languagebash
      kafka-consumer-groups --bootstrap-server <kafka-broker> --describe --group <consumer-group>

      If latest monitoring is available, use Kafka-UI to monitor Kafka consumer lags.

      Code Block
      languagebash
      kubectl port-forward svc/kafka-ui 8080:8080 -n <namespace>
      # visit http://localhost:8080/kafka-ui to access dashboard
  3. Scale Down Cluster Worker Nodes

    • Scale down the worker nodes from AWS Auto Scaling Groups, to prevent any further activities.

  4. Backup EBS Volumes

    • Take snapshots of the EBS volumes attached to Persistent Volumes (PVs).

      • Kafka

      • Kafka - Infra (if available)

      • Zookeeper (if available)

      • Elasticsearch (data & master)

      • Elasticsearch - Infra (data & master) (if available)

Post Upgrade

  1. Verify Kubeconfigs

    • Confirm that both admin and user kubeconfigs are working as expected. If issues are found, obtain the latest admin kubeconfig and update necessary roles for the user kubeconfig to ensure streamlined access.

      Code Block
      aws eks update-kubeconfig --region <region> --name <cluster-name>
  2. Scale Up Worker Nodes

    • Scale up the worker nodes from AWS Auto Scaling Groups & ensure they are successfully attached to the EKS cluster.

  3. Verify Pod Status

    • Check that all pods are up and running.

      Code Block
      languagebash
      kubectl get pods -A

      In-case of ImagePullBackOff error due for pull limit exceeded, please wait for additional 6-10 Hrs upon which the issue will be resolved on it's own. Refer to official doc for more information.

  4. Check Kafka Consumers

    • Verify Kafka consumers for any irregularities such as negative lag.

      Code Block
      languagebash
      kafka-consumer-groups --bootstrap-server <kafka-broker> --describe --group <consumer-group>

      If latest monitoring is available, use Kafka-UI to monitor Kafka consumer lags.

      Code Block
      languagebash
      kubectl port-forward svc/kafka-ui 8080:8080 -n <namespace>
      # visit http://localhost:8080/kafka-ui to access dashboard
    • In case there is negative lags in any of the consumer, identify the topic within the consumer group & reset the offset to LATEST.

      Code Block
      languagebash
      kafka-consumer-groups --bootstrap-server <kafka-broker> --reset-offsets --group <consumer-group> --topic <topic-name> --to-latest --execute
  5. Scale Up Nginx-Ingress Controller

    • Scale up the nginx-ingress controller to required N replicas.

      Code Block
      kubectl scale deployment nginx-ingress-controller --replicas=N -n <namespace>
  6. Verify System Health

    • Monitor the overall system health to ensure everything is functioning as expected.