/
Deployment best practices Checklist
Deployment best practices Checklist
Daily Tasks:
- Monitor the status of the environment and ensure every single service is running (https://<domain_monitoring>/monitoring
- Monitoring the resources of environments https://central-dashboard.digit.org/d/gzIcCaiVz/kubernetes-cluster-ram-and-cpu-utilization
- Monitoring alerts-overview dashboard and taking appropriate action on critical and warning alerts https://central-dashboard.digit.org/d/smo98XK4z/alerts-overview?orgId=1&refresh=30s
- Keep track of all tasks by creating tickets
- In the Slack channel, watch the Prometheus Alters
- Monitor the Kafka consumer group lags https://<domain_name>/monitoring/d/N9uZBy8Wz/1-kubernetes-cluster-overview-kubrnettes?viewPanel=137&orgId=1and
- In case of Kafka-related issues, troubleshoot them https://core.digit.org/guides/operations-guide/kafka-troubleshooting-guide
Weekly Tasks:
- Monitoring the resources of environments https://central-dashboard.digit.org/d/gzIcCaiVz/kubernetes-cluster-ram-and-cpu-utilization
- Monitoring alerts-overview dashboard and taking appropriate action on critical and working alerts https://central-dashboard.digit.org/d/smo98XK4z/alerts-overview?orgId=1&refresh=30s
- Monitor the Kubecost Dashboard https://central-dashboard.digit.org/d/JOUdHGZZz/kubecost-dashboard-for-grafana-cloud?orgId=1
- Cleanup logs
- Backup logs
- Weekly DB dump in case of SDC
- ES Data backup
- Publish Weekly Summary report/Come up with the format
- Publish the JIRA board status
Monthly Tasks:
- Publish the Jira Board status report
- Create a new Jira sprint and maintain it
- Publish environments resources status report
- Publish environments cost report
- If you have tackled a new problem, publish troubleshooting documents
Related content
Monitoring Upgrade
Monitoring Upgrade
More like this
Kubernetes Introductory Exercises
Kubernetes Introductory Exercises
More like this
DIGIT Deployment
DIGIT Deployment
More like this
Pre-Requisites
Pre-Requisites
More like this
Monitoring & Alerting Setup for cluster
Monitoring & Alerting Setup for cluster
More like this
EKS Upgrade
EKS Upgrade
More like this