...
- Monitor the status of the environment and ensure every single service is running (https://<domain_monitoring>/monitoring
- Monitoring the resources of environments https://central-dashboard.digit.org/d/gzIcCaiVz/kubernetes-cluster-ram-and-cpu-utilization
- Monitoring alerts-overview dashboard and taking appropriate action on critical and working warning alerts https://central-dashboard.digit.org/d/smo98XK4z/alerts-overview?orgId=1&refresh=30s
- Keep track of all tasks by creating tickets
- In the Slack channel, watch the Prometheus Alters
- Monitor the Kafka consumer group lags https://<domain_name>/monitoring/d/N9uZBy8Wz/1-kubernetes-cluster-overview-kubrnettes?viewPanel=137&orgId=1and
- In case of Kafka-related issues, troubleshoot them https://core.digit.org/guides/operations-guide/kafka-troubleshooting-guide
...