DevOps Board
Objective
Ongoing DevOps epics and stories across various areas and enhancements around tools, infra and process.
Requirements
| # | Requirement | User Story | Importance | Notes | |
|---|---|---|---|---|---|
| 1 | Azure-as-an-additional | Azure playground setup with all the capabilities for a seamless option to choose b/w AWS or Azure
OPS-1
-
Getting issue details...
STATUS
| SEVERE | - Deployment Manifest changes for Resources (S3, EBS, etc.) - Eng support: application level changes from S3 to Azure Blob for FileStore, Telemetry, Logos |
|
| 2 | GIT | Git Branching strategy OPS-30 - Getting issue details... STATUS | SEVERE |
| |
| 3 | Spinnaker | POC: Multicloud orchestration and deployment pipeline tool OPS-31 - Getting issue details... STATUS | MEDIUM |
| |
| 4 | Backup | Encrypted logs in S3
OPS-2
-
Getting issue details...
STATUS
| HIGH | ||
| 5 | Backup | Logs backup for at least 6 months period
OPS-3
-
Getting issue details...
STATUS
| HIGH | - Need to do POC with CloudFront and Log Analytics before finding out our new Log life cycle solution | |
| 6 | Capacity Planning | Cluster and app Sizing determination
OPS-4
-
Getting issue details...
STATUS
| MEDIUM | - Need to have sizing templates with BaseMin, GoodToHave & OptToHave | |
| 7 | Infra | Node resizing/restructuring across all the env, including Punjab prod (Upon customer approval)
OPS-5
-
Getting issue details...
STATUS
| BACKLOG | - Need to change the instance type M4.large to M5.xLarge | |
| 8 | Infra | Need to have dashboard, monitoring & Deployments for all the Envs
OPS-6
-
Getting issue details...
STATUS
| MEDIUM | - Need to evaluate a tool, which is cloud agnostic and all-in-one | |
| 9 | Infra | MultiCloud
OPS-7
-
Getting issue details...
STATUS
| BACKLOG | ||
| 10 | Kafka Improvement | Deploy HA Kafka and Zookeeper cluster
OPS-8
-
Getting issue details...
STATUS
| SEVER | - Need to make Headless service configuration (Kafka connect) | - Requests go to hadrcoded individual nodes like Kafka0,1,2) |
| 11 | Kafka Improvement | Use Kakfa Connect to index instead of indexer
OPS-9
-
Getting issue details...
STATUS
| SEVER | ||
| 12 | Kafka Improvment | Kakfa paritioning and multi consumer implementation
OPS-10
-
Getting issue details...
STATUS
| SEVERE | - 1-to-1 to 1-to-many | |
| 13 | Kube Upgrade | Upgrade Kubernetes to 1.11.6 for all environments (Dev, QA, PUAT, PPROD)
OPS-11
-
Getting issue details...
STATUS
| HIGH | - Kops upgrade - Manifests changes | POC Done |
| 14 | Kube Upgrade | Pod auto-scaling strategy
OPS-12
-
Getting issue details...
STATUS
| BACKLOG | ||
| 15 | Logging | Move logging from direct ELK to ELK via Kafka
OPS-13
-
Getting issue details...
STATUS
| SEVERE | - Nithin is working on | |
| 16 | Logging | Request/Response event logging from Zuul
OPS-14
-
Getting issue details...
STATUS
| SEVERE | ||
| 17 | Logging | Log masking
OPS-15
-
Getting issue details...
STATUS
| SEVERE | - Need to get the List of Fields from Dev, before working on POC | |
| 18 | Monitoring | Kafka Monitoring & Alerting
OPS-16
-
Getting issue details...
STATUS
| HIGH | ||
| 19 | Monitoring | Move telemetry to internal Kafka and ELK
OPS-17
-
Getting issue details...
STATUS
| SEVERE | - Nithin is working on | |
| 20 | Monitoring | Prometheus or any better monitoring, which is proactively reporting issues ahead of time
OPS-18
-
Getting issue details...
STATUS
| HIGH | ||
| 21 | Monitoring | Zuul/NGINX Status code monitoring - New dashboard
OPS-19
-
Getting issue details...
STATUS
| HIGH | ||
| 22 | Monitoring | Error monitoring and configuration
OPS-20
-
Getting issue details...
STATUS
| HIGH | ||
| 23 | Monitoring | Health and Readiness check on all services
OPS-21
-
Getting issue details...
STATUS
| SEVERE | ||
| 24 | Monitoring | Intra service traffic management gateway
OPS-22
-
Getting issue details...
STATUS
| HIGH | ||
| 25 | Monitoring | Monitoring dashboards at multiple levels - infra, IT & business
OPS-23
-
Getting issue details...
STATUS
| HIGH | ||
| 26 | Process | IAM user policy for whole infra
OPS-24
-
Getting issue details...
STATUS
| HIGH | - Only admins, team IAM users have access only for their respective S3 Buckets | |
| 27 | RBAC | ACL on Kubectl access (after 1.11.6 upgrade)
OPS-25
-
Getting issue details...
STATUS
| SEVERE | ||
| 28 | RBAC | ACL in Jenkins
OPS-26
-
Getting issue details...
STATUS
| SEVERE | ||
| 29 | Release Mgmt | Need Helm like deployment strategy to rollout and rollback releases with one chart or single config
OPS-27
-
Getting issue details...
STATUS
| MEDIUM | ||
| 30 | TBD | DB Masking and PII removal
OPS-28
-
Getting issue details...
STATUS
| MEDIUM | ||
| 31 | Kube Encrypt | Modify kubernetes deployment encryption OPS-39 - Getting issue details... STATUS |
User interaction and design
Open Questions
| Question | Answer | Date Answered |
|---|---|---|
Out of Scope
, multiple selections available,