Latest Monitoring Chart is available here.
# Clone git repository git clone https://github.com/egovernments/DIGIT-DevOps.git cd DIGIT-DevOps # Checkout to "digit-lts-monitoring" branch git checkout digit-lts-monitoring
Helmfile
Update the environments as required with their relevant file-paths of environment & secrets file and the namespace to be used.
# deploy-as-code/helm/charts/monitoring/monitoring-helmfile.yaml
environments: demo: values: - namespace: monitoring - ../../environments/egov-demo.yaml - ../../environments/egov-demo-secrets.yaml
Environment Files
Grafana
GitHub OAuth App Creation
Follow the GitHub OAuth app
Homepage URL
https://<your_domain_name>
Authorization callback URL
https://<your_domain_name>/monitoring/login/github
Generate
Client ID
&Client secret
Update
Client ID
&Client secret
in secrets config.# deploy-as-code/helm/environments/egov-demo-secrets.yaml cluster-configs: secrets: grafana: clientID: <OAuth-key> clientSecret: <OAuth-token>
Update environment config to allow GitHub organization & teams specific role-based access
# deploy-as-code/helm/environments/egov-demo.yaml grafana: github: allowed_organizations: ["<organization>"] role_attribute_path: contains(groups[*], '@<organization>/<team>') && 'Viewer'
Note: Valid roles are None
, Viewer
, Editor
, Admin
or GrafanaAdmin
Visit official documentation for more information Grafana GitHub OAuth
Loki Stack
Filesystem as a storage
# deploy-as-code/helm/environments/egov-demo.yaml loki: persistence: enabled: true accessModes: - ReadWriteOnce size: 10Gi serviceAccount: annotations: {} additionalConfigs: schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem ## local filesystem as storage schema: v11 index: prefix: index_ period: 24h storage_config: boltdb_shipper: active_index_directory: /data/loki/index cache_location: /data/loki/index_cache shared_store: filesystem ## local filesystem as storage cache_ttl: 24h filesystem: directory: /data/loki/chunks compactor: working_directory: /data/loki/boltdb-shipper-compactor shared_store: filesystem ## local filesystem as storage retention_enabled: true compaction_interval: 168h ## compaction in hours table_manager: retention_deletes_enabled: true retention_period: 168h ## retention in hours
s3 as storage
Caution: Use the sub
claim instead of aud
when setting up Web Identity (OIDC) IAM roles to ensure correct identity matching.
Create AWS Web Identity (OIDC) IAM role with following policy.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AccessToLokiBucket", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<s3-bucket>", "arn:aws:s3:::<s3-bucket>/*" ] } ] }
Update s3 details & role ARN in below config.
# deploy-as-code/helm/environments/egov-demo.yaml loki: persistence: enabled: true accessModes: - ReadWriteOnce size: 10Gi serviceAccount: annotations: eks.amazonaws.com/role-arn: <s3-role-arn> ## AWS arn for s3 role additionalConfigs: schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: s3 ## local filesystem as storage schema: v11 index: prefix: index_ period: 24h storage_config: boltdb_shipper: active_index_directory: /data/loki/index cache_location: /data/loki/index_cache shared_store: s3 ## local filesystem as storage cache_ttl: 24h aws: s3: s3://<region>/<s3-bucket> ## s3 region & bucket compactor: working_directory: /data/loki/boltdb-shipper-compactor shared_store: s3 ## local filesystem as storage retention_enabled: true compaction_interval: 168h ## compaction in hours table_manager: retention_deletes_enabled: true retention_period: 168h ## retention in hours
Note: Refer to official docs for detailed configuration
Prometheus
# deploy-as-code/helm/environments/egov-demo.yaml prometheus: externalLabels: cluster: <cluster-name> additionalScrapeConfigs: - job_name: 'nginx-ingress-metrics' static_configs: - targets: [ 'nginx-ingress-controller-metrics.egov:10254' ] - job_name: 'blackbox' metrics_path: /probe params: module: [ http_2xx ] static_configs: - targets: - <list of urls to be monitored> relabel_configs: - source_labels: [ __address__ ] target_label: __param_target - source_labels: [ __param_target ] target_label: instance - target_label: __address__ replacement: prometheus-blackbox-exporter:9115 - job_name: 'blackbox_exporter' static_configs: - targets: [ 'prometheus-blackbox-exporter:9115' ]
Alerting
# deploy-as-code/helm/environments/egov-demo.yaml prometheus: alertmanager: enabled: true
Note: Enable Alertmanager present under Prometheus Operator
Slack Alerts
# deploy-as-code/helm/environments/egov-demo-secrets.yaml cluster-configs: secrets: alertmanager: config: global: slack_api_url: https://hooks.slack.com resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 10m routes: - receiver: slack-notification match: severity: "warning|critical" continue: true receivers: - name: slack-notification slack_configs: - channel: '<slack-channel>' send_resolved: true username: 'Alertmanager' title: | [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} text: | {{ range .Alerts -}} {{- "\n" -}} *Alert:* {{ .Annotations.summary }} {{ if .Labels.severity }}*Severity:* `{{ .Labels.severity }}`{{ end }} *Cluster:* {{ .Labels.cluster }} *Details:* {{ .Annotations.description }} {{ end }}
Note: Generate Slack Incoming Webhook & update slack_api_url
under global config & slack-channel
under receivers config.
Email Alerts
# deploy-as-code/helm/environments/egov-demo-secrets.yaml cluster-configs: secrets: alertmanager: config: global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 10m routes: - receiver: email-notification match: severity: "warning|critical" continue: true receivers: - name: email-notification email_configs: - to: '<recepient-email-address>' from: '<sender-email-address>' smarthost: 'smtp.gmail.com:587' auth_username: '<sender-email-address>' auth_password: '<auth-token>' send_resolved: true headers: subject: | [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.cluster }} - {{ .CommonLabels.alertname }} html: | <html> <head> <title>Alert!</title> </head> <body> {{ range .Alerts.Firing }} <ul> <li> <b>Alert Name:</b> {{ .CommonLabels.alertname }} </li> <li> <b>Severity:</b> {{ if eq .Labels.severity "critical" }}<b style="color:red;">CRITICAL</b>{{ else if eq .Labels.severity "warning" }}<b style="color:orange;">WARNING</b>{{ else }}<b>{{ .Labels.severity | toUpper }}</b>{{ end }} </li> <li> <b>Summary:-</b> {{ .Annotations.summary }} </li> <li> <b>Cluster:-</b> Cluster </li> <li> <b>Details:</b> <p style="margin-left: 20px;"> {{ .Annotations.description | replace "\n" "<br>" }} </p> </li> </ul><br> {{ end }} </body></html>
Note: Follow this article in order to setup SMTP server for Gmails
Add Comment