/
Configuration

Configuration

 

Latest Monitoring Chart is available here.

# Clone git repository git clone https://github.com/egovernments/DIGIT-DevOps.git cd DIGIT-DevOps # Checkout to "digit-lts-monitoring" branch git checkout digit-lts-monitoring

Helmfile

Update the environments as required with their relevant file-paths of environment & secrets file and the namespace to be used.

In below config "demo" is the environment with default namespace being set & environment files being provided.

# deploy-as-code/helm/charts/monitoring/monitoring-helmfile.yaml environments: demo: values: - namespace: monitoring - ../../environments/egov-demo.yaml - ../../environments/egov-demo-secrets.yaml

 

Environment Files

Grafana

  1. GitHub OAuth App Creation

    • Follow the GitHub OAuth app

    • Homepage URL
      https://<your_domain_name>

    • Authorization callback URL
      https://<your_domain_name>/monitoring/login/github

    • Generate Client ID & Client secret

  2. Update Client ID & Client secret in secrets config.

    # deploy-as-code/helm/environments/egov-demo-secrets.yaml cluster-configs: secrets: grafana: clientID: <OAuth-key> clientSecret: <OAuth-token>
  3. Update environment config to allow GitHub organization & teams specific role-based access

    # deploy-as-code/helm/environments/egov-demo.yaml grafana: github: allowed_organizations: ["<organization>"] role_attribute_path: contains(groups[*], '@<organization>/<team>') && 'Viewer'

Note: Valid roles are None, Viewer, Editor, Admin or GrafanaAdmin
Visit official documentation for more information Grafana GitHub OAuth

 

Loki Stack

Filesystem as a storage

# deploy-as-code/helm/environments/egov-demo.yaml loki: persistence: enabled: true accessModes: - ReadWriteOnce size: 10Gi serviceAccount: annotations: {} additionalConfigs: schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem ## local filesystem as storage schema: v11 index: prefix: index_ period: 24h storage_config: boltdb_shipper: active_index_directory: /data/loki/index cache_location: /data/loki/index_cache shared_store: filesystem ## local filesystem as storage cache_ttl: 24h filesystem: directory: /data/loki/chunks compactor: working_directory: /data/loki/boltdb-shipper-compactor shared_store: filesystem ## local filesystem as storage retention_enabled: true compaction_interval: 168h ## compaction in hours table_manager: retention_deletes_enabled: true retention_period: 168h ## retention in hours

 

s3 as storage

Caution: Use the sub claim instead of aud when setting up Web Identity (OIDC) IAM roles to ensure correct identity matching.

  1. Create AWS Web Identity (OIDC) IAM role with following policy.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "AccessToLokiBucket", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<s3-bucket>", "arn:aws:s3:::<s3-bucket>/*" ] } ] }

     

  2. Update s3 details & role ARN in below config.

    # deploy-as-code/helm/environments/egov-demo.yaml loki: persistence: enabled: true accessModes: - ReadWriteOnce size: 10Gi serviceAccount: annotations: eks.amazonaws.com/role-arn: <s3-role-arn> ## AWS arn for s3 role additionalConfigs: schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: s3 ## AWS s3 as storage schema: v11 index: prefix: index_ period: 24h storage_config: boltdb_shipper: active_index_directory: /data/loki/index cache_location: /data/loki/index_cache shared_store: s3 ## AWS s3 as storage cache_ttl: 24h aws: s3: s3://<region>/<s3-bucket> ## s3 region & bucket compactor: working_directory: /data/loki/boltdb-shipper-compactor shared_store: s3 ## AWS s3 as storage retention_enabled: true compaction_interval: 168h ## compaction in hours table_manager: retention_deletes_enabled: true retention_period: 168h ## retention in hours

Note: Refer to official docs for detailed configuration

 

Prometheus

# deploy-as-code/helm/environments/egov-demo.yaml prometheus: externalLabels: cluster: <cluster-name> ## provide cluster name additionalScrapeConfigs: - job_name: 'nginx-ingress-metrics' static_configs: - targets: [ 'nginx-ingress-controller-metrics.egov:10254' ] - job_name: 'blackbox' metrics_path: /probe params: module: [ http_2xx ] static_configs: - targets: - <list of urls to be monitored> ## add all URLs to monitor relabel_configs: - source_labels: [ __address__ ] target_label: __param_target - source_labels: [ __param_target ] target_label: instance - target_label: __address__ replacement: prometheus-blackbox-exporter:9115 - job_name: 'blackbox_exporter' static_configs: - targets: [ 'prometheus-blackbox-exporter:9115' ]

Alerting

# deploy-as-code/helm/environments/egov-demo.yaml prometheus: alertmanager: enabled: true

Note: Enable Alertmanager present under Prometheus Operator

Slack Alerts

# deploy-as-code/helm/environments/egov-demo-secrets.yaml cluster-configs: secrets: alertmanager: config: global: slack_api_url: https://hooks.slack.com ## slack webhook URL resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 10m routes: - receiver: slack-notification match: severity: "warning|critical" continue: true receivers: - name: slack-notification slack_configs: - channel: '<slack-channel>' ## slack channel send_resolved: true username: 'Alertmanager' title: | [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} text: | {{ range .Alerts -}} {{- "\n" -}} *Alert:* {{ .Annotations.summary }} {{ if .Labels.severity }}*Severity:* `{{ .Labels.severity }}`{{ end }} *Cluster:* {{ .Labels.cluster }} *Details:* {{ .Annotations.description }} {{ end }}

Note: Generate Slack Incoming Webhook & update slack_api_url under global config & slack-channel under receivers config.

Email Alerts

# deploy-as-code/helm/environments/egov-demo-secrets.yaml cluster-configs: secrets: alertmanager: config: global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 10m routes: - receiver: email-notification match: severity: "warning|critical" continue: true receivers: - name: email-notification email_configs: - to: '<recepient-email-address>' ## reciever's email id from: '<sender-email-address>' ## sender's email id smarthost: 'smtp.gmail.com:587' ## "" Update SMPT auth_username: '<sender-email-address>' ## configuration auth_password: '<auth-token>' ## as per the provider "" send_resolved: true headers: subject: | [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.cluster }} - {{ .CommonLabels.alertname }} html: | <html> <head> <title>Alert!</title> </head> <body> {{ range .Alerts.Firing }} <ul> <li> <b>Alert Name:</b> {{ .CommonLabels.alertname }} </li> <li> <b>Severity:</b> {{ if eq .Labels.severity "critical" }}<b style="color:red;">CRITICAL</b>{{ else if eq .Labels.severity "warning" }}<b style="color:orange;">WARNING</b>{{ else }}<b>{{ .Labels.severity | toUpper }}</b>{{ end }} </li> <li> <b>Summary:-</b> {{ .Annotations.summary }} </li> <li> <b>Cluster:-</b> Cluster </li> <li> <b>Details:</b> <p style="margin-left: 20px;"> {{ .Annotations.description | replace "\n" "<br>" }} </p> </li> </ul><br> {{ end }} </body></html>

Note: Follow this article in order to setup SMTP server for Gmails

 


 

Related content