Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Current »

Latest Monitoring Chart is available here.

# Clone git repository
git clone https://github.com/egovernments/DIGIT-DevOps.git
cd DIGIT-DevOps

# Checkout to "digit-lts-monitoring" branch
git checkout digit-lts-monitoring

Helmfile

Update the environments as required with their relevant file-paths of environment & secrets file and the namespace to be used.

In below config “demo“ is the environment with default namespace being set & environment files being provided.

# deploy-as-code/helm/charts/monitoring/monitoring-helmfile.yaml
environments:
  demo:
    values:
      - namespace: monitoring
      - ../../environments/egov-demo.yaml
      - ../../environments/egov-demo-secrets.yaml

Environment Files

Grafana

  1. GitHub OAuth App Creation

    • Follow the GitHub OAuth app

    • Homepage URL
      https://<your_domain_name>

    • Authorization callback URL
      https://<your_domain_name>/monitoring/login/github

    • Generate Client ID & Client secret

  2. Update Client ID & Client secret in secrets config.

    # deploy-as-code/helm/environments/egov-demo-secrets.yaml
    cluster-configs:
      secrets:
        grafana:
          clientID: <OAuth-key>
          clientSecret: <OAuth-token>
  3. Update environment config to allow GitHub organization & teams specific role-based access

    # deploy-as-code/helm/environments/egov-demo.yaml
    grafana:
      github:
        allowed_organizations: ["<organization>"]
        role_attribute_path: contains(groups[*], '@<organization>/<team>') && 'Viewer'

Note: Valid roles are None, Viewer, Editor, Admin or GrafanaAdmin
Visit official documentation for more information Grafana GitHub OAuth

Loki Stack

Filesystem as a storage

# deploy-as-code/helm/environments/egov-demo.yaml
loki:
  persistence:
    enabled: true
    accessModes:
      - ReadWriteOnce
    size: 10Gi
  serviceAccount:
    annotations: {}
  additionalConfigs:
    schema_config:
      configs:
        - from: 2020-10-24
          store: boltdb-shipper
          object_store: filesystem                 ## local filesystem as storage
          schema: v11
          index:
            prefix: index_
            period: 24h
    storage_config:
      boltdb_shipper:
        active_index_directory: /data/loki/index
        cache_location: /data/loki/index_cache
        shared_store: filesystem                   ## local filesystem as storage
        cache_ttl: 24h
      filesystem:
        directory: /data/loki/chunks
    compactor:
      working_directory: /data/loki/boltdb-shipper-compactor
      shared_store: filesystem                     ## local filesystem as storage
      retention_enabled: true
      compaction_interval: 168h                    ## compaction in hours
    table_manager:
      retention_deletes_enabled: true
      retention_period: 168h                       ## retention in hours

s3 as storage

Caution: Use the sub claim instead of aud when setting up Web Identity (OIDC) IAM roles to ensure correct identity matching.

  1. Create AWS Web Identity (OIDC) IAM role with following policy.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AccessToLokiBucket",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::<s3-bucket>",
                    "arn:aws:s3:::<s3-bucket>/*"
                ]
            }
        ]
    }
    

  2. Update s3 details & role ARN in below config.

    # deploy-as-code/helm/environments/egov-demo.yaml
    loki:
      persistence:
        enabled: true
        accessModes:
          - ReadWriteOnce
        size: 10Gi
      serviceAccount:
        annotations:
          eks.amazonaws.com/role-arn: <s3-role-arn>    ## AWS arn for s3 role 
      additionalConfigs:
        schema_config:
          configs:
            - from: 2020-10-24
              store: boltdb-shipper
              object_store: s3                         ## AWS s3 as storage
              schema: v11
              index:
                prefix: index_
                period: 24h
        storage_config:
          boltdb_shipper:
            active_index_directory: /data/loki/index
            cache_location: /data/loki/index_cache
            shared_store: s3                           ## AWS s3 as storage
            cache_ttl: 24h
          aws:
            s3: s3://<region>/<s3-bucket>              ## s3 region & bucket
        compactor:
          working_directory: /data/loki/boltdb-shipper-compactor
          shared_store: s3                             ## AWS s3 as storage
          retention_enabled: true
          compaction_interval: 168h                    ## compaction in hours
        table_manager:
          retention_deletes_enabled: true
          retention_period: 168h                       ## retention in hours
    

Note: Refer to official docs for detailed configuration

Prometheus

# deploy-as-code/helm/environments/egov-demo.yaml
prometheus:
  externalLabels:
    cluster: <cluster-name>                          ## provide cluster name    
  additionalScrapeConfigs:
    - job_name: 'nginx-ingress-metrics'
      static_configs:
        - targets: [ 'nginx-ingress-controller-metrics.egov:10254' ]
    - job_name: 'blackbox'
      metrics_path: /probe
      params:
        module: [ http_2xx ]
      static_configs:
        - targets:
            - <list of urls to be monitored>         ## add all URLs to monitor
      relabel_configs:
        - source_labels: [ __address__ ]
          target_label: __param_target
        - source_labels: [ __param_target ]
          target_label: instance
        - target_label: __address__
          replacement: prometheus-blackbox-exporter:9115
    - job_name: 'blackbox_exporter'
      static_configs:
        - targets: [ 'prometheus-blackbox-exporter:9115' ]

Alerting

# deploy-as-code/helm/environments/egov-demo.yaml
prometheus:
  alertmanager:
    enabled: true

Note: Enable Alertmanager present under Prometheus Operator

Slack Alerts

# deploy-as-code/helm/environments/egov-demo-secrets.yaml
cluster-configs:
  secrets:
    alertmanager:
      config:
        global:
          slack_api_url: https://hooks.slack.com     ## slack webhook URL
          resolve_timeout: 5m
        route:
          group_by: ['alertname']
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 10m
          routes:
          - receiver: slack-notification
            match:
                severity: "warning|critical"
            continue: true
        receivers:
        - name: slack-notification
          slack_configs:
            - channel: '<slack-channel>'             ## slack channel
              send_resolved: true
              username: 'Alertmanager'
              title: |
                  [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}
              text: |
                  {{ range .Alerts -}}
                  {{- "\n" -}}
                  *Alert:* {{ .Annotations.summary }}
                  {{ if .Labels.severity }}*Severity:* `{{ .Labels.severity }}`{{ end }}
                  *Cluster:* {{ .Labels.cluster }}
                  *Details:*
                  {{ .Annotations.description }}
                  {{ end }}

Note: Generate Slack Incoming Webhook & update slack_api_url under global config & slack-channel under receivers config.

Email Alerts

# deploy-as-code/helm/environments/egov-demo-secrets.yaml
cluster-configs:
  secrets:
    alertmanager:
      config:
        global:
          resolve_timeout: 5m
        route:
          group_by: ['alertname']
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 10m
          routes:
          - receiver: email-notification
            match:
              severity: "warning|critical"
            continue: true
        receivers:
        - name: email-notification
          email_configs:
            - to: '<recepient-email-address>'             ##  reciever's email id
              from: '<sender-email-address>'              ##  sender's email id
              smarthost: 'smtp.gmail.com:587'             ##  "" Update SMPT
              auth_username: '<sender-email-address>'     ##  configuration
              auth_password: '<auth-token>'               ##  as per the provider ""
              send_resolved: true
              headers:
                subject: |
                  [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.cluster }} - {{ .CommonLabels.alertname }}
              html: |
                <html>
                <head>
                <title>Alert!</title>
                </head>
                <body>
                {{ range .Alerts.Firing }}
                <ul>
                <li> <b>Alert Name:</b> {{ .CommonLabels.alertname }} </li>
                <li> <b>Severity:</b> {{ if eq .Labels.severity "critical" }}<b style="color:red;">CRITICAL</b>{{ else if eq .Labels.severity "warning" }}<b style="color:orange;">WARNING</b>{{ else }}<b>{{ .Labels.severity | toUpper }}</b>{{ end }} </li>
                <li> <b>Summary:-</b> {{ .Annotations.summary }} </li>
                <li> <b>Cluster:-</b> Cluster </li>
                <li> <b>Details:</b>
                  <p style="margin-left: 20px;"> {{ .Annotations.description | replace "\n" "<br>" }} </p>
                </li>
                </ul><br>
                {{ end }}
                </body></html>

Note: Follow this article in order to setup SMTP server for Gmails


The selected root page could not be found.

  • No labels