Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Page Properties



Background of the problem:

  1. During the March 2019 peak traffic multiple issues were raised by PMIDC related to long response times of services and few errors (due to a combination of the asynchronous nature of our APIs and the UI not handling these well).
  2. DB CPU utilisation was being maxed out due to long running queries, ; alerts for the same were received via AWS RDS SNS.

...

  1. Long running queries were analyzed using Postgres tools, and fixed by adding necessary indices on UAT then on PROD.
  2. Modules Further, modules were analyzed and indices were added on all commonly searched columns.
  3. Increased monitoring Monitoring was increased in an attempt to stay on top of such situations.

Impact of the exercise:

      What went well:

  • DB query execution time got API response times improved from ~20-40s to ~2s, this resulted in great performance improvement at every service also as a whole system, graphs attached below.
  • Random errors which occurred due to slow asynchronous persistance of data was also resolved.
  • AWS RDS utilisation was no longer hitting 100% and remained well within limits, graphs depicting max and average CPU usage attached below.

      What went wrong:

  • None 

...

...