...
- Long running queries were analyzed using Postgres tools, and fixed by adding necessary indices on UAT then on PROD.
- Modules Further, modules were analyzed and indices were added on all commonly searched columns.
- Increased monitoring Monitoring was increased in an attempt to stay on top of such situations.
Impact of the exercise:
What went well:
- DB query execution time got API response times improved from ~20-40s to ~2s, this resulted in great performance improvement at every service also as a whole system, graphs attached below.
- Random errors which occurred due to slow asynchronous persistance of data was also resolved.
- AWS RDS utilisation was no longer hitting 100% and remained well within limits, graphs depicting max and average CPU usage attached below.
What went wrong:
- None
...