eGov ERP DevOps

Punjab W&S PostgreSQL DB performance monitoring

Configuration* [as on date 9 Jul 2020]:

PMIDC AWS RDS: db.m4.xlarge, 186 GiB storage.

Master DB: punjab-prod-rds

Replication DB: punjab-prod-read-rds


Timestamps:

[Dec 26, 2019 at 5:13 PM]: from client

Good work and Thanks Ramki and Manju.


[Dec 26, 2019 at 3:03 PM]: from egov

All these are seems old issues, Dec 4th, Dec 9th. After this we have done performance improvement activities as we have discussed earlier and 24th also we have done some
activities to improve it, those are.
I hope the performance would be good now.

We have observed Production RDS health was good today, and now CPU utilization is in good range. I think DB performance tuning is working well and now we can expect a good application.

As per our call, Moga reported performance for collection. Today so far there are 342 receipts for both W&S successfully created.
The average time taken for each collection is 1 to 2 sec. The max time taken is 3 sec. These are statistics from server logs.


[Dec 26, 2019 at 2:53 PM]: from client

In most of the Punjab ULBs, Water & Sewerage Application is running slow in mostly following areas

  • in many cases at the time of Collection, either it is cash or cheque collection, after clicking on Pay Button, application takes time to generate receipt, sometime receipt is not
    generated and that results in receipt number series missed in receipt register.

  • in many ULB, water & sewerage Demand or Bill generation, After giving command of Bill generation of any Block/Batch, it takes more than 5 hours to generate even for 100-200
    bills.
    Some of the attachments are here regarding errors.


[Dec 26, 2019 at 10:09 AM]: from egov

We have done some DB performance activities on 24/12/2019.
Please talk to ULB people and share the screens/feature which are having low performance.


[Dec 24, 2019 at 4:50 PM]: from egov

As we discussed over the phone, the application is not so bad. When i tried to login and do some bulk search and reports i am getting the results and its not hanging.
Still we will be working on analyzing the issue, it would be helpful if we can get what are all screens performing low or hanging.


[Dec 24, 2019 at 2:55 PM]: from client

The problem is required to be looked into at query level also as informed by Manjunath many queries are taking hours to execute .
There are system queries and should be checked .


[Dec 24, 2019 at 1:14 PM]: from egov

Their is a high db CPU utilization on production server always, due to this application performance is getting low. Problem consuming high db cpu utilization found is too many long runn
Database log analyser i have been implemented and shared with the team. And also i have explained it to the PMIDC team about this last week.


[Dec 20, 2019, 5:52 PM]

In discussions with PMIDC and Manjunath it has been identified that some of the W&S sql's are taking 45 minutes to execute.

Please plan to either resolve these or discuss with engineering.

I think from product perspective we need to resolve these issues across the platform.

One serious impact of this is almost 100% cpu utilization of db across days. This is quite serious.


[Dec 19, 2019, 5:35 PM]

As we enountring recently Punjab W&S production database high CPU utilization problems and it is causing application performance issues.

To analyse the problem causing at the database level, enabling a log analyser called pgBadger which will give the db analysis report, and it can be configurable daily/weekly basis report generation until resolution.

Please find the details below for accessing reports:
pgBadger: http://13.127.230.211/pgbadger/


[Dec 16, 2019, 5:34 PM]

As recently we are observing performance issues in the Punjab water and sewerage production application. And it was investigated and found this is due to the database high load. As to solve this issue we come up with the solutions and try outs.

Plan Of Action:

  1. Shifting elastic index collection data push jobs from master db to slave db.

  2. AWS RDS slave db instance upgradation - it will depend upon the observations made with point #1

  3. Routing all select queries to the slave db - it will depend upon the observations made with point #2

And these are all the tuning will be happening at the infrastructure level except point #3 which has to be done at the application level.

During these activities will have to take the production down for some estimated time. So as per the plan starting with #1 to check the performance will shift all elastic index collection data push jobs from master db to slave db, will perform this today and need production down with maximum time of 1 Hour between 9 PM to 10 PM.


[Dec 16, 2019, 12:35 PM]

ALARM: "[ALERT] Punjab Prod RDS monitoring" in Asia Pacific (Mumbai)


[Dec 16, 2019, 9:55 PM]

  1. Shifting elastic index collection data push jobs from master db to slave db. [DONE]

[Dec 16, 2019 at 1:01 PM]: from egov

We have observed collection data push job to elastic index was taking time and caused load on DB, now the job is finished. Now the system performance should be fine.


[10 Dec 2019]

The Master and slave configuration for database it's already done in the Punjab ERP production environment. Earlier scheduler jobs is to run from slave db, but it has been stopped due to the occurance of database frequent vacuum problem.

[Dec 10, 2019 at 11:14 AM]

The reason making application slow is the RDS is running with maximum utilization.
CPU utilization: 93.08%
Current activity: 120 Connections
And it is due to the long running queries in the background. Informed the team to check the database issue which is hampering the application.


[10 Dec 2019 10 AM]

Slow performance of W&S Application: Reported by client.

  1. We have been getting many concerns related to performance of  the W&S application.  You are requested to kindly look into the matter on high priority .

  2. Along with this UI issues are everywhere in each application, please assign someone for UI issues specifically...

  3. I believe Satyam had a call and shared the ad hoc queries causing the problem. Do let us know if after stopping the queries does not resolve the issue. As a practice please issue such queries during non working hours.

  4. I hope we may not require to run queries after new dashboard. W&S being billing and collection application and being asked data for specific period which is not easy to extract from dashboard

In response: from eGov

Along with thus, we are working in further performance tuning. We will share the solution and plan soon.

 

DevOps as a Culture