Database Dump and Restore Approaches
Problem Statement : To take a database dump from one host and restore it to another host. Both the hosts may or may not be in the same cluster environment. An example is to take the database dump from the PROD environment and restore it to the QA environment.
Database Dump Approaches
Approach 1 : Connect to Pod, Take Dump and Copy to Local System
Prerequisites
Cluster environment should be configured on the Local System.
Database Credentials
Steps
Connect to Cluster and then Pod.
kubectl config use-context <context name>
kubectl exec -it <pod name> -- sh
Take dump.
pg_dump -Fp --no-acl --no-owner --no-privileges -h <host> -U <user> -W <database> > backup.sql
Copy dump to the local environment.
kubectl cp pod_name:/backup.sql.gz local-destination_path -n playground
Challenges
The Pod may get evicted while taking the dump.
Reasons
Since the dump is being taken after connecting to the pod, the pod should have enough space to store the dump.
If the pod is low on resources (as would be the case for a dump which is several GBs for example), it will simply get evicted.
Possible Solution
Taking compressed dump is one option, as this will take less space.
pg_dump -Fp --no-acl --no-owner --no-privileges -h <host> -U <user> -W <database> | gzip > backup.sql.gz
However, if the pod gets evicted in this case also, because of low resources, one should try other approaches as mentioned below.
Approach 2 : Connect to AWS RDS from Local System and Take Dump
Prerequisites
AWS CLI access
Connection to the DB Instance should be authorized
Database Credentials
Steps
Connect to the AWS RDS using AWS CLI to take the dump on Local System
pg_dump -Fp --no-acl --no-owner --no-privileges -h <host> -U <user> -W <database> > backup.sql
Challenges
The following error may be faced : Could not connect to server: Connection timed out
Reasons
The Database Instance may not be publicly accessible.
Connection from the device may not be authorized by the DB Instance’s Security Group.
Solution
Contact the Admin with Console Access to make this Database Instance publicily accessible /add your IP to the security group.
Approach 3 : Create/ Locate an existing DB Snapshot through AWS CLI
Prerequisites
AWS CLI access
Steps
A DB snapshot can be created using
aws rds create-db-snapshot --db-instance-identifier mydbinstance --db-snapshot-identifier
mydbsnapshot
The DB snapshot may be existing from before and can be located through
aws rds describe-db-snapshots
If the DB instance is known, the search can be narrowed down using :
aws rds describe-db-snapshots --db-instance-identifier db-instance-name
Approach 4 : Connect to Cluster, and Take Dump directly into Local System
Prerequisites
Cluster environment should be configured on the Local System.
Database Credentials
Steps
Connect to Cluster.
kubectl config use-context <context name>
Take dump from Pod directly to Local using the following command :
kubectl exec -it <podName> -n playground -- sh -c "export PGPASSWORD="<password>"; pg_dump -
h <hostName> -U <userName> -d <dbName>" > <dumpFileName>.sql
Dump Restore Approaches
If a .sql file is created on local system using one of the approaches (Approach 1,2 or 4), the dump restore can be done as follows
Approach 1 : Restore using pg_restore
Prerequisites
Cluster environment should be configured on the Local System.
Database Credentials
Postgres client version should be equal to or higher than the Postgres server version, for restore to work.
Steps
Connect to Cluster, where the dump needs to be restored.
kubectl config use-context <context name>
Restore dump to database host directly from the local dump as follows
kubectl exec -it <podName> -n playground -- sh -c "export PGPASSWORD="<password>"; pg_restore - h <hostName> -U <userName> -d <dbName> " < <dumpFileName>.sql
Challenges
A connection error may be thrown if the destination environment is not accessible.
Possible Solution
The Admin with Console Access can be requested to make the environment accessible for sometime, till the dump restore is over. Once the dump restore is done, the environment can be moved to its previous VPC settings.
However, if a DB snapshot was created as part of Approach 3, the snapshot can be restored as follows
Approach 2 : Restore DB Snapshot through AWS CLI
Prerequisites
AWS CLI access
Steps
The snapshot can be restored as follows :
aws rds restore-db-instance-from-db-snapshot --db-instance-identifier mynewdbinstance --db-
snapshot-identifier mydbsnapshot
Challenges
If a snapshot from one environment (says Prod) is to be restored to another environment (say QA), it must
be made available in the latter environment.
Possible Solution
Get the account ID of the destination environment using the following command :
aws sts get-caller-identity
The snapshot can then be added to the destination environment using
aws rds modify-db-snapshot-attribute --db-snapshot-identifier manual-snapshot --attribute-name
restore --values-to-add '["Account ID"]'
For this to work, the snapshot should be a manual snapshot. If it is an automated snapshot, it can copied
as a manual snapshot as follows :
aws rds copy-db-snapshot --source-db-snapshot-identifier automated-snapshot-name --target-db-
snapshot-identifier mydbsnapshotcopy --copy-tags
If the command for modifying db snapshot attribute throws an Invalid Customer ID error, contact Admin
with Console Access to copy the snapshot from one environment to another.