Steps for setting up the environment and running the script file to get a fresh copy of the required Datamart CSV file.
Steps for setting up the environment for running the script
(One Time Setup)
Install Kubectl
Step 1: Go through the Kubernetes documentation page to install and configure the kubectl. Following are useful links:
Kubernetes Installation Doc
Kubernetes Ubuntu Installation
After installing type the below command to check the version install in your system
Code Block |
---|
kubectl version |
...
Go to $HOME/.Kube folder
Code Block |
---|
cd cd .kube |
Open the config file and replace the content with the environment cluster config file. (Config file will be attached)
Code Block |
---|
gedit config |
Copy-paste the content from the config file provided to this config file opened and save the file.
...
Code Block |
---|
kubectl exec --stdin --tty playground-584d866dcc-cr5zf -n playground -- /bin/bash |
(Replace the pod name depending on what data you want.
Refer Refer to Table 1.2 for more information)
3. Install Python and check to see if it installed correctly
Code Block |
---|
aptcd install python3..8 python --version |
4. Install pip and check to see if it installed correctly
Code Block |
---|
apt install python3-pip pip3 --version |
5. Install psycopg2, Pandas and Pandasrequests
Code Block |
---|
apt-get update pip3 install psycopg2-binary pandas requests |
Note: If this doesn’t work then try this command
...
and running the #5 command again
Steps for setting up the environment for running the script
(Every time you want a datamart with the latest data available in the pods)
1. Sending the python script to the pod
Code Block |
---|
tar cf - /home/priyanka/Desktop/mcollect.py | kubectl exec -i -n playground playground-584d866dcc-cr5zf -- tar xf - -C /tmp |
...
7. The reported CSV file is ready to use.
Jupyter vs Excel for Data Analysis
Jupyter | Excel |
Using jupyter will be command-based. Will take some time getting used to it. | Ease of Use with the Graphical User Interface (GUI). Learning formulas is fairly easier. |
Jupyter requires python language for data analysis hence a steeper learning curve. | Negligible previous knowledge is required. |
Equipped to handle lots of data really quickly. With the bonus of ease of accessibility to databases like Postgres and Mysql where actual data is stored. | Excel can only handle so much data. Scalability becomes difficult and messy. More Data = Slower Results |
Summary: Python is harder to learn because you have to download many packages and set the correct development environment on your computer. However, it provides a big leg up when working with big data and creating repeatable, automatable analyses, and in-depth visualizations. | Summary: Excel is best when doing small and one-time analyses or creating basic visualizations quickly. It is easy to become an intermediate user relatively without too much experience dueo its GUI. |
How to install and configure jupyter to analyze the datamart
Watch this video
https://www.youtube.com/watch?v=Yg9AkozItTU
OR
Follow these steps ->
(One Time Setup)
Install Python and check to see if it installed correctly
...
Code Block |
---|
pip3 install notebook |
(Whenever you want to run Jupyter lab)
To run jupyter lab
Code Block |
---|
jupyter notebook |
...
Select that notebook (Ex: sample.pynb)
Opening an existing notebook
...
After being opened
...
Table 1.1 - File Names for each Module
Module Name | Script File Name (With Links) |
Datamart CSV File Name | ||
PT | ptDatamart.csv | |
W&S | waterDatamart.csv sewerageDatamart.csv | |
PGR | pgrDatamart.csv | |
mCollect | mcollectDatamart.csv | |
TL | tlDatamart.csv tlrenewDatamart.csv | |
Fire Noc |
FNDatamart.csv | ||
OBPS (Bpa) | bpaDatamart.csv | |
FSM | fsmDatamart.csv |
Table 1.2 - Pod Names for each Module
Module Name | Pod Name | Description |
PT | playground-865db67c64-tfdrk | Punjab Prod Data in UAT Environment |
W&S | playground-584d866dcc-cr5zf | QA Data |
PGR | Local Data | Data Dump |
mCollect | playground-584d866dcc-cr5zf | QA Data |
TL | playground-584d866dcc-cr5zf | QA Data |
Fire Noc | playground-584d866dcc-cr5zf | QA Data |
OBPS (Bpa) | playground-584d866dcc-cr5zf | QA Data |
FSM | playground-584d866dcc-cr5zf | QA Data |