Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Steps for setting up the environment and running the script file to get a fresh copy of the required Datamart CSV file.

Steps for setting up the environment for running the script 

(One Time Setup)

  1. Install Kubectl
    Step 1: Go through the Kubernetes documentation page to install and configure the kubectl. Following are useful links:
    Kubernetes Installation Doc Kubernetes Ubuntu Installation

After installing type the below command to check the version install in your system

 kubectl version

Step 2: Install aws-iam-authenticator
https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html

Step 3: After installing, you need access to a particular environment cluster.

  • Go to $HOME/.Kube folder 

cd
cd .kube
  • Open the config file and replace the content with the environment cluster config file. (Config file will be attached)

gedit config
  • Copy-paste the content from the config file provided to this config file opened and save the file.

2. Exec into the pod

kubectl exec --stdin --tty playground-584d866dcc-cr5zf -n playground -- /bin/bash

(Replace the pod name depending on what data you want.

 Refer to Table 1.2 for more information)

3. Install Python and check to see if it installed correctly

apt install python3.8
python --version

4. Install pip and check to see if it installed correctly

apt install python3-pip
pip3 --version

5. Install psycopg2 and Pandas

pip3 install psycopg2-binary pandas

Note: If this doesn’t work then try this command

pip3 install --upgrade pip 

and running the #5 command again

Steps for setting up the environment for running the script 

(Every time you want a datamart with the latest data available in the pods)

  1. Sending the python script to the pod

tar cf - /home/priyanka/Desktop/mcollect.py | kubectl exec -i -n playground playground-584d866dcc-cr5zf -- tar xf - -C /tmp

Note: Replace the file path (/home/priyanka/Desktop/mcollect.py) with your own file path (/home/user_name/Desktop/script_name.py)

Note: Replace the pod name depending on what data you want.  

( Refer to Table 1.2 for more information on pod names)

2. Exec into the pod

kubectl exec --stdin --tty playground-584d866dcc-cr5zf -n playground -- /bin/bash

(Note: Replace the pod name depending on what data you want.

kubectl exec --stdin --tty <your_pod_name> -n playground -- /bin/bash

 Refer to Table 1.2 for more information)


3. Move into tmp directory and then move into the directory your script was in

cd tmp
cd home/priyanka/Desktop 

for example :

cd home/<your_username>/Desktop 

4. List the files there

ls

(Python script file should be present here)

(Refer Table 1.1 for the list of script file names for each module)

5. Run the python script file

python3 ws.py

(name of the python script file will change depending on the module)

(Refer Table 1.1 for the list of script file names for each module)

6. Outside the pod shell, In your home directory run this command to copy the CSV file/files to your desired location

kubectl cp playground/playground-584d866dcc-cr5zf:/tmp/mcollectDatamart.csv /home/priyanka/Desktop/mcollectDatamart.csv

(The list of CSV file names for each module will be mentioned below)


7. The reported CSV file is ready to use.

Jupyter vs Excel for Data Analysis

Jupyter

Excel

Using jupyter will be command-based

Will take some time getting used to it.

Ease of Use with the Graphical User Interface (GUI). Learning formulas is fairly easier.

Jupyter requires python language for data analysis hence a steeper learning curve.

Negligible previous knowledge is required.

Equipped to handle lots of data really quickly. With the bonus of ease of accessibility to databases like Postgres and Mysql where actual data is stored.

Excel can only handle so much data. Scalability becomes difficult and messy.

More Data = Slower Results

Summary:

Python is harder to learn because you have to download many packages and set the correct development environment on your computer. However, it provides a big leg up when working with big data and creating repeatable, automatable analyses, and in-depth visualizations.

Summary:

Excel is best when doing small and one-time analyses or creating basic visualizations quickly. It is easy to become an intermediate user relatively without too much experience dueo its GUI.

How to install and configure jupyter to analyze the datamart

Watch this video

https://www.youtube.com/watch?v=Yg9AkozItTU

OR

Follow these steps ->

(One Time Setup)

  1. Install Python and check to see if it installed correctly

apt install python3.8
python --version


2. Install pip and check to see if it installed correctly

apt install python3-pip
pip3 --version

3. Install jupyter

pip3 install notebook

(Whenever you want to run Jupyter lab)

  1. To run jupyter lab

jupyter notebook

2. To open a new notebook

New -> Python3 notebook

3. To open an existing notebook

Select File -> Open

Go to the directory where your sample notebook is.

Select that notebook (Ex: sample.pynb)

Opening an existing notebook
Opening an existing notebook

After being opened

After being opened

Table 1.1 - File Names for each Module

Module Name

Script File Name (With Links)

Datamart CSV File Name

Datamart CSV File Name

PT

pt.py

ptDatamart.csv


W&S

ws.py

waterDatamart.csv

sewerageDatamart.csv

PGR

pgr.py

pgrDatamart.csv


mCollect

mcollect.py

mcollectDatamart.csv


TL

tl.py

tlDatamart.csv

tlrenewDatamart.csv


Fire Noc

fn.py

fnDatamart.csv


OBPS (Bpa)

bpa.py

bpaDatamart.csv


Table 1.2 - Pod Names for each Module

Module Name

Pod Name

Description

PT

playground-865db67c64-tfdrk

Punjab Prod Data in UAT Environment

W&S

playground-584d866dcc-cr5zf

QA Data

PGR

Local Data 

Data Dump 

mCollect

playground-584d866dcc-cr5zf

QA Data

TL

playground-584d866dcc-cr5zf

QA Data

Fire Noc

playground-584d866dcc-cr5zf

QA Data

OBPS (Bpa)

playground-584d866dcc-cr5zf

QA Data

  • No labels