Mar 13-17, 2023 Vikrant Patil
All notes are available online at https://notes.pipal.in/2023/arcesium_finop_jan/
Please login to https://engage.pipal.in/ and launch jupyter lab
For today create a notebook with name module3-day5
notebook names are case sensitive. Make sure you give correct name
© Pipal Academy LLP
we will make use of third party library pypdf2
import PyPDF2
!pip install pypdf2
Requirement already satisfied: pypdf2 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (3.0.1)
pdffileurl = "https://posoco.in/download/16-07-20_nldc_psp/?wpdmdl=30215"
import requests
def download(url, filename):
resp = requests.get(url)
with open(filename, "wb") as f:
f.write(resp.content)
download(pdffileurl, "data.pdf")
with open("data.pdf","rb") as f:
pdfreader = PyPDF2.PdfReader(f)
for page in pdfreader.pages:
print(page.extract_text()[:100])
print("="*20)
National Load Despatch Centre राष्ट्रीय भार प्रेषण केंद्र POWER SYSTEM OPERATION CORPORATION LI ==================== NR WR SR ER NER TOTAL 59882 41115 34238 21526 2730 159491 1114 0 0 0 6 1120 1398 998 807 447 48 3698 ==================== 16-Jul-2020 Sl NoVoltage Level Line Details Circuit Max Import (MW) Max Export (MW) Import (MU) Exp ====================
We are interested in second page
with open("data.pdf","rb") as f: # open the file in read,binary mode
pdfreader = PyPDF2.PdfReader(f)
page = pdfreader.pages[1]
print(page.extract_text(page.extract_text()))
NR WR SR ER NER TOTAL 59882 41115 34238 21526 2730 159491 1114 0 0 0 6 1120 1398 998 807 447 48 3698 355 33 77 149 29 643 11 49 128 - - 187 39.60 16.60 41.59 4.60 0.03 102 12.6 0.0 0.0 0.0 0.0 12.6 65470 43593 38117 21535 2827 160654 22:20 10:29 10:00 21:20 19:41 21:26 Region FVI < 49.7 49.7 - 49.8 49.8 - 49.9 < 49.9 49.9 - 50.05 > 50.05 All India 0.057 0.16 1.81 13.19 15.16 76.52 8.32 Max.Demand Shortage during Energy Met Drawal OD(+)/UD(-) Max OD Energy Region States Met during the day(MW)maximum Demand(MW)(MU)Schedule (MU)(MU) (MW)Shortage (MU) Punjab 11090 0 237.9 146.8 -1.8 49 0.0 Haryana 9388 0 209.4 152.8 0.7 325 1.9 Rajasthan 12087 0 262.4 119.7 5.4 809 0.0 Delhi 5726 0 118.6 102.8 -1.4 228 0.0 NR UP 22873 0 448.9 208.5 2.0 546 0.4 Uttarakhand 1899 0 42.8 20.7 0.8 111 0.0 HP 1366 0 28.6 -2.6 -0.2 91 0.0 J&K(UT) & Ladakh(UT) 2177 544 43.1 20.3 0.4 502 10.3 Chandigarh 295 0 6.0 5.9 0.2 61 0.0 Chhattisgarh 3685 0 86.9 36.8 0.8 468 0.0 Gujarat 13478 0 286.2 87.6 4.0 527 0.0 MP 9547 0 214.7 113.8 -3.8 198 0.0 WR Maharashtra 16964 0 365.1 138.1 -1.9 457 0.0 Goa 405 0 8.5 8.2 -0.2 33 0.0 DD 246 0 5.3 5.3 0.0 19 0.0 DNH 614 0 14.0 13.8 0.2 44 0.0 AMNSIL 777 0 17.1 4.2 0.7 272 0.0 Andhra Pradesh 6439 0 141.0 45.6 -1.3 607 0.0 Telangana 8614 0 167.3 81.6 -2.5 385 0.0 SR Karnataka 8486 0 155.1 51.1 -3.4 650 0.0 Kerala 3077 0 65.2 46.1 0.5 179 0.0 Tamil Nadu 12371 0 271.3 125.9 -3.7 573 0.0 Puducherry 349 0 7.5 7.5 -0.1 35 0.0 Bihar 5740 0 111.5 106.0 -0.3 386 0.0 DVC 2989 0 62.7 -42.6 -0.7 206 0.0 Jharkhand 1438 0 26.3 18.5 -1.0 124 0.0 ER Odisha 3983 0 82.2 -0.2 -0.2 325 0.0 West Bengal 7917 0 162.6 47.2 -0.8 303 0.0 Sikkim 100 0 1.4 1.5 -0.1 17 0.0 Arunachal Pradesh 120 3 2.0 1.8 0.2 40 0.0 Assam 1759 23 30.0 27.1 -0.1 135 0.0 Manipur 183 1 2.6 2.3 0.3 37 0.0 NER Meghalaya 307 2 5.3 -1.3 0.3 52 0.0 Mizoram 89 1 1.5 1.2 0.0 13 0.0 Nagaland 140 2 2.2 2.3 -0.2 23 0.0 Tripura 298 7 4.9 5.9 0.7 66 0.0 Bhutan Nepal Bangladesh 53.3 -1.5 -19.1 2337.0 -271.3 -1110.0 NR WR SR ER NER TOTAL 352.1 -295.4 95.0 -145.8 -6.0 0.0 359.2 -293.7 84.6 -152.6 -3.4 -6.0 7.1 1.6 -10.5 -6.9 2.6 -6.0 NR WR SR ER NER TOTAL 3838 14847 11792 3445 677 34598 9289 23225 14423 4892 47 51876 13127 38072 26215 8337 723 86473 NR WR SR ER NER All India 546 1080 370 482 7 2486 25 13 14 0 0 52 355 33 77 149 29 643 26 33 47 0 0 106 40 82 19 0 22 163 71 73 210 5 0 359 1063 1314 737 636 58 3809 6.71 5.54 28.51 0.73 0.05 9.43 42.55 10.54 45.35 24.19 49.63 29.09 1.068 1.102 Based on State Max Demands Diversity factor = Sum of regional or state maximum demands / All India maximum demand *Source: RLDCs for solar connected to ISTS; SLDCs for embedded solar. Limited visibility of embedded solar data. Executive Director-NLDCShare of RES in total generation (%) Share of Non-fossil fuel (Hydro,Nuclear and RES) in total generation(%) H. All India Demand Diversity Factor Based on Regional Max DemandsLignite Hydro Nuclear Gas, Naptha & Diesel RES (Wind, Solar, Biomass & Others) TotalState Sector Total G. Sourcewise generation (MU) CoalActual(MU) O/D/U/D(MU) F. Generation Outage(MW) Central SectorDay Peak (MW) E. Import/Export by Regions (in MU) - Import(+ve)/Export(-ve); OD(+)/UD(-) Schedule(MU)D. Transnational Exchanges (MU) - Import(+ve)/Export(-ve) Actual (MU)Energy Shortage (MU) Maximum Demand Met During the Day (MW) (From NLDC SCADA) Time Of Maximum Demand Met (From NLDC SCADA) B. Frequency Profile (%) C. Power Supply Position in StatesDemand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) Peak Shortage (MW) Energy Met (MU) Hydro Gen (MU) Wind Gen (MU) Solar Gen (MU)*Report for previous day Date of Reporting: 16-Jul-2020 A. Power Supply Position at All India and Regional level
def extract_data(text):
lines = text.split("\n")
data_text = lines[:10]
headers = data_text[0].split()
data = [line.strip().split() for line in data_text[1:]]
return headers, data
def extract_data_from_pdf(pdffile):
with open("data.pdf","rb") as f:
pdfreader = PyPDF2.PdfReader(f)
page = pdfreader.pages[1]
text = page.extract_text()
return extract_data(text)
h, d = extract_data_from_pdf("data.pdf")
h
['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL']
d
[['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '-', '-', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654'], ['22:20', '10:29', '10:00', '21:20', '19:41', '21:26']]
import pandas as pd
pd.DataFrame([{"a":1,"b":2}, {"a":2, "b":3}])
| a | b | |
|---|---|---|
| 0 | 1 | 2 |
| 1 | 2 | 3 |
dirlist = []
for row in d:
dirlist.append(dict(zip(h, row)))
dirlist
[{'NR': '59882',
'WR': '41115',
'SR': '34238',
'ER': '21526',
'NER': '2730',
'TOTAL': '159491'},
{'NR': '1114', 'WR': '0', 'SR': '0', 'ER': '0', 'NER': '6', 'TOTAL': '1120'},
{'NR': '1398',
'WR': '998',
'SR': '807',
'ER': '447',
'NER': '48',
'TOTAL': '3698'},
{'NR': '355',
'WR': '33',
'SR': '77',
'ER': '149',
'NER': '29',
'TOTAL': '643'},
{'NR': '11', 'WR': '49', 'SR': '128', 'ER': '-', 'NER': '-', 'TOTAL': '187'},
{'NR': '39.60',
'WR': '16.60',
'SR': '41.59',
'ER': '4.60',
'NER': '0.03',
'TOTAL': '102'},
{'NR': '12.6',
'WR': '0.0',
'SR': '0.0',
'ER': '0.0',
'NER': '0.0',
'TOTAL': '12.6'},
{'NR': '65470',
'WR': '43593',
'SR': '38117',
'ER': '21535',
'NER': '2827',
'TOTAL': '160654'},
{'NR': '22:20',
'WR': '10:29',
'SR': '10:00',
'ER': '21:20',
'NER': '19:41',
'TOTAL': '21:26'}]
pd.DataFrame(dirlist)
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882 | 41115 | 34238 | 21526 | 2730 | 159491 |
| 1 | 1114 | 0 | 0 | 0 | 6 | 1120 |
| 2 | 1398 | 998 | 807 | 447 | 48 | 3698 |
| 3 | 355 | 33 | 77 | 149 | 29 | 643 |
| 4 | 11 | 49 | 128 | - | - | 187 |
| 5 | 39.60 | 16.60 | 41.59 | 4.60 | 0.03 | 102 |
| 6 | 12.6 | 0.0 | 0.0 | 0.0 | 0.0 | 12.6 |
| 7 | 65470 | 43593 | 38117 | 21535 | 2827 | 160654 |
| 8 | 22:20 | 10:29 | 10:00 | 21:20 | 19:41 | 21:26 |
rowlabels = """Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs)
Peak Shortage (MW)
Energy Met (MU)
Hydro Gen (MU)
Wind Gen (MU)
Solar Gen (MU)*
Energy Shortage (MU)
Maximum Demand Met During the Day (MW) (From NLDC SCADA)
Time Of Maximum Demand Met (From NLDC SCADA)""".split("\n")
pd.DataFrame([dict(zip(h, row)) for row in d], index=rowlabels)
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) | 59882 | 41115 | 34238 | 21526 | 2730 | 159491 |
| Peak Shortage (MW) | 1114 | 0 | 0 | 0 | 6 | 1120 |
| Energy Met (MU) | 1398 | 998 | 807 | 447 | 48 | 3698 |
| Hydro Gen (MU) | 355 | 33 | 77 | 149 | 29 | 643 |
| Wind Gen (MU) | 11 | 49 | 128 | - | - | 187 |
| Solar Gen (MU)* | 39.60 | 16.60 | 41.59 | 4.60 | 0.03 | 102 |
| Energy Shortage (MU) | 12.6 | 0.0 | 0.0 | 0.0 | 0.0 | 12.6 |
| Maximum Demand Met During the Day (MW) (From NLDC SCADA) | 65470 | 43593 | 38117 | 21535 | 2827 | 160654 |
| Time Of Maximum Demand Met (From NLDC SCADA) | 22:20 | 10:29 | 10:00 | 21:20 | 19:41 | 21:26 |
def extract_data(text):
lines = text.split("\n")
data_text = lines[:10]
headers = data_text[0].split()
data = [line.strip().split() for line in data_text[1:]]
return headers, data
def extract_data_from_pdf(pdffile):
with open("data.pdf","rb") as f:
pdfreader = PyPDF2.PdfReader(f)
page = pdfreader.pages[1]
text = page.extract_text()
h, d = extract_data(text)
rowlabels = """Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs)
Peak Shortage (MW)
Energy Met (MU)
Hydro Gen (MU)
Wind Gen (MU)
Solar Gen (MU)*
Energy Shortage (MU)
Maximum Demand Met During the Day (MW) (From NLDC SCADA)
Time Of Maximum Demand Met (From NLDC SCADA)""".split("\n")
return pd.DataFrame([dict(zip(h, row)) for row in d], index=rowlabels)
extract_data_from_pdf("data.pdf")
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) | 59882 | 41115 | 34238 | 21526 | 2730 | 159491 |
| Peak Shortage (MW) | 1114 | 0 | 0 | 0 | 6 | 1120 |
| Energy Met (MU) | 1398 | 998 | 807 | 447 | 48 | 3698 |
| Hydro Gen (MU) | 355 | 33 | 77 | 149 | 29 | 643 |
| Wind Gen (MU) | 11 | 49 | 128 | - | - | 187 |
| Solar Gen (MU)* | 39.60 | 16.60 | 41.59 | 4.60 | 0.03 | 102 |
| Energy Shortage (MU) | 12.6 | 0.0 | 0.0 | 0.0 | 0.0 | 12.6 |
| Maximum Demand Met During the Day (MW) (From NLDC SCADA) | 65470 | 43593 | 38117 | 21535 | 2827 | 160654 |
| Time Of Maximum Demand Met (From NLDC SCADA) | 22:20 | 10:29 | 10:00 | 21:20 | 19:41 | 21:26 |
df = extract_data_from_pdf("data.pdf")
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 9 entries, Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) to Time Of Maximum Demand Met (From NLDC SCADA) Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 NR 9 non-null object 1 WR 9 non-null object 2 SR 9 non-null object 3 ER 9 non-null object 4 NER 9 non-null object 5 TOTAL 9 non-null object dtypes: object(6) memory usage: 504.0+ bytes
The dataframe is of not use because all numeric data is converted into text here!
def extract_data(text):
lines = text.split("\n")
data_text = lines[:10]
headers = data_text[0].split()
data = [line.strip().split() for line in data_text[1:]]
return headers, data
def extract_data_from_pdf(pdffile):
with open("data.pdf","rb") as f:
pdfreader = PyPDF2.PdfReader(f)
page = pdfreader.pages[1]
text = page.extract_text()
h, d = extract_data(text)
colnames = """Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs)
Peak Shortage (MW)
Energy Met (MU)
Hydro Gen (MU)
Wind Gen (MU)
Solar Gen (MU)*
Energy Shortage (MU)
Maximum Demand Met During the Day (MW) (From NLDC SCADA)
Time Of Maximum Demand Met (From NLDC SCADA)""".split("\n")
return pd.DataFrame(dict(zip(colnames, d)), index=h)
extract_data_from_pdf("data.pdf")
| Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) | Peak Shortage (MW) | Energy Met (MU) | Hydro Gen (MU) | Wind Gen (MU) | Solar Gen (MU)* | Energy Shortage (MU) | Maximum Demand Met During the Day (MW) (From NLDC SCADA) | Time Of Maximum Demand Met (From NLDC SCADA) | |
|---|---|---|---|---|---|---|---|---|---|
| NR | 59882 | 1114 | 1398 | 355 | 11 | 39.60 | 12.6 | 65470 | 22:20 |
| WR | 41115 | 0 | 998 | 33 | 49 | 16.60 | 0.0 | 43593 | 10:29 |
| SR | 34238 | 0 | 807 | 77 | 128 | 41.59 | 0.0 | 38117 | 10:00 |
| ER | 21526 | 0 | 447 | 149 | - | 4.60 | 0.0 | 21535 | 21:20 |
| NER | 2730 | 6 | 48 | 29 | - | 0.03 | 0.0 | 2827 | 19:41 |
| TOTAL | 159491 | 1120 | 3698 | 643 | 187 | 102 | 12.6 | 160654 | 21:26 |
df = extract_data_from_pdf("data.pdf")
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 6 entries, NR to TOTAL Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) 6 non-null object 1 Peak Shortage (MW) 6 non-null object 2 Energy Met (MU) 6 non-null object 3 Hydro Gen (MU) 6 non-null object 4 Wind Gen (MU) 6 non-null object 5 Solar Gen (MU)* 6 non-null object 6 Energy Shortage (MU) 6 non-null object 7 Maximum Demand Met During the Day (MW) (From NLDC SCADA) 6 non-null object 8 Time Of Maximum Demand Met (From NLDC SCADA) 6 non-null object dtypes: object(9) memory usage: 480.0+ bytes
def removedash(row):
return [item.replace("-", "") for item in row]
def extract_data(text):
lines = text.split("\n")
data_text = lines[:10]
headers = data_text[0].split()
data = [removedash(line.split()) for line in data_text[1:]]
return headers, data
def extract_data_from_pdf(pdffile):
with open("data.pdf","rb") as f:
pdfreader = PyPDF2.PdfReader(f)
page = pdfreader.pages[1]
text = page.extract_text()
h, d = extract_data(text)
print(h, d)
colnames = """Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs)
Peak Shortage (MW)
Energy Met (MU)
Hydro Gen (MU)
Wind Gen (MU)
Solar Gen (MU)*
Energy Shortage (MU)
Maximum Demand Met During the Day (MW) (From NLDC SCADA)
Time Of Maximum Demand Met (From NLDC SCADA)""".split("\n")
data = pd.DataFrame(dict(zip(colnames, d)), index=h)
for c in colnames[:-1]:
data[c] = pd.to_numeric(data[c])
return data
df = extract_data_from_pdf("data.pdf")
df
['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL'] [['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '', '', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654'], ['22:20', '10:29', '10:00', '21:20', '19:41', '21:26']]
| Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) | Peak Shortage (MW) | Energy Met (MU) | Hydro Gen (MU) | Wind Gen (MU) | Solar Gen (MU)* | Energy Shortage (MU) | Maximum Demand Met During the Day (MW) (From NLDC SCADA) | Time Of Maximum Demand Met (From NLDC SCADA) | |
|---|---|---|---|---|---|---|---|---|---|
| NR | 59882 | 1114 | 1398 | 355 | 11.0 | 39.60 | 12.6 | 65470 | 22:20 |
| WR | 41115 | 0 | 998 | 33 | 49.0 | 16.60 | 0.0 | 43593 | 10:29 |
| SR | 34238 | 0 | 807 | 77 | 128.0 | 41.59 | 0.0 | 38117 | 10:00 |
| ER | 21526 | 0 | 447 | 149 | NaN | 4.60 | 0.0 | 21535 | 21:20 |
| NER | 2730 | 6 | 48 | 29 | NaN | 0.03 | 0.0 | 2827 | 19:41 |
| TOTAL | 159491 | 1120 | 3698 | 643 | 187.0 | 102.00 | 12.6 | 160654 | 21:26 |
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 6 entries, NR to TOTAL Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) 6 non-null int64 1 Peak Shortage (MW) 6 non-null int64 2 Energy Met (MU) 6 non-null int64 3 Hydro Gen (MU) 6 non-null int64 4 Wind Gen (MU) 4 non-null float64 5 Solar Gen (MU)* 6 non-null float64 6 Energy Shortage (MU) 6 non-null float64 7 Maximum Demand Met During the Day (MW) (From NLDC SCADA) 6 non-null int64 8 Time Of Maximum Demand Met (From NLDC SCADA) 6 non-null object dtypes: float64(3), int64(5), object(1) memory usage: 480.0+ bytes
%%file wc.py
import sys
def linecount(filename):
with open(filename) as f:
return len(f.readlines())
def wordcount(filename):
with open(filename) as f:
return len(f.read().split())
def charcount(filename):
with open(filename) as f:
return len(f.read())
if __name__ == "__main__":
filename = sys.argv[1]
print(linecount(filename), wordcount(filename), charcount(filename), filename)
Overwriting wc.py
!python wc.py poem.txt
5415.79s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
21 144 857 poem.txt
import PyPDF2
import pandas as pd
def removedash(row):
return [item.replace("-", "") for item in row]
def extract_data(text):
lines = text.split("\n")
data_text = lines[:10]
headers = data_text[0].split()
data = [removedash(line.split()) for line in data_text[1:]]
return headers, data
def extract_data_from_pdf(pdffile):
with open("data.pdf","rb") as f:
pdfreader = PyPDF2.PdfReader(f)
page = pdfreader.pages[1]
text = page.extract_text()
h, d = extract_data(text)
print(h, d)
colnames = """Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs)
Peak Shortage (MW)
Energy Met (MU)
Hydro Gen (MU)
Wind Gen (MU)
Solar Gen (MU)*
Energy Shortage (MU)
Maximum Demand Met During the Day (MW) (From NLDC SCADA)
Time Of Maximum Demand Met (From NLDC SCADA)""".split("\n")
data = pd.DataFrame(dict(zip(colnames, d)), index=h)
for c in colnames[:-1]:
data[c] = pd.to_numeric(data[c])
return data
extract_data_from_pdf("data.pdf")
['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL'] [['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '', '', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654'], ['22:20', '10:29', '10:00', '21:20', '19:41', '21:26']]
| Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) | Peak Shortage (MW) | Energy Met (MU) | Hydro Gen (MU) | Wind Gen (MU) | Solar Gen (MU)* | Energy Shortage (MU) | Maximum Demand Met During the Day (MW) (From NLDC SCADA) | Time Of Maximum Demand Met (From NLDC SCADA) | |
|---|---|---|---|---|---|---|---|---|---|
| NR | 59882 | 1114 | 1398 | 355 | 11.0 | 39.60 | 12.6 | 65470 | 22:20 |
| WR | 41115 | 0 | 998 | 33 | 49.0 | 16.60 | 0.0 | 43593 | 10:29 |
| SR | 34238 | 0 | 807 | 77 | 128.0 | 41.59 | 0.0 | 38117 | 10:00 |
| ER | 21526 | 0 | 447 | 149 | NaN | 4.60 | 0.0 | 21535 | 21:20 |
| NER | 2730 | 6 | 48 | 29 | NaN | 0.03 | 0.0 | 2827 | 19:41 |
| TOTAL | 159491 | 1120 | 3698 | 643 | 187.0 | 102.00 | 12.6 | 160654 | 21:26 |
python -m pdb yourscript.py args1 args2
(pdb) h # will show help .. what commands are available
(pdb) l # show lines from file with line no.
(pdb) b # set a breakpoint b 12 ..set breakpoint at line 12. b filename:line
(pdb) c # will start execution and stop at first break point
(pdb) n # execute this line and go to next line and stop
(pdb) s # if there is function call on current line then go inside that function and stop
(pdb) p variable # will print current value of variable
names = ["aakash", "priya", "Aman"]
for i in range(3):
values[i] = i*i*2 # you can modify list without defining it!
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 2 1 for i in range(3): ----> 2 values[i] = i*i*2 NameError: name 'values' is not defined
values = [] # or define a empty list
for i in range(3):
values.append( i*i*2) # and keep appending
values = [0, 0, 0] # either initialize the list of required size
for i in range(3):
values[i] = i*i*2
def func():
x = 10 # this is local variable inside the function
x = 20
func()
print(x)
20
def appendzero(x):
x.append(0) # this will modify actual list
y = [1, 1, 1, 1]
appendzero(y)
print(y)
[1, 1, 1, 1, 0]
def appendzero_and_return(x):
x = x + [0] # local variable... will not affect original variable
return x
y = [1, 1, 1, 1]
appendzero_and_return(y)
[1, 1, 1, 1, 0]
print(y)
[1, 1, 1, 1]
def square(x):
print(x*x) # you will get None as return
x2 = square(5)
25
print(x2)
None
def square(x):
return x*x
x2 = square(5)
x2
25
for i in range(3):
print(i, end=",")
0,1,2,
for c in "text data":
print(c, end=",")
t,e,x,t, ,d,a,t,a,
for key in {"z":2, "x":4}:
print(key)
z x
d = {"z":2, "x":4}
for key, value in d.items():
print(key, value)
z 2 x 4
values = []
for i in range(3):
values.append( i*i*2)
[i*i*2 for i in range(3)]
[0, 2, 8]
[i*i*2 for i in range(10) if i%2==0]
[0, 8, 32, 72, 128]
{k:str(k) for k in range(5)}
{0: '0', 1: '1', 2: '2', 3: '3', 4: '4'}
pd.DataFrame({"col1":[1, 2, 3, 4],
"col2":[2, 2, 2, 2]})
| col1 | col2 | |
|---|---|---|
| 0 | 1 | 2 |
| 1 | 2 | 2 |
| 2 | 3 | 2 |
| 3 | 4 | 2 |
pd.DataFrame([
{"col1":1, "col2":2},
{"col1":21, "col2":22},
{"col1":31, "col2":32}])
| col1 | col2 | |
|---|---|---|
| 0 | 1 | 2 |
| 1 | 21 | 22 |
| 2 | 31 | 32 |
wallet = pd.read_csv("wallet.csv", index_col=0)
wallet
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... |
| 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 4 columns
wallet.groupby("category").min()['debit'].loc['Utility']
103.68079074846584
Find total expenses on books
wallet.groupby("category").sum(numeric_only=True)['debit'].loc['Books']
4929.750393283798
wallet.groupby(['category', 'description']).sum(numeric_only=True)
| debit | ||
|---|---|---|
| category | description | |
| Books | Amazon | 1037.442802 |
| Amazon Kindle | 1389.052376 | |
| Flipcart | 2503.255216 | |
| Food | Hotel | 2752.174732 |
| Swiggy | 1936.495366 | |
| Zomato | 3592.519075 | |
| Music | Amazon | 1467.247766 |
| Netflix | 1546.567562 | |
| spotify | 1219.636541 | |
| Travel | Auto | 2210.428935 |
| Metro | 1216.463665 | |
| Taxi | 2626.039276 | |
| Utility | Electricity | 2885.064355 |
| Phone | 4677.202878 |
df = wallet.groupby(['category', 'description']).sum(numeric_only=True)
df
| debit | ||
|---|---|---|
| category | description | |
| Books | Amazon | 1037.442802 |
| Amazon Kindle | 1389.052376 | |
| Flipcart | 2503.255216 | |
| Food | Hotel | 2752.174732 |
| Swiggy | 1936.495366 | |
| Zomato | 3592.519075 | |
| Music | Amazon | 1467.247766 |
| Netflix | 1546.567562 | |
| spotify | 1219.636541 | |
| Travel | Auto | 2210.428935 |
| Metro | 1216.463665 | |
| Taxi | 2626.039276 | |
| Utility | Electricity | 2885.064355 |
| Phone | 4677.202878 |
df.loc[('Books','Amazon')]
debit 1037.442802 Name: (Books, Amazon), dtype: float64
df.reset_index()
| category | description | debit | |
|---|---|---|---|
| 0 | Books | Amazon | 1037.442802 |
| 1 | Books | Amazon Kindle | 1389.052376 |
| 2 | Books | Flipcart | 2503.255216 |
| 3 | Food | Hotel | 2752.174732 |
| 4 | Food | Swiggy | 1936.495366 |
| 5 | Food | Zomato | 3592.519075 |
| 6 | Music | Amazon | 1467.247766 |
| 7 | Music | Netflix | 1546.567562 |
| 8 | Music | spotify | 1219.636541 |
| 9 | Travel | Auto | 2210.428935 |
| 10 | Travel | Metro | 1216.463665 |
| 11 | Travel | Taxi | 2626.039276 |
| 12 | Utility | Electricity | 2885.064355 |
| 13 | Utility | Phone | 4677.202878 |
pd.read_html(url) # load tables on this page and return a list of tables