Aug 19-25, 2022 Vikrant Patil
All notes are available online at https://notes.pipal.in/2022/arcesium_finop_batch1/
Please accept the invitation that you have received in your email and login to
login to lab and create today's notebook module3-day4
© Pipal Academy LLP
we will try to read pdf files from python using a thrid party module PyPDF2
pip install PyPDF2
pdffileurl = "https://posoco.in/download/16-07-20_nldc_psp/?wpdmdl=30215"
import requests
def download(url, filename):
with open(filename, "wb") as f:
resp = requests.get(url)
f.write(resp.content)
/home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
download(pdffileurl, "data.pdf")
import PyPDF2
with open("data.pdf", "rb") as f:
pdfreader = PyPDF2.PdfFileReader(f)
n = pdfreader.getNumPages()
for p in range(n):
page = pdfreader.getPage(p)
print(page.extractText()[:100])
print("=="*10)
National Load Despatch Centre राष्ट्रीय भार प्रेषण केंद्र POWER SYSTEM OPERATION CORPORATION LI ==================== NR WR SR ER NER TOTAL 59882 41115 34238 21526 2730 159491 1114 0 0 0 6 1120 1398 998 807 447 48 3698 ==================== 16-Jul-2020 Sl NoVoltage Level Line Details Circuit Max Import (MW) Max Export (MW) Import (MU) Exp ====================
with open("data.pdf", "rb") as f:
pdfreader = PyPDF2.PdfFileReader(f)
page = pdfreader.getPage(1)
print(page.extractText())
NR WR SR ER NER TOTAL 59882 41115 34238 21526 2730 159491 1114 0 0 0 6 1120 1398 998 807 447 48 3698 355 33 77 149 29 643 11 49 128 - - 187 39.60 16.60 41.59 4.60 0.03 102 12.6 0.0 0.0 0.0 0.0 12.6 65470 43593 38117 21535 2827 160654 22:20 10:29 10:00 21:20 19:41 21:26 Region FVI < 49.7 49.7 - 49.8 49.8 - 49.9 < 49.9 49.9 - 50.05 > 50.05 All India 0.057 0.16 1.81 13.19 15.16 76.52 8.32 Max.Demand Shortage during Energy Met Drawal OD(+)/UD(-) Max OD Energy Region States Met during the day(MW)maximum Demand(MW)(MU)Schedule (MU)(MU) (MW)Shortage (MU) Punjab 11090 0 237.9 146.8 -1.8 49 0.0 Haryana 9388 0 209.4 152.8 0.7 325 1.9 Rajasthan 12087 0 262.4 119.7 5.4 809 0.0 Delhi 5726 0 118.6 102.8 -1.4 228 0.0 NR UP 22873 0 448.9 208.5 2.0 546 0.4 Uttarakhand 1899 0 42.8 20.7 0.8 111 0.0 HP 1366 0 28.6 -2.6 -0.2 91 0.0 J&K(UT) & Ladakh(UT) 2177 544 43.1 20.3 0.4 502 10.3 Chandigarh 295 0 6.0 5.9 0.2 61 0.0 Chhattisgarh 3685 0 86.9 36.8 0.8 468 0.0 Gujarat 13478 0 286.2 87.6 4.0 527 0.0 MP 9547 0 214.7 113.8 -3.8 198 0.0 WR Maharashtra 16964 0 365.1 138.1 -1.9 457 0.0 Goa 405 0 8.5 8.2 -0.2 33 0.0 DD 246 0 5.3 5.3 0.0 19 0.0 DNH 614 0 14.0 13.8 0.2 44 0.0 AMNSIL 777 0 17.1 4.2 0.7 272 0.0 Andhra Pradesh 6439 0 141.0 45.6 -1.3 607 0.0 Telangana 8614 0 167.3 81.6 -2.5 385 0.0 SR Karnataka 8486 0 155.1 51.1 -3.4 650 0.0 Kerala 3077 0 65.2 46.1 0.5 179 0.0 Tamil Nadu 12371 0 271.3 125.9 -3.7 573 0.0 Puducherry 349 0 7.5 7.5 -0.1 35 0.0 Bihar 5740 0 111.5 106.0 -0.3 386 0.0 DVC 2989 0 62.7 -42.6 -0.7 206 0.0 Jharkhand 1438 0 26.3 18.5 -1.0 124 0.0 ER Odisha 3983 0 82.2 -0.2 -0.2 325 0.0 West Bengal 7917 0 162.6 47.2 -0.8 303 0.0 Sikkim 100 0 1.4 1.5 -0.1 17 0.0 Arunachal Pradesh 120 3 2.0 1.8 0.2 40 0.0 Assam 1759 23 30.0 27.1 -0.1 135 0.0 Manipur 183 1 2.6 2.3 0.3 37 0.0 NER Meghalaya 307 2 5.3 -1.3 0.3 52 0.0 Mizoram 89 1 1.5 1.2 0.0 13 0.0 Nagaland 140 2 2.2 2.3 -0.2 23 0.0 Tripura 298 7 4.9 5.9 0.7 66 0.0 Bhutan Nepal Bangladesh 53.3 -1.5 -19.1 2337.0 -271.3 -1110.0 NR WR SR ER NER TOTAL 352.1 -295.4 95.0 -145.8 -6.0 0.0 359.2 -293.7 84.6 -152.6 -3.4 -6.0 7.1 1.6 -10.5 -6.9 2.6 -6.0 NR WR SR ER NER TOTAL 3838 14847 11792 3445 677 34598 9289 23225 14423 4892 47 51876 13127 38072 26215 8337 723 86473 NR WR SR ER NER All India 546 1080 370 482 7 2486 25 13 14 0 0 52 355 33 77 149 29 643 26 33 47 0 0 106 40 82 19 0 22 163 71 73 210 5 0 359 1063 1314 737 636 58 3809 6.71 5.54 28.51 0.73 0.05 9.43 42.55 10.54 45.35 24.19 49.63 29.09 1.068 1.102 Based on State Max Demands Diversity factor = Sum of regional or state maximum demands / All India maximum demand *Source: RLDCs for solar connected to ISTS; SLDCs for embedded solar. Limited visibility of embedded solar data. Executive Director-NLDCShare of RES in total generation (%) Share of Non-fossil fuel (Hydro,Nuclear and RES) in total generation(%) H. All India Demand Diversity Factor Based on Regional Max DemandsLignite Hydro Nuclear Gas, Naptha & Diesel RES (Wind, Solar, Biomass & Others) TotalState Sector Total G. Sourcewise generation (MU) CoalActual(MU) O/D/U/D(MU) F. Generation Outage(MW) Central SectorDay Peak (MW) E. Import/Export by Regions (in MU) - Import(+ve)/Export(-ve); OD(+)/UD(-) Schedule(MU)D. Transnational Exchanges (MU) - Import(+ve)/Export(-ve) Actual (MU)Energy Shortage (MU) Maximum Demand Met During the Day (MW) (From NLDC SCADA) Time Of Maximum Demand Met (From NLDC SCADA) B. Frequency Profile (%) C. Power Supply Position in StatesDemand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs) Peak Shortage (MW) Energy Met (MU) Hydro Gen (MU) Wind Gen (MU) Solar Gen (MU)*Report for previous day Date of Reporting: 16-Jul-2020 A. Power Supply Position at All India and Regional level
def extract_tableA(pagetext):
lines = pagetext.split("\n")
tableA = lines[:9]
headers = tableA[0].split()
data = [line.strip().split() for line in tableA[1:]]
return headers, data
with open("data.pdf", "rb") as f:
pdfreader = PyPDF2.PdfFileReader(f)
page = pdfreader.getPage(1)
headers, data = extract_tableA(page.extractText())
headers
['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL']
data
[['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '-', '-', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654']]
import csv
def extract_tableA(pagetext):
lines = pagetext.split("\n")
tableA = lines[:9]
headers = tableA[0].split()
data = [line.strip().split() for line in tableA[1:]]
return headers, data
def write_csv(headers, data, filename):
with open(filename, "w") as f:
csvf = csv.writer(f)
csvf.writerow(headers)
csvf.writerows(data)
with open("data.pdf", "rb") as f:
pdfreader = PyPDF2.PdfFileReader(f)
page = pdfreader.getPage(1)
headers, data = extract_tableA(page.extractText())
write_csv(headers, data, "tableA.csv")
!head tableA.csv
NR,WR,SR,ER,NER,TOTAL 59882,41115,34238,21526,2730,159491 1114,0,0,0,6,1120 1398,998,807,447,48,3698 355,33,77,149,29,643 11,49,128,-,-,187 39.60,16.60,41.59,4.60,0.03,102 12.6,0.0,0.0,0.0,0.0,12.6 65470,43593,38117,21535,2827,160654
import pandas as pd
pd.read_csv("tableA.csv")
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882.0 | 41115.0 | 34238.00 | 21526 | 2730 | 159491.0 |
| 1 | 1114.0 | 0.0 | 0.00 | 0 | 6 | 1120.0 |
| 2 | 1398.0 | 998.0 | 807.00 | 447 | 48 | 3698.0 |
| 3 | 355.0 | 33.0 | 77.00 | 149 | 29 | 643.0 |
| 4 | 11.0 | 49.0 | 128.00 | - | - | 187.0 |
| 5 | 39.6 | 16.6 | 41.59 | 4.60 | 0.03 | 102.0 |
| 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.0 | 12.6 |
| 7 | 65470.0 | 43593.0 | 38117.00 | 21535 | 2827 | 160654.0 |
A = pd.read_csv("tableA.csv")
A.describe()
| NR | WR | SR | TOTAL | |
|---|---|---|---|---|
| count | 8.000000 | 8.000000 | 8.000000 | 8.000000 |
| mean | 16035.275000 | 10725.575000 | 9176.073750 | 40738.450000 |
| std | 28830.682422 | 19535.641314 | 16699.952904 | 73664.786038 |
| min | 11.000000 | 0.000000 | 0.000000 | 12.600000 |
| 25% | 32.850000 | 12.450000 | 31.192500 | 165.750000 |
| 50% | 734.500000 | 41.000000 | 102.500000 | 881.500000 |
| 75% | 16019.000000 | 11027.250000 | 9164.750000 | 42646.250000 |
| max | 65470.000000 | 43593.000000 | 38117.000000 | 160654.000000 |
A.ER
0 21526 1 0 2 447 3 149 4 - 5 4.60 6 0.0 7 21535 Name: ER, dtype: object
A.ER.str.replace("-","")
0 21526 1 0 2 447 3 149 4 5 4.60 6 0.0 7 21535 Name: ER, dtype: object
pd.to_numeric(A.ER.str.replace("-",""))
0 21526.0 1 0.0 2 447.0 3 149.0 4 NaN 5 4.6 6 0.0 7 21535.0 Name: ER, dtype: float64
A['ER'] = pd.to_numeric(A.ER.str.replace("-",""))
A['NER'] = pd.to_numeric(A.NER.str.replace("-",""))
A
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882.0 | 41115.0 | 34238.00 | 21526.0 | 2730.00 | 159491.0 |
| 1 | 1114.0 | 0.0 | 0.00 | 0.0 | 6.00 | 1120.0 |
| 2 | 1398.0 | 998.0 | 807.00 | 447.0 | 48.00 | 3698.0 |
| 3 | 355.0 | 33.0 | 77.00 | 149.0 | 29.00 | 643.0 |
| 4 | 11.0 | 49.0 | 128.00 | NaN | NaN | 187.0 |
| 5 | 39.6 | 16.6 | 41.59 | 4.6 | 0.03 | 102.0 |
| 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.00 | 12.6 |
| 7 | 65470.0 | 43593.0 | 38117.00 | 21535.0 | 2827.00 | 160654.0 |
A.describe()
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| count | 8.000000 | 8.000000 | 8.000000 | 7.000000 | 7.000000 | 8.000000 |
| mean | 16035.275000 | 10725.575000 | 9176.073750 | 6237.371429 | 805.718571 | 40738.450000 |
| std | 28830.682422 | 19535.641314 | 16699.952904 | 10448.391549 | 1348.068562 | 73664.786038 |
| min | 11.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 12.600000 |
| 25% | 32.850000 | 12.450000 | 31.192500 | 2.300000 | 3.015000 | 165.750000 |
| 50% | 734.500000 | 41.000000 | 102.500000 | 149.000000 | 29.000000 | 881.500000 |
| 75% | 16019.000000 | 11027.250000 | 9164.750000 | 10986.500000 | 1389.000000 | 42646.250000 |
| max | 65470.000000 | 43593.000000 | 38117.000000 | 21535.000000 | 2827.000000 | 160654.000000 |
A1 = pd.read_csv("tableA.csv")
A1
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882.0 | 41115.0 | 34238.00 | 21526 | 2730 | 159491.0 |
| 1 | 1114.0 | 0.0 | 0.00 | 0 | 6 | 1120.0 |
| 2 | 1398.0 | 998.0 | 807.00 | 447 | 48 | 3698.0 |
| 3 | 355.0 | 33.0 | 77.00 | 149 | 29 | 643.0 |
| 4 | 11.0 | 49.0 | 128.00 | - | - | 187.0 |
| 5 | 39.6 | 16.6 | 41.59 | 4.60 | 0.03 | 102.0 |
| 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.0 | 12.6 |
| 7 | 65470.0 | 43593.0 | 38117.00 | 21535 | 2827 | 160654.0 |
A1.ER.str.replace("-","0")
0 21526 1 0 2 447 3 149 4 0 5 4.60 6 0.0 7 21535 Name: ER, dtype: object
float("21526")
21526.0
def extract_tableA_csv(pagetext, filename):
lines = pagetext.split("\n")
tableA = lines[:9]
with open(filename, "w") as f:
f.write(",".join(tableA[0].strip().split()))
f.write("\n")
for line in tableA[1:]:
f.write(",".join(line.strip().split()).replace("-",""))
f.write("\n")
with open("data.pdf", "rb") as f:
pdfreader = PyPDF2.PdfFileReader(f)
page = pdfreader.getPage(1)
extract_tableA_csv(page.extractText(), "A.csv")
!cat A.csv
NR,WR,SR,ER,NER,TOTAL 59882,41115,34238,21526,2730,159491 1114,0,0,0,6,1120 1398,998,807,447,48,3698 355,33,77,149,29,643 11,49,128,,,187 39.60,16.60,41.59,4.60,0.03,102 12.6,0.0,0.0,0.0,0.0,12.6 65470,43593,38117,21535,2827,160654
pd.read_csv("A.csv")
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882.0 | 41115.0 | 34238.00 | 21526.0 | 2730.00 | 159491.0 |
| 1 | 1114.0 | 0.0 | 0.00 | 0.0 | 6.00 | 1120.0 |
| 2 | 1398.0 | 998.0 | 807.00 | 447.0 | 48.00 | 3698.0 |
| 3 | 355.0 | 33.0 | 77.00 | 149.0 | 29.00 | 643.0 |
| 4 | 11.0 | 49.0 | 128.00 | NaN | NaN | 187.0 |
| 5 | 39.6 | 16.6 | 41.59 | 4.6 | 0.03 | 102.0 |
| 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.00 | 12.6 |
| 7 | 65470.0 | 43593.0 | 38117.00 | 21535.0 | 2827.00 | 160654.0 |
A = pd.read_csv("tableA.csv")
A
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882.0 | 41115.0 | 34238.00 | 21526 | 2730 | 159491.0 |
| 1 | 1114.0 | 0.0 | 0.00 | 0 | 6 | 1120.0 |
| 2 | 1398.0 | 998.0 | 807.00 | 447 | 48 | 3698.0 |
| 3 | 355.0 | 33.0 | 77.00 | 149 | 29 | 643.0 |
| 4 | 11.0 | 49.0 | 128.00 | - | - | 187.0 |
| 5 | 39.6 | 16.6 | 41.59 | 4.60 | 0.03 | 102.0 |
| 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.0 | 12.6 |
| 7 | 65470.0 | 43593.0 | 38117.00 | 21535 | 2827 | 160654.0 |
A.ER.str.replace("-","") # possible only with string columns
0 21526 1 0 2 447 3 149 4 5 4.60 6 0.0 7 21535 Name: ER, dtype: object
A.NE.str.replace("11.0","11") # will not work with non string column
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [38], in <cell line: 1>() ----> 1 A.NE.str.replace("11.0","11") File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/generic.py:5575, in NDFrame.__getattr__(self, name) 5568 if ( 5569 name not in self._internal_names_set 5570 and name not in self._metadata 5571 and name not in self._accessors 5572 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5573 ): 5574 return self[name] -> 5575 return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'NE'
A['ER'] = pd.to_numeric(A.ER.str.replace("-",""))
A.ER
0 21526.0 1 0.0 2 447.0 3 149.0 4 NaN 5 4.6 6 0.0 7 21535.0 Name: ER, dtype: float64
pd.to_numeric(A.ER.str.replace("-","")) # not working because ER is no more a string column, it is now numeric column
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [43], in <cell line: 1>() ----> 1 pd.to_numeric(A.ER.str.replace("-","")) File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/generic.py:5575, in NDFrame.__getattr__(self, name) 5568 if ( 5569 name not in self._internal_names_set 5570 and name not in self._metadata 5571 and name not in self._accessors 5572 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5573 ): 5574 return self[name] -> 5575 return object.__getattribute__(self, name) File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/accessor.py:182, in CachedAccessor.__get__(self, obj, cls) 179 if obj is None: 180 # we're accessing the attribute of the class, i.e., Dataset.geo 181 return self._accessor --> 182 accessor_obj = self._accessor(obj) 183 # Replace the property with the accessor object. Inspired by: 184 # https://www.pydanny.com/cached-property.html 185 # We need to use object.__setattr__ because we overwrite __setattr__ on 186 # NDFrame 187 object.__setattr__(obj, self._name, accessor_obj) File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/strings/accessor.py:177, in StringMethods.__init__(self, data) 174 def __init__(self, data): 175 from pandas.core.arrays.string_ import StringDtype --> 177 self._inferred_dtype = self._validate(data) 178 self._is_categorical = is_categorical_dtype(data.dtype) 179 self._is_string = isinstance(data.dtype, StringDtype) File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/strings/accessor.py:231, in StringMethods._validate(data) 228 inferred_dtype = lib.infer_dtype(values, skipna=True) 230 if inferred_dtype not in allowed_types: --> 231 raise AttributeError("Can only use .str accessor with string values!") 232 return inferred_dtype AttributeError: Can only use .str accessor with string values!
import PyPDF2
x, y = 3 ,4
def foo():
return "hello", 4
foo()
('hello', 4)
a,b = foo()
import sqlite3 # this is for sqlite connector
conn = sqlite3.connect("a1.db") # these two statements will be different for different database
cur = conn.cursor()
cur.execute("create table person (name varchar(100), email varchar(100));")
cur.execute("insert into person (name, email) values ('alice', 'alice@example.com');")
conn.commit()
cur = conn.cursor()
results = cur.execute("select * from person")
results.fetchall()
[('alice', 'alice@example.com')]
def find_person(conn, email):
q = "select * from person where email='{}'".format(email) # not very nice way of doing!
print(q)
cur = conn.cursor()
results = cur.execute(q)
return results.fetchone()
find_person(conn, "alice@example.com")
select * from person where email='alice@example.com'
('alice', 'alice@example.com')
def find_person(conn, email):
q = "select * from person where email=?"
cur = conn.cursor()
results = cur.execute(q, (email,))
return results.fetchone()
find_person(conn, "alice@example.com")
('alice', 'alice@example.com')
def query(conn, querystring, params=()):
cur = conn.cursor()
result = cur.execute(querystring, params)
return result.fetchall()
query(conn, "select * from person where name=?", ("alice",))
[('alice', 'alice@example.com')]
query(conn, "select * from person")
[('alice', 'alice@example.com')]
conn.close() # after this any changes that your program has done will be saved to database
conn = sqlite3.connect("a.db")
conn.commit() # just save the chages but don't close the connection
conn.close()
import pandas as pd
A
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882.0 | 41115.0 | 34238.00 | 21526.0 | 2730 | 159491.0 |
| 1 | 1114.0 | 0.0 | 0.00 | 0.0 | 6 | 1120.0 |
| 2 | 1398.0 | 998.0 | 807.00 | 447.0 | 48 | 3698.0 |
| 3 | 355.0 | 33.0 | 77.00 | 149.0 | 29 | 643.0 |
| 4 | 11.0 | 49.0 | 128.00 | NaN | - | 187.0 |
| 5 | 39.6 | 16.6 | 41.59 | 4.6 | 0.03 | 102.0 |
| 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.0 | 12.6 |
| 7 | 65470.0 | 43593.0 | 38117.00 | 21535.0 | 2827 | 160654.0 |
conn = sqlite3.connect("pdfdata") # the connection has to esatblished beforehand
## to_sql(tablename, con, if_exits=replace/append/error)
A.to_sql("tableA", con=conn, if_exists="replace") # be carefull , you should know what you are doing
8
r = conn.cursor().execute("select * from tableA")
for item in r:
print(item)
(0, 59882.0, 41115.0, 34238.0, 21526.0, '2730', 159491.0) (1, 1114.0, 0.0, 0.0, 0.0, '6', 1120.0) (2, 1398.0, 998.0, 807.0, 447.0, '48', 3698.0) (3, 355.0, 33.0, 77.0, 149.0, '29', 643.0) (4, 11.0, 49.0, 128.0, None, '-', 187.0) (5, 39.6, 16.6, 41.59, 4.6, '0.03', 102.0) (6, 12.6, 0.0, 0.0, 0.0, '0.0', 12.6) (7, 65470.0, 43593.0, 38117.0, 21535.0, '2827', 160654.0)
df = pd.read_sql_query("select * from tableA", con= conn)
df
| index | NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | 59882.0 | 41115.0 | 34238.00 | 21526.0 | 2730 | 159491.0 |
| 1 | 1 | 1114.0 | 0.0 | 0.00 | 0.0 | 6 | 1120.0 |
| 2 | 2 | 1398.0 | 998.0 | 807.00 | 447.0 | 48 | 3698.0 |
| 3 | 3 | 355.0 | 33.0 | 77.00 | 149.0 | 29 | 643.0 |
| 4 | 4 | 11.0 | 49.0 | 128.00 | NaN | - | 187.0 |
| 5 | 5 | 39.6 | 16.6 | 41.59 | 4.6 | 0.03 | 102.0 |
| 6 | 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.0 | 12.6 |
| 7 | 7 | 65470.0 | 43593.0 | 38117.00 | 21535.0 | 2827 | 160654.0 |
df.describe()
| index | NR | WR | SR | ER | TOTAL | |
|---|---|---|---|---|---|---|
| count | 8.00000 | 8.000000 | 8.000000 | 8.000000 | 7.000000 | 8.000000 |
| mean | 3.50000 | 16035.275000 | 10725.575000 | 9176.073750 | 6237.371429 | 40738.450000 |
| std | 2.44949 | 28830.682422 | 19535.641314 | 16699.952904 | 10448.391549 | 73664.786038 |
| min | 0.00000 | 11.000000 | 0.000000 | 0.000000 | 0.000000 | 12.600000 |
| 25% | 1.75000 | 32.850000 | 12.450000 | 31.192500 | 2.300000 | 165.750000 |
| 50% | 3.50000 | 734.500000 | 41.000000 | 102.500000 | 149.000000 | 881.500000 |
| 75% | 5.25000 | 16019.000000 | 11027.250000 | 9164.750000 | 10986.500000 | 42646.250000 |
| max | 7.00000 | 65470.000000 | 43593.000000 | 38117.000000 | 21535.000000 | 160654.000000 |
query(conn, "select * from tableA where NER=?", ("-",))
[(4, 11.0, 49.0, 128.0, None, '-', 187.0)]
import pyodbc
cnxn = pyodbc.connect(r'Driver={SQL Server};Server=.\SQLEXPRESS;Database=myDB;Trusted_Connection=yes;')
cursor = cnxn.cursor()
cursor.execute("SELECT LastName FROM myContacts")
while 1:
row = cursor.fetchone()
if not row:
break
print(row.LastName)
cnxn.close()
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Input In [83], in <cell line: 1>() ----> 1 import pyodbc 2 cnxn = pyodbc.connect(r'Driver={SQL Server};Server=.\SQLEXPRESS;Database=myDB;Trusted_Connection=yes;') 3 cursor = cnxn.cursor() ModuleNotFoundError: No module named 'pyodbc'
headers, data
(['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL'], [['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '-', '-', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654']])
pd.DataFrame({"col1":[1,2,3],
"col2":[2,3,4]})
| col1 | col2 | |
|---|---|---|
| 0 | 1 | 2 |
| 1 | 2 | 3 |
| 2 | 3 | 4 |
headers, data
(['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL'], [['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '-', '-', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654']])
dict(zip(header, data)) # i want columns not rows
{'NR': ['59882', '41115', '34238', '21526', '2730', '159491'],
'WR': ['1114', '0', '0', '0', '6', '1120'],
'SR': ['1398', '998', '807', '447', '48', '3698'],
'ER': ['355', '33', '77', '149', '29', '643'],
'NER': ['11', '49', '128', '-', '-', '187'],
'TOTAL': ['39.60', '16.60', '41.59', '4.60', '0.03', '102']}
pd.DataFrame(dict(zip(header, data))) # wrong
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882 | 1114 | 1398 | 355 | 11 | 39.60 |
| 1 | 41115 | 0 | 998 | 33 | 49 | 16.60 |
| 2 | 34238 | 0 | 807 | 77 | 128 | 41.59 |
| 3 | 21526 | 0 | 447 | 149 | - | 4.60 |
| 4 | 2730 | 6 | 48 | 29 | - | 0.03 |
| 5 | 159491 | 1120 | 3698 | 643 | 187 | 102 |
data # collection of rows
[['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '-', '-', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654']]
#colection of columns!
data[0][0] #
'59882'
data[1][0]
'1114'
data[2][0]
'1398'
data[3][0]
'355'
def column(data, columnnum):
rowcount = len(data)
return [data[i][columnnum] for i in range(rowcount)]
column(data, 0)
['59882', '1114', '1398', '355', '11', '39.60', '12.6', '65470']
data
[['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '-', '-', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654']]
def column(data, columnnum):
return [row[columnnum] for row in data]
column(data, 0)
['59882', '1114', '1398', '355', '11', '39.60', '12.6', '65470']
def transpose(data):
## if data is collection of rows
## then transpose if nothing but collection of columns
columncount = len(data[0])
return [column(data, i) for i in range(columncount)]
transpose(data)
[['59882', '1114', '1398', '355', '11', '39.60', '12.6', '65470'], ['41115', '0', '998', '33', '49', '16.60', '0.0', '43593'], ['34238', '0', '807', '77', '128', '41.59', '0.0', '38117'], ['21526', '0', '447', '149', '-', '4.60', '0.0', '21535'], ['2730', '6', '48', '29', '-', '0.03', '0.0', '2827'], ['159491', '1120', '3698', '643', '187', '102', '12.6', '160654']]
data
[['59882', '41115', '34238', '21526', '2730', '159491'], ['1114', '0', '0', '0', '6', '1120'], ['1398', '998', '807', '447', '48', '3698'], ['355', '33', '77', '149', '29', '643'], ['11', '49', '128', '-', '-', '187'], ['39.60', '16.60', '41.59', '4.60', '0.03', '102'], ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'], ['65470', '43593', '38117', '21535', '2827', '160654']]
pd.DataFrame(dict(zip(headers, transpose(data))))
| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882 | 41115 | 34238 | 21526 | 2730 | 159491 |
| 1 | 1114 | 0 | 0 | 0 | 6 | 1120 |
| 2 | 1398 | 998 | 807 | 447 | 48 | 3698 |
| 3 | 355 | 33 | 77 | 149 | 29 | 643 |
| 4 | 11 | 49 | 128 | - | - | 187 |
| 5 | 39.60 | 16.60 | 41.59 | 4.60 | 0.03 | 102 |
| 6 | 12.6 | 0.0 | 0.0 | 0.0 | 0.0 | 12.6 |
| 7 | 65470 | 43593 | 38117 | 21535 | 2827 | 160654 |