Python Virtual Training For Arcesium - Module III - Day 4¶

Aug 19-25, 2022 Vikrant Patil

All notes are available online at https://notes.pipal.in/2022/arcesium_finop_batch1/

Please accept the invitation that you have received in your email and login to

https://engage.pipal.in/

login to lab and create today's notebook module3-day4

© Pipal Academy LLP

Reading pdf files¶

we will try to read pdf files from python using a thrid party module PyPDF2

pip install PyPDF2

In [1]:
pdffileurl = "https://posoco.in/download/16-07-20_nldc_psp/?wpdmdl=30215"
In [2]:
import requests
def download(url, filename):
    with open(filename, "wb") as f:
        resp = requests.get(url)
        f.write(resp.content)
/home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
In [3]:
download(pdffileurl, "data.pdf")
In [4]:
import PyPDF2
In [7]:
with open("data.pdf", "rb") as f:
    pdfreader = PyPDF2.PdfFileReader(f)
    n = pdfreader.getNumPages()
    for p in range(n):
        page = pdfreader.getPage(p)
        print(page.extractText()[:100])
        print("=="*10)
 
National Load Despatch Centre  
राष्ट्रीय भार प्रेषण केंद्र 
POWER SYSTEM OPERATION CORPORATION LI
====================
NR WR SR ER NER TOTAL
59882 41115 34238 21526 2730 159491
1114 0 0 0 6 1120
1398 998 807 447 48 3698
====================
16-Jul-2020
Sl 
NoVoltage Level Line Details Circuit Max Import (MW) Max Export (MW) Import (MU) Exp
====================
In [8]:
with open("data.pdf", "rb") as f:
    pdfreader = PyPDF2.PdfFileReader(f)
    page = pdfreader.getPage(1)
    print(page.extractText())
NR WR SR ER NER TOTAL
59882 41115 34238 21526 2730 159491
1114 0 0 0 6 1120
1398 998 807 447 48 3698
355 33 77 149 29 643
11 49 128 - - 187
39.60 16.60 41.59 4.60 0.03 102
12.6 0.0 0.0 0.0 0.0 12.6
65470 43593 38117 21535 2827 160654
22:20 10:29 10:00 21:20 19:41 21:26
Region FVI < 49.7 49.7 - 49.8 49.8 - 49.9 < 49.9 49.9 - 50.05 > 50.05
All India 0.057 0.16 1.81 13.19 15.16 76.52 8.32
Max.Demand Shortage during Energy Met Drawal OD(+)/UD(-) Max OD Energy
Region States Met during the 
day(MW)maximum 
Demand(MW)(MU)Schedule
(MU)(MU) (MW)Shortage 
(MU)
Punjab 11090 0 237.9 146.8 -1.8 49 0.0
Haryana 9388 0 209.4 152.8 0.7 325 1.9
Rajasthan 12087 0 262.4 119.7 5.4 809 0.0
Delhi 5726 0 118.6 102.8 -1.4 228 0.0
NR UP 22873 0 448.9 208.5 2.0 546 0.4
Uttarakhand 1899 0 42.8 20.7 0.8 111 0.0
HP 1366 0 28.6 -2.6 -0.2 91 0.0
J&K(UT) & Ladakh(UT) 2177 544 43.1 20.3 0.4 502 10.3
Chandigarh 295 0 6.0 5.9 0.2 61 0.0
Chhattisgarh 3685 0 86.9 36.8 0.8 468 0.0
Gujarat 13478 0 286.2 87.6 4.0 527 0.0
MP 9547 0 214.7 113.8 -3.8 198 0.0
WR Maharashtra 16964 0 365.1 138.1 -1.9 457 0.0
Goa 405 0 8.5 8.2 -0.2 33 0.0
DD 246 0 5.3 5.3 0.0 19 0.0
DNH 614 0 14.0 13.8 0.2 44 0.0
AMNSIL 777 0 17.1 4.2 0.7 272 0.0
Andhra Pradesh 6439 0 141.0 45.6 -1.3 607 0.0
Telangana 8614 0 167.3 81.6 -2.5 385 0.0
SR Karnataka 8486 0 155.1 51.1 -3.4 650 0.0
Kerala 3077 0 65.2 46.1 0.5 179 0.0
Tamil Nadu 12371 0 271.3 125.9 -3.7 573 0.0
Puducherry 349 0 7.5 7.5 -0.1 35 0.0
Bihar 5740 0 111.5 106.0 -0.3 386 0.0
DVC 2989 0 62.7 -42.6 -0.7 206 0.0
Jharkhand 1438 0 26.3 18.5 -1.0 124 0.0
ER Odisha 3983 0 82.2 -0.2 -0.2 325 0.0
West Bengal 7917 0 162.6 47.2 -0.8 303 0.0
Sikkim 100 0 1.4 1.5 -0.1 17 0.0
Arunachal Pradesh 120 3 2.0 1.8 0.2 40 0.0
Assam 1759 23 30.0 27.1 -0.1 135 0.0
Manipur 183 1 2.6 2.3 0.3 37 0.0
NER Meghalaya 307 2 5.3 -1.3 0.3 52 0.0
Mizoram 89 1 1.5 1.2 0.0 13 0.0
Nagaland 140 2 2.2 2.3 -0.2 23 0.0
Tripura 298 7 4.9 5.9 0.7 66 0.0
Bhutan Nepal Bangladesh
53.3 -1.5 -19.1
2337.0 -271.3 -1110.0
NR WR SR ER NER TOTAL
352.1 -295.4 95.0 -145.8 -6.0 0.0
359.2 -293.7 84.6 -152.6 -3.4 -6.0
7.1 1.6 -10.5 -6.9 2.6 -6.0
NR WR SR ER NER TOTAL
3838 14847 11792 3445 677 34598
9289 23225 14423 4892 47 51876
13127 38072 26215 8337 723 86473
NR WR SR ER NER All India
546 1080 370 482 7 2486
25 13 14 0 0 52
355 33 77 149 29 643
26 33 47 0 0 106
40 82 19 0 22 163
71 73 210 5 0 359
1063 1314 737 636 58 3809
6.71 5.54 28.51 0.73 0.05 9.43
42.55 10.54 45.35 24.19 49.63 29.09
1.068
1.102 Based on State Max Demands
Diversity factor = Sum of regional or state maximum demands / All India maximum demand
*Source: RLDCs for solar connected to ISTS; SLDCs for embedded solar. Limited visibility of embedded solar data.
Executive Director-NLDCShare of RES in total generation (%)
Share of Non-fossil fuel (Hydro,Nuclear and RES) in total generation(%)
H. All India Demand Diversity Factor
Based on Regional Max DemandsLignite
Hydro
Nuclear
Gas, Naptha & Diesel
RES (Wind, Solar, Biomass & Others)
TotalState Sector
Total
G. Sourcewise generation (MU)
CoalActual(MU)
O/D/U/D(MU)
F. Generation Outage(MW)
Central SectorDay Peak (MW)
E. Import/Export by Regions (in MU) - Import(+ve)/Export(-ve); OD(+)/UD(-)
Schedule(MU)D. Transnational Exchanges (MU) - Import(+ve)/Export(-ve)   
Actual (MU)Energy Shortage (MU)
Maximum Demand Met During the Day (MW) (From NLDC SCADA)
Time Of Maximum Demand Met (From NLDC SCADA)
B. Frequency Profile (%)
C. Power Supply Position in StatesDemand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs)
Peak Shortage (MW)
Energy Met (MU)
Hydro Gen (MU)
Wind Gen (MU)
Solar Gen (MU)*Report for previous day Date of Reporting: 16-Jul-2020
A. Power Supply Position at All India and Regional level

In [10]:
def extract_tableA(pagetext):
    lines = pagetext.split("\n")
    tableA = lines[:9]
    headers = tableA[0].split()
    data = [line.strip().split() for line in tableA[1:]]
    return headers, data



with open("data.pdf", "rb") as f:
    pdfreader = PyPDF2.PdfFileReader(f)
    page = pdfreader.getPage(1)
    headers, data = extract_tableA(page.extractText())
In [11]:
headers
Out[11]:
['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL']
In [12]:
data
Out[12]:
[['59882', '41115', '34238', '21526', '2730', '159491'],
 ['1114', '0', '0', '0', '6', '1120'],
 ['1398', '998', '807', '447', '48', '3698'],
 ['355', '33', '77', '149', '29', '643'],
 ['11', '49', '128', '-', '-', '187'],
 ['39.60', '16.60', '41.59', '4.60', '0.03', '102'],
 ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'],
 ['65470', '43593', '38117', '21535', '2827', '160654']]
In [13]:
import csv

def extract_tableA(pagetext):
    lines = pagetext.split("\n")
    tableA = lines[:9]
    headers = tableA[0].split()
    data = [line.strip().split() for line in tableA[1:]]
    return headers, data

def write_csv(headers, data, filename):
    with open(filename, "w") as f:
        csvf = csv.writer(f)
        csvf.writerow(headers)
        csvf.writerows(data)


with open("data.pdf", "rb") as f:
    pdfreader = PyPDF2.PdfFileReader(f)
    page = pdfreader.getPage(1)
    headers, data = extract_tableA(page.extractText())
    write_csv(headers, data, "tableA.csv")
In [14]:
!head tableA.csv
NR,WR,SR,ER,NER,TOTAL
59882,41115,34238,21526,2730,159491
1114,0,0,0,6,1120
1398,998,807,447,48,3698
355,33,77,149,29,643
11,49,128,-,-,187
39.60,16.60,41.59,4.60,0.03,102
12.6,0.0,0.0,0.0,0.0,12.6
65470,43593,38117,21535,2827,160654
In [15]:
import pandas as pd
In [16]:
pd.read_csv("tableA.csv")
Out[16]:
NR WR SR ER NER TOTAL
0 59882.0 41115.0 34238.00 21526 2730 159491.0
1 1114.0 0.0 0.00 0 6 1120.0
2 1398.0 998.0 807.00 447 48 3698.0
3 355.0 33.0 77.00 149 29 643.0
4 11.0 49.0 128.00 - - 187.0
5 39.6 16.6 41.59 4.60 0.03 102.0
6 12.6 0.0 0.00 0.0 0.0 12.6
7 65470.0 43593.0 38117.00 21535 2827 160654.0
In [17]:
A = pd.read_csv("tableA.csv")
In [18]:
A.describe()
Out[18]:
NR WR SR TOTAL
count 8.000000 8.000000 8.000000 8.000000
mean 16035.275000 10725.575000 9176.073750 40738.450000
std 28830.682422 19535.641314 16699.952904 73664.786038
min 11.000000 0.000000 0.000000 12.600000
25% 32.850000 12.450000 31.192500 165.750000
50% 734.500000 41.000000 102.500000 881.500000
75% 16019.000000 11027.250000 9164.750000 42646.250000
max 65470.000000 43593.000000 38117.000000 160654.000000
In [19]:
A.ER
Out[19]:
0    21526
1        0
2      447
3      149
4        -
5     4.60
6      0.0
7    21535
Name: ER, dtype: object
In [20]:
A.ER.str.replace("-","")
Out[20]:
0    21526
1        0
2      447
3      149
4         
5     4.60
6      0.0
7    21535
Name: ER, dtype: object
In [21]:
pd.to_numeric(A.ER.str.replace("-",""))
Out[21]:
0    21526.0
1        0.0
2      447.0
3      149.0
4        NaN
5        4.6
6        0.0
7    21535.0
Name: ER, dtype: float64
In [22]:
A['ER'] = pd.to_numeric(A.ER.str.replace("-",""))
In [23]:
A['NER'] = pd.to_numeric(A.NER.str.replace("-",""))
In [24]:
A
Out[24]:
NR WR SR ER NER TOTAL
0 59882.0 41115.0 34238.00 21526.0 2730.00 159491.0
1 1114.0 0.0 0.00 0.0 6.00 1120.0
2 1398.0 998.0 807.00 447.0 48.00 3698.0
3 355.0 33.0 77.00 149.0 29.00 643.0
4 11.0 49.0 128.00 NaN NaN 187.0
5 39.6 16.6 41.59 4.6 0.03 102.0
6 12.6 0.0 0.00 0.0 0.00 12.6
7 65470.0 43593.0 38117.00 21535.0 2827.00 160654.0
In [25]:
A.describe()
Out[25]:
NR WR SR ER NER TOTAL
count 8.000000 8.000000 8.000000 7.000000 7.000000 8.000000
mean 16035.275000 10725.575000 9176.073750 6237.371429 805.718571 40738.450000
std 28830.682422 19535.641314 16699.952904 10448.391549 1348.068562 73664.786038
min 11.000000 0.000000 0.000000 0.000000 0.000000 12.600000
25% 32.850000 12.450000 31.192500 2.300000 3.015000 165.750000
50% 734.500000 41.000000 102.500000 149.000000 29.000000 881.500000
75% 16019.000000 11027.250000 9164.750000 10986.500000 1389.000000 42646.250000
max 65470.000000 43593.000000 38117.000000 21535.000000 2827.000000 160654.000000
In [26]:
A1 = pd.read_csv("tableA.csv")
In [27]:
A1
Out[27]:
NR WR SR ER NER TOTAL
0 59882.0 41115.0 34238.00 21526 2730 159491.0
1 1114.0 0.0 0.00 0 6 1120.0
2 1398.0 998.0 807.00 447 48 3698.0
3 355.0 33.0 77.00 149 29 643.0
4 11.0 49.0 128.00 - - 187.0
5 39.6 16.6 41.59 4.60 0.03 102.0
6 12.6 0.0 0.00 0.0 0.0 12.6
7 65470.0 43593.0 38117.00 21535 2827 160654.0
In [28]:
A1.ER.str.replace("-","0")
Out[28]:
0    21526
1        0
2      447
3      149
4        0
5     4.60
6      0.0
7    21535
Name: ER, dtype: object
In [29]:
float("21526")
Out[29]:
21526.0
In [30]:
def extract_tableA_csv(pagetext, filename):
    lines = pagetext.split("\n")
    tableA = lines[:9]
    with open(filename, "w") as f:
        f.write(",".join(tableA[0].strip().split()))
        f.write("\n")
        for line in tableA[1:]:
            f.write(",".join(line.strip().split()).replace("-",""))
            f.write("\n")


with open("data.pdf", "rb") as f:
    pdfreader = PyPDF2.PdfFileReader(f)
    page = pdfreader.getPage(1)
    extract_tableA_csv(page.extractText(), "A.csv")
In [31]:
!cat A.csv
NR,WR,SR,ER,NER,TOTAL
59882,41115,34238,21526,2730,159491
1114,0,0,0,6,1120
1398,998,807,447,48,3698
355,33,77,149,29,643
11,49,128,,,187
39.60,16.60,41.59,4.60,0.03,102
12.6,0.0,0.0,0.0,0.0,12.6
65470,43593,38117,21535,2827,160654
In [32]:
pd.read_csv("A.csv")
Out[32]:
NR WR SR ER NER TOTAL
0 59882.0 41115.0 34238.00 21526.0 2730.00 159491.0
1 1114.0 0.0 0.00 0.0 6.00 1120.0
2 1398.0 998.0 807.00 447.0 48.00 3698.0
3 355.0 33.0 77.00 149.0 29.00 643.0
4 11.0 49.0 128.00 NaN NaN 187.0
5 39.6 16.6 41.59 4.6 0.03 102.0
6 12.6 0.0 0.00 0.0 0.00 12.6
7 65470.0 43593.0 38117.00 21535.0 2827.00 160654.0
In [33]:
A = pd.read_csv("tableA.csv")
In [34]:
A
Out[34]:
NR WR SR ER NER TOTAL
0 59882.0 41115.0 34238.00 21526 2730 159491.0
1 1114.0 0.0 0.00 0 6 1120.0
2 1398.0 998.0 807.00 447 48 3698.0
3 355.0 33.0 77.00 149 29 643.0
4 11.0 49.0 128.00 - - 187.0
5 39.6 16.6 41.59 4.60 0.03 102.0
6 12.6 0.0 0.00 0.0 0.0 12.6
7 65470.0 43593.0 38117.00 21535 2827 160654.0
In [36]:
A.ER.str.replace("-","") # possible only with string columns
Out[36]:
0    21526
1        0
2      447
3      149
4         
5     4.60
6      0.0
7    21535
Name: ER, dtype: object
In [38]:
A.NE.str.replace("11.0","11") # will not work with non string column
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [38], in <cell line: 1>()
----> 1 A.NE.str.replace("11.0","11")

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/generic.py:5575, in NDFrame.__getattr__(self, name)
   5568 if (
   5569     name not in self._internal_names_set
   5570     and name not in self._metadata
   5571     and name not in self._accessors
   5572     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5573 ):
   5574     return self[name]
-> 5575 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'NE'
In [40]:
A['ER'] = pd.to_numeric(A.ER.str.replace("-","")) 
In [41]:
A.ER
Out[41]:
0    21526.0
1        0.0
2      447.0
3      149.0
4        NaN
5        4.6
6        0.0
7    21535.0
Name: ER, dtype: float64
In [43]:
pd.to_numeric(A.ER.str.replace("-","")) # not working because ER is no more a string column, it is now numeric column
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [43], in <cell line: 1>()
----> 1 pd.to_numeric(A.ER.str.replace("-",""))

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/generic.py:5575, in NDFrame.__getattr__(self, name)
   5568 if (
   5569     name not in self._internal_names_set
   5570     and name not in self._metadata
   5571     and name not in self._accessors
   5572     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5573 ):
   5574     return self[name]
-> 5575 return object.__getattribute__(self, name)

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/accessor.py:182, in CachedAccessor.__get__(self, obj, cls)
    179 if obj is None:
    180     # we're accessing the attribute of the class, i.e., Dataset.geo
    181     return self._accessor
--> 182 accessor_obj = self._accessor(obj)
    183 # Replace the property with the accessor object. Inspired by:
    184 # https://www.pydanny.com/cached-property.html
    185 # We need to use object.__setattr__ because we overwrite __setattr__ on
    186 # NDFrame
    187 object.__setattr__(obj, self._name, accessor_obj)

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/strings/accessor.py:177, in StringMethods.__init__(self, data)
    174 def __init__(self, data):
    175     from pandas.core.arrays.string_ import StringDtype
--> 177     self._inferred_dtype = self._validate(data)
    178     self._is_categorical = is_categorical_dtype(data.dtype)
    179     self._is_string = isinstance(data.dtype, StringDtype)

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/strings/accessor.py:231, in StringMethods._validate(data)
    228 inferred_dtype = lib.infer_dtype(values, skipna=True)
    230 if inferred_dtype not in allowed_types:
--> 231     raise AttributeError("Can only use .str accessor with string values!")
    232 return inferred_dtype

AttributeError: Can only use .str accessor with string values!
In [44]:
import PyPDF2
In [45]:
x, y = 3 ,4
In [46]:
def foo():
    return "hello", 4
In [47]:
foo()
Out[47]:
('hello', 4)
In [48]:
a,b = foo() 

connecting to database from python¶

In [76]:
import sqlite3 # this is for sqlite connector
conn = sqlite3.connect("a1.db") # these two statements will be different for different database
cur = conn.cursor()
cur.execute("create table person (name varchar(100), email varchar(100));")
cur.execute("insert into person (name, email) values ('alice', 'alice@example.com');")
conn.commit()
In [77]:
cur = conn.cursor()
results = cur.execute("select * from person")
results.fetchall()
Out[77]:
[('alice', 'alice@example.com')]
In [54]:
def find_person(conn, email):
    q = "select * from person where email='{}'".format(email) # not very nice way of doing!
    print(q)
    cur = conn.cursor()
    results = cur.execute(q)
    return results.fetchone()
In [53]:
find_person(conn, "alice@example.com")
select * from person where email='alice@example.com'
Out[53]:
('alice', 'alice@example.com')
In [55]:
def find_person(conn, email):
    q = "select * from person where email=?"
    cur = conn.cursor()
    results = cur.execute(q, (email,))
    return results.fetchone()
In [56]:
find_person(conn, "alice@example.com")
Out[56]:
('alice', 'alice@example.com')
In [60]:
def query(conn, querystring, params=()):
    cur = conn.cursor()
    result = cur.execute(querystring, params)
    return result.fetchall()
In [61]:
query(conn, "select * from  person where name=?", ("alice",))
Out[61]:
[('alice', 'alice@example.com')]
In [62]:
query(conn, "select * from  person")
Out[62]:
[('alice', 'alice@example.com')]
In [63]:
conn.close() # after this any changes that your program has done will  be saved to database
In [64]:
conn = sqlite3.connect("a.db")
In [67]:
conn.commit() # just save the chages but don't close the connection
conn.close()

pandas and database¶

In [66]:
import pandas as pd

writing dataframe to database table¶

In [68]:
A
Out[68]:
NR WR SR ER NER TOTAL
0 59882.0 41115.0 34238.00 21526.0 2730 159491.0
1 1114.0 0.0 0.00 0.0 6 1120.0
2 1398.0 998.0 807.00 447.0 48 3698.0
3 355.0 33.0 77.00 149.0 29 643.0
4 11.0 49.0 128.00 NaN - 187.0
5 39.6 16.6 41.59 4.6 0.03 102.0
6 12.6 0.0 0.00 0.0 0.0 12.6
7 65470.0 43593.0 38117.00 21535.0 2827 160654.0
In [78]:
conn = sqlite3.connect("pdfdata") # the connection has to esatblished beforehand
In [70]:
## to_sql(tablename, con, if_exits=replace/append/error)
A.to_sql("tableA", con=conn, if_exists="replace") # be carefull , you should know what you are doing
Out[70]:
8
In [71]:
r = conn.cursor().execute("select * from tableA")
In [72]:
for item in r:
    print(item)
(0, 59882.0, 41115.0, 34238.0, 21526.0, '2730', 159491.0)
(1, 1114.0, 0.0, 0.0, 0.0, '6', 1120.0)
(2, 1398.0, 998.0, 807.0, 447.0, '48', 3698.0)
(3, 355.0, 33.0, 77.0, 149.0, '29', 643.0)
(4, 11.0, 49.0, 128.0, None, '-', 187.0)
(5, 39.6, 16.6, 41.59, 4.6, '0.03', 102.0)
(6, 12.6, 0.0, 0.0, 0.0, '0.0', 12.6)
(7, 65470.0, 43593.0, 38117.0, 21535.0, '2827', 160654.0)

reading data from databse table into a dataframe¶

In [80]:
df = pd.read_sql_query("select * from tableA", con= conn)
In [81]:
df
Out[81]:
index NR WR SR ER NER TOTAL
0 0 59882.0 41115.0 34238.00 21526.0 2730 159491.0
1 1 1114.0 0.0 0.00 0.0 6 1120.0
2 2 1398.0 998.0 807.00 447.0 48 3698.0
3 3 355.0 33.0 77.00 149.0 29 643.0
4 4 11.0 49.0 128.00 NaN - 187.0
5 5 39.6 16.6 41.59 4.6 0.03 102.0
6 6 12.6 0.0 0.00 0.0 0.0 12.6
7 7 65470.0 43593.0 38117.00 21535.0 2827 160654.0
In [82]:
df.describe()
Out[82]:
index NR WR SR ER TOTAL
count 8.00000 8.000000 8.000000 8.000000 7.000000 8.000000
mean 3.50000 16035.275000 10725.575000 9176.073750 6237.371429 40738.450000
std 2.44949 28830.682422 19535.641314 16699.952904 10448.391549 73664.786038
min 0.00000 11.000000 0.000000 0.000000 0.000000 12.600000
25% 1.75000 32.850000 12.450000 31.192500 2.300000 165.750000
50% 3.50000 734.500000 41.000000 102.500000 149.000000 881.500000
75% 5.25000 16019.000000 11027.250000 9164.750000 10986.500000 42646.250000
max 7.00000 65470.000000 43593.000000 38117.000000 21535.000000 160654.000000
In [84]:
query(conn, "select * from tableA where NER=?", ("-",))
Out[84]:
[(4, 11.0, 49.0, 128.0, None, '-', 187.0)]

microsoft sql database sample code¶

In [83]:
import pyodbc
cnxn = pyodbc.connect(r'Driver={SQL Server};Server=.\SQLEXPRESS;Database=myDB;Trusted_Connection=yes;')
cursor = cnxn.cursor()
cursor.execute("SELECT LastName FROM myContacts")
while 1:
    row = cursor.fetchone()
    if not row:
        break
    print(row.LastName)
cnxn.close()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [83], in <cell line: 1>()
----> 1 import pyodbc
      2 cnxn = pyodbc.connect(r'Driver={SQL Server};Server=.\SQLEXPRESS;Database=myDB;Trusted_Connection=yes;')
      3 cursor = cnxn.cursor()

ModuleNotFoundError: No module named 'pyodbc'

factoring your problem¶

In [86]:
headers, data
Out[86]:
(['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL'],
 [['59882', '41115', '34238', '21526', '2730', '159491'],
  ['1114', '0', '0', '0', '6', '1120'],
  ['1398', '998', '807', '447', '48', '3698'],
  ['355', '33', '77', '149', '29', '643'],
  ['11', '49', '128', '-', '-', '187'],
  ['39.60', '16.60', '41.59', '4.60', '0.03', '102'],
  ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'],
  ['65470', '43593', '38117', '21535', '2827', '160654']])
In [87]:
pd.DataFrame({"col1":[1,2,3],
             "col2":[2,3,4]})
Out[87]:
col1 col2
0 1 2
1 2 3
2 3 4
In [88]:
headers, data
Out[88]:
(['NR', 'WR', 'SR', 'ER', 'NER', 'TOTAL'],
 [['59882', '41115', '34238', '21526', '2730', '159491'],
  ['1114', '0', '0', '0', '6', '1120'],
  ['1398', '998', '807', '447', '48', '3698'],
  ['355', '33', '77', '149', '29', '643'],
  ['11', '49', '128', '-', '-', '187'],
  ['39.60', '16.60', '41.59', '4.60', '0.03', '102'],
  ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'],
  ['65470', '43593', '38117', '21535', '2827', '160654']])
In [91]:
dict(zip(header, data)) # i want columns not rows
Out[91]:
{'NR': ['59882', '41115', '34238', '21526', '2730', '159491'],
 'WR': ['1114', '0', '0', '0', '6', '1120'],
 'SR': ['1398', '998', '807', '447', '48', '3698'],
 'ER': ['355', '33', '77', '149', '29', '643'],
 'NER': ['11', '49', '128', '-', '-', '187'],
 'TOTAL': ['39.60', '16.60', '41.59', '4.60', '0.03', '102']}
In [93]:
pd.DataFrame(dict(zip(header, data))) # wrong
Out[93]:
NR WR SR ER NER TOTAL
0 59882 1114 1398 355 11 39.60
1 41115 0 998 33 49 16.60
2 34238 0 807 77 128 41.59
3 21526 0 447 149 - 4.60
4 2730 6 48 29 - 0.03
5 159491 1120 3698 643 187 102
In [97]:
data # collection of rows
Out[97]:
[['59882', '41115', '34238', '21526', '2730', '159491'],
 ['1114', '0', '0', '0', '6', '1120'],
 ['1398', '998', '807', '447', '48', '3698'],
 ['355', '33', '77', '149', '29', '643'],
 ['11', '49', '128', '-', '-', '187'],
 ['39.60', '16.60', '41.59', '4.60', '0.03', '102'],
 ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'],
 ['65470', '43593', '38117', '21535', '2827', '160654']]
In [96]:
#colection of columns!
In [98]:
data[0][0] # 
Out[98]:
'59882'
In [99]:
data[1][0]
Out[99]:
'1114'
In [100]:
data[2][0]
Out[100]:
'1398'
In [101]:
data[3][0]
Out[101]:
'355'
In [102]:
def column(data, columnnum):
    rowcount = len(data)
    return [data[i][columnnum] for i in range(rowcount)]
In [103]:
column(data, 0)
Out[103]:
['59882', '1114', '1398', '355', '11', '39.60', '12.6', '65470']
In [104]:
data
Out[104]:
[['59882', '41115', '34238', '21526', '2730', '159491'],
 ['1114', '0', '0', '0', '6', '1120'],
 ['1398', '998', '807', '447', '48', '3698'],
 ['355', '33', '77', '149', '29', '643'],
 ['11', '49', '128', '-', '-', '187'],
 ['39.60', '16.60', '41.59', '4.60', '0.03', '102'],
 ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'],
 ['65470', '43593', '38117', '21535', '2827', '160654']]
In [105]:
def column(data, columnnum):
    return [row[columnnum] for row in data]
In [106]:
column(data, 0)
Out[106]:
['59882', '1114', '1398', '355', '11', '39.60', '12.6', '65470']
In [107]:
def transpose(data):
    ## if data is collection of rows
    ## then transpose if nothing but collection of columns
    columncount = len(data[0])
    return [column(data, i) for i in range(columncount)]
In [108]:
transpose(data)
Out[108]:
[['59882', '1114', '1398', '355', '11', '39.60', '12.6', '65470'],
 ['41115', '0', '998', '33', '49', '16.60', '0.0', '43593'],
 ['34238', '0', '807', '77', '128', '41.59', '0.0', '38117'],
 ['21526', '0', '447', '149', '-', '4.60', '0.0', '21535'],
 ['2730', '6', '48', '29', '-', '0.03', '0.0', '2827'],
 ['159491', '1120', '3698', '643', '187', '102', '12.6', '160654']]
In [109]:
data
Out[109]:
[['59882', '41115', '34238', '21526', '2730', '159491'],
 ['1114', '0', '0', '0', '6', '1120'],
 ['1398', '998', '807', '447', '48', '3698'],
 ['355', '33', '77', '149', '29', '643'],
 ['11', '49', '128', '-', '-', '187'],
 ['39.60', '16.60', '41.59', '4.60', '0.03', '102'],
 ['12.6', '0.0', '0.0', '0.0', '0.0', '12.6'],
 ['65470', '43593', '38117', '21535', '2827', '160654']]
In [112]:
pd.DataFrame(dict(zip(headers, transpose(data))))
Out[112]:
NR WR SR ER NER TOTAL
0 59882 41115 34238 21526 2730 159491
1 1114 0 0 0 6 1120
2 1398 998 807 447 48 3698
3 355 33 77 149 29 643
4 11 49 128 - - 187
5 39.60 16.60 41.59 4.60 0.03 102
6 12.6 0.0 0.0 0.0 0.0 12.6
7 65470 43593 38117 21535 2827 160654
In [ ]: