Module 2 - Day 2

Login to Lab using your credentials. There is a notebook with name 2-2.ipynb already created for you. Open that and use it for today’s training.

Shut down all previous notebooks.

indexdata = [('IBM', 'Monday', 111.71436961893693),
            ('IBM', 'Tuesday', 141.21220022208635),
            ('IBM', 'Wednesday', 112.40571010053796),
            ('IBM', 'Thursday', 137.54133351926248),
            ('IBM', 'Friday', 140.25154281801224),
            ('MICROSOFT', 'Monday', 235.0403622499107),
            ('MICROSOFT', 'Tuesday', 225.0206535036475),
            ('MICROSOFT', 'Wednesday', 216.10342426936444),
            ('MICROSOFT', 'Thursday', 200.38038844494193),
            ('MICROSOFT', 'Friday', 235.80850482793264),
            ('APPLE', 'Monday', 321.49182055844256),
            ('APPLE', 'Tuesday', 340.63612771662815),
            ('APPLE', 'Wednesday', 303.9065277507285),
            ('APPLE', 'Thursday', 338.1350605764038),
            ('APPLE', 'Friday', 318.3912296144338)]

problems

[item for item in indexdata if item[1]=="Monday"]
[('IBM', 'Monday', 111.71436961893693),
 ('MICROSOFT', 'Monday', 235.0403622499107),
 ('APPLE', 'Monday', 321.49182055844256)]
[(name, day, price) for name, day, price in indexdata if day=="Monday"]
[('IBM', 'Monday', 111.71436961893693),
 ('MICROSOFT', 'Monday', 235.0403622499107),
 ('APPLE', 'Monday', 321.49182055844256)]
[price for name, day, price in indexdata if name=="IBM"]
[111.71436961893693,
 141.21220022208635,
 112.40571010053796,
 137.54133351926248,
 140.25154281801224]
max([price for name, day, price in indexdata if name=="IBM"])
141.21220022208635

Reading files from python

%%file zen.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Writing zen.txt
with open("zen.txt") as filehandle:
    filedata = filehandle.read() # this will read complete file
    print(filedata)
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
for i in range(5):
    x = i*i
    y = x + 2
    print(y)
2
3
6
11
18
with open("zen.txt") as filehandle:
    firstline = filehandle.readline() # this will read only one line
    secondline = filehandle.readline() # this will read next line

print(firstline)
print(secondline)
The Zen of Python, by Tim Peters


filehandle.readline()# file is closed after with block!
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[11], line 1
----> 1 filehandle.readline()

ValueError: I/O operation on closed file.
with open("zen.txt") as filehandle:
    print(filehandle.read())
    print("Next print", filehandle.read()) # this will result into empty string
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Next print 
with open("zen.txt") as f: 
    for line in f:
        print(line)
The Zen of Python, by Tim Peters



Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!
with open("zen.txt") as f: 
    for line in f:
        print(line, end="") # the line will have its own \n char at end
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

problem

  • Write a function print_with_linenums which takes a text file name/path as argument and prints contents of file with line numbers at start
def print_with_linenums(filename):
    with open(filename) as handle:
        for linenum, line in enumerate(handle, start=1):
            print(linenum, line, end="")
print_with_linenums("zen.txt")
1 The Zen of Python, by Tim Peters
2 
3 Beautiful is better than ugly.
4 Explicit is better than implicit.
5 Simple is better than complex.
6 Complex is better than complicated.
7 Flat is better than nested.
8 Sparse is better than dense.
9 Readability counts.
10 Special cases aren't special enough to break the rules.
11 Although practicality beats purity.
12 Errors should never pass silently.
13 Unless explicitly silenced.
14 In the face of ambiguity, refuse the temptation to guess.
15 There should be one-- and preferably only one --obvious way to do it.
16 Although that way may not be obvious at first unless you're Dutch.
17 Now is better than never.
18 Although never is often better than *right* now.
19 If the implementation is hard to explain, it's a bad idea.
20 If the implementation is easy to explain, it may be a good idea.
21 Namespaces are one honking great idea -- let's do more of those!
!pwd # is unix command
/opt/arcesium-python-2024-june
print_with_linenums("/opt/arcesium-python-2024-june/zen.txt") # this is absolute path
1 The Zen of Python, by Tim Peters
2 
3 Beautiful is better than ugly.
4 Explicit is better than implicit.
5 Simple is better than complex.
6 Complex is better than complicated.
7 Flat is better than nested.
8 Sparse is better than dense.
9 Readability counts.
10 Special cases aren't special enough to break the rules.
11 Although practicality beats purity.
12 Errors should never pass silently.
13 Unless explicitly silenced.
14 In the face of ambiguity, refuse the temptation to guess.
15 There should be one-- and preferably only one --obvious way to do it.
16 Although that way may not be obvious at first unless you're Dutch.
17 Now is better than never.
18 Although never is often better than *right* now.
19 If the implementation is hard to explain, it's a bad idea.
20 If the implementation is easy to explain, it may be a good idea.
21 Namespaces are one honking great idea -- let's do more of those!
print_with_linenums("zen.txt")
1 The Zen of Python, by Tim Peters
2 
3 Beautiful is better than ugly.
4 Explicit is better than implicit.
5 Simple is better than complex.
6 Complex is better than complicated.
7 Flat is better than nested.
8 Sparse is better than dense.
9 Readability counts.
10 Special cases aren't special enough to break the rules.
11 Although practicality beats purity.
12 Errors should never pass silently.
13 Unless explicitly silenced.
14 In the face of ambiguity, refuse the temptation to guess.
15 There should be one-- and preferably only one --obvious way to do it.
16 Although that way may not be obvious at first unless you're Dutch.
17 Now is better than never.
18 Although never is often better than *right* now.
19 If the implementation is hard to explain, it's a bad idea.
20 If the implementation is easy to explain, it may be a good idea.
21 Namespaces are one honking great idea -- let's do more of those!
print_with_linenums("testfolder/hello.txt") # relative path
1 hello world!
print_with_linenums("/opt/arcesium-python-2024-june/testfolder/hello.txt") # absolute path
1 hello world!

Parsing data from file

%%file salary.txt
100000
121323
200000
340000
150000
Writing salary.txt
with open("salary.txt") as f:
    data = []
    for line in f:
        data.append(line)
data
['100000\n', '121323\n', '200000\n', '340000\n', '150000\n']
with open("salary.txt") as f:
    data = []
    for line in f:
        data.append(line.strip()) # strip will remove trailing spaces
data # data is text!
['100000', '121323', '200000', '340000', '150000']
sum(data)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[30], line 1
----> 1 sum(data)

TypeError: unsupported operand type(s) for +: 'int' and 'str'
def read_int_list(filename):
    with open(filename) as f:
        data = []
        for line in f:
            n = line.strip()
            n = int(n)
            data.append(n)

    return data
read_int_list("salary.txt")
[100000, 121323, 200000, 340000, 150000]
def read_int_list(filename):
    with open(filename) as f:
        return [int(line.strip()) for line in f]
read_int_list("salary.txt")
[100000, 121323, 200000, 340000, 150000]
salaries = read_int_list("salary.txt")
max(salaries)
340000
sum(salaries)
911323

problem

  • Parse integers from a row given in a file, write a function to do this parse_row_as_ints
%%file salary.csv
11111,22222,33333,40000,50000
Writing salary.csv

bonus problem

  • parse csv tabular data as list of list of integers ( 2d list) , write a function parseints_from_csv
%%file tabular.csv
1,2,3,4,5
21,22,23,24,25,
31,32,33,34,35
Overwriting tabular.csv
[[1,2,3,4,5],
 [21,22,23,24,25],
 [31,32,33,34,35]]
"hello this is a statment".split(" ")
['hello', 'this', 'is', 'a', 'statment']
"121,232,23232".split(",")
['121', '232', '23232']
[int(token) for token in "121,232,23232".split(",")]
[121, 232, 23232]
f = open("salary.csv")
data = f.read()
data
'11111,22222,33333,40000,50000\n'
data.strip()
'11111,22222,33333,40000,50000'
data.strip().split(",")
['11111', '22222', '33333', '40000', '50000']
[ int(i) for i in data.strip().split(",")]
[11111, 22222, 33333, 40000, 50000]
f.close() # beacuse we did not open file using with statement
def parse_row_as_ints(filename):
    with open(filename) as f:
        textlist = f.read().strip().split(",")
        return [int(t) for t in textlist]
parse_row_as_ints("salary.csv")
[11111, 22222, 33333, 40000, 50000]
def sqauare(nums):
    data = []
    for i in nums:
        data.append(i*i)
        return data
sqauare(range(5))
[0]
def sqauare(nums):
    data = []
    for i in nums:
        data.append(i*i)
    return data
sqauare(range(5))
[0, 1, 4, 9, 16]
%%file tabular.csv
1,2,3,4,5
21,22,23,24,25,
31,32,33,34,35
Overwriting tabular.csv
def process_line(line):
    textlist = line.strip().split(",")
    return [int(t) for t in textlist]

def parseints_from_csv(filename):
    with open(filename) as f:
        rows = []
        for line in f:
            rows.append(process_line(line))
        return rows
parseints_from_csv("tabular.csv")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[63], line 1
----> 1 parseints_from_csv("tabular.csv")

Cell In[62], line 9, in parseints_from_csv(filename)
      7 rows = []
      8 for line in f:
----> 9     rows.append(process_line(line))
     10 return rows

Cell In[62], line 3, in process_line(line)
      1 def process_line(line):
      2     textlist = line.strip().split(",")
----> 3     return [int(t) for t in textlist]

Cell In[62], line 3, in <listcomp>(.0)
      1 def process_line(line):
      2     textlist = line.strip().split(",")
----> 3     return [int(t) for t in textlist]

ValueError: invalid literal for int() with base 10: ''
"21,22,23,24,25,".strip().split(",")
['21', '22', '23', '24', '25', '']
%%file tabular1.csv
1,2,3,4,5
21,22,23,24,25
31,32,33,34,35
Writing tabular1.csv
parseints_from_csv("tabular1.csv")
[[1, 2, 3, 4, 5], [21, 22, 23, 24, 25], [31, 32, 33, 34, 35]]
def process_line(line):
    textlist = line.strip().split(",")
    return [int(t) for t in textlist]

def parseints_from_csv(filename):
    with open(filename) as f:
        return [process_line(line) for line in f]
parseints_from_csv("tabular1.csv")
[[1, 2, 3, 4, 5], [21, 22, 23, 24, 25], [31, 32, 33, 34, 35]]

Write text files using python

with open("out.txt", "w") as fhandle: # write mode
    fhandle.write("Hello there!")
    fhandle.write("is this second line?")
with open("out.txt", "w") as fhandle: # writing it again will over write the file!
    fhandle.write("Hello there!")
    fhandle.write("\n") # unless we write \n , it won't be there in the file!
    fhandle.write("is this second line?")
nums = [1, 2, 3, 4, 5]
def write_list_to_file(listdata, filename):
    with open(filename, "w") as f:
        for item in listdata:
            f.write(str(item))
            f.write("\n")
write_list_to_file(nums, "nums.txt")
!cat nums.txt
1
2
3
4
5
%%file cat.py
import sys

def print_file(filename):
    with open(filename) as f:
        for line in f:
            print(line, end="")

filename = sys.argv[1]
print_file(filename)
Writing cat.py
!python cat.py nums.txt
1
2
3
4
5
with open("nums.txt", "a") as f: # this will append to existing file
    f.write("6")
!python cat.py nums.txt
1
2
3
4
5
6

problem

  • Data is given as a list, write it into a file each item on one row. write a function write_column for this

    >>> write_column(listdata, filename)
nums = [1, 2, 3, 43, 4,6]

String formating

x = 35
f"The value of x is {x}" # format string
'The value of x is 35'
"The value of x is " + str(x)
'The value of x is 35'
f"The value of x is {x}"
'The value of x is 35'
def process_item(item):
    return f"{item}\n"  # str(item)

def write_column(data, filename):
    with open(filename, "w") as f:
        for item in data:
            f.write(process_item(item))
            
write_column(nums, "n.txt")
!python cat.py n.txt
1
2
3
43
4
6
data = [[1, 2, 3, 4],
        [21, 22, 23, 24],
        [31, 32, 33, 34],
        [41, 42, 43, 44]]
def process_row(row):
    textrow = [f"{item}" for item in row]
    return ",".join(textrow)

def write_csv(data, filename):
    with open(filename, "w") as f:
        for row in data:
            f.write(process_row(row))
            f.write("\n")
            
words = ["one", "two", "three"]
",".join(words)
'one,two,three'
write_csv(data, "csvdata.csv")
!python cat.py csvdata.csv
1,2,3,4
21,22,23,24
31,32,33,34
41,42,43,44
%%file stocks.csv
symbol,high,low,gain
IBM,123,122,3
AGG,232,232,0
CAC,231,215,-3
Writing stocks.csv
def process_remaining_csvdata(fhandle):
    return [line.strip().split(",") for line in fhandle]

with open("stocks.csv") as f:
    headers = f.readline().strip().split(",")
    data = process_remaining_csvdata(f)
headers
['symbol', 'high', 'low', 'gain']
data
[['IBM', '123', '122', '3'],
 ['AGG', '232', '232', '0'],
 ['CAC', '231', '215', '-3']]

String formating - more

x, y = 10, 20
f"value x = {x} and value of y = {y}"
'value x = 10 and value of y = 20'
"value of x = {0} and value of y = {1}".format(30, 50)
'value of x = 30 and value of y = 50'
tables = [ [n*i for i in range(1, 11) ] for n in range(1, 6)]
tables
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 [3, 6, 9, 12, 15, 18, 21, 24, 27, 30],
 [4, 8, 12, 16, 20, 24, 28, 32, 36, 40],
 [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]]
for t in tables[0]:
    print(t)
1
2
3
4
5
6
7
8
9
10
for t in tables[0]:
    print(f"{t:2d}")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10