Module 2 - Day 2

Login to Lab using your credentials. There is a notebook with name 2-2.ipynb already created for you. Open that and use it for today’s training.

Shut down all previous notebooks.

indexdata = [('IBM', 'Monday', 111.71436961893693),
            ('IBM', 'Tuesday', 141.21220022208635),
            ('IBM', 'Wednesday', 112.40571010053796),
            ('IBM', 'Thursday', 137.54133351926248),
            ('IBM', 'Friday', 140.25154281801224),
            ('MICROSOFT', 'Monday', 235.0403622499107),
            ('MICROSOFT', 'Tuesday', 225.0206535036475),
            ('MICROSOFT', 'Wednesday', 216.10342426936444),
            ('MICROSOFT', 'Thursday', 200.38038844494193),
            ('MICROSOFT', 'Friday', 235.80850482793264),
            ('APPLE', 'Monday', 321.49182055844256),
            ('APPLE', 'Tuesday', 340.63612771662815),
            ('APPLE', 'Wednesday', 303.9065277507285),
            ('APPLE', 'Thursday', 338.1350605764038),
            ('APPLE', 'Friday', 318.3912296144338)]

problems

Write a list comprehension for finding data for given day (Monday)
Write a function to find weekly maximum for given symbol

[item for item in indexdata if item[1]=="Monday"]

[('IBM', 'Monday', 111.71436961893693),
 ('MICROSOFT', 'Monday', 235.0403622499107),
 ('APPLE', 'Monday', 321.49182055844256)]

[(name, day, price) for name, day, price in indexdata if day=="Monday"]

[('IBM', 'Monday', 111.71436961893693),
 ('MICROSOFT', 'Monday', 235.0403622499107),
 ('APPLE', 'Monday', 321.49182055844256)]

[price for name, day, price in indexdata if name=="IBM"]

[111.71436961893693,
 141.21220022208635,
 112.40571010053796,
 137.54133351926248,
 140.25154281801224]

max([price for name, day, price in indexdata if name=="IBM"])

141.21220022208635

Reading files from python

%%file zen.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Writing zen.txt

with open("zen.txt") as filehandle:
    filedata = filehandle.read() # this will read complete file
    print(filedata)

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

for i in range(5):
    x = i*i
    y = x + 2
    print(y)

with open("zen.txt") as filehandle:
    firstline = filehandle.readline() # this will read only one line
    secondline = filehandle.readline() # this will read next line

print(firstline)
print(secondline)

The Zen of Python, by Tim Peters

filehandle.readline()# file is closed after with block!

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[11], line 1
----> 1 filehandle.readline()

ValueError: I/O operation on closed file.

with open("zen.txt") as filehandle:
    print(filehandle.read())
    print("Next print", filehandle.read()) # this will result into empty string

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Next print

with open("zen.txt") as f: 
    for line in f:
        print(line)

The Zen of Python, by Tim Peters



Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

with open("zen.txt") as f: 
    for line in f:
        print(line, end="") # the line will have its own \n char at end

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

problem

Write a function print_with_linenums which takes a text file name/path as argument and prints contents of file with line numbers at start

def print_with_linenums(filename):
    with open(filename) as handle:
        for linenum, line in enumerate(handle, start=1):
            print(linenum, line, end="")

print_with_linenums("zen.txt")

1 The Zen of Python, by Tim Peters
2 
3 Beautiful is better than ugly.
4 Explicit is better than implicit.
5 Simple is better than complex.
6 Complex is better than complicated.
7 Flat is better than nested.
8 Sparse is better than dense.
9 Readability counts.
10 Special cases aren't special enough to break the rules.
11 Although practicality beats purity.
12 Errors should never pass silently.
13 Unless explicitly silenced.
14 In the face of ambiguity, refuse the temptation to guess.
15 There should be one-- and preferably only one --obvious way to do it.
16 Although that way may not be obvious at first unless you're Dutch.
17 Now is better than never.
18 Although never is often better than *right* now.
19 If the implementation is hard to explain, it's a bad idea.
20 If the implementation is easy to explain, it may be a good idea.
21 Namespaces are one honking great idea -- let's do more of those!

!pwd # is unix command

/opt/arcesium-python-2024-june

print_with_linenums("/opt/arcesium-python-2024-june/zen.txt") # this is absolute path

1 The Zen of Python, by Tim Peters
2 
3 Beautiful is better than ugly.
4 Explicit is better than implicit.
5 Simple is better than complex.
6 Complex is better than complicated.
7 Flat is better than nested.
8 Sparse is better than dense.
9 Readability counts.
10 Special cases aren't special enough to break the rules.
11 Although practicality beats purity.
12 Errors should never pass silently.
13 Unless explicitly silenced.
14 In the face of ambiguity, refuse the temptation to guess.
15 There should be one-- and preferably only one --obvious way to do it.
16 Although that way may not be obvious at first unless you're Dutch.
17 Now is better than never.
18 Although never is often better than *right* now.
19 If the implementation is hard to explain, it's a bad idea.
20 If the implementation is easy to explain, it may be a good idea.
21 Namespaces are one honking great idea -- let's do more of those!

print_with_linenums("zen.txt")

1 The Zen of Python, by Tim Peters
2 
3 Beautiful is better than ugly.
4 Explicit is better than implicit.
5 Simple is better than complex.
6 Complex is better than complicated.
7 Flat is better than nested.
8 Sparse is better than dense.
9 Readability counts.
10 Special cases aren't special enough to break the rules.
11 Although practicality beats purity.
12 Errors should never pass silently.
13 Unless explicitly silenced.
14 In the face of ambiguity, refuse the temptation to guess.
15 There should be one-- and preferably only one --obvious way to do it.
16 Although that way may not be obvious at first unless you're Dutch.
17 Now is better than never.
18 Although never is often better than *right* now.
19 If the implementation is hard to explain, it's a bad idea.
20 If the implementation is easy to explain, it may be a good idea.
21 Namespaces are one honking great idea -- let's do more of those!

print_with_linenums("testfolder/hello.txt") # relative path

1 hello world!

print_with_linenums("/opt/arcesium-python-2024-june/testfolder/hello.txt") # absolute path

1 hello world!

Parsing data from file

%%file salary.txt
100000
121323
200000
340000
150000

Writing salary.txt

with open("salary.txt") as f:
    data = []
    for line in f:
        data.append(line)

data

['100000\n', '121323\n', '200000\n', '340000\n', '150000\n']

with open("salary.txt") as f:
    data = []
    for line in f:
        data.append(line.strip()) # strip will remove trailing spaces

data # data is text!

['100000', '121323', '200000', '340000', '150000']

sum(data)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[30], line 1
----> 1 sum(data)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

def read_int_list(filename):
    with open(filename) as f:
        data = []
        for line in f:
            n = line.strip()
            n = int(n)
            data.append(n)

    return data

read_int_list("salary.txt")

[100000, 121323, 200000, 340000, 150000]

def read_int_list(filename):
    with open(filename) as f:
        return [int(line.strip()) for line in f]

read_int_list("salary.txt")

[100000, 121323, 200000, 340000, 150000]

salaries = read_int_list("salary.txt")

max(salaries)

sum(salaries)

problem

Parse integers from a row given in a file, write a function to do this parse_row_as_ints

%%file salary.csv
11111,22222,33333,40000,50000

Writing salary.csv

bonus problem

parse csv tabular data as list of list of integers ( 2d list) , write a function parseints_from_csv

%%file tabular.csv
1,2,3,4,5
21,22,23,24,25,
31,32,33,34,35

Overwriting tabular.csv

[[1,2,3,4,5],
 [21,22,23,24,25],
 [31,32,33,34,35]]

"hello this is a statment".split(" ")

['hello', 'this', 'is', 'a', 'statment']

"121,232,23232".split(",")

['121', '232', '23232']

[int(token) for token in "121,232,23232".split(",")]

[121, 232, 23232]

f = open("salary.csv")

data = f.read()

data

'11111,22222,33333,40000,50000\n'

data.strip()

'11111,22222,33333,40000,50000'

data.strip().split(",")

['11111', '22222', '33333', '40000', '50000']

[ int(i) for i in data.strip().split(",")]

[11111, 22222, 33333, 40000, 50000]

f.close() # beacuse we did not open file using with statement

def parse_row_as_ints(filename):
    with open(filename) as f:
        textlist = f.read().strip().split(",")
        return [int(t) for t in textlist]

parse_row_as_ints("salary.csv")

[11111, 22222, 33333, 40000, 50000]

def sqauare(nums):
    data = []
    for i in nums:
        data.append(i*i)
        return data

sqauare(range(5))

[0]

def sqauare(nums):
    data = []
    for i in nums:
        data.append(i*i)
    return data

sqauare(range(5))

[0, 1, 4, 9, 16]

%%file tabular.csv
1,2,3,4,5
21,22,23,24,25,
31,32,33,34,35

Overwriting tabular.csv

def process_line(line):
    textlist = line.strip().split(",")
    return [int(t) for t in textlist]

def parseints_from_csv(filename):
    with open(filename) as f:
        rows = []
        for line in f:
            rows.append(process_line(line))
        return rows

parseints_from_csv("tabular.csv")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[63], line 1
----> 1 parseints_from_csv("tabular.csv")

Cell In[62], line 9, in parseints_from_csv(filename)
      7 rows = []
      8 for line in f:
----> 9     rows.append(process_line(line))
     10 return rows

Cell In[62], line 3, in process_line(line)
      1 def process_line(line):
      2     textlist = line.strip().split(",")
----> 3     return [int(t) for t in textlist]

Cell In[62], line 3, in <listcomp>(.0)
      1 def process_line(line):
      2     textlist = line.strip().split(",")
----> 3     return [int(t) for t in textlist]

ValueError: invalid literal for int() with base 10: ''

"21,22,23,24,25,".strip().split(",")

['21', '22', '23', '24', '25', '']

%%file tabular1.csv
1,2,3,4,5
21,22,23,24,25
31,32,33,34,35

Writing tabular1.csv

parseints_from_csv("tabular1.csv")

[[1, 2, 3, 4, 5], [21, 22, 23, 24, 25], [31, 32, 33, 34, 35]]

def process_line(line):
    textlist = line.strip().split(",")
    return [int(t) for t in textlist]

def parseints_from_csv(filename):
    with open(filename) as f:
        return [process_line(line) for line in f]

parseints_from_csv("tabular1.csv")

[[1, 2, 3, 4, 5], [21, 22, 23, 24, 25], [31, 32, 33, 34, 35]]

Write text files using python

with open("out.txt", "w") as fhandle: # write mode
    fhandle.write("Hello there!")
    fhandle.write("is this second line?")

with open("out.txt", "w") as fhandle: # writing it again will over write the file!
    fhandle.write("Hello there!")
    fhandle.write("\n") # unless we write \n , it won't be there in the file!
    fhandle.write("is this second line?")

nums = [1, 2, 3, 4, 5]

def write_list_to_file(listdata, filename):
    with open(filename, "w") as f:
        for item in listdata:
            f.write(str(item))
            f.write("\n")

write_list_to_file(nums, "nums.txt")

!cat nums.txt

%%file cat.py
import sys

def print_file(filename):
    with open(filename) as f:
        for line in f:
            print(line, end="")

filename = sys.argv[1]
print_file(filename)

Writing cat.py

!python cat.py nums.txt

with open("nums.txt", "a") as f: # this will append to existing file
    f.write("6")

!python cat.py nums.txt

problem

Data is given as a list, write it into a file each item on one row. write a function write_column for this
```
>>> write_column(listdata, filename)
```

nums = [1, 2, 3, 43, 4,6]

String formating

x = 35

f"The value of x is {x}" # format string

'The value of x is 35'

"The value of x is " + str(x)

'The value of x is 35'

f"The value of x is {x}"

'The value of x is 35'

def process_item(item):
    return f"{item}\n"  # str(item)

def write_column(data, filename):
    with open(filename, "w") as f:
        for item in data:
            f.write(process_item(item))

write_column(nums, "n.txt")

!python cat.py n.txt

data = [[1, 2, 3, 4],
        [21, 22, 23, 24],
        [31, 32, 33, 34],
        [41, 42, 43, 44]]

def process_row(row):
    textrow = [f"{item}" for item in row]
    return ",".join(textrow)

def write_csv(data, filename):
    with open(filename, "w") as f:
        for row in data:
            f.write(process_row(row))
            f.write("\n")

words = ["one", "two", "three"]

",".join(words)

'one,two,three'

write_csv(data, "csvdata.csv")

!python cat.py csvdata.csv

1,2,3,4
21,22,23,24
31,32,33,34
41,42,43,44

%%file stocks.csv
symbol,high,low,gain
IBM,123,122,3
AGG,232,232,0
CAC,231,215,-3

Writing stocks.csv

def process_remaining_csvdata(fhandle):
    return [line.strip().split(",") for line in fhandle]

with open("stocks.csv") as f:
    headers = f.readline().strip().split(",")
    data = process_remaining_csvdata(f)

headers

['symbol', 'high', 'low', 'gain']

data

[['IBM', '123', '122', '3'],
 ['AGG', '232', '232', '0'],
 ['CAC', '231', '215', '-3']]

String formating - more

x, y = 10, 20

f"value x = {x} and value of y = {y}"

'value x = 10 and value of y = 20'

"value of x = {0} and value of y = {1}".format(30, 50)

'value of x = 30 and value of y = 50'

tables = [ [n*i for i in range(1, 11) ] for n in range(1, 6)]

tables

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 [3, 6, 9, 12, 15, 18, 21, 24, 27, 30],
 [4, 8, 12, 16, 20, 24, 28, 32, 36, 40],
 [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]]

for t in tables[0]:
    print(t)

for t in tables[0]:
    print(f"{t:2d}")