Python Virtual Training For Arcesium - Module II - Day 2¶

Dec 07-11, 2020 Vikrant Patil

These notes are available online at http://notes.pipal.in/2020/arcesium_finop_batch3/module2-day2.html

We will be using jupyter hub from http://lab.pipal.in for this training. Create a notebook with name module2-day2.ipynb for today's session.

Problems

write function listpy (just like os.listdir!) which uses list comprehension to identify py files in given directory.:

    >>> listpy(os.getcwd())
    add.py
    add1.py
    add2.py
    hello.py

find sum of all multiples of 7 or 11 below 1000.
There is a string "abrakadabra", we want to capitalize alternate character from it. how can we do it? can a list comprehension be used to do this?
Some records are stored with timestamp in database as shown below.:

    records = [("2018-11-11 24:04","11803","16602"),
                ("2018-11-11 24:09","11782","16568"),
                ("2018-11-11 24:14","11741","16524"),
                ("2018-11-11 24:19","11756","16543"),
                ("2018-11-11 24:24","11741","16538"),
                ("2018-11-11 24:28","11722","16558"),
                ("2018-11-11 24:34","11716","16457"),
                ("2018-11-11 24:39","11724","16430"),
                ("2018-11-11 24:44","11723","16572"),
                ("2018-11-11 24:49","11739","16611"),
                ("2018-11-11 24:54","11740","16501"),
                ("2018-11-11 24:58","11743","16568"),
                ("2018-11-12 01:04","11754","16626")]

The timestamp given above has been misprinted, instead of 11th Nov 24:04 , it should be 12 Nov 00:04! Write a function to correct the record. use list comprehension to do this.

Write a function factors which finds all factors of given number (include 1 and self)
Write a function is_prime which checks if given number is prime based on fact that prime number has only two factors 1 and self.
Write a list comprehension to generate prime numbers.

Bonus Problem

Implement excel function COUNTIFS as a function in python. COUNTIFS(criteria_list, condition). Here first argument is the list on which count will be performed. Second argument is condition as a string , as in excel.

        "<" --------- less than
        "<="--------- less than or equal to
        ">" --------- greater than
        ">="--------- greater than or equal to
        "<>"--------- not equal to

    Sample run is shown below:

    >>> a = [10,20,30,40,50,40,40,50]
    >>> COUNTIFS(a, "<40")
    3
    >>> COUNTIFS(a, ">=40")
    5
    >>> COUNTIFS(a, "40")
    >>> COUNTIFS(a, "<>40")
    5

List comprehesnion¶

names = ["akshat", "aman","avinash", "deekash", "shivam", "jyoti", "kunal"]

emails = []
domain = "arcesium.com"
for name in names:
    emails.append("@".join([name,domain]))

emails

['akshat@arcesium.com',
 'aman@arcesium.com',
 'avinash@arcesium.com',
 'deekash@arcesium.com',
 'shivam@arcesium.com',
 'jyoti@arcesium.com',
 'kunal@arcesium.com']

["@".join([name, domain]) for name in names]

['akshat@arcesium.com',
 'aman@arcesium.com',
 'avinash@arcesium.com',
 'deekash@arcesium.com',
 'shivam@arcesium.com',
 'jyoti@arcesium.com',
 'kunal@arcesium.com']

newlist = []
for item in oldlist:
    newlist.append(do_some_operation(item))

newlist = [do_some_operation(item) for item in oldlist]

names_with_a = []
for name in names:
    if name.startswith("a"):
        names_with_a.append(name)

names_with_a

['akshat', 'aman', 'avinash']

names_with_a = [name for name in names if name.startswith("a")]

names_with_a

['akshat', 'aman', 'avinash']

import os

def print_list(items):
    for item in items:
        print(item)

def listpy(dirpath):
    """
    filters files with extension .py
    """
    files = os.listdir(dirpath)
    pyfiles = [file for file in files if file.endswith(".py")]
    print_list(pyfiles)

listpy(".")

hello.py
addall.py
sample.py
add.py
mean.py
stats.py

listpy(os.getcwd())

hello.py
addall.py
sample.py
add.py
mean.py
stats.py

sum([2, 3, 4, 5])

14

sum([n for n in range(1, 1000) if n%7==0 or n%11==0])

110110

[print(file) for file in os.listdir() if file.endswith(".py")]

hello.py
addall.py
sample.py
add.py
mean.py
stats.py

[None, None, None, None, None, None]

x = print("hello")

hello

print(x)

None

text = "abrakadabra"

[c for i, c in enumerate(text) if i%2==1]

['b', 'a', 'a', 'a', 'r']

[c for i, c in enumerate(text) if i%2==0]

['a', 'r', 'k', 'd', 'b', 'a']

def transform(c, index):
    if index%2==0:
        return c.lower()
    else:
        return c.upper()

[transform(c, i) for i, c in enumerate(text)]

['a', 'B', 'r', 'A', 'k', 'A', 'd', 'A', 'b', 'R', 'a']

def transform(c, index):
    if index%2==0:
        return c.lower()
    else:
        return c.upper()

"".join([transform(c, i) for i, c in enumerate(text)])

'aBrAkAdAbRa'

"".join([c.lower() if i%2==0 else c.upper() for i,c in enumerate(text)])

'aBrAkAdAbRa'

x = "even" if len(names)%2==0 else "odd" # one liner if else

x

'odd'

len(names)

7

8%2 # remainder

0

8/2 # division

4.0

records = [("2018-11-11 23:58","11803","16602"),
            ("2018-11-11 24:04","11803","16602"),
                ("2018-11-11 24:09","11782","16568"),
                ("2018-11-11 24:14","11741","16524"),
                ("2018-11-11 24:19","11756","16543"),
                ("2018-11-11 24:24","11741","16538"),
                ("2018-11-11 24:28","11722","16558"),
                ("2018-11-11 24:34","11716","16457"),
                ("2018-11-11 24:39","11724","16430"),
                ("2018-11-11 24:44","11723","16572"),
                ("2018-11-11 24:49","11739","16611"),
                ("2018-11-11 24:54","11740","16501"),
                ("2018-11-11 24:58","11743","16568"),
                ("2018-11-12 01:04","11754","16626")]

def increament(strnum):
    return str(int(strnum)+1)

def correct_time(date):    
    
    dt, t = date.split()
    if "24:" in t: # make note of :, if you don't give it them we will confuse with min and hrs
        y , m , d_ = dt.split("-")
        d_ = increament(d_)
        return "-".join([y, m, d_]) + " " + t.replace("24:","00:")
    else:
        return date
    
[(correct_time(dt), x, y) for dt, x, y in records]

[('2018-11-11 23:58', '11803', '16602'),
 ('2018-11-12 00:04', '11803', '16602'),
 ('2018-11-12 00:09', '11782', '16568'),
 ('2018-11-12 00:14', '11741', '16524'),
 ('2018-11-12 00:19', '11756', '16543'),
 ('2018-11-12 00:24', '11741', '16538'),
 ('2018-11-12 00:28', '11722', '16558'),
 ('2018-11-12 00:34', '11716', '16457'),
 ('2018-11-12 00:39', '11724', '16430'),
 ('2018-11-12 00:44', '11723', '16572'),
 ('2018-11-12 00:49', '11739', '16611'),
 ('2018-11-12 00:54', '11740', '16501'),
 ('2018-11-12 00:58', '11743', '16568'),
 ('2018-11-12 01:04', '11754', '16626')]

"2018-11-11 24:09".split()

['2018-11-11', '24:09']

def correct_hour(h):
    hi = int(h)
    actual_hour = hi - 24
    return str(actual_hour).zfill(2) #1 -> 01
    
def correct_time(date):
    
    
    dt, t = date.split()
    h, min_ = t.split(":")
    if int(h)>=24:
        y , m , d_ = dt.split("-")
        d_ = increament(d_)
        return "-".join([y, m, d_]) + " " + correct_hour(h) + ":" + min_
    else:
        return date

rec = records + [("2018-11-11 25:04","11754","16626")]
print_list(rec)

print("*"*10)
[(correct_time(dt), x, y) for dt, x, y in rec]

('2018-11-11 23:58', '11803', '16602')
('2018-11-11 24:04', '11803', '16602')
('2018-11-11 24:09', '11782', '16568')
('2018-11-11 24:14', '11741', '16524')
('2018-11-11 24:19', '11756', '16543')
('2018-11-11 24:24', '11741', '16538')
('2018-11-11 24:28', '11722', '16558')
('2018-11-11 24:34', '11716', '16457')
('2018-11-11 24:39', '11724', '16430')
('2018-11-11 24:44', '11723', '16572')
('2018-11-11 24:49', '11739', '16611')
('2018-11-11 24:54', '11740', '16501')
('2018-11-11 24:58', '11743', '16568')
('2018-11-12 01:04', '11754', '16626')
('2018-11-11 25:04', '11754', '16626')
**********

[('2018-11-11 23:58', '11803', '16602'),
 ('2018-11-12 00:04', '11803', '16602'),
 ('2018-11-12 00:09', '11782', '16568'),
 ('2018-11-12 00:14', '11741', '16524'),
 ('2018-11-12 00:19', '11756', '16543'),
 ('2018-11-12 00:24', '11741', '16538'),
 ('2018-11-12 00:28', '11722', '16558'),
 ('2018-11-12 00:34', '11716', '16457'),
 ('2018-11-12 00:39', '11724', '16430'),
 ('2018-11-12 00:44', '11723', '16572'),
 ('2018-11-12 00:49', '11739', '16611'),
 ('2018-11-12 00:54', '11740', '16501'),
 ('2018-11-12 00:58', '11743', '16568'),
 ('2018-11-12 01:04', '11754', '16626'),
 ('2018-11-12 01:04', '11754', '16626')]

def factors(n):
    return [f for f in range(1, n+1) if n%f==0]

factors(5)

[1, 5]

factors(10)

[1, 2, 5, 10]

factors(7)

[1, 7]

factors(11)

[1, 11]

def is_prime(p):
    return factors(p)==[1,p]

is_prime(49)

False

is_prime(47)

True

def primes(n):
    """
    generate prime numbers less than n
    """
    return [i for i in range(1, n) if is_prime(i)]


def print_list1(items):
    for item in items:
        print(item, end=" ")

print_list1(primes(100))

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Working with files¶

import this # just prints phillosphy of python

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

%%file zen.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Writing zen.txt

filename = "zen.txt" # relative path..
with open(filename) as f:
    print(f.readline())
    print(f.readline())

# when you come out of with block ... file is automatially closed.

The Zen of Python, by Tim Peters

with open(filename) as f:
    for line in f:
        print(line)

The Zen of Python, by Tim Peters


Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

with open(filename) as f:
    for line in f:
        print(line, end="")

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

with open(filename) as f:
    for i , line in enumerate(f, start=1):
        print(i, line, end="")

1 The Zen of Python, by Tim Peters
2 
3 Beautiful is better than ugly.
4 Explicit is better than implicit.
5 Simple is better than complex.
6 Complex is better than complicated.
7 Flat is better than nested.
8 Sparse is better than dense.
9 Readability counts.
10 Special cases aren't special enough to break the rules.
11 Although practicality beats purity.
12 Errors should never pass silently.
13 Unless explicitly silenced.
14 In the face of ambiguity, refuse the temptation to guess.
15 There should be one-- and preferably only one --obvious way to do it.
16 Although that way may not be obvious at first unless you're Dutch.
17 Now is better than never.
18 Although never is often better than *right* now.
19 If the implementation is hard to explain, it's a bad idea.
20 If the implementation is easy to explain, it may be a good idea.
21 Namespaces are one honking great idea -- let's do more of those!

for index, p in enumerate(primes(50)):
    print(index, p)

0 2
1 3
2 5
3 7
4 11
5 13
6 17
7 19
8 23
9 29
10 31
11 37
12 41
13 43
14 47

p_ = primes(20)

p_

[2, 3, 5, 7, 11, 13, 17, 19]

for p in p_:
    print(p)

2
3
5
7
11
13
17
19

for i, p in enumerate(p_):
    print(i, p)

0 2
1 3
2 5
3 7
4 11
5 13
6 17
7 19

for i, c in enumerate("some text"):
    print(i, c)

0 s
1 o
2 m
3 e
4  
5 t
6 e
7 x
8 t

f = open(filename) # no data is loaded as of now

f.readline() # here data loading in memory starts

'The Zen of Python, by Tim Peters\n'

f.close() # how many files you can open , is pre-defined. so close it whenever work is done.

with open(filename) as f: # with block will make sure file is closed after the block is over
    f.readline()

with open(filename) as file:
     filetext = file.read() # you can read comple file with read method

filetext

"The Zen of Python, by Tim Peters\n\nBeautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.\nFlat is better than nested.\nSparse is better than dense.\nReadability counts.\nSpecial cases aren't special enough to break the rules.\nAlthough practicality beats purity.\nErrors should never pass silently.\nUnless explicitly silenced.\nIn the face of ambiguity, refuse the temptation to guess.\nThere should be one-- and preferably only one --obvious way to do it.\nAlthough that way may not be obvious at first unless you're Dutch.\nNow is better than never.\nAlthough never is often better than *right* now.\nIf the implementation is hard to explain, it's a bad idea.\nIf the implementation is easy to explain, it may be a good idea.\nNamespaces are one honking great idea -- let's do more of those!\n"

with open(filename) as f:
    lines = f.readlines()

lines

['The Zen of Python, by Tim Peters\n',
 '\n',
 'Beautiful is better than ugly.\n',
 'Explicit is better than implicit.\n',
 'Simple is better than complex.\n',
 'Complex is better than complicated.\n',
 'Flat is better than nested.\n',
 'Sparse is better than dense.\n',
 'Readability counts.\n',
 "Special cases aren't special enough to break the rules.\n",
 'Although practicality beats purity.\n',
 'Errors should never pass silently.\n',
 'Unless explicitly silenced.\n',
 'In the face of ambiguity, refuse the temptation to guess.\n',
 'There should be one-- and preferably only one --obvious way to do it.\n',
 "Although that way may not be obvious at first unless you're Dutch.\n",
 'Now is better than never.\n',
 'Although never is often better than *right* now.\n',
 "If the implementation is hard to explain, it's a bad idea.\n",
 'If the implementation is easy to explain, it may be a good idea.\n',
 "Namespaces are one honking great idea -- let's do more of those!\n"]

!ls # these are system command

addall.py		  module1-day1.ipynb  module2-day1.ipynb
add.py			  module1-day2.html   module2-day2.html
hello.py		  module1-day2.ipynb  module2-day2.ipynb
Makefile		  module1-day3.html   module-day5.html
mean.py			  module1-day3.ipynb  push
module1-assignment.html   module1-day4.html   __pycache__
module1-assignment.ipynb  module1-day4.ipynb  sample.py
module1-day1_2.html	  module1-day5.html   stats.py
module1-day1_2.ipynb	  module1-day5.ipynb  Untitled.html
module1-day1.html	  module2-day1.html   zen.txt

!cat add.py

import sys

def add(x, y):
    return x+y


x, y = sys.argv[1:3] # start from 1 till 2
print(add(int(x), int(y)))

!wc zen.txt

 21 144 857 zen.txt

!wc /home/vikrant/Downloads/eMARC\ sample\ data.csv

  3559903  10679707 125208937 /home/vikrant/Downloads/eMARC sample data.csv

!head /home/vikrant/Downloads/eMARC\ sample\ data.csv

﻿cummulativekwh,event_time,deployment_id
2250.640136,6/4/2019 12:00:00 AM,D0065
2250.640136,6/4/2019 12:01:00 AM,D0065
2250.650146,6/4/2019 12:02:00 AM,D0065
2250.650146,6/4/2019 12:03:00 AM,D0065
2250.650146,6/4/2019 12:04:00 AM,D0065
2250.650146,6/4/2019 12:05:00 AM,D0065
2250.650146,6/4/2019 12:06:00 AM,D0065
2250.650146,6/4/2019 12:07:00 AM,D0065
2250.650146,6/4/2019 12:08:00 AM,D0065

!head zen.txt

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.

%%file head.py
import sys

def head(filename):
    with open(filename) as f:
        for i in range(5):
            print(f.readline(), end="")
            

filename = sys.argv[1] 
head(filename)

Writing head.py

!python head.py zen.txt

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.

import head

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-115-4686af9faae2> in <module>
----> 1 import head

~/trainings/2020/arcesium_finop_batch3/head.py in <module>
      8 
      9 filename = sys.argv[1]
---> 10 head(filename)

~/trainings/2020/arcesium_finop_batch3/head.py in head(filename)
      2 
      3 def head(filename):
----> 4     with open(filename) as f:
      5         for i in range(5):
      6             print(f.readline(), end="")

FileNotFoundError: [Errno 2] No such file or directory: '-f'

import math

math.pi

3.141592653589793

!python head.py zen.txt lksjad dkfjdsj

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.

%%file head1.py
import sys

def head(filename):
    with open(filename) as f:
        for i in range(5):
            print(f.readline(), end="")

print(__name__)

if __name__ == "__main__":
    filename = sys.argv[1] 
    head(filename)

Writing head1.py

!python head1.py zen.txt

__main__
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.

import head1 # when you import file as a module, the variable __name__ -> modulename

%%file head2.py
import sys

def head(filename):
    with open(filename) as f:
        for i in range(5):
            print(f.readline(), end="")

print(__name__)

if __name__ == "__main__":
    filename = sys.argv[1] 
    head(filename)

Writing head2.py

import head2

head2

head2.head("zen.txt")

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.

%%file head3.py
import sys

def head(filename):
    with open(filename) as f:
        for i in range(5):
            print(f.readline(), end="")

Writing head3.py

import head3

head3.head("zen.txt")

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.

!python head3.py zen.txt kfdshf kjhfds

!head -n 4 zen.txt

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.

%%file head5.py
import sys

def head(filename, n):
    with open(filename) as f:
        for i in range(n):
            print(f.readline(), end="")

if __name__ == "__main__":
    n = int(sys.argv[1])
    filename = sys.argv[2]
    head(filename, n)

Overwriting head5.py

!python head5.py 9 zen.txt

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.

Writing files¶

with open("data.txt", "w") as f:
    for word in "hello create some words out of this!".split():
        f.write(word)
        f.write("\n")

!python3 head5.py 7 data.txt

hello
create
some
words
out
of
this!

with open("data.txt", "w") as f: # w mode ,  overwrites
    for word in "let me change the contents!".split():
        f.write(word)
        f.write("\n")

!cat data.txt

let
me
change
the
contents!

with open("data.txt", "a") as f: # a mode (append) ,  appends data
    for word in "this is additional data!".split():
        f.write(word)
        f.write("\n")

!cat data.txt

let
me
change
the
contents!
this
is
additional
data!

problems

Write a python script cat.py which mimics uniz command cat. Essentially cat.py should print contents of file to standard output (print statement).

!python3 cat.py zen.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

write a script wc.py which mimics unix command wc. It should print line count, word count and char count and filename.

python3 wc.py zen.txt
21 144 857 zen.txt

Write a function csvparse which will load data from file and create a list as shown below.

file%% data.csv
symbol,day,price
IBM,Monday,111.23
IBM,Tuesday,112.54
APPLE,Monday,200.45
APPLE,Tuesday,205.54

```

csvparse("data.csv") [['IBM', 'Monday', 111.23], ['IBM','Tuesday',112.54], ['APPLE','Monday',200.45] ['APPLE','Tuesday',205.54]]