Python Training at Symantec - Chennai -- Day 3

March 27-31, 2017
Anand Chitipothu

These notes are available online at https://notes.pipal.in/2017/symantec

© Pipal Academy LLP

Home | Day 1| Day 2 | Day 3 | Day 4 | Day 5

Topics for today

  • Writing Custom Modules
  • Unit testing
  • Sorting Lists
  • String Formatting
  • Working with Files
  • Dictionaries

Writing Custom Modules

In [3]:
%%file mymodule.py
print("BEGIN mymodule")
x = 2

def add(a, b):
    return a+b

print(add(3, 4))
print("END mymodule")
Overwriting mymodule.py
In [4]:
!python mymodule.py
BEGIN mymodule
7
END mymodule
In [5]:
%%file a.py
import mymodule

print(mymodule.x)
print(mymodule.add(10, 20))
Writing a.py
In [6]:
!python a.py
BEGIN mymodule
7
END mymodule
2
30

The __name__ magic variable

In [7]:
%%file mymodule2.py
x = 2

def add(a, b):
    return a+b

print(add(3, 4))
print(__name__)
Writing mymodule2.py
In [8]:
!python mymodule2.py
7
__main__
In [9]:
# ask python to import mymodule2
!python -c "import mymodule2"
7
mymodule2

When the file is execited as a script, the value of __name__ is set to "__main__". When the file is imported as a module, the __name__ is set to the module name.

In [10]:
%%file mymodule3.py
x = 2

def add(a, b):
    return a+b

if __name__ == "__main__":
    # run the following code only when this file is executed as a script.
    # ignore this when this file is imported as a module.
    print(add(3, 4))
Writing mymodule3.py
In [11]:
!python mymodule3.py
7
In [12]:
!python -c "import mymodule3; print(mymodule3.add(10, 20))"
30

Docstrings

In [13]:
%%file sq.py
"""The square module.

The long description of the module after one empty line.
"""
import sys

def square(n):
    """Computes square of a number.
    
        >>> square(4)
        16
    """
    return n*n

def main():
    n = int(sys.argv[1])
    print(square(n))

if __name__ == "__main__":
    main()
Writing sq.py
In [14]:
!python sq.py 3
9
In [15]:
help("sq")
Help on module sq:

NAME
    sq - The square module.

DESCRIPTION
    The long description of the module after one empty line.

FUNCTIONS
    main()
    
    square(n)
        Computes square of a number.
        
        >>> square(4)
        16

FILE
    /Users/anand/trainings/2017/symantec/sq.py


In [16]:
import sq
sq.square(123)
Out[16]:
15129

Testing Python Programs

In [17]:
def square(x):
    return x*x
In [18]:
print(square(3))
9

It is hard to know if the output is correct or not.

In [19]:
def square(x):
    return x*x

if square(3) == 9:
    print("OK")
OK

Python has an assert statement to make sure something is True.

In [22]:
def square(x):
    return x*x

def test_square():
    assert square(3) == 9
    assert square(-3) == 90
In [23]:
test_square()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-23-108216126958> in <module>()
----> 1 test_square()

<ipython-input-22-83bc75c406fe> in test_square()
      4 def test_square():
      5     assert square(3) == 9
----> 6     assert square(-3) == 90

AssertionError: 

Python has some tools to make testing easier.

In [26]:
%%file sq2.py

def square(x):
    return x*x

def sum_of_squares(x, y):
    return square(x) + square(y)

def test_square():
    assert square(3) == 9
    assert square(-3) == 9
    
def test_sum_of_squares():
    assert sum_of_squares(0, 0) == 0
    assert sum_of_squares(3, 4) == 25
Overwriting sq2.py
In [27]:
!py.test sq2.py
============================= test session starts ==============================
platform darwin -- Python 3.5.2, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/anand/trainings/2017/symantec, inifile: 
collected 2 items 

sq2.py ..

=========================== 2 passed in 0.06 seconds ===========================
In [28]:
!py.test -v sq2.py
============================= test session starts ==============================
platform darwin -- Python 3.5.2, pytest-3.0.2, py-1.4.31, pluggy-0.3.1 -- /Users/anand/pyenvs/python35/bin/python3.5
cachedir: .cache
rootdir: /Users/anand/trainings/2017/symantec, inifile: 
collected 2 items 

sq2.py::test_square PASSED
sq2.py::test_sum_of_squares PASSED

=========================== 2 passed in 0.08 seconds ===========================
In [29]:
# run only the tests that has a keyword "sum"
!py.test -v -k sum sq2.py
============================= test session starts ==============================
platform darwin -- Python 3.5.2, pytest-3.0.2, py-1.4.31, pluggy-0.3.1 -- /Users/anand/pyenvs/python35/bin/python3.5
cachedir: .cache
rootdir: /Users/anand/trainings/2017/symantec, inifile: 
collected 2 items 

sq2.py::test_sum_of_squares PASSED

============================== 1 tests deselected ==============================
==================== 1 passed, 1 deselected in 0.02 seconds ====================

The py.test utility can be installed using:

pip install pytest

You may have to use sudo for it.

Lists (Continued...)

Sorting Lists

In [33]:
names = ["alice", "dave", "bob", "charlie"]
In [34]:
names.sort() # sorts in-place
In [32]:
names
Out[32]:
['alice', 'bob', 'charlie', 'dave']
In [35]:
names = ["alice", "dave", "bob", "charlie"]
sorted(names)
Out[35]:
['alice', 'bob', 'charlie', 'dave']

The sorted function returns a new sorted list and does not modify the original list.

In [36]:
sorted_names = sorted(names)
print(sorted_names)
['alice', 'bob', 'charlie', 'dave']

How to sort these names by length?

In [38]:
# this is not what we want
sorted([len(name) for name in names])
Out[38]:
[3, 4, 5, 7]
In [39]:
sorted(names, key=len)
Out[39]:
['bob', 'dave', 'alice', 'charlie']

Let us say, we have records of students containing name and marks.

In [40]:
records = [
    ("A", 80),
    ("B", 37),    
    ("C", 98),
    ("D", 72)    
]

How to find sort these records by marks?

In [44]:
def get_marks(record):
    print("get_marks", record)
    return record[1]
    
sorted(records, key=get_marks)
get_marks ('A', 80)
get_marks ('B', 37)
get_marks ('C', 98)
get_marks ('D', 72)
Out[44]:
[('B', 37), ('D', 72), ('A', 80), ('C', 98)]

We can use this even to sort files by size etc.

In [45]:
import os
files = sorted(os.listdir("."), key=os.path.getsize)
for f in files:
    print(f)
push
args.py
echo.py
date.py
hello.py
square.py
a.py
echo2a.py
mymodule2.py
echo2.py
mymodule.py
.cache
notes.txt
mymodule3.py
__pycache__
sq2.py
sq.py
.ipynb_checkpoints
jhub
Readme.txt
Makefile
day4.ipynb
day5.ipynb
index.ipynb
A1-solutions.ipynb
Welcome.ipynb
A2-solutions.ipynb
day3.ipynb
day1.ipynb
day2.ipynb
day4.html
day5.html
index.html
Welcome.html
A1-solutions.html
A2-solutions.html
day3.html
day1.html
day2.html
In [47]:
ls -Sr | tail
day2.ipynb
day5.html
day4.html
index.html
Welcome.html
A1-solutions.html
A2-solutions.html
day3.html
day1.html
day2.html

Problem: Write a function isorted to sort given names ignoring the case.

>>> isorted(["A", "b", "d", "C"])
['A', 'b', 'C', 'd']
In [49]:
sorted(["A", "b", "d", "C"])
Out[49]:
['A', 'C', 'b', 'd']
In [55]:
def isorted(names):
    return sorted(names, key=ignorecase)

def ignorecase(name):
    print("ignorecase", name)
    # FIXME
    return name.upper()
In [57]:
isorted(["A", "b", "d", "C"])
ignorecase A
ignorecase b
ignorecase d
ignorecase C
Out[57]:
['A', 'b', 'C', 'd']

Strings

In [58]:
x = "hello"
In [59]:
for c in x:
    print(c)
h
e
l
l
o
In [60]:
x[0]
Out[60]:
'h'
In [61]:
x[1]
Out[61]:
'e'
In [62]:
x[:4]
Out[62]:
'hell'
In [63]:
max("helloworld")
Out[63]:
'w'
In [66]:
line = "one\n"
In [67]:
line.strip() # remove all whitespace on both the sides
Out[67]:
'one'
In [68]:
"  hello  \n".strip()
Out[68]:
'hello'
In [69]:
"  hello  \n".strip("\n") # strip only the new line character
Out[69]:
'  hello  '

Q: How to replace only the first space?

In [71]:
"1 2 3 4".replace(" ", "-")
Out[71]:
'1-2-3-4'
In [72]:
"1 2 3 4".replace(" ", "-", 1)
Out[72]:
'1-2 3 4'

String Formatting

In [74]:
name = "Python"
message = "Hello {}".format(name)
print(message)
Hello Python
In [75]:
"chapter {}: {}".format(1, "Getting Started")
Out[75]:
'chapter 1: Getting Started'

Sometimes we may want to use the same value multiple times in the pattern.

In [76]:
t = "chapter {}: {}\nContents of {} will come here."
print(t.format(1, "Getting Started", "Getting Started"))
chapter 1: Getting Started
Contents of Getting Started will come here.
In [77]:
t = "chapter {0}: {1}\nContents of {1} will come here."
print(t.format(1, "Getting Started"))
chapter 1: Getting Started
Contents of Getting Started will come here.
In [78]:
t = "chapter {number}: {title}\nContents of {title} will come here."
print(t.format(number=1, title="Getting Started"))
chapter 1: Getting Started
Contents of Getting Started will come here.

Let us look at another example.

In [81]:
def make_link(url):
    return '<a href="{url}">{url}</a>'.format(url=url)
In [82]:
make_link("https://www.google.com/")
Out[82]:
'<a href="https://www.google.com/">https://www.google.com/</a>'

Another example:

In [83]:
email = """
Dear {name},

As you've requested, we've reset your password. Please use the following link 
to reset your password.

http://mywebsite.com/reset-password?code={code}

Thanks,
Our Team
"""

def send_email(to, message):
    # TODO
    print(message)

message = email.format(name="Alice", code='123456789')
send_email("hello@example.com", message)
Dear Alice,

As you've requested, we've reset your password. Please use the following link 
to reset your password.

http://mywebsite.com/reset-password?code=123456789

Thanks,
Our Team

Working with Files

In [85]:
%%file three.txt
one
two
three
Writing three.txt
In [86]:
f = open("three.txt")
In [87]:
f.read()
Out[87]:
'one\ntwo\nthree'
In [88]:
print(open("three.txt").read())
one
two
three

Remember that reading from a file can't be done again and again (unless you rewind the file position) using the same file object.

In [89]:
f = open("three.txt")
In [90]:
f.read()
Out[90]:
'one\ntwo\nthree'
In [91]:
f.read()
Out[91]:
''

It is also possible to pass a size to read to read a small chunk.

In [92]:
f = open("three.txt")
In [93]:
f.read(5)
Out[93]:
'one\nt'
In [94]:
f.read()
Out[94]:
'wo\nthree'
In [95]:
f.read()
Out[95]:
''

The other common way to read a file is readlines.

In [96]:
open("three.txt").readlines()
Out[96]:
['one\n', 'two\n', 'three']
In [97]:
lines = open("three.txt").readlines()
In [98]:
for line in lines:
    print(line)
one

two

three

Why is the extra line coming between the lines?

That is because the line has a new line char at the end and prints adds another new line.

We can solve it in two ways.

In [99]:
# remove the new line before printing.
for line in lines:
    print(line.strip("\n"))
one
two
three
In [100]:
# tell print to not add a new line
for line in lines:
    print(line, end="")
one
two
three
In [101]:
for n in [1, 2, 3, 4]:
    print(n, end="--")
1--2--3--4--

Problem: Write a program cat.py that takes a filename as command-line argument and prints all the contents of the file.

$ python cat.py three.txt
one
two
three
In [102]:
%%file cat.py
import sys
filename = sys.argv[1]
contents = open(filename).read()
print(contents)
Writing cat.py
In [103]:
!python cat.py three.txt
one
two
three

Example: Word Count

Let us implement the unix word count program in Python. The program should print the line count, word count and char count for given filename.

In [104]:
%%file numbers.txt
1 one
2 two
3 three
4 four
5 five
Writing numbers.txt
In [111]:
%%file wc.py
"""Program to find line count, word count and char count of given file.

USAGE: python wc.py filename
"""
import sys

def linecount(f):
    return len(open(f).readlines())

def wordcount(f):
    return len(open(f).read().split())

def charcount(f):
    return len(open(f).read())

def main():
    f = sys.argv[1]
    print(linecount(f), wordcount(f), charcount(f), f)
    
if __name__ == "__main__":
    main()
Overwriting wc.py
In [112]:
!python wc.py numbers.txt
5 10 33 numbers.txt

Writing to Files

A file can be opened in write mode by specifying "w" as second argument.

In [113]:
f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()
In [114]:
open("a.txt").read()
Out[114]:
'one\ntwo\n'

When a file is opened in write mode, it gets overwritten if it already exists.

The contents written to the file are flushed to the disk only when the file is closed. It is very important to close the file when writing.

To append to an existing file, open it in append ("a") mode.

In [115]:
f = open("a.txt", "a")
f.write("three\n")
f.close()
In [116]:
open("a.txt").read()
Out[116]:
'one\ntwo\nthree\n'

The with Statement

The with statement is handy when writing to files as it takes care of closing the file automatically.

In [118]:
with open("b.txt", "w") as f:
    f.write("one\n")
    f.write("two\n")    
# f gets closed automatically here    
In [119]:
open("b.txt").read()
Out[119]:
'one\ntwo\n'

Q: How to insert a line in the middle of a file?

Simple answer is not possible.

You need to create a new file, copy the first part, the line to be the inserted and the last part. Once that is done, move the new file to old file.

In [ ]:
 

Problem: Write a program copyfile.py to copy contents of one file to another. The program should accept the path of source file and destination file and copies the source file into the destination.

$ python copyfile.py numbers.txt numbers2.txt


WARNING: don't call this file copy.py as it interferes with a standard library module with the same name.

In [124]:
%%file copyfile.py
"""Program to copy files.

USAGE: python copyfile.py src.txt dest.txt
"""
import sys 

def copyfile(src, dest):
    contents = open(src).read()
    with open(dest, "w") as f:
        f.write(contents)

def copyfile2(src, dest):
    with open(src) as f1, open(dest, "w") as f2:
        f2.write(f1.read())
        
def main():
    src = sys.argv[1]
    dest = sys.argv[2]    
    copyfile(src, dest)

if __name__ == "__main__":
    main()
Overwriting copyfile.py
In [122]:
!python copyfile.py three.txt 3.txt
In [123]:
!cat 3.txt
one
two
three

Working with binary files

To open a file in binary mode, use mode as "rb", "wb", and "ab" for read, write and append respectively.

When a file is opened in binary mode, read and readlines returns bytes instead of strings.

In [125]:
open("a.txt", "r").read()
Out[125]:
'one\ntwo\nthree\n'
In [126]:
open("a.txt", "rb").read()
Out[126]:
b'one\ntwo\nthree\n'
In [128]:
f = open("binary.bin", "wb")
f.write(b"\x12\x98")
f.close()

When you are working with text files, it might be useful to specify the encoding.

In [133]:
f = open("tamil.txt", "w", encoding="utf-8")
f.write("\u0b85\u0b86")
f.close()
In [134]:
open("tamil.txt", encoding="utf-8").read()
Out[134]:
'அஆ'

stdin, stdout and stderr

In [135]:
import sys
In [136]:
# prints to stdout
print("hello")
hello
In [137]:
# same as
print("hello", file=sys.stdout)
hello
In [139]:
# write to stderr
print("ERROR: unable to connect to database", file=sys.stderr)
ERROR: unable to connect to database

Example: Reading CSV files

In [142]:
%%file a.csv
A1,B1,C1
A2,B2,C2
A3,B3,C3
A4,B4,C4
Overwriting a.csv

The CSV format has lot of special cases and there are standard library modules to parse them. But, we'll parse with hand just to get a feel of how easy to do such things in Python.

In [143]:
open("a.csv").readlines()
Out[143]:
['A1,B1,C1\n', 'A2,B2,C2\n', 'A3,B3,C3\n', 'A4,B4,C4']
In [144]:
# remove the new line char
[line.strip("\n") for line in open("a.csv").readlines()]
Out[144]:
['A1,B1,C1', 'A2,B2,C2', 'A3,B3,C3', 'A4,B4,C4']
In [145]:
[line.strip("\n").split(",") for line in open("a.csv").readlines()]
Out[145]:
[['A1', 'B1', 'C1'],
 ['A2', 'B2', 'C2'],
 ['A3', 'B3', 'C3'],
 ['A4', 'B4', 'C4']]

To iterate over the lines of a file, we can just loop over the file instead of reading lines.

In [147]:
[line.strip("\n").split(",") for line in open("a.csv")]
Out[147]:
[['A1', 'B1', 'C1'],
 ['A2', 'B2', 'C2'],
 ['A3', 'B3', 'C3'],
 ['A4', 'B4', 'C4']]
In [148]:
def read_csv(filename):
    return [line.strip("\n").split(",") for line in open(filename)]
In [150]:
dataset = read_csv("a.csv")
In [151]:
dataset
Out[151]:
[['A1', 'B1', 'C1'],
 ['A2', 'B2', 'C2'],
 ['A3', 'B3', 'C3'],
 ['A4', 'B4', 'C4']]
In [152]:
# how to get the first row?
dataset[0]
Out[152]:
['A1', 'B1', 'C1']
In [153]:
# how to get first column?
[row[0] for row in dataset]
Out[153]:
['A1', 'A2', 'A3', 'A4']
In [154]:
def get_column(dataset, colum_index):
    return [row[colum_index] for row in dataset]
In [155]:
get_column(dataset, 0)
Out[155]:
['A1', 'A2', 'A3', 'A4']
In [156]:
get_column(dataset, 1)
Out[156]:
['B1', 'B2', 'B3', 'B4']

Installing third-party modules

Python has a tool called pip to install third-party libraries.

Python maintains a catalogue of all third-party libraries at https://pypi.python.org/

Any of these packages can be installed using pip.

In [157]:
!pip install requests
Requirement already satisfied (use --upgrade to upgrade): requests in /Users/anand/pyenvs/python35/lib/python3.5/site-packages
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
In [ ]: