Feb 20-22, 2017
Anand Chitipothu
These notes are available online at http://bit.ly/intuit17
© Pipal Academy LLP
%%file three.txt
one
two
three
f = open("three.txt")
f.read()
The read function reads all the contents of the file. If you try read again, it'll give you empty string as you've already reached the end of the file.
f.read()
open("three.txt").read()
print(open("three.txt").read())
We can also read the contents line by line.
lines = open("three.txt").readlines()
print(lines)
for line in lines:
print(line)
Why are we getting those empty lines?
That is because the text has a new line and the print is adding another new line. To fix it, we can either remove the newline from the text or tell print not to add the new line.
for line in lines:
print(line.strip("\n"))
for line in lines:
print(line, end="")
Problem: Write a program cat.py that takes a filename as command-line argument and prints all the contents of that file.
$ python cat.py three.txt
one
two
three
Let us implement the unix word count program in Python. The program should print the line count, word count and char count for the given filename.
%%file numbers.txt
1 one
2 two
3 three
4 four
5 five
%%file wc.py
"""Program to compute line count, word count and char count of a file.
USAGE: python wc.py filename
"""
import sys
def linecount(f):
return len(open(f).readlines())
def wordcount(f):
return len(open(f).read().split())
def charcount(f):
return len(open(f).read())
def main():
f = sys.argv[1]
print(linecount(f), wordcount(f), charcount(f), f)
if __name__ == "__main__":
main()
!python wc.py numbers.txt
How to find the python file in the current directory that has the largest number of lines.
import os
files = [f for f in os.listdir(".") if f.endswith(".py")]
import wc
max(files, key=wc.linecount)
Files can be opened in write mode by specifying "w" as second argument.
f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()
open("a.txt").read()
It is very important to close a file after writing. The contents are flushed to the disk only after closing.
Also, when a file is opened in write mode, it gets completely overwritten.
To append to an existing file, open it in append ("a") mode.
f = open("a.txt", "a")
f.write("three\n")
f.close()
open("a.txt").read()
with statement¶The with statement is handy when writing to files as it takes care of closing the files automatically.
with open("b.txt", "w") as f:
f.write("one\n")
f.write("two\n")
# file gets closed here automatically
open("b.txt").read()
Problem: Write a program copyfile.py to copy contents of one file to another. The program should accept a source file and destination file as command-line arguments and copies the source into destination.
$ python copyfile.py src.txt dest.txt
WARNING: Don't call this file copy.py as it inteferes with a standard library module with the same name.
To open a file in binary mode, use "rb", "wb", and "ab" for read, write and append modes respectively.
# read in text mode
open("a.txt", "r").read()
# read in binary mode
open("a.txt", "rb").read()
with open("binary.bin", "wb") as f:
f.write(b"\x02\x03")
open("binary.bin", "rb").read()
with open("kannada.txt", "w", encoding='utf-8') as f:
f.write("\u0c05\u0c06")
open("kannada.txt", encoding="utf-8").read()
open("kannada.txt", "rb").read()
Every process has three special files already open. stdin, stdout and stderr for standard input, output and error respectively.
The output of print usually goes to stdout.
print("helloworld")
import sys
sys.stdout.write("helloworld")
To display error, use stderr.
sys.stderr.write("Error: something went wrong")
print("Error: something went wrong", file=sys.stderr)
d = {"x": 1, "y": 2, "z": 3}
print(d)
d['x']
d['x'] = 11
d
d['a'] = 76
d
person = {"name": "Alice", "email": "alice@example.com"}
"name" in person
"phone" in person
"phone" not in person
person["phone"]
person.get("phone", "not-provided")
person.get("email", "not-provided")
Let us try to see a real world example.
The following is output of ifconfig. Let us try to model this data in Python.
en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ether 60:33:2b:83:02:27
inet6 fe80::6233:4bff:fe03:257%en1 prefixlen 64 scopeid 0x4
inet 192.168.237.32 netmask 0xfffffc00 broadcast 192.168.239.255
nd6 options=1<PERFORMNUD>
media: autoselect
status: active
en1 = {
"ether": "60:33:2b:83:02:27",
"ip": "192.168.237.32",
"status": "active"
}
en0 = {
"ether": "60:33:2b:83:02:28",
"ip": "",
"status": "inactive"
}
interfaces = {"en1": en1, "en0": en0}
interfaces
How to find IP address of interface en1?
interfaces['en1']['ip']
d = {"x": 1, "y": 2, "z": 3}
Iterate over keys:
for k in d.keys():
print(k)
Iterate over values:
for v in d.values():
print(v)
iterate over key-value pairs:
for k, v in d.items():
print(k, v)
If we iterate over a dictionary, it goes over the keys.
for k in d:
print(k)
Let us look at a simple example.
marks = {
"english": 76,
"maths": 56,
"science": 73
}
for subject, score in marks.items():
print(subject, score)
print("----")
print("Total", sum(marks.values()))
Write a program to compute frequency of words in a file.
%%file words.txt
five
five four
five four three
five four three two
five four three two one
%%file wordfreq.py
"""Program to compute frequency of words in a file.
USAGE: python wordfreq.py filename.txt
"""
import sys
def read_words(filename):
return open(filename).read().split()
def wordfreq(words):
"""Computes the frequency of each word in the
given list of words.
"""
print(words)
freq = {}
print(freq)
for w in words:
# if w in freq:
# freq[w] = freq[w] + 1
# else:
# freq[w] = 1
freq[w] = freq.get(w, 0) + 1
print(w, freq)
return freq
def print_freq(freq):
# TODO: improve this
print(freq)
def main():
filename = sys.argv[1]
words = read_words(filename)
freq = wordfreq(words)
print_freq(freq)
if __name__ == "__main__":
main()
!python wordfreq.py words.txt
d = {}
d
d['x'] = 1
d
The final result is:
freq = {'three': 3, 'two': 2, 'one': 1, 'five': 5, 'four': 4}
print(freq)
How to print this nicely, with one word and count in each line?
for w, count in freq.items():
print(w, count)
Nice!
What if we want to order them by the count?
sorted(freq.items())
def get_value(item):
#print(item)
return item[1]
sorted(freq.items(), key=get_value, reverse=True)
for w, count in sorted(freq.items(), key=get_value, reverse=True):
print(w, count)
We can also solve it in slightly different way.
sorted(freq.keys(), key=freq.get)
sorted(freq.keys(), key=freq.get, reverse=True)
sorted(freq, key=freq.get, reverse=True)
for w in sorted(freq, key=freq.get, reverse=True):
print(w, freq[w])
Q: How to get all keys for a value?
d = {"x": 1, "y": 2, "z": 1}
[k for k in d if d[k] == 1]
x = [1, 2, 3, 4]
y = x
y.append(5)
print(x)
x = [1, 2, 3, 4]
y = x
y = [1, 2, 3]
print(x)
x = 1
y = x
y = 2
print(x)
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(3, 4)
print(p.x, p.y)
isinstance(p, Point)
The __init__ is a special method that is called to initialize the newly created object. Objects are created by calling the class like a function. It creates the object and initializes it by calling the __init__ method.
The __init__ is optional.
class Foo:
pass
foo = Foo()
isinstance(foo, Foo)
isinstance(foo, Point)
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def getx(self):
return self.x
p = Point(2, 3)
print(p.getx())
# calling p.getx() is equivalant to:
Point.getx(p)
p.z = 4
p.z
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def getx(self):
return self.x
def display(self):
print(self.x, self.y)
def add(self, p):
x = self.x + p.x
y = self.y + p.y
return Point(x, y)
p1 = Point(1, 2)
p2 = Point(10, 20)
p3 = p1.add(p2)
p3.display()
Problem: Add a method double to the above Point class. It should return a new point with both x and y coordinates doubled.
>>> p = Point(2, 3)
>>> p2 = p.double()
>>> p2.display()
4 6
Let us say we want to model a bank account.
%%file bank0.py
balance = 0
def deposit(amount):
global balance
balance = balance + amount
def withdraw(amount):
global balance
balance = balance - amount
def get_balance():
return balance
def main():
deposit(100)
withdraw(40)
print(get_balance())
deposit(20)
print(get_balance())
if __name__ == "__main__":
main()
!python bank0.py
One big limitation of this implementation is that it supports only one bank account. It is not possible to support multiple accounts.
Let us try to address that issue.
%%file bank1.py
"""Implementation of bank accounts with support for multiple accounts.
"""
def make_account():
return {"balance": 0}
def deposit(account, amount):
account["balance"] += amount
def withdraw(account, amount):
account["balance"] -= amount
def get_balance(account):
return account["balance"]
def main():
a1 = make_account()
a2 = make_account()
deposit(a1, 100)
deposit(a2, 50)
print(get_balance(a1), get_balance(a2))
withdraw(a1, 30)
withdraw(a2, 20)
print(get_balance(a1), get_balance(a2))
if __name__ == "__main__":
main()
!python bank1.py
Now let us try doing the same using class.
%%file bank2.py
"""Class-based implementation of bank account.
"""
class BankAccount:
def __init__(self):
self.balance = 0
def deposit(self, amount):
self.balance += amount
def withdraw(self, amount):
self.balance -= amount
def get_balance(self):
return self.balance
def main():
a1 = BankAccount()
a2 = BankAccount()
a1.deposit(100)
a2.deposit(50)
print(a1.get_balance(), a2.get_balance())
a1.withdraw(30)
a2.withdraw(20)
print(a1.get_balance(), a2.get_balance())
if __name__ == "__main__":
main()
!python bank2.py
p = Point(2, 3)
p.x
p.y
p.__dict__
p.z = 4
p.__dict__
Problem: Write a class Timer to measure the time taken in a task. The class should have start and stop methods and it should be able to find the time taken between then.
t = Timer()
t.start()
do_something()
t.stop()
print("Time taken:", t.get_time_taken())
Hint: see time.time()
import time
class Timer:
def __init__(self):
self.t_start = 0
self.t_stop = 0
def start(self):
self.t_start = time.time()
def stop(self):
self.t_stop = time.time()
def get_time_taken(self):
return self.t_stop - self.t_start
def do_something(n):
for i in range(n):
for j in range(n):
x = i*j
t = Timer()
t.start()
do_something(1000)
t.stop()
print(t.get_time_taken())
What are exceptions?
xx
1 + "2"
int("bad-value")
open("nofile.txt")
How to handle exceptions?
try:
do_something()
except Exception1 as e:
handle_exception1()
except Exception2 as e:
handle_exception2()
finally:
some_cleanup()
def safeint(strvalue, default):
try:
return int(strvalue)
except ValueError:
print("Invalid integer", repr(strvalue), file=sys.stderr)
return default
safeint("bad-number", 0)
safeint("3", 0)
Let us look at a use case for this.
%%file sumfile.py
"""program to compute sum of all numbers in a file.
"""
import sys
filename = sys.argv[1]
lines = open(filename).readlines()
numbers = [int(line) for line in lines]
print(sum(numbers))
%%file ten.txt
1
2
3
4
5
6
7
8
9
10
!python sumfile.py ten.txt
What happens if we pass a file that has invalid numbers?
%%file num.txt
1
2
3
N/A
4
none
5
!python sumfile.py num.txt
How to make this program ignore the bad values?
%%file sumfile.py
"""program to compute sum of all numbers in a file.
"""
import sys
def safeint(strvalue, default):
try:
return int(strvalue)
except ValueError:
print("Invalid integer", repr(strvalue), file=sys.stderr)
return default
filename = sys.argv[1]
lines = open(filename).readlines()
numbers = [safeint(line, 0) for line in lines]
print(sum(numbers))
!python sumfile.py num.txt
Q: How to use classes defined in other files?
import bank2
a1 = bank2.BankAccount()
a2 = bank2.BankAccount()
from bank2 import BankAccount
a1 = BankAccount()
a2 = BankAccount()
Professional command-line applications takes various flags and display nice help.
Let us look at grep command in unix.
!grep --help
Let us understand different kinds of arguments that are commonly used:
-i, --ignore-case
Perform case insensitive matching.
By default, grep is case sensitive.
-c, --count
Only a count of selected lines is written to
standard output.
-C[num, --context=num]
Print num lines of leading and trailing context
surrounding each match.
-f file, --file=file
Read one or more newline separated patterns from file.
pattern
Pattern to look for
[file ...]
zero or more filenames to search
There are different kinds of them here.
boolean flags (like -i)
flags that take an argument (like -f file)
positional argument with one value (like pattern)
positional arguments with more than one value (like file)
Also there are short flags (-i) and long flags (--ignore-case).
argparse module¶%%file echo.py
import argparse
def parse_args():
p = argparse.ArgumentParser()
p.add_argument("message", help="message to display")
p.add_argument("-r", "--repeats",
type=int,
default=1,
help="number of times to display the message")
return p.parse_args()
def main():
args = parse_args()
print(args)
for i in range(args.repeats):
print(args.message)
if __name__ == "__main__":
main()
!python echo.py hello
!python echo.py
!python echo.py --help
!python echo.py -r 4 hello
!python echo.py --repeats 4 hello
Let us look at boolean flags and positional arguments with multiple values.
%%file echo2.py
import argparse
def parse_args():
p = argparse.ArgumentParser()
p.add_argument("message", nargs="+", help="message to display")
p.add_argument("-r", "--repeats",
type=int,
default=1,
help="number of times to display the message")
p.add_argument("-u", "--upper-case",
default=False,
action="store_true",
help="convert the message to upper case")
return p.parse_args()
def main():
args = parse_args()
print(args)
message = " ".join(args.message)
if args.upper_case:
message = message.upper()
for i in range(args.repeats):
print(message)
if __name__ == "__main__":
main()
!python echo2.py hello world
!python echo2.py -u -r 3 hello world
from urllib.request import urlopen
response = urlopen("http://httpbin.org/html")
response
contents = response.read()
contents[:100]
The contents of http response are always bytes.
html = contents.decode("utf-8")
print(html[:400])
Q: How to find the status code?
response.status
The third-party library requests makes it very easy to work with HTTP requests.
Install it using:
pip3 install requests
import requests
response = requests.get("http://httpbin.org/html")
print(response.text[:400])
Q: How to look at the response headers?
response.headers
Q: How to pass query parameters?
response = requests.get("http://httpbin.org/get")
print(response.text)
response = requests.get("http://httpbin.org/get", params={"query": "python class", "page": "2"})
print(response.text)
Q: How to send post data.
response = requests.post("http://httpbin.org/post", data={"query": "python class", "page": "2"})
print(response.text)
response = requests.post("http://httpbin.org/post", data='plain text payload')
print(response.text)
How to add headers?
response = requests.post("http://httpbin.org/post",
data='plain text payload',
headers={"X-Test-Header": "1234"})
print(response.text)
import requests
response = requests.get("http://httpbin.org/get")
print(response.text)
d = response.json()
print(d)
d['origin']
import requests
url = "https://api.github.com/orgs/intuit/repos"
repos = requests.get(url).json()
type(repos)
for repo in repos:
print(repo['full_name'], repo['forks'])
Lets us find the top 5 popular repos by the number of forks.
def get_forks(repo):
return repo['forks']
repos = sorted(repos, key=get_forks, reverse=True)[:5]
for repo in repos:
print(repo['full_name'], repo['forks'])
Now let us find out who commits to these repos.
def get_top_contributors(reponame):
url = "https://api.github.com/repos/{}/stats/contributors".format(reponame)
print(url)
contributors = requests.get(url).json()
contributors = sorted(contributors, key=lambda c: c['total'], reverse=True)
for c in contributors[:5]:
print(c['author']['login'], c['total'])
get_top_contributors("intuit/wasabi")
Q: How to pass username password?
username = "xxx"
password = open("secret-file.txt").read().strip()
response = requests.get(url, auth=(username, password))
Q: How to read config files?
There are couple of different config file formats.
%%file test.conf
[auth]
username = guest
password = secret
import configparser
p = configparser.ConfigParser()
p.read("test.conf")
p.get("auth", "username")
import re
m = re.match("ab+", "abbb")
m.group()
Patterns:
c - that character
. - any character
[abcd] - one of them
[a-z] - one of them in the range
[^abcd] - any one other than these
x* - zero or more x (x could any of the above)
x+ - one or more x
x? - zero or one x
(x) - match x and also remember it
\d - any digit
\s - any space
^ - begining of the string
$ - end of the string
text = "10 apples and 20 mangos"
Extract all numbers from the text
re.findall("\d+", text)
re.sub("\d+", "x", text)
re.sub("(\d+[^\d]*)\d+", "\\1x", text)
Please take sometime to let me know how you felt about the course by filling the form below.