Python Training at Intuit Bangalore - Day 3

Feb 20-22, 2017
Anand Chitipothu

These notes are available online at http://bit.ly/intuit17

© Pipal Academy LLP

Home | Day 1 | Day 2 | Day 3

Writing Beautiful Code

Working with Files

In [1]:
%%file three.txt
one
two
three
Writing three.txt
In [2]:
f = open("three.txt")
In [3]:
f.read()
Out[3]:
'one\ntwo\nthree'

The read function reads all the contents of the file. If you try read again, it'll give you empty string as you've already reached the end of the file.

In [4]:
f.read()
Out[4]:
''
In [5]:
open("three.txt").read()
Out[5]:
'one\ntwo\nthree'
In [6]:
print(open("three.txt").read())
one
two
three

We can also read the contents line by line.

In [7]:
lines = open("three.txt").readlines()
In [8]:
print(lines)
['one\n', 'two\n', 'three']
In [9]:
for line in lines:
    print(line)
one

two

three

Why are we getting those empty lines?

That is because the text has a new line and the print is adding another new line. To fix it, we can either remove the newline from the text or tell print not to add the new line.

In [10]:
for line in lines:
    print(line.strip("\n"))
one
two
three
In [11]:
for line in lines:
    print(line, end="")
one
two
three

Problem: Write a program cat.py that takes a filename as command-line argument and prints all the contents of that file.

$ python cat.py three.txt
one
two
three

Example: Word Count

Let us implement the unix word count program in Python. The program should print the line count, word count and char count for the given filename.

In [12]:
%%file numbers.txt
1 one
2 two
3 three
4 four
5 five
Writing numbers.txt
In [20]:
%%file wc.py
"""Program to compute line count, word count and char count of a file.

USAGE: python wc.py filename
"""
import sys

def linecount(f):
    return len(open(f).readlines())

def wordcount(f):
    return len(open(f).read().split())

def charcount(f):
    return len(open(f).read())

def main():
    f = sys.argv[1]
    print(linecount(f), wordcount(f), charcount(f), f)
    
if __name__ == "__main__":
    main()
Overwriting wc.py
In [21]:
!python wc.py numbers.txt
5 10 34 numbers.txt

How to find the python file in the current directory that has the largest number of lines.

In [22]:
import os

files = [f for f in os.listdir(".") if f.endswith(".py")]
In [24]:
import wc
In [25]:
max(files, key=wc.linecount)
Out[25]:
'square.py'

Writing to Files

Files can be opened in write mode by specifying "w" as second argument.

In [27]:
f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()
In [28]:
open("a.txt").read()
Out[28]:
'one\ntwo\n'

It is very important to close a file after writing. The contents are flushed to the disk only after closing.

Also, when a file is opened in write mode, it gets completely overwritten.

To append to an existing file, open it in append ("a") mode.

In [29]:
f = open("a.txt", "a")
f.write("three\n")
f.close()
In [30]:
open("a.txt").read()
Out[30]:
'one\ntwo\nthree\n'

The with statement

The with statement is handy when writing to files as it takes care of closing the files automatically.

In [31]:
with open("b.txt", "w") as f:
    f.write("one\n")
    f.write("two\n")    
# file gets closed here automatically    
In [32]:
open("b.txt").read()
Out[32]:
'one\ntwo\n'

Problem: Write a program copyfile.py to copy contents of one file to another. The program should accept a source file and destination file as command-line arguments and copies the source into destination.

$ python copyfile.py src.txt dest.txt

WARNING: Don't call this file copy.py as it inteferes with a standard library module with the same name.

Working with binary files

To open a file in binary mode, use "rb", "wb", and "ab" for read, write and append modes respectively.

In [34]:
# read in text mode
open("a.txt", "r").read()
Out[34]:
'one\ntwo\nthree\n'
In [35]:
# read in binary mode
open("a.txt", "rb").read()
Out[35]:
b'one\ntwo\nthree\n'
In [36]:
with open("binary.bin", "wb") as f:
    f.write(b"\x02\x03")
In [37]:
open("binary.bin", "rb").read()
Out[37]:
b'\x02\x03'
In [40]:
with open("kannada.txt", "w", encoding='utf-8') as f:
    f.write("\u0c05\u0c06")
In [41]:
open("kannada.txt", encoding="utf-8").read()
Out[41]:
'à°…à°†'
In [42]:
open("kannada.txt", "rb").read()
Out[42]:
b'\xe0\xb0\x85\xe0\xb0\x86'

stdin, stdout and stderr

Every process has three special files already open. stdin, stdout and stderr for standard input, output and error respectively.

The output of print usually goes to stdout.

In [43]:
print("helloworld")
helloworld
In [44]:
import sys
sys.stdout.write("helloworld")
helloworld

To display error, use stderr.

In [45]:
sys.stderr.write("Error: something went wrong")
Error: something went wrong
In [46]:
print("Error: something went wrong", file=sys.stderr)
Error: something went wrong

Dictionaries

In [47]:
d = {"x": 1, "y": 2, "z": 3}
In [48]:
print(d)
{'y': 2, 'z': 3, 'x': 1}
In [49]:
d['x']
Out[49]:
1
In [50]:
d['x'] = 11
In [51]:
d
Out[51]:
{'x': 11, 'y': 2, 'z': 3}
In [52]:
d['a'] = 76
In [53]:
d
Out[53]:
{'a': 76, 'x': 11, 'y': 2, 'z': 3}
In [54]:
person = {"name": "Alice", "email": "alice@example.com"}
In [55]:
"name" in person
Out[55]:
True
In [56]:
"phone" in person
Out[56]:
False
In [57]:
"phone" not in person
Out[57]:
True
In [58]:
person["phone"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-58-23f40ea4f1ec> in <module>()
----> 1 person["phone"]

KeyError: 'phone'
In [60]:
person.get("phone", "not-provided")
Out[60]:
'not-provided'
In [61]:
person.get("email", "not-provided")
Out[61]:
'alice@example.com'

Let us try to see a real world example.

The following is output of ifconfig. Let us try to model this data in Python.

en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    ether 60:33:2b:83:02:27
    inet6 fe80::6233:4bff:fe03:257%en1 prefixlen 64 scopeid 0x4
    inet 192.168.237.32 netmask 0xfffffc00 broadcast 192.168.239.255
    nd6 options=1<PERFORMNUD>
    media: autoselect
    status: active
In [62]:
en1 = {
    "ether": "60:33:2b:83:02:27",
    "ip": "192.168.237.32",
    "status": "active"
}

en0 = {
    "ether": "60:33:2b:83:02:28",
    "ip": "",
    "status": "inactive"
}
interfaces = {"en1": en1, "en0": en0}
In [63]:
interfaces
Out[63]:
{'en0': {'ether': '60:33:2b:83:02:28', 'ip': '', 'status': 'inactive'},
 'en1': {'ether': '60:33:2b:83:02:27',
  'ip': '192.168.237.32',
  'status': 'active'}}

How to find IP address of interface en1?

In [64]:
interfaces['en1']['ip']
Out[64]:
'192.168.237.32'

Iterating over dictionaries

In [65]:
d = {"x": 1, "y": 2, "z": 3}

Iterate over keys:

In [66]:
for k in d.keys():
    print(k)
y
z
x

Iterate over values:

In [67]:
for v in d.values():
    print(v)
2
3
1

iterate over key-value pairs:

In [68]:
for k, v in d.items():
    print(k, v)
y 2
z 3
x 1

If we iterate over a dictionary, it goes over the keys.

In [69]:
for k in d:
    print(k)
y
z
x

Let us look at a simple example.

In [70]:
marks = {
    "english": 76,
    "maths": 56,
    "science": 73
}    
In [72]:
for subject, score in marks.items():
    print(subject, score)
print("----")
print("Total", sum(marks.values()))
english 76
science 73
maths 56
----
Total 205

Example: Word Frequency

Write a program to compute frequency of words in a file.

In [73]:
%%file words.txt
five
five four
five four three
five four three two
five four three two one
Writing words.txt
In [84]:
%%file wordfreq.py
"""Program to compute frequency of words in a file.

USAGE: python wordfreq.py filename.txt
"""
import sys

def read_words(filename):
    return open(filename).read().split()

def wordfreq(words):
    """Computes the frequency of each word in the 
    given list of words.
    """
    print(words)
    freq = {}
    print(freq)
    for w in words:
        # if w in freq:
        #     freq[w] = freq[w] + 1
        # else:
        #     freq[w] = 1
        freq[w] = freq.get(w, 0) + 1
        print(w, freq)            
    return freq

def print_freq(freq):
    # TODO: improve this
    print(freq)

def main():
    filename = sys.argv[1]
    words = read_words(filename)
    freq = wordfreq(words)
    print_freq(freq)
    
if __name__ == "__main__":
    main()
Overwriting wordfreq.py
In [85]:
!python wordfreq.py words.txt
['five', 'five', 'four', 'five', 'four', 'three', 'five', 'four', 'three', 'two', 'five', 'four', 'three', 'two', 'one']
{}
five {'five': 1}
five {'five': 2}
four {'five': 2, 'four': 1}
five {'five': 3, 'four': 1}
four {'five': 3, 'four': 2}
three {'three': 1, 'five': 3, 'four': 2}
five {'three': 1, 'five': 4, 'four': 2}
four {'three': 1, 'five': 4, 'four': 3}
three {'three': 2, 'five': 4, 'four': 3}
two {'three': 2, 'two': 1, 'five': 4, 'four': 3}
five {'three': 2, 'two': 1, 'five': 5, 'four': 3}
four {'three': 2, 'two': 1, 'five': 5, 'four': 4}
three {'three': 3, 'two': 1, 'five': 5, 'four': 4}
two {'three': 3, 'two': 2, 'five': 5, 'four': 4}
one {'three': 3, 'two': 2, 'one': 1, 'five': 5, 'four': 4}
{'three': 3, 'two': 2, 'one': 1, 'five': 5, 'four': 4}
In [80]:
d = {}
In [81]:
d
Out[81]:
{}
In [82]:
d['x'] = 1
In [83]:
d
Out[83]:
{'x': 1}

The final result is:

In [86]:
freq = {'three': 3, 'two': 2, 'one': 1, 'five': 5, 'four': 4}
In [87]:
print(freq)
{'three': 3, 'one': 1, 'two': 2, 'five': 5, 'four': 4}

How to print this nicely, with one word and count in each line?

In [88]:
for w, count in freq.items():
    print(w, count)
three 3
one 1
two 2
five 5
four 4

Nice!

What if we want to order them by the count?

In [90]:
sorted(freq.items())
Out[90]:
[('five', 5), ('four', 4), ('one', 1), ('three', 3), ('two', 2)]
In [95]:
def get_value(item):
    #print(item)
    return item[1]
    
sorted(freq.items(), key=get_value, reverse=True)
Out[95]:
[('five', 5), ('four', 4), ('three', 3), ('two', 2), ('one', 1)]
In [96]:
for w, count in sorted(freq.items(), key=get_value, reverse=True):
    print(w, count)
five 5
four 4
three 3
two 2
one 1

We can also solve it in slightly different way.

In [97]:
sorted(freq.keys(), key=freq.get)
Out[97]:
['one', 'two', 'three', 'four', 'five']
In [98]:
sorted(freq.keys(), key=freq.get, reverse=True)
Out[98]:
['five', 'four', 'three', 'two', 'one']
In [99]:
sorted(freq, key=freq.get, reverse=True)
Out[99]:
['five', 'four', 'three', 'two', 'one']
In [100]:
for w in sorted(freq, key=freq.get, reverse=True):
    print(w, freq[w])
five 5
four 4
three 3
two 2
one 1

Q: How to get all keys for a value?

In [101]:
d = {"x": 1, "y": 2, "z": 1}
In [102]:
[k for k in d if d[k] == 1]
Out[102]:
['z', 'x']

Issues of mutalibility

In [104]:
x = [1, 2, 3, 4]
y = x
y.append(5)
print(x)
[1, 2, 3, 4, 5]
In [105]:
x = [1, 2, 3, 4]
y = x
y = [1, 2, 3]
print(x)
[1, 2, 3, 4]
In [106]:
x = 1
y = x
y = 2
print(x)
1

Classes

In [112]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
In [113]:
p = Point(3, 4)
print(p.x, p.y)
3 4
In [109]:
isinstance(p, Point)
Out[109]:
True

The __init__ is a special method that is called to initialize the newly created object. Objects are created by calling the class like a function. It creates the object and initializes it by calling the __init__ method.

The __init__ is optional.

In [114]:
class Foo:
    pass

foo = Foo()
In [115]:
isinstance(foo, Foo)
Out[115]:
True
In [116]:
isinstance(foo, Point)
Out[116]:
False
In [117]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def getx(self):
        return self.x
In [118]:
p = Point(2, 3)
print(p.getx())
2
In [119]:
# calling p.getx() is equivalant to:
Point.getx(p)
Out[119]:
2
In [120]:
p.z = 4
In [121]:
p.z
Out[121]:
4
In [123]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def getx(self):
        return self.x
    
    def display(self):
        print(self.x, self.y)
        
    def add(self, p):
        x = self.x + p.x
        y = self.y + p.y
        return Point(x, y)
    
p1 = Point(1, 2)
p2 = Point(10, 20)
p3 = p1.add(p2)
p3.display()
11 22

Problem: Add a method double to the above Point class. It should return a new point with both x and y coordinates doubled.

>>> p = Point(2, 3)
>>> p2 = p.double()
>>> p2.display()
4 6

Why do we need classes?

Let us say we want to model a bank account.

In [124]:
%%file bank0.py

balance = 0

def deposit(amount):
    global balance
    balance = balance + amount

def withdraw(amount):
    global balance
    balance = balance - amount

def get_balance():
    return balance

def main():
    deposit(100)
    withdraw(40)
    print(get_balance())
    
    deposit(20)
    print(get_balance())

if __name__ == "__main__":
    main()
Writing bank0.py
In [125]:
!python bank0.py
60
80

One big limitation of this implementation is that it supports only one bank account. It is not possible to support multiple accounts.

Let us try to address that issue.

In [129]:
%%file bank1.py
"""Implementation of bank accounts with support for multiple accounts.
"""

def make_account():
    return {"balance": 0}

def deposit(account, amount):
    account["balance"] += amount

def withdraw(account, amount):
    account["balance"] -= amount
    
def get_balance(account):
    return account["balance"]

def main():
    a1 = make_account()
    a2 = make_account()
    
    deposit(a1, 100)
    deposit(a2, 50)
    print(get_balance(a1), get_balance(a2))
    
    withdraw(a1, 30)
    withdraw(a2, 20)
    print(get_balance(a1), get_balance(a2))

if __name__ == "__main__":
    main()
Overwriting bank1.py
In [130]:
!python bank1.py
100 50
70 30

Now let us try doing the same using class.

In [136]:
%%file bank2.py
"""Class-based implementation of bank account.
"""

class BankAccount:
    def __init__(self):
        self.balance = 0

    def deposit(self, amount):
        self.balance += amount

    def withdraw(self, amount):
        self.balance -= amount

    def get_balance(self):
        return self.balance

def main():
    a1 = BankAccount()
    a2 = BankAccount()
    
    a1.deposit(100)
    a2.deposit(50)
    print(a1.get_balance(), a2.get_balance())
    
    a1.withdraw(30)
    a2.withdraw(20)
    print(a1.get_balance(), a2.get_balance())

if __name__ == "__main__":
    main()
Overwriting bank2.py
In [135]:
!python bank2.py
100 50
70 30
In [137]:
p = Point(2, 3)
In [138]:
p.x
Out[138]:
2
In [139]:
p.y
Out[139]:
3
In [140]:
p.__dict__
Out[140]:
{'x': 2, 'y': 3}
In [141]:
p.z = 4
In [142]:
p.__dict__
Out[142]:
{'x': 2, 'y': 3, 'z': 4}

Problem: Write a class Timer to measure the time taken in a task. The class should have start and stop methods and it should be able to find the time taken between then.

t = Timer()
t.start()
do_something()
t.stop()
print("Time taken:", t.get_time_taken())

Hint: see time.time()

In [152]:
import time

class Timer:
    def __init__(self):
        self.t_start = 0
        self.t_stop = 0
    
    def start(self):
        self.t_start = time.time()
    
    def stop(self):
        self.t_stop = time.time()
    
    def get_time_taken(self):
        return self.t_stop - self.t_start
    
def do_something(n):
    for i in range(n):
        for j in range(n):
            x = i*j

t = Timer()
t.start()
do_something(1000)
t.stop()
print(t.get_time_taken())
0.09971284866333008

Exeption Handling

What are exceptions?

In [153]:
xx
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-153-102f5037fe64> in <module>()
----> 1 xx

NameError: name 'xx' is not defined
In [154]:
1 + "2"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-154-b88986c5ffd8> in <module>()
----> 1 1 + "2"

TypeError: unsupported operand type(s) for +: 'int' and 'str'
In [155]:
int("bad-value")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-155-e43f0ab836f6> in <module>()
----> 1 int("bad-value")

ValueError: invalid literal for int() with base 10: 'bad-value'
In [156]:
open("nofile.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-156-dba134cf36ca> in <module>()
----> 1 open("nofile.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'nofile.txt'

How to handle exceptions?

try:
    do_something()
except Exception1 as e:
    handle_exception1()
except Exception2 as e:
    handle_exception2()
finally:
    some_cleanup()

Example: safeint

In [162]:
def safeint(strvalue, default):
    try:
        return int(strvalue)
    except ValueError:
        print("Invalid integer", repr(strvalue), file=sys.stderr)
        return default
In [163]:
safeint("bad-number", 0)
Invalid integer 'bad-number'
Out[163]:
0
In [159]:
safeint("3", 0)
Out[159]:
3

Let us look at a use case for this.

In [164]:
%%file sumfile.py
"""program to compute sum of all numbers in a file.
"""
import sys
filename = sys.argv[1]
lines = open(filename).readlines()
numbers = [int(line) for line in lines]
print(sum(numbers))
Writing sumfile.py
In [165]:
%%file ten.txt
1
2
3
4
5
6
7
8
9
10
Writing ten.txt
In [166]:
!python sumfile.py ten.txt
55

What happens if we pass a file that has invalid numbers?

In [167]:
%%file num.txt
1
2
3
N/A
4
none
5
Writing num.txt
In [168]:
!python sumfile.py num.txt
Traceback (most recent call last):
  File "sumfile.py", line 6, in <module>
    numbers = [int(line) for line in lines]
  File "sumfile.py", line 6, in <listcomp>
    numbers = [int(line) for line in lines]
ValueError: invalid literal for int() with base 10: 'N/A\n'

How to make this program ignore the bad values?

In [169]:
%%file sumfile.py
"""program to compute sum of all numbers in a file.
"""
import sys

def safeint(strvalue, default):
    try:
        return int(strvalue)
    except ValueError:
        print("Invalid integer", repr(strvalue), file=sys.stderr)
        return default

filename = sys.argv[1]
lines = open(filename).readlines()
numbers = [safeint(line, 0) for line in lines]
print(sum(numbers))
Overwriting sumfile.py
In [170]:
!python sumfile.py num.txt
Invalid integer 'N/A\n'
Invalid integer 'none\n'
15

Q: How to use classes defined in other files?

In [171]:
import bank2
a1 = bank2.BankAccount()
a2 = bank2.BankAccount()
In [172]:
from bank2 import BankAccount
a1 = BankAccount()
a2 = BankAccount()

Writing Command-line Applications

Professional command-line applications takes various flags and display nice help.

Let us look at grep command in unix.

In [173]:
!grep --help
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
	[-e pattern] [-f file] [--binary-files=value] [--color=when]
	[--context[=num]] [--directories=action] [--label] [--line-buffered]
	[--null] [pattern] [file ...]
Let us understand different kinds of arguments that are commonly used:
 -i, --ignore-case
         Perform case insensitive matching.  
         By default, grep is case sensitive.

 -c, --count
         Only a count of selected lines is written to 
         standard output. 

 -C[num, --context=num]
         Print num lines of leading and trailing context 
         surrounding each match.

 -f file, --file=file
         Read one or more newline separated patterns from file. 

pattern 
        Pattern to look for

[file ...]     
        zero or more filenames to search
There are different kinds of them here.
boolean flags (like -i)
flags that take an argument (like -f file)
positional argument with one value (like pattern)
positional arguments with more than one value (like file)
Also there are short flags (-i) and long flags (--ignore-case).

The argparse module

In [189]:
%%file echo.py
import argparse

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("message", help="message to display")
    p.add_argument("-r", "--repeats", 
                   type=int,
                   default=1,
                   help="number of times to display the message")
    return p.parse_args()
    
def main():
    args = parse_args()
    print(args)
    for i in range(args.repeats):
        print(args.message)

if __name__ == "__main__":
    main()
Overwriting echo.py
In [190]:
!python echo.py hello
Namespace(message='hello', n=1)
Traceback (most recent call last):
  File "echo.py", line 19, in <module>
    main()
  File "echo.py", line 15, in main
    for i in range(args.repeats):
AttributeError: 'Namespace' object has no attribute 'repeats'
In [185]:
!python echo.py 
usage: echo.py [-h] [-r REPEATS] message
echo.py: error: the following arguments are required: message
In [186]:
!python echo.py --help
usage: echo.py [-h] [-r REPEATS] message

positional arguments:
  message               message to display

optional arguments:
  -h, --help            show this help message and exit
  -r REPEATS, --repeats REPEATS
                        number of times to display the message
In [187]:
!python echo.py -r 4 hello
Namespace(message='hello', repeats=4)
hello
hello
hello
hello
In [188]:
!python echo.py --repeats 4 hello
Namespace(message='hello', repeats=4)
hello
hello
hello
hello

Let us look at boolean flags and positional arguments with multiple values.

In [206]:
%%file echo2.py
import argparse

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("message", nargs="+", help="message to display")
    p.add_argument("-r", "--repeats", 
                   type=int,
                   default=1,
                   help="number of times to display the message")
    p.add_argument("-u", "--upper-case", 
                   default=False,
                   action="store_true",
                   help="convert the message to upper case")
    return p.parse_args()
    
def main():
    args = parse_args()
    print(args)
    message = " ".join(args.message)
    if args.upper_case:
        message = message.upper()
    for i in range(args.repeats):
        print(message)

if __name__ == "__main__":
    main()
Overwriting echo2.py
In [207]:
!python echo2.py hello world
Namespace(message=['hello', 'world'], repeats=1, upper_case=False)
hello world
In [208]:
!python echo2.py -u -r 3 hello world
Namespace(message=['hello', 'world'], repeats=3, upper_case=True)
HELLO WORLD
HELLO WORLD
HELLO WORLD

Downloading stuff from web

In [209]:
from urllib.request import urlopen
In [210]:
response = urlopen("http://httpbin.org/html")
In [211]:
response
Out[211]:
<http.client.HTTPResponse at 0x10eb15940>
In [212]:
contents = response.read()
In [213]:
contents[:100]
Out[213]:
b'<!DOCTYPE html>\n<html>\n  <head>\n  </head>\n  <body>\n      <h1>Herman Melville - Moby-Dick</h1>\n\n     '

The contents of http response are always bytes.

In [214]:
html = contents.decode("utf-8")
In [215]:
print(html[:400])
<!DOCTYPE html>
<html>
  <head>
  </head>
  <body>
      <h1>Herman Melville - Moby-Dick</h1>

      <div>
        <p>
          Availing himself of the mild, summer-cool weather that now reigned in these latitudes, and in preparation for the peculiarly active pursuits shortly to be anticipated, Perth, the begrimed, blistered old blacksmith, had not removed his portable forge to the hold again, af

Q: How to find the status code?

In [216]:
response.status
Out[216]:
200

The third-party library requests makes it very easy to work with HTTP requests.

Install it using:

pip3 install requests
In [217]:
import requests
In [219]:
response = requests.get("http://httpbin.org/html")
print(response.text[:400])
<!DOCTYPE html>
<html>
  <head>
  </head>
  <body>
      <h1>Herman Melville - Moby-Dick</h1>

      <div>
        <p>
          Availing himself of the mild, summer-cool weather that now reigned in these latitudes, and in preparation for the peculiarly active pursuits shortly to be anticipated, Perth, the begrimed, blistered old blacksmith, had not removed his portable forge to the hold again, af

Q: How to look at the response headers?

In [220]:
response.headers
Out[220]:
{'Server': 'nginx', 'Content-Length': '3741', 'Content-Type': 'text/html; charset=utf-8', 'Connection': 'keep-alive', 'Date': 'Wed, 22 Feb 2017 09:29:07 GMT', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}

Q: How to pass query parameters?

In [221]:
response = requests.get("http://httpbin.org/get")
print(response.text)
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.11.1"
  }, 
  "origin": "103.15.250.13", 
  "url": "http://httpbin.org/get"
}

In [223]:
response = requests.get("http://httpbin.org/get", params={"query": "python class", "page": "2"})
print(response.text)
{
  "args": {
    "page": "2", 
    "query": "python class"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.11.1"
  }, 
  "origin": "103.15.250.13", 
  "url": "http://httpbin.org/get?query=python+class&page=2"
}

Q: How to send post data.

In [225]:
response = requests.post("http://httpbin.org/post", data={"query": "python class", "page": "2"})
print(response.text)
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "page": "2", 
    "query": "python class"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "25", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.11.1"
  }, 
  "json": null, 
  "origin": "103.15.250.13", 
  "url": "http://httpbin.org/post"
}

In [226]:
response = requests.post("http://httpbin.org/post", data='plain text payload')
print(response.text)
{
  "args": {}, 
  "data": "plain text payload", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "18", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.11.1"
  }, 
  "json": null, 
  "origin": "103.15.250.13", 
  "url": "http://httpbin.org/post"
}

How to add headers?

In [228]:
response = requests.post("http://httpbin.org/post", 
                         data='plain text payload',
                        headers={"X-Test-Header": "1234"})
print(response.text)
{
  "args": {}, 
  "data": "plain text payload", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "18", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.11.1", 
    "X-Test-Header": "1234"
  }, 
  "json": null, 
  "origin": "103.15.250.13", 
  "url": "http://httpbin.org/post"
}

Working with JSON APIs

In [229]:
import requests
response = requests.get("http://httpbin.org/get")
print(response.text)
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.11.1"
  }, 
  "origin": "103.15.250.13", 
  "url": "http://httpbin.org/get"
}

In [231]:
d = response.json()
In [232]:
print(d)
{'origin': '103.15.250.13', 'url': 'http://httpbin.org/get', 'headers': {'User-Agent': 'python-requests/2.11.1', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'Accept': '*/*'}, 'args': {}}
In [234]:
d['origin']
Out[234]:
'103.15.250.13'
In [235]:
import requests
In [236]:
url = "https://api.github.com/orgs/intuit/repos"
In [237]:
repos = requests.get(url).json()
In [241]:
type(repos)
Out[241]:
list
In [242]:
for repo in repos:
    print(repo['full_name'], repo['forks'])
intuit/heirloom 8
intuit/simple_deploy 17
intuit/sdbport 4
intuit/WeakForwarder 4
intuit/AutoRemoveObserver 10
intuit/LocationManager 291
intuit/rego 10
intuit/xhr-xdr-adapter 14
intuit/cfn-clone 3
intuit/AnimationEngine 73
intuit/istanbul-cobertura-badger 16
intuit/homebrew-cfn-clone 1
intuit/GroupedArray 20
intuit/node-pom-parser 2
intuit/sdp 95
intuit/Tank 14
intuit/ami-query 3
intuit/aws_account_utils 20
intuit/intuit-developer-nodejs 5
intuit/spring-config-client-fallback 1
intuit/Autumn 7
intuit/wasabi 34
intuit/AnimatedFormFieldTableViewCell 2
intuit/destructive_socks5_proxy 1
intuit/filemerge 1
intuit/ssp 7
intuit/karate 11

Lets us find the top 5 popular repos by the number of forks.

In [245]:
def get_forks(repo):
    return repo['forks']

repos = sorted(repos, key=get_forks, reverse=True)[:5]
In [246]:
for repo in repos:
    print(repo['full_name'], repo['forks'])
intuit/LocationManager 291
intuit/sdp 95
intuit/AnimationEngine 73
intuit/wasabi 34
intuit/GroupedArray 20

Now let us find out who commits to these repos.

In [251]:
def get_top_contributors(reponame):
    url = "https://api.github.com/repos/{}/stats/contributors".format(reponame)
    print(url)
    contributors = requests.get(url).json()
    contributors = sorted(contributors, key=lambda c: c['total'], reverse=True)
    for c in contributors[:5]:
        print(c['author']['login'], c['total'])
In [253]:
get_top_contributors("intuit/wasabi")
https://api.github.com/repos/intuit/wasabi/stats/contributors
jwtodd 108
longdogz 76
shoeffner 72
AndreaSuckro 58
jcwuzoegiver 29

Q: How to pass username password?

username = "xxx"
password = open("secret-file.txt").read().strip()
response = requests.get(url, auth=(username, password))

Q: How to read config files?

There are couple of different config file formats.

In [259]:
%%file test.conf
[auth]
username = guest
password = secret
Overwriting test.conf
In [260]:
import configparser
p = configparser.ConfigParser()
p.read("test.conf")
Out[260]:
['test.conf']
In [261]:
p.get("auth", "username")
Out[261]:
'guest'

Regular Expressions

In [262]:
import re
In [264]:
m = re.match("ab+", "abbb")
In [266]:
m.group()
Out[266]:
'abbb'

Patterns:

c - that character
. - any character
[abcd] - one of them
[a-z] - one of them in the range
[^abcd] - any one other than these
x* - zero or more x (x could any of the above)
x+ - one or more x
x? - zero or one x
(x) - match x and also remember it

\d - any digit
\s - any space

^ - begining of the string
$ - end of the string
In [267]:
text = "10 apples and 20 mangos"

Extract all numbers from the text

In [269]:
re.findall("\d+", text)
Out[269]:
['10', '20']
In [270]:
re.sub("\d+", "x", text)
Out[270]:
'x apples and x mangos'
In [278]:
re.sub("(\d+[^\d]*)\d+", "\\1x", text)
Out[278]:
'10 apples and x mangos'
In [ ]:
 
In [ ]:
 

Feedback

Please take sometime to let me know how you felt about the course by filling the form below.

View Feedback Form

In [ ]: