Advanced Python Training at Arcesium - Day 2

Nov 15-17, 2017 Vikrant Patil

These notes are available online at http://notes.pipal.in/2017/arcesium-oct-advpython/day2.html

© Pipal Academy LLP

Day 1 | Day 2 | Day 3

Iterators and Generators

In [1]:
nums = list(range(5))
In [2]:
for n in nums:
    print(n)
0
1
2
3
4
In [3]:
for c in "string":
    print(c)
s
t
r
i
n
g
In [4]:
for key in {"x":1,"y":2}:
    print(key)
x
y
In [5]:
for line in open("data.csv"):
    print(repr(line))
'A1,B1,C1\n'
'A2,B2,C2\n'
'A3,B3,C3\n'
'A4,B4,C4'

The Iteration protocol

In [6]:
items = [1,2,3]
In [7]:
itr = iter(items)
In [8]:
next(itr)
Out[8]:
1
In [9]:
next(itr)
Out[9]:
2
In [10]:
next(itr)
Out[10]:
3
In [11]:
next(itr)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-11-94b7b2f7f392> in <module>()
----> 1 next(itr)

StopIteration: 

Generators

In [12]:
def square(numbers):
    for n in numbers:
        yield n*n
In [13]:
sq5 = square(range(1,6))
In [14]:
sq5
Out[14]:
<generator object square at 0x7f632c212b48>
In [15]:
for i in sq5:
    print(i)
1
4
9
16
25
In [16]:
sq4 = square(range(1,4))
In [17]:
next(sq4)
Out[17]:
1
In [18]:
range(5)
Out[18]:
range(0, 5)
In [22]:
def square(numbers):
    print("Begin squares")
    for i in numbers:
        print("Computing square of ",i)
        yield i*i
        print("After yield")
        
    print("Finish square")
In [23]:
sq4 = square(range(1,4))
In [24]:
next(sq4)
Begin squares
Computing square of  1
Out[24]:
1
In [25]:
next(sq4)
After yield
Computing square of  2
Out[25]:
4
In [26]:
next(sq4)
After yield
Computing square of  3
Out[26]:
9
In [27]:
next(sq4)
After yield
Finish square
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-27-b9eab369b80c> in <module>()
----> 1 next(sq4)

StopIteration: 
In [28]:
for s in square(range(1,5)):
    print(s)
Begin squares
Computing square of  1
1
After yield
Computing square of  2
4
After yield
Computing square of  3
9
After yield
Computing square of  4
16
After yield
Finish square
In [32]:
def f():
    for i in range(1000):
        if i ==13:
            return
        yield i*i
In [33]:
for s in f():
    print(s)
0
1
4
9
16
25
36
49
64
81
100
121
144
In [34]:
def f():
    for i in range(1000):
        if i ==3:
            return
        yield i*i
In [35]:
g = f()
In [36]:
next(g)
Out[36]:
0
In [37]:
next(g)
Out[37]:
1
In [38]:
next(g)
Out[38]:
4
In [39]:
next(g)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-39-5f315c5de15b> in <module>()
----> 1 next(g)

StopIteration: 

problem: Write a generators function countdown that takes a number n as argument and generates all numbers down to 0 starting from n

>>> for i in countdown(3):
...     print(i)
3
2
1
0

problem: Write a generator triangular that takes number n as argument and generates sequence of first n triangular numbers. nth triangular number is sum of fisrt n natural numbers.

>>> for t in triangular(5):
...     print(t, end=",")
1,3,6,10,15

Bonus problem: Remove duplicates from a sequence while maintianing order. Can same function be used to remove duplicate lines from a file?

>>> for item in consumedup([3,5,3,4,5,6,7,8,8,9]):
...     print(item, end=",")
2,5,4,6,7,8,9
In [40]:
x = set()
In [41]:
def countdown(n):
    while n>=0:
        yield n
        n -= 1
In [42]:
for i in countdown(3):
    print(i)
3
2
1
0
In [43]:
def triangular(n):
    for i in range(1, n+1):
        yield sum(range(1,i+1))
In [44]:
for t in triangular(5):
    print(t, end=",")
1,3,6,10,15,
In [45]:
def consumedup(seq):
    seen = set()
    for item in seq:
        if item not in seen:
            yield item
            seen.add(item)
            
In [47]:
g = consumedup([3,5,3,4,5,6,7,8,8,9])
In [48]:
for item in g:
    print(item, end=",")
3,5,4,6,7,8,9,
In [51]:
%%file duplicatelines.txt
Saving file at /day2.ipynb
Saving file at /day2.ipynb
Saving file at /day2.ipynb
hello
hello
x
Overwriting duplicatelines.txt
In [52]:
for line in consumedup(open("duplicatelines.txt")):
    print(line, end="")
Saving file at /day2.ipynb
hello
x

Generator Expressions

In [53]:
[n*n for n in range(1,11)] # list comprehension
Out[53]:
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
In [56]:
s = (n*n for n in range(1,11)) # generator expression
In [55]:
s
Out[55]:
<generator object <genexpr> at 0x7f632c134308>
In [57]:
sum(s)
Out[57]:
385
In [58]:
max(s)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-9b5ef623e450> in <module>()
----> 1 max(s)

ValueError: max() arg is an empty sequence
In [59]:
sum((x*x for x in range(1000000)))
Out[59]:
333332833333500000
In [64]:
sum(x*x for x in range(1000000))# when generator expression is the only argument
                                # to function then you can skeep parenthesis
Out[64]:
333332833333500000
In [62]:
g = consumedup(x*x for x in range(1,5)) 
In [63]:
for i in g:
    print(i, end=",")
1,4,9,16,
In [66]:
def ones():
    count = 0
    while True:
        if count >=3:
            DOOM
        yield 1
        count += 1
In [67]:
one = ones()
next(one)
next(one)
next(one)
next(one)
next(one)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-67-5acdb7be04d5> in <module>()
      3 next(one)
      4 next(one)
----> 5 next(one)
      6 next(one)

<ipython-input-66-f88b46c71a5d> in ones()
      3     while True:
      4         if count >=3:
----> 5             DOOM
      6         yield 1
      7         count += 1

NameError: name 'DOOM' is not defined
In [68]:
next(one)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-68-d9ee1c3c65c2> in <module>()
----> 1 next(one)

StopIteration: 

What is the advantage of using generators/itereators

  • Very usefull for huge data, only part of data is loded in memory
  • you can build lazy pipelines of dataprocessing

Example: Building data pipelines

In [69]:
import os
def find(root):
    for path, dirnames, filenames in os.walk(root):
        for f in filenames:
            yield os.path.join(path, f)
In [70]:
def take(n, seq):
    it = iter(seq)
    return  list(next(it) for i in range(n))
In [71]:
def integers():
    """
    generates infinite sequence of natural numbers
    """
    i = 1
    while True:
        yield i
        i += 1
        
def squares(numbers):
    return (n*n for n in numbers)
In [72]:
take(10, squares(integers()))
Out[72]:
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
In [76]:
def grep(pattern, seq):
    return (x for x in seq if pattern in x)
In [77]:
files = find(".")
pyfiles = grep(".py", files)
print(take(10, pyfiles))
['./module.py', './trace.py', './fib.py', './cmdline.py', './commands.py', './fib1.py', './module1.py', './memoize.py', './sum.py', './__pycache__/module.cpython-36.pyc']
In [80]:
def count(seq):
    return sum(1 for item in seq)
In [81]:
count(range(100))
Out[81]:
100
In [82]:
def readlines(filenames):
    """
    returns iterator over lines of all files
    """
    for f in filenames:
        for line in open(f):
            yield line

How many line of python code we have written during this course

In [83]:
files = find(".")
pyfiles = grep(".py", files)
lines = readlines(pyfiles)
print(count(lines))
144

How many pythong function we have written?

In [85]:
files = find(".")
pyfiles = grep(".py", files)
lines = readlines(pyfiles)
functions = grep("def " ,lines)
print(count(functions))
19

problem: Write a function get_paragraphs to split given text into paragraphs. Paragraphs are seperated by empty line. The function should take a sequence of lines as argument and return a sequence of paragraphs. For sample input, see http://anandology.com/tmp/pg1342.txt once the function is there, we should be able to find:

  • number of paragraphs
  • The longest paragraph
In [86]:
x = [1,2,3,4]
In [87]:
itr = iter(x)
In [88]:
type(itr)
Out[88]:
list_iterator
In [89]:
y = (i for i in range(5))
In [90]:
type(y)
Out[90]:
generator
In [91]:
type(range(1,2))
Out[91]:
range
In [92]:
r = range(1,2)
In [95]:
type(range)
Out[95]:
type
In [96]:
range.__class__
Out[96]:
type
In [99]:
next(y)
Out[99]:
0
In [100]:
if "":
    print("x")
In [101]:
def get_paragraphs(seq):
    paragraphs = []
    for line in seq:
        if line.strip()=="" and paragraphs:
            yield "".join(paragraphs)
            paragraphs = []
        paragraphs.append(line)
    
    if paragraphs:
        yield "".join(paragraphs)
In [103]:
count(get_paragraphs(["A\nB", "\n", "A\n","B\n","\n" ,"A\n", "A\n"]))
Out[103]:
3
In [104]:
max(get_paragraphs(["A\nB", "\n", "A\n","B\n","\n" ,"A\n", "A\n"]), key=len)
Out[104]:
'\nA\nB\n'
In [107]:
g = get_paragraphs(open("pg1342.txt"))
In [108]:
count(g)
Out[108]:
2395
In [109]:
g = get_paragraphs(open("pg1342.txt"))
In [110]:
max(g, key=len)
Out[110]:
'\n"By this time, my dearest sister, you have received my hurried letter; I\nwish this may be more intelligible, but though not confined for time, my\nhead is so bewildered that I cannot answer for being coherent. Dearest\nLizzy, I hardly know what I would write, but I have bad news for you,\nand it cannot be delayed. Imprudent as the marriage between Mr. Wickham\nand our poor Lydia would be, we are now anxious to be assured it has\ntaken place, for there is but too much reason to fear they are not gone\nto Scotland. Colonel Forster came yesterday, having left Brighton the\nday before, not many hours after the express. Though Lydia\'s short\nletter to Mrs. F. gave them to understand that they were going to Gretna\nGreen, something was dropped by Denny expressing his belief that W.\nnever intended to go there, or to marry Lydia at all, which was\nrepeated to Colonel F., who, instantly taking the alarm, set off from B.\nintending to trace their route. He did trace them easily to Clapham,\nbut no further; for on entering that place, they removed into a hackney\ncoach, and dismissed the chaise that brought them from Epsom. All that\nis known after this is, that they were seen to continue the London road.\nI know not what to think. After making every possible inquiry on that\nside London, Colonel F. came on into Hertfordshire, anxiously renewing\nthem at all the turnpikes, and at the inns in Barnet and Hatfield, but\nwithout any success--no such people had been seen to pass through. With\nthe kindest concern he came on to Longbourn, and broke his apprehensions\nto us in a manner most creditable to his heart. I am sincerely grieved\nfor him and Mrs. F., but no one can throw any blame on them. Our\ndistress, my dear Lizzy, is very great. My father and mother believe the\nworst, but I cannot think so ill of him. Many circumstances might make\nit more eligible for them to be married privately in town than to pursue\ntheir first plan; and even if _he_ could form such a design against a\nyoung woman of Lydia\'s connections, which is not likely, can I suppose\nher so lost to everything? Impossible! I grieve to find, however, that\nColonel F. is not disposed to depend upon their marriage; he shook his\nhead when I expressed my hopes, and said he feared W. was not a man to\nbe trusted. My poor mother is really ill, and keeps her room. Could she\nexert herself, it would be better; but this is not to be expected. And\nas to my father, I never in my life saw him so affected. Poor Kitty has\nanger for having concealed their attachment; but as it was a matter of\nconfidence, one cannot wonder. I am truly glad, dearest Lizzy, that you\nhave been spared something of these distressing scenes; but now, as the\nfirst shock is over, shall I own that I long for your return? I am not\nso selfish, however, as to press for it, if inconvenient. Adieu! I\ntake up my pen again to do what I have just told you I would not; but\ncircumstances are such that I cannot help earnestly begging you all to\ncome here as soon as possible. I know my dear uncle and aunt so well,\nthat I am not afraid of requesting it, though I have still something\nmore to ask of the former. My father is going to London with Colonel\nForster instantly, to try to discover her. What he means to do I am sure\nI know not; but his excessive distress will not allow him to pursue any\nmeasure in the best and safest way, and Colonel Forster is obliged to\nbe at Brighton again to-morrow evening. In such an exigence, my\nuncle\'s advice and assistance would be everything in the world; he will\nimmediately comprehend what I must feel, and I rely upon his goodness."\n'

Working with XML

In [111]:
import requests
url = "http://www.thehindu.com/"
response = requests.get(url, params = {"service":"rss"})
In [112]:
xmltext = response.text
In [113]:
xmltext[:100]
Out[113]:
'<?xml version="1.0" encoding="UTF-8"?>\n<rss version="2.0">\n<channel>\n<title>The Hindu - Home</title>'
In [115]:
from xml.etree import ElementTree as et
In [116]:
root = et.fromstring(xmltext)
In [118]:
items = root.findall(".//item")
In [119]:
len(items)
Out[119]:
426
In [120]:
items[0]
Out[120]:
<Element 'item' at 0x7f63240a7f48>
In [121]:
print(et.tostring(items[0]).decode())
<item>
<title>A fruit forest at home</title>
<author>Anasuya Menon</author>
<category>Life &amp; Style</category>
<link>http://www.thehindu.com/life-and-style/manoj-kumar-ibs-concept-fruitful-future-is-about-creating-fruit-forests/article20466856.ece?utm_source=RSS_Feed&amp;utm_medium=RSS&amp;utm_campaign=RSS_Syndication</link>
<description>
Manoj Kumar IB&#8217;s &#8216;Fruitful Future&#8217; concept is about creating fruit forests even in limited spaces 
</description>
<pubDate>Thu, 16 Nov 2017 12:34:36 +0530</pubDate>
</item>

In [122]:
for item in items[:10]:
    print(item.findtext("title"))
    print(item.findtext("link"))
    print("-"*50)
A fruit forest at home
http://www.thehindu.com/life-and-style/manoj-kumar-ibs-concept-fruitful-future-is-about-creating-fruit-forests/article20466856.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
BJP legislators stage protest 
http://www.thehindu.com/news/national/karnataka/bjp-legislators-stage-protest/article20466838.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 New India Assurance stock up 3% on strong Sep quarter earnings 
http://www.thehindu.com/business/new-india-assurance-stock-up-3-on-strong-sep-quarter-earnings/article20466714.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Doctors’ strike: Karnataka HC to pass order if no solution found 
http://www.thehindu.com/news/national/karnataka/doctors-strike-karnataka-hc-to-pass-order-if-no-solution-found/article20466339.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Patients left in lurch as private doctors begin indefinite strike against KPME Bill
http://www.thehindu.com/news/national/karnataka/patients-left-in-lurch-as-private-doctors-begin-indefinite-strike-against-kpme-bill/article20466193.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
Indira Gandhi stood up for refugees: Antony
http://www.thehindu.com/news/national/indira-gandhi-stood-up-for-refugees-antony/article20465664.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Review: Young dancers take the stage at Soorya’s ‘Parampara’ festival
http://www.thehindu.com/entertainment/dance/delightful-performances-by-young-classical-dancers-at-sooryas-parampara-dance-festival/article20464680.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Review: ‘Marattam’ honours veteran Kathakali artiste Sadanam Krishnankutty
http://www.thehindu.com/entertainment/theatre/marattam-in-kochi-to-honour-kathakali-artiste-sadanam-krishnankutty/article20453174.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Remembering Kathakali musician Kalamandalam Sankaran Embranthiri 
http://www.thehindu.com/entertainment/theatre/tribute-to-kathakali-musician-kalamandalam-sankaran-embranthiri-on-his-death-anniversary/article20451623.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Indian student shot dead at grocery store in US
http://www.thehindu.com/news/international/indian-student-shot-dead-at-grocery-store-in-us/article20465380.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
In [123]:
from xml.dom.minidom import parseString
In [124]:
root = parseString(xmltext)
In [125]:
items = root.getElementsByTagName("item")
In [126]:
len(items)
Out[126]:
426
In [127]:
item = items[0]
In [130]:
title = item.getElementsByTagName("title")[0]
In [131]:
title.firstChild.data
Out[131]:
'A fruit forest at home'

JSON

In [132]:
import json
In [136]:
j = json.loads('{"a":2,"l":["a","b","c"]}')
In [137]:
type(j)
Out[137]:
dict
In [138]:
j['a']
Out[138]:
2
In [139]:
j['l']
Out[139]:
['a', 'b', 'c']
In [140]:
d = {"service":"rss", "x":[1,2,3,4,5]}
In [141]:
json.dumps(d)
Out[141]:
'{"service": "rss", "x": [1, 2, 3, 4, 5]}'

Find distance between two cities using google API

In [144]:
import requests
def distance(origin, dest):
    url = "https://maps.googleapis.com/maps/api/distancematrix/json"
    response = requests.get(url, params={"units":"metric",
                                         "origins":origin,
                                         "destinations":dest
                                        })
    data = response.json()
    return data['rows'][0]['elements'][0]['distance']['text']
In [143]:
distance("hyderabad", "mumbai")
Out[143]:
{'destination_addresses': ['Mumbai, Maharashtra, India'],
 'origin_addresses': ['Hyderabad, Telangana, India'],
 'rows': [{'elements': [{'distance': {'text': '709 km', 'value': 709450},
     'duration': {'text': '13 hours 18 mins', 'value': 47866},
     'status': 'OK'}]}],
 'status': 'OK'}
In [145]:
distance("hyderabad", "mumbai")
Out[145]:
'709 km'

Numpy

In [146]:
import numpy as np
In [147]:
x = np.arange(32)
In [148]:
x
Out[148]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])
In [149]:
x.reshape(4,8)
Out[149]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31]])
In [150]:
y = np.arange(64).reshape(4,2,8)
In [151]:
y
Out[151]:
array([[[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15]],

       [[16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31]],

       [[32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47]],

       [[48, 49, 50, 51, 52, 53, 54, 55],
        [56, 57, 58, 59, 60, 61, 62, 63]]])
In [152]:
len(y[-1][-1])
Out[152]:
8
In [153]:
len(y[-1])
Out[153]:
2
In [154]:
len(y)
Out[154]:
4
In [155]:
y.shape
Out[155]:
(4, 2, 8)
In [156]:
l,w,h = y.shape
In [157]:
l,w,h
Out[157]:
(4, 2, 8)
In [158]:
y.dtype
Out[158]:
dtype('int64')
In [159]:
y.size
Out[159]:
64
In [160]:
y.itemsize
Out[160]:
8

problem: create a 2D array of size 5x6

other ways of creating arrays

In [161]:
np.random.random(50).reshape(5,10)
Out[161]:
array([[ 0.04499068,  0.50619285,  0.44488286,  0.82907944,  0.90284184,
         0.52232663,  0.39199814,  0.70911043,  0.84078058,  0.20210877],
       [ 0.6784636 ,  0.074447  ,  0.05762705,  0.65109205,  0.61762302,
         0.07111594,  0.5603616 ,  0.13030784,  0.15284609,  0.32295726],
       [ 0.27873428,  0.16703084,  0.6833295 ,  0.23503493,  0.35724634,
         0.78948851,  0.80452339,  0.96852529,  0.86675047,  0.49045919],
       [ 0.93068683,  0.75820907,  0.59659381,  0.50088148,  0.78470971,
         0.52832485,  0.10246332,  0.76045816,  0.16626284,  0.00690344],
       [ 0.77779312,  0.4413533 ,  0.950841  ,  0.51995188,  0.35815909,
         0.65010685,  0.05154926,  0.03869555,  0.43416689,  0.19653547]])
In [162]:
np.linspace(1.0, 10, 15)
Out[162]:
array([  1.        ,   1.64285714,   2.28571429,   2.92857143,
         3.57142857,   4.21428571,   4.85714286,   5.5       ,
         6.14285714,   6.78571429,   7.42857143,   8.07142857,
         8.71428571,   9.35714286,  10.        ])
In [163]:
np.zeros(20).reshape(4,5)
Out[163]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

vector operations

In [164]:
x = np.arange(10)
In [165]:
x + 10
Out[165]:
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
In [166]:
x * 2
Out[166]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [167]:
x + x
Out[167]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [168]:
x * x 
Out[168]:
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
In [169]:
x ** 3
Out[169]:
array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])

Playing with images

In [170]:
from scipy import misc
In [171]:
face = misc.face(gray=True)
In [172]:
type(face)
Out[172]:
numpy.ndarray
In [173]:
face.ndim
Out[173]:
2
In [174]:
face.shape
Out[174]:
(768, 1024)
In [175]:
face.dtype
Out[175]:
dtype('uint8')
In [176]:
face[1][:10]
Out[176]:
array([ 83, 104, 123, 130, 134, 141, 145, 144, 157, 147], dtype=uint8)
In [177]:
#show images from matplotlib in the same HTML page
%matplotlib inline
In [178]:
import matplotlib.pyplot as plt
In [180]:
plt.imshow(face, cmap=plt.cm.gray)
Out[180]:
<matplotlib.image.AxesImage at 0x7f62fcc756a0>
In [181]:
import matplotlib.pyplot as plt
def imshow(img):
    plt.imshow(img, cmap=plt.cm.gray)
    plt.show()
In [182]:
negface = 255 - face
In [183]:
imshow(negface)
In [184]:
face[-1][:10]
Out[184]:
array([ 94, 106, 119, 127, 131, 134, 135, 136, 134, 139], dtype=uint8)
In [185]:
negface[-1][:10]
Out[185]:
array([161, 149, 136, 128, 124, 121, 120, 119, 121, 116], dtype=uint8)

Transpose

In [186]:
x = np.arange(20).reshape(4,5)
In [187]:
x[1][2]
Out[187]:
7
In [188]:
x[1,2]
Out[188]:
7
In [189]:
x[1,:] # 1st row
Out[189]:
array([5, 6, 7, 8, 9])
In [193]:
x[:,0] # 0th column
Out[193]:
array([ 0,  5, 10, 15])
In [194]:
x[:,:2] # first two columns
Out[194]:
array([[ 0,  1],
       [ 5,  6],
       [10, 11],
       [15, 16]])
In [195]:
x.transpose()
Out[195]:
array([[ 0,  5, 10, 15],
       [ 1,  6, 11, 16],
       [ 2,  7, 12, 17],
       [ 3,  8, 13, 18],
       [ 4,  9, 14, 19]])
In [198]:
x.transpose().shape
Out[198]:
(5, 4)
In [199]:
x.shape
Out[199]:
(4, 5)
In [200]:
facet = face.transpose()
In [201]:
imshow(facet)
In [202]:
face.mean()
Out[202]:
113.48026784261067
In [203]:
x = np.arange(10)
In [204]:
x < 5
Out[204]:
array([ True,  True,  True,  True,  True, False, False, False, False, False], dtype=bool)
In [205]:
x[x<5]
Out[205]:
array([0, 1, 2, 3, 4])
In [206]:
a = x < 5
In [207]:
a
Out[207]:
array([ True,  True,  True,  True,  True, False, False, False, False, False], dtype=bool)
In [208]:
a.sum()
Out[208]:
5

problem: Convert the face image to black and white image (instead of gray scale)

In [209]:
facebw = face > 127
In [210]:
imshow(facebw)
In [211]:
x = np.arange(10000).reshape(100,100)
In [212]:
imshow(x)
In [213]:
x[:,50] = 9999
x[50,:] = 9999
In [215]:
x = np.zeros_like(face)
In [216]:
imshow(x)
In [217]:
x[::10,:] = 255
In [218]:
imshow(x)
In [219]:
x = np.zeros(10000).reshape(100,100)
In [220]:
x[::5,:] = 255
In [221]:
imshow(x)
In [222]:
x[:,::5] = 255
In [223]:
imshow(x)
In [224]:
mesh = np.zeros_like(face)
mesh[::50,:]= 255
mesh[:,::50]= 255
In [225]:
imshow(mesh)
In [226]:
imshow(0.5*face + 0.5*mesh)
In [227]:
x = list(range(10))
In [228]:
x
Out[228]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [229]:
x[2:]
Out[229]:
[2, 3, 4, 5, 6, 7, 8, 9]
In [230]:
x[:3]
Out[230]:
[0, 1, 2]
In [231]:
x[::2]
Out[231]:
[0, 2, 4, 6, 8]
In [232]:
x[::3]
Out[232]:
[0, 3, 6, 9]
In [233]:
x[::4]
Out[233]:
[0, 4, 8]
In [235]:
x = np.zeros(10000).reshape(100,100)
In [236]:
x[:,::10] = 255 #columns at interval of 10
In [237]:
x[::10,:] = 255 #rows at interval of 10
In [238]:
imshow(x)
In [239]:
face2 = face + mesh
imshow(face2)
In [240]:
imshow(face)
In [241]:
face2[:,50]
Out[241]:
array([162, 136,  82,  68, 114, 145, 132, 116,  92,  90, 118, 140, 129,
        89,  77, 119, 154, 152, 136, 113, 117, 150, 161, 141, 116, 123,
       149, 159, 118,  63,  47,  59,  91,  99,  65,  34,  53, 114, 159,
       153, 122,  98,  65,  52,  62,  77, 108, 146, 143,  79,  41,  57,
        90, 114, 111,  89,  81,  93,  87,  80,  83,  74,  65,  73,  94,
       120, 149, 164, 150, 108,  68,  54,  75,  82,  76,  70,  75,  73,
        61,  57,  77, 126, 155, 162, 183, 177, 105,  26,  69, 141, 176,
       146, 106,  73,  51,  44,  52,  61,  52,  65, 118, 144, 131, 118,
        76,  55,  83, 129, 142, 145, 135, 107,  71,  53,  94, 144, 122,
        78,  71,  81,  81, 112, 135, 132, 117, 107, 114, 130, 128,  84,
        67, 110, 163, 178, 160, 144, 173, 179, 178, 165, 149, 144, 150,
       158, 161, 146, 122, 115, 140, 165, 164, 152, 155, 175, 180, 170,
       167, 161, 150, 141, 146, 146, 138, 135, 141, 135, 128, 133, 153,
       173, 181, 169, 148, 125, 105,  97, 143, 159, 160, 154, 157, 161,
       169, 184, 174, 166, 144, 133, 145, 157, 153, 145, 156, 166, 174,
       188, 179, 133,  99, 106, 145, 183, 171, 107,  61,  58,  75,  90,
        86,  81, 102, 135, 134, 100,  62,  40,  37,  75, 129, 160, 143,
       103, 101, 131, 175, 180, 175, 167, 165, 161, 154, 151, 126, 110,
       113, 138, 162, 175, 169, 147, 115, 108, 112, 121, 124, 134, 159,
       180, 180, 172, 175, 169, 135,  95,  63,  41,  58,  94, 131, 133,
       115, 112, 107,  84,  70, 114, 136,  99,  52,  59, 105, 139, 164,
       141, 107,  79,  67,  60,  61,  69,  71,  65,  63,  71,  87, 103,
       118, 127, 116,  94,  70,  52,  57,  96, 131, 133,  92,  78,  61,
        54,  59,  64,  63,  58,  65,  65,  64,  65,  59,  51,  63,  89,
       134, 163, 192, 204, 205, 206, 209, 211, 203, 185, 161, 147, 145,
       140, 125, 110, 123, 159, 170, 140, 121, 133, 153, 163, 139, 123,
       108,  98,  90,  87,  87,  81,  77, 108, 140, 139, 137, 142, 121,
       107,  99,  94,  83,  69,  63,  68,  71,  71,  72,  73,  78,  81,
        84,  87,  90,  92,  96,  95,  93,  91,  85,  77,  72,  70, 122,
       168, 200, 191, 177, 181, 180, 169, 138, 100,  75,  94, 140, 171,
       141,  82,  81,  88,  91,  91,  83,  66,  71,  96, 102, 117, 122,
       131, 151, 167, 169, 171, 148, 131, 124, 117, 101,  92, 101, 114,
       114, 119, 130, 145, 157, 163, 165, 168, 168, 169, 174, 177, 179,
       181, 182, 182, 180, 176, 177, 177, 173, 170, 160, 146, 121, 100,
        81,  78,  91, 107, 116, 119, 119, 120, 124, 126, 128, 133, 139,
       142, 139, 132, 124, 118, 112, 109, 109, 112, 120, 116, 111, 105,
        99,  94,  97, 103, 116, 119, 123, 124, 123, 123, 130, 139, 151,
       157, 160, 157, 154, 156, 160, 163, 160, 157, 151, 135, 119, 113,
       110, 102,  99, 114, 113,  85,  54,  35,  25,  21,  24,  25,  26,
        24,  24,  25,  26,  27,  32,  33,  34,  35,  37,  41,  46,  50,
        57,  60,  63,  66,  68,  70,  70,  69,  53,  47,  32,  29,  28,
         5,  12,  60, 123, 126, 108,  85,  93, 122, 134, 132, 126, 130,
       135, 140, 147, 152, 154, 152, 147, 149, 151, 150, 150, 151, 147,
       141, 140, 145, 152, 157, 151, 134, 132, 147, 154, 161, 163, 159,
       154, 154, 156, 158, 161, 162, 163, 166, 168, 172, 175, 177, 178,
       179, 181, 183, 184, 184, 185, 184, 187, 186, 185, 182, 178, 171,
       164, 159, 141, 136, 131, 126, 125, 124, 127, 128, 144, 149, 149,
       151, 160, 164, 168, 175, 180, 179, 179, 179, 177, 173, 173, 176,
       178, 179, 181, 182, 182, 181, 180, 179, 182, 182, 181, 179, 176,
       174, 174, 174, 169, 168, 166, 164, 162, 160, 155, 154, 155, 156,
       158, 163, 164, 162, 160, 159, 158, 157, 157, 159, 158, 155, 154,
       155, 164, 167, 172, 172, 172, 183, 193, 193, 181, 168, 159, 156,
       150, 148, 145, 139, 136, 128, 126, 131, 136, 134, 134, 137, 136,
       140, 145, 148, 149, 152, 157, 162, 157, 158, 157, 156, 154, 155,
       156, 158, 157, 158, 158, 157, 156, 155, 154, 155, 159, 159, 157,
       154, 151, 149, 149, 150, 154, 156, 156, 152, 150, 152, 155, 157,
       158, 157, 156, 156, 157, 160, 164, 166, 173, 175, 178, 180, 182,
       185, 188, 190, 194, 197, 199, 198, 196, 193, 194, 195, 198, 198,
       198, 198, 199, 201, 202, 203, 197, 197, 198, 199, 200, 201, 201, 201], dtype=uint8)
In [242]:
face[:,50]
Out[242]:
array([163, 137,  83,  69, 115, 146, 133, 117,  93,  91, 119, 141, 130,
        90,  78, 120, 155, 153, 137, 114, 118, 151, 162, 142, 117, 124,
       150, 160, 119,  64,  48,  60,  92, 100,  66,  35,  54, 115, 160,
       154, 123,  99,  66,  53,  63,  78, 109, 147, 144,  80,  42,  58,
        91, 115, 112,  90,  82,  94,  88,  81,  84,  75,  66,  74,  95,
       121, 150, 165, 151, 109,  69,  55,  76,  83,  77,  71,  76,  74,
        62,  58,  78, 127, 156, 163, 184, 178, 106,  27,  70, 142, 177,
       147, 107,  74,  52,  45,  53,  62,  53,  66, 119, 145, 132, 119,
        77,  56,  84, 130, 143, 146, 136, 108,  72,  54,  95, 145, 123,
        79,  72,  82,  82, 113, 136, 133, 118, 108, 115, 131, 129,  85,
        68, 111, 164, 179, 161, 145, 174, 180, 179, 166, 150, 145, 151,
       159, 162, 147, 123, 116, 141, 166, 165, 153, 156, 176, 181, 171,
       168, 162, 151, 142, 147, 147, 139, 136, 142, 136, 129, 134, 154,
       174, 182, 170, 149, 126, 106,  98, 144, 160, 161, 155, 158, 162,
       170, 185, 175, 167, 145, 134, 146, 158, 154, 146, 157, 167, 175,
       189, 180, 134, 100, 107, 146, 184, 172, 108,  62,  59,  76,  91,
        87,  82, 103, 136, 135, 101,  63,  41,  38,  76, 130, 161, 144,
       104, 102, 132, 176, 181, 176, 168, 166, 162, 155, 152, 127, 111,
       114, 139, 163, 176, 170, 148, 116, 109, 113, 122, 125, 135, 160,
       181, 181, 173, 176, 170, 136,  96,  64,  42,  59,  95, 132, 134,
       116, 113, 108,  85,  71, 115, 137, 100,  53,  60, 106, 140, 165,
       142, 108,  80,  68,  61,  62,  70,  72,  66,  64,  72,  88, 104,
       119, 128, 117,  95,  71,  53,  58,  97, 132, 134,  93,  79,  62,
        55,  60,  65,  64,  59,  66,  66,  65,  66,  60,  52,  64,  90,
       135, 164, 193, 205, 206, 207, 210, 212, 204, 186, 162, 148, 146,
       141, 126, 111, 124, 160, 171, 141, 122, 134, 154, 164, 140, 124,
       109,  99,  91,  88,  88,  82,  78, 109, 141, 140, 138, 143, 122,
       108, 100,  95,  84,  70,  64,  69,  72,  72,  73,  74,  79,  82,
        85,  88,  91,  93,  97,  96,  94,  92,  86,  78,  73,  71, 123,
       169, 201, 192, 178, 182, 181, 170, 139, 101,  76,  95, 141, 172,
       142,  83,  82,  89,  92,  92,  84,  67,  72,  97, 103, 118, 123,
       132, 152, 168, 170, 172, 149, 132, 125, 118, 102,  93, 102, 115,
       115, 120, 131, 146, 158, 164, 166, 169, 169, 170, 175, 178, 180,
       182, 183, 183, 181, 177, 178, 178, 174, 171, 161, 147, 122, 101,
        82,  79,  92, 108, 117, 120, 120, 121, 125, 127, 129, 134, 140,
       143, 140, 133, 125, 119, 113, 110, 110, 113, 121, 117, 112, 106,
       100,  95,  98, 104, 117, 120, 124, 125, 124, 124, 131, 140, 152,
       158, 161, 158, 155, 157, 161, 164, 161, 158, 152, 136, 120, 114,
       111, 103, 100, 115, 114,  86,  55,  36,  26,  22,  25,  26,  27,
        25,  25,  26,  27,  28,  33,  34,  35,  36,  38,  42,  47,  51,
        58,  61,  64,  67,  69,  71,  71,  70,  54,  48,  33,  30,  29,
         6,  13,  61, 124, 127, 109,  86,  94, 123, 135, 133, 127, 131,
       136, 141, 148, 153, 155, 153, 148, 150, 152, 151, 151, 152, 148,
       142, 141, 146, 153, 158, 152, 135, 133, 148, 155, 162, 164, 160,
       155, 155, 157, 159, 162, 163, 164, 167, 169, 173, 176, 178, 179,
       180, 182, 184, 185, 185, 186, 185, 188, 187, 186, 183, 179, 172,
       165, 160, 142, 137, 132, 127, 126, 125, 128, 129, 145, 150, 150,
       152, 161, 165, 169, 176, 181, 180, 180, 180, 178, 174, 174, 177,
       179, 180, 182, 183, 183, 182, 181, 180, 183, 183, 182, 180, 177,
       175, 175, 175, 170, 169, 167, 165, 163, 161, 156, 155, 156, 157,
       159, 164, 165, 163, 161, 160, 159, 158, 158, 160, 159, 156, 155,
       156, 165, 168, 173, 173, 173, 184, 194, 194, 182, 169, 160, 157,
       151, 149, 146, 140, 137, 129, 127, 132, 137, 135, 135, 138, 137,
       141, 146, 149, 150, 153, 158, 163, 158, 159, 158, 157, 155, 156,
       157, 159, 158, 159, 159, 158, 157, 156, 155, 156, 160, 160, 158,
       155, 152, 150, 150, 151, 155, 157, 157, 153, 151, 153, 156, 158,
       159, 158, 157, 157, 158, 161, 165, 167, 174, 176, 179, 181, 183,
       186, 189, 191, 195, 198, 200, 199, 197, 194, 195, 196, 199, 199,
       199, 199, 200, 202, 203, 204, 198, 198, 199, 200, 201, 202, 202, 202], dtype=uint8)
In [243]:
x = face2 - face
In [244]:
imshow(x)

problem: Try to swap parts of image. split the image in four parts like

AB
CD

it should become

AC
BD
In [245]:
face2 = face.copy()
In [246]:
h, w = face2.shape
In [247]:
TR = face2[:h//2,w//2:].copy()
BL = face2[h//2:,:w//2].copy()
In [248]:
imshow(face2)
imshow(TR)
imshow(BL)
In [249]:
face2[:h//2,w//2:] = BL
face2[h//2:,:w//2] = TR
imshow(face2)
In [250]:
f2 = np.rot90(face)
imshow(f2)
In [251]:
imshow(np.roll(face, 400))
In [252]:
imshow(np.flip(face, 1))
In [255]:
thumb = face[::4,::4]
In [256]:
imshow(thumb)
In [257]:
thumb.shape
Out[257]:
(192, 256)
In [258]:
t = np.hstack([thumb, thumb, thumb, thumb])
v = np.vstack([t,t,t,t])
imshow(v)

Matplotlib

In [259]:
import numpy as np
import matplotlib.pyplot as plt
In [260]:
X = np.linspace(-np.pi, np.pi, 256, endpoint=True)
In [261]:
X.shape
Out[261]:
(256,)
In [262]:
C = np.cos(X)
In [263]:
S = np.sin(X)
In [264]:
plt.plot(X,C, label="cos(x)")
plt.plot(X,S, label="sin(x)")
plt.legend()
plt.show()
In [266]:
T = np.tan(X)
T
Out[266]:
array([  1.22464680e-16,   2.46449301e-02,   4.93198157e-02,
         7.40547582e-02,   9.88801519e-02,   1.23826835e-01,
         1.48926244e-01,   1.74210575e-01,   1.99712954e-01,
         2.25467616e-01,   2.51510096e-01,   2.77877435e-01,
         3.04608405e-01,   3.31743753e-01,   3.59326465e-01,
         3.87402064e-01,   4.16018933e-01,   4.45228685e-01,
         4.75086564e-01,   5.05651907e-01,   5.36988659e-01,
         5.69165959e-01,   6.02258806e-01,   6.36348824e-01,
         6.71525130e-01,   7.07885343e-01,   7.45536747e-01,
         7.84597640e-01,   8.25198908e-01,   8.67485872e-01,
         9.11620453e-01,   9.57783740e-01,   1.00617904e+00,
         1.05703550e+00,   1.11061251e+00,   1.16720497e+00,
         1.22714971e+00,   1.29083333e+00,   1.35870197e+00,
         1.43127325e+00,   1.50915142e+00,   1.59304642e+00,
         1.68379814e+00,   1.78240780e+00,   1.89007882e+00,
         2.00827073e+00,   2.13877135e+00,   2.28379480e+00,
         2.44611689e+00,   2.62926545e+00,   2.83779394e+00,
         3.07768354e+00,   3.35695082e+00,   3.68659441e+00,
         4.08212426e+00,   4.56613958e+00,   5.17290256e+00,
         5.95697769e+00,   7.01088586e+00,   8.50505855e+00,
         1.07917187e+01,   1.47354103e+01,   2.31767738e+01,
         5.41065205e+01,  -1.62335989e+02,  -3.24573411e+01,
        -1.80190765e+01,  -1.24608370e+01,  -9.51436445e+00,
        -7.68721487e+00,  -6.44210712e+00,  -5.53818992e+00,
        -4.85138736e+00,  -4.31127708e+00,  -3.87491778e+00,
        -3.51463544e+00,  -3.21179171e+00,  -2.95337427e+00,
        -2.73002271e+00,  -2.53483125e+00,  -2.36259336e+00,
        -2.20930931e+00,  -2.07185542e+00,  -1.94775662e+00,
        -1.83502616e+00,  -1.73205081e+00,  -1.63750682e+00,
        -1.55029770e+00,  -1.46950733e+00,  -1.39436424e+00,
        -1.32421401e+00,  -1.25849780e+00,  -1.19673541e+00,
        -1.13851183e+00,  -1.08346641e+00,  -1.03128418e+00,
        -9.81688718e-01,  -9.34436362e-01,  -8.89311374e-01,
        -8.46121975e-01,  -8.04697006e-01,  -7.64883142e-01,
        -7.26542528e-01,  -6.89550784e-01,  -6.53795302e-01,
        -6.19173786e-01,  -5.85593003e-01,  -5.52967699e-01,
        -5.21219665e-01,  -4.90276921e-01,  -4.60073002e-01,
        -4.30546337e-01,  -4.01639694e-01,  -3.73299701e-01,
        -3.45476407e-01,  -3.18122901e-01,  -2.91194969e-01,
        -2.64650778e-01,  -2.38450601e-01,  -2.12556562e-01,
        -1.86932397e-01,  -1.61543248e-01,  -1.36355456e-01,
        -1.11336383e-01,  -8.64542334e-02,  -6.16778888e-02,
        -3.69767523e-02,  -1.23205945e-02,   1.23205945e-02,
         3.69767523e-02,   6.16778888e-02,   8.64542334e-02,
         1.11336383e-01,   1.36355456e-01,   1.61543248e-01,
         1.86932397e-01,   2.12556562e-01,   2.38450601e-01,
         2.64650778e-01,   2.91194969e-01,   3.18122901e-01,
         3.45476407e-01,   3.73299701e-01,   4.01639694e-01,
         4.30546337e-01,   4.60073002e-01,   4.90276921e-01,
         5.21219665e-01,   5.52967699e-01,   5.85593003e-01,
         6.19173786e-01,   6.53795302e-01,   6.89550784e-01,
         7.26542528e-01,   7.64883142e-01,   8.04697006e-01,
         8.46121975e-01,   8.89311374e-01,   9.34436362e-01,
         9.81688718e-01,   1.03128418e+00,   1.08346641e+00,
         1.13851183e+00,   1.19673541e+00,   1.25849780e+00,
         1.32421401e+00,   1.39436424e+00,   1.46950733e+00,
         1.55029770e+00,   1.63750682e+00,   1.73205081e+00,
         1.83502616e+00,   1.94775662e+00,   2.07185542e+00,
         2.20930931e+00,   2.36259336e+00,   2.53483125e+00,
         2.73002271e+00,   2.95337427e+00,   3.21179171e+00,
         3.51463544e+00,   3.87491778e+00,   4.31127708e+00,
         4.85138736e+00,   5.53818992e+00,   6.44210712e+00,
         7.68721487e+00,   9.51436445e+00,   1.24608370e+01,
         1.80190765e+01,   3.24573411e+01,   1.62335989e+02,
        -5.41065205e+01,  -2.31767738e+01,  -1.47354103e+01,
        -1.07917187e+01,  -8.50505855e+00,  -7.01088586e+00,
        -5.95697769e+00,  -5.17290256e+00,  -4.56613958e+00,
        -4.08212426e+00,  -3.68659441e+00,  -3.35695082e+00,
        -3.07768354e+00,  -2.83779394e+00,  -2.62926545e+00,
        -2.44611689e+00,  -2.28379480e+00,  -2.13877135e+00,
        -2.00827073e+00,  -1.89007882e+00,  -1.78240780e+00,
        -1.68379814e+00,  -1.59304642e+00,  -1.50915142e+00,
        -1.43127325e+00,  -1.35870197e+00,  -1.29083333e+00,
        -1.22714971e+00,  -1.16720497e+00,  -1.11061251e+00,
        -1.05703550e+00,  -1.00617904e+00,  -9.57783740e-01,
        -9.11620453e-01,  -8.67485872e-01,  -8.25198908e-01,
        -7.84597640e-01,  -7.45536747e-01,  -7.07885343e-01,
        -6.71525130e-01,  -6.36348824e-01,  -6.02258806e-01,
        -5.69165959e-01,  -5.36988659e-01,  -5.05651907e-01,
        -4.75086564e-01,  -4.45228685e-01,  -4.16018933e-01,
        -3.87402064e-01,  -3.59326465e-01,  -3.31743753e-01,
        -3.04608405e-01,  -2.77877435e-01,  -2.51510096e-01,
        -2.25467616e-01,  -1.99712954e-01,  -1.74210575e-01,
        -1.48926244e-01,  -1.23826835e-01,  -9.88801519e-02,
        -7.40547582e-02,  -4.93198157e-02,  -2.46449301e-02,
        -1.22464680e-16])
In [267]:
plt.plot(X,T, label="tan(x)")
plt.legend()
plt.show()
In [268]:
n = 1024
X = np.random.normal(0,1, n)
Y = np.random.normal(0,1, n)
plt.scatter(X,Y)
plt.show()

Example: temperature and rainfall data

Download data from http://notes.pipal.in/2017/arcesium-oct-advpython/HYDERABAD-weather.csv

In [269]:
import csv
In [270]:
data = list(csv.reader(open("HYDERABAD-weather.csv")))
In [271]:
data[:3]
Out[271]:
[['', 'city', 'month', 'year', 'maxtemp', 'mintemp', 'rainfall'],
 ['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0']]
In [272]:
data = data[1:] # skip header
In [273]:
tmax = [float(row[4]) for row in data]
tmin = [float(row[5]) for row in data]
In [274]:
plt.scatter(tmin, tmax)
Out[274]:
<matplotlib.collections.PathCollection at 0x7f62fc2a72e8>
In [275]:
rainfall = [float(row[6]) for row in data]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-275-5b1992869e33> in <module>()
----> 1 rainfall = [float(row[6]) for row in data]

<ipython-input-275-5b1992869e33> in <listcomp>(.0)
----> 1 rainfall = [float(row[6]) for row in data]

ValueError: could not convert string to float: 
In [276]:
def safefloat(value):
    try:
        return float(value)
    except ValueError:
        print("bad value: %r"% value)
        return 0.0
rainfall = [safefloat(row[6]) for row in data]
bad value: ''
In [277]:
plt.scatter(tmax, rainfall)
Out[277]:
<matplotlib.collections.PathCollection at 0x7f62fc0c0b00>
In [278]:
n = 10
X = np.arange(n)
Y = np.random.normal(0,100,n)
plt.bar(X,Y)
Out[278]:
<Container object of 10 artists>

problem: Using above dataset, plot a bar chart of average rainfall per month.

In [279]:
data[:2]
Out[279]:
[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0']]
In [280]:
data[:10]
Out[280]:
[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0'],
 ['2', 'HYDERABAD', 'January', '1953', '28.6', '14.6', '3.5'],
 ['3', 'HYDERABAD', 'January', '1954', '28.2', '13.9', '0.0'],
 ['4', 'HYDERABAD', 'January', '1955', '28.0', '14.7', '0.0'],
 ['5', 'HYDERABAD', 'January', '1956', '28.1', '14.2', '0.0'],
 ['6', 'HYDERABAD', 'January', '1957', '29.0', '14.5', '0.0'],
 ['7', 'HYDERABAD', 'January', '1958', '28.9', '14.5', '0.0'],
 ['8', 'HYDERABAD', 'January', '1959', '28.7', '15.5', '0.0'],
 ['9', 'HYDERABAD', 'January', '1960', '28.4', '17.0', '0.0']]
In [281]:
months = np.array([row[2] for row in data])
In [283]:
rainfall = np.array([safefloat(row[-1]) for row in data])
bad value: ''
In [284]:
rainfall[months == "January"].mean()
Out[284]:
13.177999999999997
In [285]:
import datetime
list_of_months = [datetime.date(2000, i+1, 1).strftime("%B") for i in range(12)]
In [286]:
list_of_months
Out[286]:
['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']
In [287]:
def get_mean_rainfall(month):
    return rainfall[months == month].mean()
In [288]:
mean_rainfall = [get_mean_rainfall(m) for m in list_of_months]
In [289]:
mean_rainfall
Out[289]:
[13.177999999999997,
 7.9400000000000004,
 15.264000000000001,
 20.23469387755102,
 35.713999999999999,
 103.75399999999999,
 169.86000000000001,
 178.69,
 158.292,
 97.158000000000015,
 21.971999999999998,
 5.9120000000000008]
In [290]:
plt.bar(range(12), mean_rainfall)
Out[290]:
<Container object of 12 artists>
In [292]:
x = np.arange(10)
In [293]:
x > 5
Out[293]:
array([False, False, False, False, False, False,  True,  True,  True,  True], dtype=bool)
In [294]:
x = np.arange(3)
In [295]:
x
Out[295]:
array([0, 1, 2])
In [297]:
x[np.array([True, False, True, True])]
/home/vikrant/usr/local/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 4
  """Entry point for launching an IPython kernel.
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-297-9ce7fde9f5fa> in <module>()
----> 1 x[np.array([True, False, True, True])]

IndexError: index 3 is out of bounds for axis 1 with size 3

Pandas

In [299]:
import pandas as pd
import numpy as np
%matplotlib inline
In [300]:
x = pd.Series(range(10))
In [301]:
x
Out[301]:
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64
In [302]:
s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])
In [303]:
s
Out[303]:
a    0.543000
b    0.167640
c   -0.155607
d    0.723278
e    0.068404
dtype: float64
In [304]:
d = {'a':0, 'b':1, 'c':2}
s = pd.Series(d)
In [305]:
s
Out[305]:
a    0
b    1
c    2
dtype: int64
In [306]:
pd.Series(d, index=['b','c','d','a'])
Out[306]:
b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64
In [307]:
s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])
In [308]:
s[0]
Out[308]:
-1.0835475405050732
In [309]:
s[:3]
Out[309]:
a   -1.083548
b    1.650536
c    0.538694
dtype: float64
In [310]:
s['a']
Out[310]:
-1.0835475405050732
In [311]:
s[1:4:2]
Out[311]:
b    1.650536
d    0.307735
dtype: float64
In [312]:
s[s > s.median()]
Out[312]:
b    1.650536
c    0.538694
dtype: float64
In [313]:
s.mean()
Out[313]:
0.2347597680155867
In [314]:
np.exp(s)
Out[314]:
a    0.338393
b    5.209773
c    1.713767
d    1.360340
e    0.786928
dtype: float64
In [315]:
'e' in s
Out[315]:
True
In [316]:
s['a']
Out[316]:
-1.0835475405050732
In [317]:
'z' in s
Out[317]:
False
In [318]:
s + s
Out[318]:
a   -2.167095
b    3.301073
c    1.077388
d    0.615470
e   -0.479237
dtype: float64
In [319]:
s * s
Out[319]:
a    1.174075
b    2.724270
c    0.290191
d    0.094701
e    0.057417
dtype: float64
In [320]:
data = [["A",1], ["B", 2], ["c",3], ["D",4]]
In [321]:
pd.DataFrame(data)
Out[321]:
0 1
0 A 1
1 B 2
2 c 3
3 D 4
In [322]:
d = {"one":[1. , 2. , 3., 4.],
     "two":[4. ,3., 2. , 1.]
    }
df = pd.DataFrame(d, index=['a','b','c','d'])
In [323]:
df
Out[323]:
one two
a 1.0 4.0
b 2.0 3.0
c 3.0 2.0
d 4.0 1.0
In [324]:
df['one']
Out[324]:
a    1.0
b    2.0
c    3.0
d    4.0
Name: one, dtype: float64
In [325]:
df['one']['a']
Out[325]:
1.0
In [326]:
df.columns
Out[326]:
Index(['one', 'two'], dtype='object')
In [327]:
df.columns = ["column1", "column2"]
In [328]:
df
Out[328]:
column1 column2
a 1.0 4.0
b 2.0 3.0
c 3.0 2.0
d 4.0 1.0
In [329]:
df2 = df.set_index("column2")
In [330]:
df2
Out[330]:
column1
column2
4.0 1.0
3.0 2.0
2.0 3.0
1.0 4.0
In [331]:
df2['column1'][4.0]
Out[331]:
1.0
In [332]:
df.to_csv("df.csv")
In [334]:
!cat df.csv
,column1,column2
a,1.0,4.0
b,2.0,3.0
c,3.0,2.0
d,4.0,1.0

Playing with weather dataset

In [335]:
df = pd.read_csv("HYDERABAD-weather.csv", index_col=0)
In [336]:
df
Out[336]:
city month year maxtemp mintemp rainfall
0 HYDERABAD January 1951 29.0 14.8 0.0
1 HYDERABAD January 1952 29.1 13.6 0.0
2 HYDERABAD January 1953 28.6 14.6 3.5
3 HYDERABAD January 1954 28.2 13.9 0.0
4 HYDERABAD January 1955 28.0 14.7 0.0
5 HYDERABAD January 1956 28.1 14.2 0.0
6 HYDERABAD January 1957 29.0 14.5 0.0
7 HYDERABAD January 1958 28.9 14.5 0.0
8 HYDERABAD January 1959 28.7 15.5 0.0
9 HYDERABAD January 1960 28.4 17.0 0.0
10 HYDERABAD January 1961 28.4 15.6 0.4
11 HYDERABAD January 1962 27.5 12.7 0.0
12 HYDERABAD January 1963 26.7 13.2 0.0
13 HYDERABAD January 1964 29.9 14.4 0.0
14 HYDERABAD January 1965 28.3 14.2 1.0
15 HYDERABAD January 1966 28.8 16.5 3.9
16 HYDERABAD January 1967 29.2 14.6 0.0
17 HYDERABAD January 1968 28.3 13.3 7.8
18 HYDERABAD January 1969 29.3 14.1 7.3
19 HYDERABAD January 1970 28.9 15.2 5.6
20 HYDERABAD January 1971 28.8 15.0 2.4
21 HYDERABAD January 1972 28.1 13.5 0.0
22 HYDERABAD January 1973 30.6 16.1 0.0
23 HYDERABAD January 1974 29.1 13.4 0.0
24 HYDERABAD January 1975 27.5 14.1 50.9
25 HYDERABAD January 1976 26.5 13.2 0.0
26 HYDERABAD January 1977 29.1 14.0 0.0
27 HYDERABAD January 1978 28.4 16.5 5.4
28 HYDERABAD January 1979 28.9 17.3 0.0
29 HYDERABAD January 1980 29.7 16.8 0.0
... ... ... ... ... ... ...
569 HYDERABAD December 1971 26.9 12.5 0.0
570 HYDERABAD December 1972 28.2 16.9 3.0
571 HYDERABAD December 1973 27.2 14.8 0.3
572 HYDERABAD December 1974 26.9 12.4 0.0
573 HYDERABAD December 1975 26.6 11.6 0.0
574 HYDERABAD December 1976 28.5 15.5 0.0
575 HYDERABAD December 1977 28.0 13.8 1.5
576 HYDERABAD December 1978 27.6 16.5 0.0
577 HYDERABAD December 1979 28.2 16.4 0.0
578 HYDERABAD December 1980 28.7 15.6 3.7
579 HYDERABAD December 1981 27.6 15.7 0.0
580 HYDERABAD December 1982 28.3 15.0 0.0
581 HYDERABAD December 1983 26.9 15.5 12.3
582 HYDERABAD December 1984 29.8 15.5 0.0
583 HYDERABAD December 1985 29.4 15.9 4.7
584 HYDERABAD December 1986 28.8 17.1 15.5
585 HYDERABAD December 1987 27.7 16.5 2.8
586 HYDERABAD December 1988 27.8 15.4 13.3
587 HYDERABAD December 1989 27.5 15.8 1.6
588 HYDERABAD December 1990 27.9 16.8 0.0
589 HYDERABAD December 1991 28.1 14.9 0.3
590 HYDERABAD December 1992 27.1 13.8 0.0
591 HYDERABAD December 1993 27.1 13.2 34.9
592 HYDERABAD December 1994 27.9 12.0 0.0
593 HYDERABAD December 1995 28.9 15.9 0.0
594 HYDERABAD December 1996 28.3 14.9 0.0
595 HYDERABAD December 1997 28.7 19.2 40.6
596 HYDERABAD December 1998 28.7 12.8 0.0
597 HYDERABAD December 1999 29.0 14.2 0.0
598 HYDERABAD December 2000 29.6 13.3 1.0

599 rows × 6 columns

In [337]:
df.head()
Out[337]:
city month year maxtemp mintemp rainfall
0 HYDERABAD January 1951 29.0 14.8 0.0
1 HYDERABAD January 1952 29.1 13.6 0.0
2 HYDERABAD January 1953 28.6 14.6 3.5
3 HYDERABAD January 1954 28.2 13.9 0.0
4 HYDERABAD January 1955 28.0 14.7 0.0
In [338]:
df.tail()
Out[338]:
city month year maxtemp mintemp rainfall
594 HYDERABAD December 1996 28.3 14.9 0.0
595 HYDERABAD December 1997 28.7 19.2 40.6
596 HYDERABAD December 1998 28.7 12.8 0.0
597 HYDERABAD December 1999 29.0 14.2 0.0
598 HYDERABAD December 2000 29.6 13.3 1.0
In [339]:
df.plot("maxtemp", "mintemp", kind="scatter")
Out[339]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f62ebf8e7f0>
In [340]:
mean = df.groupby("year").mean()
In [341]:
mean
Out[341]:
maxtemp mintemp rainfall
year
1951 32.666667 20.233333 58.975000
1952 31.975000 19.891667 46.741667
1953 32.183333 20.266667 74.245455
1954 31.525000 19.875000 70.366667
1955 30.883333 19.725000 92.775000
1956 30.783333 19.791667 64.941667
1957 31.533333 20.016667 66.783333
1958 31.733333 20.475000 76.216667
1959 31.900000 20.358333 64.825000
1960 31.841667 20.416667 57.775000
1961 31.258333 20.225000 68.400000
1962 30.418182 19.490909 107.118182
1963 31.133333 19.308333 69.125000
1964 32.150000 19.658333 58.400000
1965 32.600000 19.541667 67.441667
1966 32.666667 20.550000 55.358333
1967 32.625000 19.483333 69.383333
1968 32.416667 19.200000 53.250000
1969 32.575000 20.408333 53.100000
1970 32.041667 19.825000 95.566667
1971 31.975000 20.008333 55.433333
1972 32.633333 21.008333 42.958333
1973 32.425000 21.200000 73.183333
1974 32.175000 20.016667 56.283333
1975 31.266667 20.258333 115.291667
1976 32.050000 20.441667 66.075000
1977 32.258333 20.775000 45.300000
1978 31.575000 21.108333 93.116667
1979 32.400000 21.616667 58.650000
1980 32.816667 21.425000 49.325000
1981 32.108333 20.750000 82.750000
1982 32.241667 21.133333 63.891667
1983 32.541667 21.000000 110.025000
1984 32.483333 21.033333 64.083333
1985 32.875000 20.975000 31.116667
1986 32.800000 21.441667 51.775000
1987 32.433333 21.200000 80.250000
1988 32.525000 21.400000 76.458333
1989 32.325000 20.825000 83.883333
1990 31.541667 20.991667 76.666667
1991 32.450000 21.416667 64.200000
1992 32.683333 20.650000 63.716667
1993 32.733333 20.516667 60.458333
1994 32.225000 20.516667 68.325000
1995 32.183333 20.916667 101.991667
1996 32.633333 20.958333 80.958333
1997 32.616667 21.025000 63.750000
1998 33.125000 21.683333 78.516667
1999 32.608333 20.341667 47.008333
2000 32.583333 20.391667 87.066667
In [342]:
mean.plot()
Out[342]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f62eb7d66d8>
In [343]:
bymonth = df.groupby("month").mean()
In [344]:
bymonth
Out[344]:
year maxtemp mintemp rainfall
month
April 1975.77551 37.863265 24.273469 20.234694
August 1975.50000 29.786000 22.086000 178.690000
December 1975.50000 28.004000 14.526000 5.912000
February 1975.50000 31.932000 17.556000 7.940000
January 1975.50000 28.760000 15.214000 13.178000
July 1975.50000 30.754000 22.560000 169.860000
June 1975.50000 34.528000 23.976000 103.754000
March 1975.50000 35.444000 20.798000 15.264000
May 1975.50000 38.996000 26.160000 35.714000
November 1975.50000 29.016000 16.862000 22.420408
October 1975.50000 30.582000 20.306000 97.158000
September 1975.50000 30.452000 21.962000 158.292000
In [345]:
del bymonth['year']
In [346]:
bymonth
Out[346]:
maxtemp mintemp rainfall
month
April 37.863265 24.273469 20.234694
August 29.786000 22.086000 178.690000
December 28.004000 14.526000 5.912000
February 31.932000 17.556000 7.940000
January 28.760000 15.214000 13.178000
July 30.754000 22.560000 169.860000
June 34.528000 23.976000 103.754000
March 35.444000 20.798000 15.264000
May 38.996000 26.160000 35.714000
November 29.016000 16.862000 22.420408
October 30.582000 20.306000 97.158000
September 30.452000 21.962000 158.292000
In [347]:
bymonth.index
Out[347]:
Index(['April', 'August', 'December', 'February', 'January', 'July', 'June',
       'March', 'May', 'November', 'October', 'September'],
      dtype='object', name='month')
In [348]:
newindex = [list_of_months.index(month) for month in bymonth.index]
In [349]:
newindex
Out[349]:
[3, 7, 11, 1, 0, 6, 5, 2, 4, 10, 9, 8]
In [350]:
bymonth['m'] = newindex
In [351]:
bymonth
Out[351]:
maxtemp mintemp rainfall m
month
April 37.863265 24.273469 20.234694 3
August 29.786000 22.086000 178.690000 7
December 28.004000 14.526000 5.912000 11
February 31.932000 17.556000 7.940000 1
January 28.760000 15.214000 13.178000 0
July 30.754000 22.560000 169.860000 6
June 34.528000 23.976000 103.754000 5
March 35.444000 20.798000 15.264000 2
May 38.996000 26.160000 35.714000 4
November 29.016000 16.862000 22.420408 10
October 30.582000 20.306000 97.158000 9
September 30.452000 21.962000 158.292000 8
In [352]:
bymonth2 = bymonth.set_index("m")
In [353]:
bymonth2
Out[353]:
maxtemp mintemp rainfall
m
3 37.863265 24.273469 20.234694
7 29.786000 22.086000 178.690000
11 28.004000 14.526000 5.912000
1 31.932000 17.556000 7.940000
0 28.760000 15.214000 13.178000
6 30.754000 22.560000 169.860000
5 34.528000 23.976000 103.754000
2 35.444000 20.798000 15.264000
4 38.996000 26.160000 35.714000
10 29.016000 16.862000 22.420408
9 30.582000 20.306000 97.158000
8 30.452000 21.962000 158.292000
In [354]:
bymonth
Out[354]:
maxtemp mintemp rainfall m
month
April 37.863265 24.273469 20.234694 3
August 29.786000 22.086000 178.690000 7
December 28.004000 14.526000 5.912000 11
February 31.932000 17.556000 7.940000 1
January 28.760000 15.214000 13.178000 0
July 30.754000 22.560000 169.860000 6
June 34.528000 23.976000 103.754000 5
March 35.444000 20.798000 15.264000 2
May 38.996000 26.160000 35.714000 4
November 29.016000 16.862000 22.420408 10
October 30.582000 20.306000 97.158000 9
September 30.452000 21.962000 158.292000 8
In [355]:
bymonth2.sort_index().plot()
Out[355]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f62eb755630>
In [ ]: