Advanced Python Training at Arcesium - Day 2¶

Nov 15-17, 2017 Vikrant Patil

These notes are available online at http://notes.pipal.in/2017/arcesium-oct-advpython/day2.html

Iterators and Generators¶

nums = list(range(5))

for n in nums:
    print(n)

for c in "string":
    print(c)

s
t
r
i
n
g

for key in {"x":1,"y":2}:
    print(key)

x
y

for line in open("data.csv"):
    print(repr(line))

'A1,B1,C1\n'
'A2,B2,C2\n'
'A3,B3,C3\n'
'A4,B4,C4'

The Iteration protocol¶

items = [1,2,3]

itr = iter(items)

next(itr)

1

next(itr)

2

next(itr)

3

next(itr)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-11-94b7b2f7f392> in <module>()
----> 1 next(itr)

StopIteration:

Generators¶

def square(numbers):
    for n in numbers:
        yield n*n

sq5 = square(range(1,6))

sq5

<generator object square at 0x7f632c212b48>

for i in sq5:
    print(i)

1
4
9
16
25

sq4 = square(range(1,4))

next(sq4)

1

range(5)

range(0, 5)

def square(numbers):
    print("Begin squares")
    for i in numbers:
        print("Computing square of ",i)
        yield i*i
        print("After yield")
        
    print("Finish square")

sq4 = square(range(1,4))

next(sq4)

Begin squares
Computing square of  1

1

next(sq4)

After yield
Computing square of  2

4

next(sq4)

After yield
Computing square of  3

9

next(sq4)

After yield
Finish square

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-27-b9eab369b80c> in <module>()
----> 1 next(sq4)

StopIteration:

for s in square(range(1,5)):
    print(s)

Begin squares
Computing square of  1
1
After yield
Computing square of  2
4
After yield
Computing square of  3
9
After yield
Computing square of  4
16
After yield
Finish square

def f():
    for i in range(1000):
        if i ==13:
            return
        yield i*i

for s in f():
    print(s)

0
1
4
9
16
25
36
49
64
81
100
121
144

def f():
    for i in range(1000):
        if i ==3:
            return
        yield i*i

g = f()

next(g)

0

next(g)

1

next(g)

4

next(g)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-39-5f315c5de15b> in <module>()
----> 1 next(g)

StopIteration:

problem: Write a generators function countdown that takes a number n as argument and generates all numbers down to 0 starting from n

>>> for i in countdown(3):
...     print(i)
3
2
1
0

problem: Write a generator triangular that takes number n as argument and generates sequence of first n triangular numbers. nth triangular number is sum of fisrt n natural numbers.

>>> for t in triangular(5):
...     print(t, end=",")
1,3,6,10,15

Bonus problem: Remove duplicates from a sequence while maintianing order. Can same function be used to remove duplicate lines from a file?

>>> for item in consumedup([3,5,3,4,5,6,7,8,8,9]):
...     print(item, end=",")
2,5,4,6,7,8,9

x = set()

def countdown(n):
    while n>=0:
        yield n
        n -= 1

for i in countdown(3):
    print(i)

3
2
1
0

def triangular(n):
    for i in range(1, n+1):
        yield sum(range(1,i+1))

for t in triangular(5):
    print(t, end=",")

1,3,6,10,15,

def consumedup(seq):
    seen = set()
    for item in seq:
        if item not in seen:
            yield item
            seen.add(item)

g = consumedup([3,5,3,4,5,6,7,8,8,9])

for item in g:
    print(item, end=",")

3,5,4,6,7,8,9,

%%file duplicatelines.txt
Saving file at /day2.ipynb
Saving file at /day2.ipynb
Saving file at /day2.ipynb
hello
hello
x

Overwriting duplicatelines.txt

for line in consumedup(open("duplicatelines.txt")):
    print(line, end="")

Saving file at /day2.ipynb
hello
x

Generator Expressions¶

[n*n for n in range(1,11)] # list comprehension

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

s = (n*n for n in range(1,11)) # generator expression

s

<generator object <genexpr> at 0x7f632c134308>

sum(s)

385

max(s)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-9b5ef623e450> in <module>()
----> 1 max(s)

ValueError: max() arg is an empty sequence

sum((x*x for x in range(1000000)))

333332833333500000

sum(x*x for x in range(1000000))# when generator expression is the only argument
                                # to function then you can skeep parenthesis

333332833333500000

g = consumedup(x*x for x in range(1,5))

for i in g:
    print(i, end=",")

1,4,9,16,

def ones():
    count = 0
    while True:
        if count >=3:
            DOOM
        yield 1
        count += 1

one = ones()
next(one)
next(one)
next(one)
next(one)
next(one)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-67-5acdb7be04d5> in <module>()
      3 next(one)
      4 next(one)
----> 5 next(one)
      6 next(one)

<ipython-input-66-f88b46c71a5d> in ones()
      3     while True:
      4         if count >=3:
----> 5             DOOM
      6         yield 1
      7         count += 1

NameError: name 'DOOM' is not defined

next(one)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-68-d9ee1c3c65c2> in <module>()
----> 1 next(one)

StopIteration:

What is the advantage of using generators/itereators¶

Very usefull for huge data, only part of data is loded in memory
you can build lazy pipelines of dataprocessing

Example: Building data pipelines¶

import os
def find(root):
    for path, dirnames, filenames in os.walk(root):
        for f in filenames:
            yield os.path.join(path, f)

def take(n, seq):
    it = iter(seq)
    return  list(next(it) for i in range(n))

def integers():
    """
    generates infinite sequence of natural numbers
    """
    i = 1
    while True:
        yield i
        i += 1
        
def squares(numbers):
    return (n*n for n in numbers)

take(10, squares(integers()))

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

def grep(pattern, seq):
    return (x for x in seq if pattern in x)

files = find(".")
pyfiles = grep(".py", files)
print(take(10, pyfiles))

['./module.py', './trace.py', './fib.py', './cmdline.py', './commands.py', './fib1.py', './module1.py', './memoize.py', './sum.py', './__pycache__/module.cpython-36.pyc']

def count(seq):
    return sum(1 for item in seq)

count(range(100))

100

def readlines(filenames):
    """
    returns iterator over lines of all files
    """
    for f in filenames:
        for line in open(f):
            yield line

How many line of python code we have written during this course

files = find(".")
pyfiles = grep(".py", files)
lines = readlines(pyfiles)
print(count(lines))

144

How many pythong function we have written?

files = find(".")
pyfiles = grep(".py", files)
lines = readlines(pyfiles)
functions = grep("def " ,lines)
print(count(functions))

19

problem: Write a function get_paragraphs to split given text into paragraphs. Paragraphs are seperated by empty line. The function should take a sequence of lines as argument and return a sequence of paragraphs. For sample input, see http://anandology.com/tmp/pg1342.txt once the function is there, we should be able to find:

number of paragraphs
The longest paragraph

x = [1,2,3,4]

itr = iter(x)

type(itr)

list_iterator

y = (i for i in range(5))

type(y)

generator

type(range(1,2))

range

r = range(1,2)

type(range)

type

range.__class__

type

next(y)

0

if "":
    print("x")

def get_paragraphs(seq):
    paragraphs = []
    for line in seq:
        if line.strip()=="" and paragraphs:
            yield "".join(paragraphs)
            paragraphs = []
        paragraphs.append(line)
    
    if paragraphs:
        yield "".join(paragraphs)

count(get_paragraphs(["A\nB", "\n", "A\n","B\n","\n" ,"A\n", "A\n"]))

3

max(get_paragraphs(["A\nB", "\n", "A\n","B\n","\n" ,"A\n", "A\n"]), key=len)

'\nA\nB\n'

g = get_paragraphs(open("pg1342.txt"))

count(g)

2395

g = get_paragraphs(open("pg1342.txt"))

max(g, key=len)

'\n"By this time, my dearest sister, you have received my hurried letter; I\nwish this may be more intelligible, but though not confined for time, my\nhead is so bewildered that I cannot answer for being coherent. Dearest\nLizzy, I hardly know what I would write, but I have bad news for you,\nand it cannot be delayed. Imprudent as the marriage between Mr. Wickham\nand our poor Lydia would be, we are now anxious to be assured it has\ntaken place, for there is but too much reason to fear they are not gone\nto Scotland. Colonel Forster came yesterday, having left Brighton the\nday before, not many hours after the express. Though Lydia\'s short\nletter to Mrs. F. gave them to understand that they were going to Gretna\nGreen, something was dropped by Denny expressing his belief that W.\nnever intended to go there, or to marry Lydia at all, which was\nrepeated to Colonel F., who, instantly taking the alarm, set off from B.\nintending to trace their route. He did trace them easily to Clapham,\nbut no further; for on entering that place, they removed into a hackney\ncoach, and dismissed the chaise that brought them from Epsom. All that\nis known after this is, that they were seen to continue the London road.\nI know not what to think. After making every possible inquiry on that\nside London, Colonel F. came on into Hertfordshire, anxiously renewing\nthem at all the turnpikes, and at the inns in Barnet and Hatfield, but\nwithout any success--no such people had been seen to pass through. With\nthe kindest concern he came on to Longbourn, and broke his apprehensions\nto us in a manner most creditable to his heart. I am sincerely grieved\nfor him and Mrs. F., but no one can throw any blame on them. Our\ndistress, my dear Lizzy, is very great. My father and mother believe the\nworst, but I cannot think so ill of him. Many circumstances might make\nit more eligible for them to be married privately in town than to pursue\ntheir first plan; and even if _he_ could form such a design against a\nyoung woman of Lydia\'s connections, which is not likely, can I suppose\nher so lost to everything? Impossible! I grieve to find, however, that\nColonel F. is not disposed to depend upon their marriage; he shook his\nhead when I expressed my hopes, and said he feared W. was not a man to\nbe trusted. My poor mother is really ill, and keeps her room. Could she\nexert herself, it would be better; but this is not to be expected. And\nas to my father, I never in my life saw him so affected. Poor Kitty has\nanger for having concealed their attachment; but as it was a matter of\nconfidence, one cannot wonder. I am truly glad, dearest Lizzy, that you\nhave been spared something of these distressing scenes; but now, as the\nfirst shock is over, shall I own that I long for your return? I am not\nso selfish, however, as to press for it, if inconvenient. Adieu! I\ntake up my pen again to do what I have just told you I would not; but\ncircumstances are such that I cannot help earnestly begging you all to\ncome here as soon as possible. I know my dear uncle and aunt so well,\nthat I am not afraid of requesting it, though I have still something\nmore to ask of the former. My father is going to London with Colonel\nForster instantly, to try to discover her. What he means to do I am sure\nI know not; but his excessive distress will not allow him to pursue any\nmeasure in the best and safest way, and Colonel Forster is obliged to\nbe at Brighton again to-morrow evening. In such an exigence, my\nuncle\'s advice and assistance would be everything in the world; he will\nimmediately comprehend what I must feel, and I rely upon his goodness."\n'

Working with XML¶

import requests
url = "http://www.thehindu.com/"
response = requests.get(url, params = {"service":"rss"})

xmltext = response.text

xmltext[:100]

'<?xml version="1.0" encoding="UTF-8"?>\n<rss version="2.0">\n<channel>\n<title>The Hindu - Home</title>'

from xml.etree import ElementTree as et

root = et.fromstring(xmltext)

items = root.findall(".//item")

len(items)

426

items[0]

<Element 'item' at 0x7f63240a7f48>

print(et.tostring(items[0]).decode())

<item>
<title>A fruit forest at home</title>
<author>Anasuya Menon</author>
<category>Life &amp; Style</category>
<link>http://www.thehindu.com/life-and-style/manoj-kumar-ibs-concept-fruitful-future-is-about-creating-fruit-forests/article20466856.ece?utm_source=RSS_Feed&amp;utm_medium=RSS&amp;utm_campaign=RSS_Syndication</link>
<description>
Manoj Kumar IB&#8217;s &#8216;Fruitful Future&#8217; concept is about creating fruit forests even in limited spaces 
</description>
<pubDate>Thu, 16 Nov 2017 12:34:36 +0530</pubDate>
</item>

for item in items[:10]:
    print(item.findtext("title"))
    print(item.findtext("link"))
    print("-"*50)

A fruit forest at home
http://www.thehindu.com/life-and-style/manoj-kumar-ibs-concept-fruitful-future-is-about-creating-fruit-forests/article20466856.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
BJP legislators stage protest 
http://www.thehindu.com/news/national/karnataka/bjp-legislators-stage-protest/article20466838.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 New India Assurance stock up 3% on strong Sep quarter earnings 
http://www.thehindu.com/business/new-india-assurance-stock-up-3-on-strong-sep-quarter-earnings/article20466714.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Doctors’ strike: Karnataka HC to pass order if no solution found 
http://www.thehindu.com/news/national/karnataka/doctors-strike-karnataka-hc-to-pass-order-if-no-solution-found/article20466339.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Patients left in lurch as private doctors begin indefinite strike against KPME Bill
http://www.thehindu.com/news/national/karnataka/patients-left-in-lurch-as-private-doctors-begin-indefinite-strike-against-kpme-bill/article20466193.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
Indira Gandhi stood up for refugees: Antony
http://www.thehindu.com/news/national/indira-gandhi-stood-up-for-refugees-antony/article20465664.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Review: Young dancers take the stage at Soorya’s ‘Parampara’ festival
http://www.thehindu.com/entertainment/dance/delightful-performances-by-young-classical-dancers-at-sooryas-parampara-dance-festival/article20464680.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Review: ‘Marattam’ honours veteran Kathakali artiste Sadanam Krishnankutty
http://www.thehindu.com/entertainment/theatre/marattam-in-kochi-to-honour-kathakali-artiste-sadanam-krishnankutty/article20453174.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Remembering Kathakali musician Kalamandalam Sankaran Embranthiri 
http://www.thehindu.com/entertainment/theatre/tribute-to-kathakali-musician-kalamandalam-sankaran-embranthiri-on-his-death-anniversary/article20451623.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------
 Indian student shot dead at grocery store in US
http://www.thehindu.com/news/international/indian-student-shot-dead-at-grocery-store-in-us/article20465380.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
--------------------------------------------------

from xml.dom.minidom import parseString

root = parseString(xmltext)

items = root.getElementsByTagName("item")

len(items)

426

item = items[0]

title = item.getElementsByTagName("title")[0]

title.firstChild.data

'A fruit forest at home'

JSON¶

import json

j = json.loads('{"a":2,"l":["a","b","c"]}')

type(j)

dict

j['a']

2

j['l']

['a', 'b', 'c']

d = {"service":"rss", "x":[1,2,3,4,5]}

json.dumps(d)

'{"service": "rss", "x": [1, 2, 3, 4, 5]}'

Find distance between two cities using google API

import requests
def distance(origin, dest):
    url = "https://maps.googleapis.com/maps/api/distancematrix/json"
    response = requests.get(url, params={"units":"metric",
                                         "origins":origin,
                                         "destinations":dest
                                        })
    data = response.json()
    return data['rows'][0]['elements'][0]['distance']['text']

distance("hyderabad", "mumbai")

{'destination_addresses': ['Mumbai, Maharashtra, India'],
 'origin_addresses': ['Hyderabad, Telangana, India'],
 'rows': [{'elements': [{'distance': {'text': '709 km', 'value': 709450},
     'duration': {'text': '13 hours 18 mins', 'value': 47866},
     'status': 'OK'}]}],
 'status': 'OK'}

distance("hyderabad", "mumbai")

'709 km'

Numpy¶

import numpy as np

x = np.arange(32)

x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])

x.reshape(4,8)

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31]])

y = np.arange(64).reshape(4,2,8)

y

array([[[ 0,  1,  2,  3,  4,  5,  6,  7],
        [ 8,  9, 10, 11, 12, 13, 14, 15]],

       [[16, 17, 18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29, 30, 31]],

       [[32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47]],

       [[48, 49, 50, 51, 52, 53, 54, 55],
        [56, 57, 58, 59, 60, 61, 62, 63]]])

len(y[-1][-1])

8

len(y[-1])

2

len(y)

4

y.shape

(4, 2, 8)

l,w,h = y.shape

l,w,h

(4, 2, 8)

y.dtype

dtype('int64')

y.size

64

y.itemsize

8

problem: create a 2D array of size 5x6

other ways of creating arrays¶

np.random.random(50).reshape(5,10)

array([[ 0.04499068,  0.50619285,  0.44488286,  0.82907944,  0.90284184,
         0.52232663,  0.39199814,  0.70911043,  0.84078058,  0.20210877],
       [ 0.6784636 ,  0.074447  ,  0.05762705,  0.65109205,  0.61762302,
         0.07111594,  0.5603616 ,  0.13030784,  0.15284609,  0.32295726],
       [ 0.27873428,  0.16703084,  0.6833295 ,  0.23503493,  0.35724634,
         0.78948851,  0.80452339,  0.96852529,  0.86675047,  0.49045919],
       [ 0.93068683,  0.75820907,  0.59659381,  0.50088148,  0.78470971,
         0.52832485,  0.10246332,  0.76045816,  0.16626284,  0.00690344],
       [ 0.77779312,  0.4413533 ,  0.950841  ,  0.51995188,  0.35815909,
         0.65010685,  0.05154926,  0.03869555,  0.43416689,  0.19653547]])

np.linspace(1.0, 10, 15)

array([  1.        ,   1.64285714,   2.28571429,   2.92857143,
         3.57142857,   4.21428571,   4.85714286,   5.5       ,
         6.14285714,   6.78571429,   7.42857143,   8.07142857,
         8.71428571,   9.35714286,  10.        ])

np.zeros(20).reshape(4,5)

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

vector operations¶

x = np.arange(10)

x + 10

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

x * 2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

x + x

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

x * x

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

x ** 3

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])

Playing with images¶

from scipy import misc

face = misc.face(gray=True)

type(face)

numpy.ndarray

face.ndim

2

face.shape

(768, 1024)

face.dtype

dtype('uint8')

face[1][:10]

array([ 83, 104, 123, 130, 134, 141, 145, 144, 157, 147], dtype=uint8)

#show images from matplotlib in the same HTML page
%matplotlib inline

import matplotlib.pyplot as plt

plt.imshow(face, cmap=plt.cm.gray)

<matplotlib.image.AxesImage at 0x7f62fcc756a0>

import matplotlib.pyplot as plt
def imshow(img):
    plt.imshow(img, cmap=plt.cm.gray)
    plt.show()

negface = 255 - face

imshow(negface)

face[-1][:10]

array([ 94, 106, 119, 127, 131, 134, 135, 136, 134, 139], dtype=uint8)

negface[-1][:10]

array([161, 149, 136, 128, 124, 121, 120, 119, 121, 116], dtype=uint8)

Transpose¶

x = np.arange(20).reshape(4,5)

x[1][2]

7

x[1,2]

7

x[1,:] # 1st row

array([5, 6, 7, 8, 9])

x[:,0] # 0th column

array([ 0,  5, 10, 15])

x[:,:2] # first two columns

array([[ 0,  1],
       [ 5,  6],
       [10, 11],
       [15, 16]])

x.transpose()

array([[ 0,  5, 10, 15],
       [ 1,  6, 11, 16],
       [ 2,  7, 12, 17],
       [ 3,  8, 13, 18],
       [ 4,  9, 14, 19]])

x.transpose().shape

(5, 4)

x.shape

(4, 5)

facet = face.transpose()

imshow(facet)

face.mean()

113.48026784261067

x = np.arange(10)

x < 5

array([ True,  True,  True,  True,  True, False, False, False, False, False], dtype=bool)

x[x<5]

array([0, 1, 2, 3, 4])

a = x < 5

a

array([ True,  True,  True,  True,  True, False, False, False, False, False], dtype=bool)

a.sum()

5

problem: Convert the face image to black and white image (instead of gray scale)

facebw = face > 127

imshow(facebw)

x = np.arange(10000).reshape(100,100)

imshow(x)

x[:,50] = 9999
x[50,:] = 9999

x = np.zeros_like(face)

imshow(x)

x[::10,:] = 255

imshow(x)

x = np.zeros(10000).reshape(100,100)

x[::5,:] = 255

imshow(x)

x[:,::5] = 255

imshow(x)

mesh = np.zeros_like(face)
mesh[::50,:]= 255
mesh[:,::50]= 255

imshow(mesh)

imshow(0.5*face + 0.5*mesh)

x = list(range(10))

x

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

x[2:]

[2, 3, 4, 5, 6, 7, 8, 9]

x[:3]

[0, 1, 2]

x[::2]

[0, 2, 4, 6, 8]

x[::3]

[0, 3, 6, 9]

x[::4]

[0, 4, 8]

x = np.zeros(10000).reshape(100,100)

x[:,::10] = 255 #columns at interval of 10

x[::10,:] = 255 #rows at interval of 10

imshow(x)

face2 = face + mesh
imshow(face2)

imshow(face)

face2[:,50]

array([162, 136,  82,  68, 114, 145, 132, 116,  92,  90, 118, 140, 129,
        89,  77, 119, 154, 152, 136, 113, 117, 150, 161, 141, 116, 123,
       149, 159, 118,  63,  47,  59,  91,  99,  65,  34,  53, 114, 159,
       153, 122,  98,  65,  52,  62,  77, 108, 146, 143,  79,  41,  57,
        90, 114, 111,  89,  81,  93,  87,  80,  83,  74,  65,  73,  94,
       120, 149, 164, 150, 108,  68,  54,  75,  82,  76,  70,  75,  73,
        61,  57,  77, 126, 155, 162, 183, 177, 105,  26,  69, 141, 176,
       146, 106,  73,  51,  44,  52,  61,  52,  65, 118, 144, 131, 118,
        76,  55,  83, 129, 142, 145, 135, 107,  71,  53,  94, 144, 122,
        78,  71,  81,  81, 112, 135, 132, 117, 107, 114, 130, 128,  84,
        67, 110, 163, 178, 160, 144, 173, 179, 178, 165, 149, 144, 150,
       158, 161, 146, 122, 115, 140, 165, 164, 152, 155, 175, 180, 170,
       167, 161, 150, 141, 146, 146, 138, 135, 141, 135, 128, 133, 153,
       173, 181, 169, 148, 125, 105,  97, 143, 159, 160, 154, 157, 161,
       169, 184, 174, 166, 144, 133, 145, 157, 153, 145, 156, 166, 174,
       188, 179, 133,  99, 106, 145, 183, 171, 107,  61,  58,  75,  90,
        86,  81, 102, 135, 134, 100,  62,  40,  37,  75, 129, 160, 143,
       103, 101, 131, 175, 180, 175, 167, 165, 161, 154, 151, 126, 110,
       113, 138, 162, 175, 169, 147, 115, 108, 112, 121, 124, 134, 159,
       180, 180, 172, 175, 169, 135,  95,  63,  41,  58,  94, 131, 133,
       115, 112, 107,  84,  70, 114, 136,  99,  52,  59, 105, 139, 164,
       141, 107,  79,  67,  60,  61,  69,  71,  65,  63,  71,  87, 103,
       118, 127, 116,  94,  70,  52,  57,  96, 131, 133,  92,  78,  61,
        54,  59,  64,  63,  58,  65,  65,  64,  65,  59,  51,  63,  89,
       134, 163, 192, 204, 205, 206, 209, 211, 203, 185, 161, 147, 145,
       140, 125, 110, 123, 159, 170, 140, 121, 133, 153, 163, 139, 123,
       108,  98,  90,  87,  87,  81,  77, 108, 140, 139, 137, 142, 121,
       107,  99,  94,  83,  69,  63,  68,  71,  71,  72,  73,  78,  81,
        84,  87,  90,  92,  96,  95,  93,  91,  85,  77,  72,  70, 122,
       168, 200, 191, 177, 181, 180, 169, 138, 100,  75,  94, 140, 171,
       141,  82,  81,  88,  91,  91,  83,  66,  71,  96, 102, 117, 122,
       131, 151, 167, 169, 171, 148, 131, 124, 117, 101,  92, 101, 114,
       114, 119, 130, 145, 157, 163, 165, 168, 168, 169, 174, 177, 179,
       181, 182, 182, 180, 176, 177, 177, 173, 170, 160, 146, 121, 100,
        81,  78,  91, 107, 116, 119, 119, 120, 124, 126, 128, 133, 139,
       142, 139, 132, 124, 118, 112, 109, 109, 112, 120, 116, 111, 105,
        99,  94,  97, 103, 116, 119, 123, 124, 123, 123, 130, 139, 151,
       157, 160, 157, 154, 156, 160, 163, 160, 157, 151, 135, 119, 113,
       110, 102,  99, 114, 113,  85,  54,  35,  25,  21,  24,  25,  26,
        24,  24,  25,  26,  27,  32,  33,  34,  35,  37,  41,  46,  50,
        57,  60,  63,  66,  68,  70,  70,  69,  53,  47,  32,  29,  28,
         5,  12,  60, 123, 126, 108,  85,  93, 122, 134, 132, 126, 130,
       135, 140, 147, 152, 154, 152, 147, 149, 151, 150, 150, 151, 147,
       141, 140, 145, 152, 157, 151, 134, 132, 147, 154, 161, 163, 159,
       154, 154, 156, 158, 161, 162, 163, 166, 168, 172, 175, 177, 178,
       179, 181, 183, 184, 184, 185, 184, 187, 186, 185, 182, 178, 171,
       164, 159, 141, 136, 131, 126, 125, 124, 127, 128, 144, 149, 149,
       151, 160, 164, 168, 175, 180, 179, 179, 179, 177, 173, 173, 176,
       178, 179, 181, 182, 182, 181, 180, 179, 182, 182, 181, 179, 176,
       174, 174, 174, 169, 168, 166, 164, 162, 160, 155, 154, 155, 156,
       158, 163, 164, 162, 160, 159, 158, 157, 157, 159, 158, 155, 154,
       155, 164, 167, 172, 172, 172, 183, 193, 193, 181, 168, 159, 156,
       150, 148, 145, 139, 136, 128, 126, 131, 136, 134, 134, 137, 136,
       140, 145, 148, 149, 152, 157, 162, 157, 158, 157, 156, 154, 155,
       156, 158, 157, 158, 158, 157, 156, 155, 154, 155, 159, 159, 157,
       154, 151, 149, 149, 150, 154, 156, 156, 152, 150, 152, 155, 157,
       158, 157, 156, 156, 157, 160, 164, 166, 173, 175, 178, 180, 182,
       185, 188, 190, 194, 197, 199, 198, 196, 193, 194, 195, 198, 198,
       198, 198, 199, 201, 202, 203, 197, 197, 198, 199, 200, 201, 201, 201], dtype=uint8)

face[:,50]

array([163, 137,  83,  69, 115, 146, 133, 117,  93,  91, 119, 141, 130,
        90,  78, 120, 155, 153, 137, 114, 118, 151, 162, 142, 117, 124,
       150, 160, 119,  64,  48,  60,  92, 100,  66,  35,  54, 115, 160,
       154, 123,  99,  66,  53,  63,  78, 109, 147, 144,  80,  42,  58,
        91, 115, 112,  90,  82,  94,  88,  81,  84,  75,  66,  74,  95,
       121, 150, 165, 151, 109,  69,  55,  76,  83,  77,  71,  76,  74,
        62,  58,  78, 127, 156, 163, 184, 178, 106,  27,  70, 142, 177,
       147, 107,  74,  52,  45,  53,  62,  53,  66, 119, 145, 132, 119,
        77,  56,  84, 130, 143, 146, 136, 108,  72,  54,  95, 145, 123,
        79,  72,  82,  82, 113, 136, 133, 118, 108, 115, 131, 129,  85,
        68, 111, 164, 179, 161, 145, 174, 180, 179, 166, 150, 145, 151,
       159, 162, 147, 123, 116, 141, 166, 165, 153, 156, 176, 181, 171,
       168, 162, 151, 142, 147, 147, 139, 136, 142, 136, 129, 134, 154,
       174, 182, 170, 149, 126, 106,  98, 144, 160, 161, 155, 158, 162,
       170, 185, 175, 167, 145, 134, 146, 158, 154, 146, 157, 167, 175,
       189, 180, 134, 100, 107, 146, 184, 172, 108,  62,  59,  76,  91,
        87,  82, 103, 136, 135, 101,  63,  41,  38,  76, 130, 161, 144,
       104, 102, 132, 176, 181, 176, 168, 166, 162, 155, 152, 127, 111,
       114, 139, 163, 176, 170, 148, 116, 109, 113, 122, 125, 135, 160,
       181, 181, 173, 176, 170, 136,  96,  64,  42,  59,  95, 132, 134,
       116, 113, 108,  85,  71, 115, 137, 100,  53,  60, 106, 140, 165,
       142, 108,  80,  68,  61,  62,  70,  72,  66,  64,  72,  88, 104,
       119, 128, 117,  95,  71,  53,  58,  97, 132, 134,  93,  79,  62,
        55,  60,  65,  64,  59,  66,  66,  65,  66,  60,  52,  64,  90,
       135, 164, 193, 205, 206, 207, 210, 212, 204, 186, 162, 148, 146,
       141, 126, 111, 124, 160, 171, 141, 122, 134, 154, 164, 140, 124,
       109,  99,  91,  88,  88,  82,  78, 109, 141, 140, 138, 143, 122,
       108, 100,  95,  84,  70,  64,  69,  72,  72,  73,  74,  79,  82,
        85,  88,  91,  93,  97,  96,  94,  92,  86,  78,  73,  71, 123,
       169, 201, 192, 178, 182, 181, 170, 139, 101,  76,  95, 141, 172,
       142,  83,  82,  89,  92,  92,  84,  67,  72,  97, 103, 118, 123,
       132, 152, 168, 170, 172, 149, 132, 125, 118, 102,  93, 102, 115,
       115, 120, 131, 146, 158, 164, 166, 169, 169, 170, 175, 178, 180,
       182, 183, 183, 181, 177, 178, 178, 174, 171, 161, 147, 122, 101,
        82,  79,  92, 108, 117, 120, 120, 121, 125, 127, 129, 134, 140,
       143, 140, 133, 125, 119, 113, 110, 110, 113, 121, 117, 112, 106,
       100,  95,  98, 104, 117, 120, 124, 125, 124, 124, 131, 140, 152,
       158, 161, 158, 155, 157, 161, 164, 161, 158, 152, 136, 120, 114,
       111, 103, 100, 115, 114,  86,  55,  36,  26,  22,  25,  26,  27,
        25,  25,  26,  27,  28,  33,  34,  35,  36,  38,  42,  47,  51,
        58,  61,  64,  67,  69,  71,  71,  70,  54,  48,  33,  30,  29,
         6,  13,  61, 124, 127, 109,  86,  94, 123, 135, 133, 127, 131,
       136, 141, 148, 153, 155, 153, 148, 150, 152, 151, 151, 152, 148,
       142, 141, 146, 153, 158, 152, 135, 133, 148, 155, 162, 164, 160,
       155, 155, 157, 159, 162, 163, 164, 167, 169, 173, 176, 178, 179,
       180, 182, 184, 185, 185, 186, 185, 188, 187, 186, 183, 179, 172,
       165, 160, 142, 137, 132, 127, 126, 125, 128, 129, 145, 150, 150,
       152, 161, 165, 169, 176, 181, 180, 180, 180, 178, 174, 174, 177,
       179, 180, 182, 183, 183, 182, 181, 180, 183, 183, 182, 180, 177,
       175, 175, 175, 170, 169, 167, 165, 163, 161, 156, 155, 156, 157,
       159, 164, 165, 163, 161, 160, 159, 158, 158, 160, 159, 156, 155,
       156, 165, 168, 173, 173, 173, 184, 194, 194, 182, 169, 160, 157,
       151, 149, 146, 140, 137, 129, 127, 132, 137, 135, 135, 138, 137,
       141, 146, 149, 150, 153, 158, 163, 158, 159, 158, 157, 155, 156,
       157, 159, 158, 159, 159, 158, 157, 156, 155, 156, 160, 160, 158,
       155, 152, 150, 150, 151, 155, 157, 157, 153, 151, 153, 156, 158,
       159, 158, 157, 157, 158, 161, 165, 167, 174, 176, 179, 181, 183,
       186, 189, 191, 195, 198, 200, 199, 197, 194, 195, 196, 199, 199,
       199, 199, 200, 202, 203, 204, 198, 198, 199, 200, 201, 202, 202, 202], dtype=uint8)

x = face2 - face

imshow(x)

problem: Try to swap parts of image. split the image in four parts like

AB
CD

it should become

AC
BD

face2 = face.copy()

h, w = face2.shape

TR = face2[:h//2,w//2:].copy()
BL = face2[h//2:,:w//2].copy()

imshow(face2)
imshow(TR)
imshow(BL)

face2[:h//2,w//2:] = BL
face2[h//2:,:w//2] = TR
imshow(face2)

f2 = np.rot90(face)
imshow(f2)

imshow(np.roll(face, 400))

imshow(np.flip(face, 1))

thumb = face[::4,::4]

imshow(thumb)

thumb.shape

(192, 256)

t = np.hstack([thumb, thumb, thumb, thumb])
v = np.vstack([t,t,t,t])
imshow(v)

Matplotlib¶

import numpy as np
import matplotlib.pyplot as plt

X = np.linspace(-np.pi, np.pi, 256, endpoint=True)

X.shape

(256,)

C = np.cos(X)

S = np.sin(X)

plt.plot(X,C, label="cos(x)")
plt.plot(X,S, label="sin(x)")
plt.legend()
plt.show()

T = np.tan(X)
T

array([  1.22464680e-16,   2.46449301e-02,   4.93198157e-02,
         7.40547582e-02,   9.88801519e-02,   1.23826835e-01,
         1.48926244e-01,   1.74210575e-01,   1.99712954e-01,
         2.25467616e-01,   2.51510096e-01,   2.77877435e-01,
         3.04608405e-01,   3.31743753e-01,   3.59326465e-01,
         3.87402064e-01,   4.16018933e-01,   4.45228685e-01,
         4.75086564e-01,   5.05651907e-01,   5.36988659e-01,
         5.69165959e-01,   6.02258806e-01,   6.36348824e-01,
         6.71525130e-01,   7.07885343e-01,   7.45536747e-01,
         7.84597640e-01,   8.25198908e-01,   8.67485872e-01,
         9.11620453e-01,   9.57783740e-01,   1.00617904e+00,
         1.05703550e+00,   1.11061251e+00,   1.16720497e+00,
         1.22714971e+00,   1.29083333e+00,   1.35870197e+00,
         1.43127325e+00,   1.50915142e+00,   1.59304642e+00,
         1.68379814e+00,   1.78240780e+00,   1.89007882e+00,
         2.00827073e+00,   2.13877135e+00,   2.28379480e+00,
         2.44611689e+00,   2.62926545e+00,   2.83779394e+00,
         3.07768354e+00,   3.35695082e+00,   3.68659441e+00,
         4.08212426e+00,   4.56613958e+00,   5.17290256e+00,
         5.95697769e+00,   7.01088586e+00,   8.50505855e+00,
         1.07917187e+01,   1.47354103e+01,   2.31767738e+01,
         5.41065205e+01,  -1.62335989e+02,  -3.24573411e+01,
        -1.80190765e+01,  -1.24608370e+01,  -9.51436445e+00,
        -7.68721487e+00,  -6.44210712e+00,  -5.53818992e+00,
        -4.85138736e+00,  -4.31127708e+00,  -3.87491778e+00,
        -3.51463544e+00,  -3.21179171e+00,  -2.95337427e+00,
        -2.73002271e+00,  -2.53483125e+00,  -2.36259336e+00,
        -2.20930931e+00,  -2.07185542e+00,  -1.94775662e+00,
        -1.83502616e+00,  -1.73205081e+00,  -1.63750682e+00,
        -1.55029770e+00,  -1.46950733e+00,  -1.39436424e+00,
        -1.32421401e+00,  -1.25849780e+00,  -1.19673541e+00,
        -1.13851183e+00,  -1.08346641e+00,  -1.03128418e+00,
        -9.81688718e-01,  -9.34436362e-01,  -8.89311374e-01,
        -8.46121975e-01,  -8.04697006e-01,  -7.64883142e-01,
        -7.26542528e-01,  -6.89550784e-01,  -6.53795302e-01,
        -6.19173786e-01,  -5.85593003e-01,  -5.52967699e-01,
        -5.21219665e-01,  -4.90276921e-01,  -4.60073002e-01,
        -4.30546337e-01,  -4.01639694e-01,  -3.73299701e-01,
        -3.45476407e-01,  -3.18122901e-01,  -2.91194969e-01,
        -2.64650778e-01,  -2.38450601e-01,  -2.12556562e-01,
        -1.86932397e-01,  -1.61543248e-01,  -1.36355456e-01,
        -1.11336383e-01,  -8.64542334e-02,  -6.16778888e-02,
        -3.69767523e-02,  -1.23205945e-02,   1.23205945e-02,
         3.69767523e-02,   6.16778888e-02,   8.64542334e-02,
         1.11336383e-01,   1.36355456e-01,   1.61543248e-01,
         1.86932397e-01,   2.12556562e-01,   2.38450601e-01,
         2.64650778e-01,   2.91194969e-01,   3.18122901e-01,
         3.45476407e-01,   3.73299701e-01,   4.01639694e-01,
         4.30546337e-01,   4.60073002e-01,   4.90276921e-01,
         5.21219665e-01,   5.52967699e-01,   5.85593003e-01,
         6.19173786e-01,   6.53795302e-01,   6.89550784e-01,
         7.26542528e-01,   7.64883142e-01,   8.04697006e-01,
         8.46121975e-01,   8.89311374e-01,   9.34436362e-01,
         9.81688718e-01,   1.03128418e+00,   1.08346641e+00,
         1.13851183e+00,   1.19673541e+00,   1.25849780e+00,
         1.32421401e+00,   1.39436424e+00,   1.46950733e+00,
         1.55029770e+00,   1.63750682e+00,   1.73205081e+00,
         1.83502616e+00,   1.94775662e+00,   2.07185542e+00,
         2.20930931e+00,   2.36259336e+00,   2.53483125e+00,
         2.73002271e+00,   2.95337427e+00,   3.21179171e+00,
         3.51463544e+00,   3.87491778e+00,   4.31127708e+00,
         4.85138736e+00,   5.53818992e+00,   6.44210712e+00,
         7.68721487e+00,   9.51436445e+00,   1.24608370e+01,
         1.80190765e+01,   3.24573411e+01,   1.62335989e+02,
        -5.41065205e+01,  -2.31767738e+01,  -1.47354103e+01,
        -1.07917187e+01,  -8.50505855e+00,  -7.01088586e+00,
        -5.95697769e+00,  -5.17290256e+00,  -4.56613958e+00,
        -4.08212426e+00,  -3.68659441e+00,  -3.35695082e+00,
        -3.07768354e+00,  -2.83779394e+00,  -2.62926545e+00,
        -2.44611689e+00,  -2.28379480e+00,  -2.13877135e+00,
        -2.00827073e+00,  -1.89007882e+00,  -1.78240780e+00,
        -1.68379814e+00,  -1.59304642e+00,  -1.50915142e+00,
        -1.43127325e+00,  -1.35870197e+00,  -1.29083333e+00,
        -1.22714971e+00,  -1.16720497e+00,  -1.11061251e+00,
        -1.05703550e+00,  -1.00617904e+00,  -9.57783740e-01,
        -9.11620453e-01,  -8.67485872e-01,  -8.25198908e-01,
        -7.84597640e-01,  -7.45536747e-01,  -7.07885343e-01,
        -6.71525130e-01,  -6.36348824e-01,  -6.02258806e-01,
        -5.69165959e-01,  -5.36988659e-01,  -5.05651907e-01,
        -4.75086564e-01,  -4.45228685e-01,  -4.16018933e-01,
        -3.87402064e-01,  -3.59326465e-01,  -3.31743753e-01,
        -3.04608405e-01,  -2.77877435e-01,  -2.51510096e-01,
        -2.25467616e-01,  -1.99712954e-01,  -1.74210575e-01,
        -1.48926244e-01,  -1.23826835e-01,  -9.88801519e-02,
        -7.40547582e-02,  -4.93198157e-02,  -2.46449301e-02,
        -1.22464680e-16])

plt.plot(X,T, label="tan(x)")
plt.legend()
plt.show()

n = 1024
X = np.random.normal(0,1, n)
Y = np.random.normal(0,1, n)
plt.scatter(X,Y)
plt.show()

Example: temperature and rainfall data¶

Download data from http://notes.pipal.in/2017/arcesium-oct-advpython/HYDERABAD-weather.csv

import csv

data = list(csv.reader(open("HYDERABAD-weather.csv")))

data[:3]

[['', 'city', 'month', 'year', 'maxtemp', 'mintemp', 'rainfall'],
 ['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0']]

data = data[1:] # skip header

tmax = [float(row[4]) for row in data]
tmin = [float(row[5]) for row in data]

plt.scatter(tmin, tmax)

<matplotlib.collections.PathCollection at 0x7f62fc2a72e8>

rainfall = [float(row[6]) for row in data]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-275-5b1992869e33> in <module>()
----> 1 rainfall = [float(row[6]) for row in data]

<ipython-input-275-5b1992869e33> in <listcomp>(.0)
----> 1 rainfall = [float(row[6]) for row in data]

ValueError: could not convert string to float:

def safefloat(value):
    try:
        return float(value)
    except ValueError:
        print("bad value: %r"% value)
        return 0.0
rainfall = [safefloat(row[6]) for row in data]

bad value: ''

plt.scatter(tmax, rainfall)

<matplotlib.collections.PathCollection at 0x7f62fc0c0b00>

n = 10
X = np.arange(n)
Y = np.random.normal(0,100,n)
plt.bar(X,Y)

<Container object of 10 artists>

problem: Using above dataset, plot a bar chart of average rainfall per month.

data[:2]

[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0']]

data[:10]

[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0'],
 ['2', 'HYDERABAD', 'January', '1953', '28.6', '14.6', '3.5'],
 ['3', 'HYDERABAD', 'January', '1954', '28.2', '13.9', '0.0'],
 ['4', 'HYDERABAD', 'January', '1955', '28.0', '14.7', '0.0'],
 ['5', 'HYDERABAD', 'January', '1956', '28.1', '14.2', '0.0'],
 ['6', 'HYDERABAD', 'January', '1957', '29.0', '14.5', '0.0'],
 ['7', 'HYDERABAD', 'January', '1958', '28.9', '14.5', '0.0'],
 ['8', 'HYDERABAD', 'January', '1959', '28.7', '15.5', '0.0'],
 ['9', 'HYDERABAD', 'January', '1960', '28.4', '17.0', '0.0']]

months = np.array([row[2] for row in data])

rainfall = np.array([safefloat(row[-1]) for row in data])

bad value: ''

rainfall[months == "January"].mean()

13.177999999999997

import datetime
list_of_months = [datetime.date(2000, i+1, 1).strftime("%B") for i in range(12)]

list_of_months

['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']

def get_mean_rainfall(month):
    return rainfall[months == month].mean()

mean_rainfall = [get_mean_rainfall(m) for m in list_of_months]

mean_rainfall

[13.177999999999997,
 7.9400000000000004,
 15.264000000000001,
 20.23469387755102,
 35.713999999999999,
 103.75399999999999,
 169.86000000000001,
 178.69,
 158.292,
 97.158000000000015,
 21.971999999999998,
 5.9120000000000008]

plt.bar(range(12), mean_rainfall)

<Container object of 12 artists>

x = np.arange(10)

x > 5

array([False, False, False, False, False, False,  True,  True,  True,  True], dtype=bool)

x = np.arange(3)

x

array([0, 1, 2])

x[np.array([True, False, True, True])]

/home/vikrant/usr/local/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 4
  """Entry point for launching an IPython kernel.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-297-9ce7fde9f5fa> in <module>()
----> 1 x[np.array([True, False, True, True])]

IndexError: index 3 is out of bounds for axis 1 with size 3

Pandas¶

import pandas as pd
import numpy as np
%matplotlib inline

x = pd.Series(range(10))

x

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])

s

a    0.543000
b    0.167640
c   -0.155607
d    0.723278
e    0.068404
dtype: float64

d = {'a':0, 'b':1, 'c':2}
s = pd.Series(d)

s

a    0
b    1
c    2
dtype: int64

pd.Series(d, index=['b','c','d','a'])

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])

s[0]

-1.0835475405050732

s[:3]

a   -1.083548
b    1.650536
c    0.538694
dtype: float64

s['a']

-1.0835475405050732

s[1:4:2]

b    1.650536
d    0.307735
dtype: float64

s[s > s.median()]

b    1.650536
c    0.538694
dtype: float64

s.mean()

0.2347597680155867

np.exp(s)

a    0.338393
b    5.209773
c    1.713767
d    1.360340
e    0.786928
dtype: float64

'e' in s

True

s['a']

-1.0835475405050732

'z' in s

False

s + s

a   -2.167095
b    3.301073
c    1.077388
d    0.615470
e   -0.479237
dtype: float64

s * s

a    1.174075
b    2.724270
c    0.290191
d    0.094701
e    0.057417
dtype: float64

data = [["A",1], ["B", 2], ["c",3], ["D",4]]

pd.DataFrame(data)

d = {"one":[1. , 2. , 3., 4.],
     "two":[4. ,3., 2. , 1.]
    }
df = pd.DataFrame(d, index=['a','b','c','d'])

df

df['one']

a    1.0
b    2.0
c    3.0
d    4.0
Name: one, dtype: float64

df['one']['a']

1.0

df.columns

Index(['one', 'two'], dtype='object')

df.columns = ["column1", "column2"]

df

df2 = df.set_index("column2")

df2

df2['column1'][4.0]

1.0

df.to_csv("df.csv")

!cat df.csv

,column1,column2
a,1.0,4.0
b,2.0,3.0
c,3.0,2.0
d,4.0,1.0

Playing with weather dataset¶

df = pd.read_csv("HYDERABAD-weather.csv", index_col=0)

df

df.head()

df.tail()

df.plot("maxtemp", "mintemp", kind="scatter")

<matplotlib.axes._subplots.AxesSubplot at 0x7f62ebf8e7f0>

mean = df.groupby("year").mean()

mean

mean.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x7f62eb7d66d8>

bymonth = df.groupby("month").mean()

bymonth

del bymonth['year']

bymonth

bymonth.index

Index(['April', 'August', 'December', 'February', 'January', 'July', 'June',
       'March', 'May', 'November', 'October', 'September'],
      dtype='object', name='month')

newindex = [list_of_months.index(month) for month in bymonth.index]

newindex

[3, 7, 11, 1, 0, 6, 5, 2, 4, 10, 9, 8]

bymonth['m'] = newindex

bymonth

bymonth2 = bymonth.set_index("m")

bymonth2

bymonth

bymonth2.sort_index().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x7f62eb755630>

	maxtemp	mintemp	rainfall
year
1951	32.666667	20.233333	58.975000
1952	31.975000	19.891667	46.741667
1953	32.183333	20.266667	74.245455
1954	31.525000	19.875000	70.366667
1955	30.883333	19.725000	92.775000
1956	30.783333	19.791667	64.941667
1957	31.533333	20.016667	66.783333
1958	31.733333	20.475000	76.216667
1959	31.900000	20.358333	64.825000
1960	31.841667	20.416667	57.775000
1961	31.258333	20.225000	68.400000
1962	30.418182	19.490909	107.118182
1963	31.133333	19.308333	69.125000
1964	32.150000	19.658333	58.400000
1965	32.600000	19.541667	67.441667
1966	32.666667	20.550000	55.358333
1967	32.625000	19.483333	69.383333
1968	32.416667	19.200000	53.250000
1969	32.575000	20.408333	53.100000
1970	32.041667	19.825000	95.566667
1971	31.975000	20.008333	55.433333
1972	32.633333	21.008333	42.958333
1973	32.425000	21.200000	73.183333
1974	32.175000	20.016667	56.283333
1975	31.266667	20.258333	115.291667
1976	32.050000	20.441667	66.075000
1977	32.258333	20.775000	45.300000
1978	31.575000	21.108333	93.116667
1979	32.400000	21.616667	58.650000
1980	32.816667	21.425000	49.325000
1981	32.108333	20.750000	82.750000
1982	32.241667	21.133333	63.891667
1983	32.541667	21.000000	110.025000
1984	32.483333	21.033333	64.083333
1985	32.875000	20.975000	31.116667
1986	32.800000	21.441667	51.775000
1987	32.433333	21.200000	80.250000
1988	32.525000	21.400000	76.458333
1989	32.325000	20.825000	83.883333
1990	31.541667	20.991667	76.666667
1991	32.450000	21.416667	64.200000
1992	32.683333	20.650000	63.716667
1993	32.733333	20.516667	60.458333
1994	32.225000	20.516667	68.325000
1995	32.183333	20.916667	101.991667
1996	32.633333	20.958333	80.958333
1997	32.616667	21.025000	63.750000
1998	33.125000	21.683333	78.516667
1999	32.608333	20.341667	47.008333
2000	32.583333	20.391667	87.066667

	year	maxtemp	mintemp	rainfall
month
April	1975.77551	37.863265	24.273469	20.234694
August	1975.50000	29.786000	22.086000	178.690000
December	1975.50000	28.004000	14.526000	5.912000
February	1975.50000	31.932000	17.556000	7.940000
January	1975.50000	28.760000	15.214000	13.178000
July	1975.50000	30.754000	22.560000	169.860000
June	1975.50000	34.528000	23.976000	103.754000
March	1975.50000	35.444000	20.798000	15.264000
May	1975.50000	38.996000	26.160000	35.714000
November	1975.50000	29.016000	16.862000	22.420408
October	1975.50000	30.582000	20.306000	97.158000
September	1975.50000	30.452000	21.962000	158.292000

	maxtemp	mintemp	rainfall
month
April	37.863265	24.273469	20.234694
August	29.786000	22.086000	178.690000
December	28.004000	14.526000	5.912000
February	31.932000	17.556000	7.940000
January	28.760000	15.214000	13.178000
July	30.754000	22.560000	169.860000
June	34.528000	23.976000	103.754000
March	35.444000	20.798000	15.264000
May	38.996000	26.160000	35.714000
November	29.016000	16.862000	22.420408
October	30.582000	20.306000	97.158000
September	30.452000	21.962000	158.292000

	maxtemp	mintemp	rainfall	m
month
April	37.863265	24.273469	20.234694	3
August	29.786000	22.086000	178.690000	7
December	28.004000	14.526000	5.912000	11
February	31.932000	17.556000	7.940000	1
January	28.760000	15.214000	13.178000	0
July	30.754000	22.560000	169.860000	6
June	34.528000	23.976000	103.754000	5
March	35.444000	20.798000	15.264000	2
May	38.996000	26.160000	35.714000	4
November	29.016000	16.862000	22.420408	10
October	30.582000	20.306000	97.158000	9
September	30.452000	21.962000	158.292000	8

	maxtemp	mintemp	rainfall
m
3	37.863265	24.273469	20.234694
7	29.786000	22.086000	178.690000
11	28.004000	14.526000	5.912000
1	31.932000	17.556000	7.940000
0	28.760000	15.214000	13.178000
6	30.754000	22.560000	169.860000
5	34.528000	23.976000	103.754000
2	35.444000	20.798000	15.264000
4	38.996000	26.160000	35.714000
10	29.016000	16.862000	22.420408
9	30.582000	20.306000	97.158000
8	30.452000	21.962000	158.292000

	city	month	year	maxtemp	mintemp	rainfall
0	HYDERABAD	January	1951	29.0	14.8	0.0
1	HYDERABAD	January	1952	29.1	13.6	0.0
2	HYDERABAD	January	1953	28.6	14.6	3.5
3	HYDERABAD	January	1954	28.2	13.9	0.0
4	HYDERABAD	January	1955	28.0	14.7	0.0
5	HYDERABAD	January	1956	28.1	14.2	0.0
6	HYDERABAD	January	1957	29.0	14.5	0.0
7	HYDERABAD	January	1958	28.9	14.5	0.0
8	HYDERABAD	January	1959	28.7	15.5	0.0
9	HYDERABAD	January	1960	28.4	17.0	0.0
10	HYDERABAD	January	1961	28.4	15.6	0.4
11	HYDERABAD	January	1962	27.5	12.7	0.0
12	HYDERABAD	January	1963	26.7	13.2	0.0
13	HYDERABAD	January	1964	29.9	14.4	0.0
14	HYDERABAD	January	1965	28.3	14.2	1.0
15	HYDERABAD	January	1966	28.8	16.5	3.9
16	HYDERABAD	January	1967	29.2	14.6	0.0
17	HYDERABAD	January	1968	28.3	13.3	7.8
18	HYDERABAD	January	1969	29.3	14.1	7.3
19	HYDERABAD	January	1970	28.9	15.2	5.6
20	HYDERABAD	January	1971	28.8	15.0	2.4
21	HYDERABAD	January	1972	28.1	13.5	0.0
22	HYDERABAD	January	1973	30.6	16.1	0.0
23	HYDERABAD	January	1974	29.1	13.4	0.0
24	HYDERABAD	January	1975	27.5	14.1	50.9
25	HYDERABAD	January	1976	26.5	13.2	0.0
26	HYDERABAD	January	1977	29.1	14.0	0.0
27	HYDERABAD	January	1978	28.4	16.5	5.4
28	HYDERABAD	January	1979	28.9	17.3	0.0
29	HYDERABAD	January	1980	29.7	16.8	0.0
...	...	...	...	...	...	...
569	HYDERABAD	December	1971	26.9	12.5	0.0
570	HYDERABAD	December	1972	28.2	16.9	3.0
571	HYDERABAD	December	1973	27.2	14.8	0.3
572	HYDERABAD	December	1974	26.9	12.4	0.0
573	HYDERABAD	December	1975	26.6	11.6	0.0
574	HYDERABAD	December	1976	28.5	15.5	0.0
575	HYDERABAD	December	1977	28.0	13.8	1.5
576	HYDERABAD	December	1978	27.6	16.5	0.0
577	HYDERABAD	December	1979	28.2	16.4	0.0
578	HYDERABAD	December	1980	28.7	15.6	3.7
579	HYDERABAD	December	1981	27.6	15.7	0.0
580	HYDERABAD	December	1982	28.3	15.0	0.0
581	HYDERABAD	December	1983	26.9	15.5	12.3
582	HYDERABAD	December	1984	29.8	15.5	0.0
583	HYDERABAD	December	1985	29.4	15.9	4.7
584	HYDERABAD	December	1986	28.8	17.1	15.5
585	HYDERABAD	December	1987	27.7	16.5	2.8
586	HYDERABAD	December	1988	27.8	15.4	13.3
587	HYDERABAD	December	1989	27.5	15.8	1.6
588	HYDERABAD	December	1990	27.9	16.8	0.0
589	HYDERABAD	December	1991	28.1	14.9	0.3
590	HYDERABAD	December	1992	27.1	13.8	0.0
591	HYDERABAD	December	1993	27.1	13.2	34.9
592	HYDERABAD	December	1994	27.9	12.0	0.0
593	HYDERABAD	December	1995	28.9	15.9	0.0
594	HYDERABAD	December	1996	28.3	14.9	0.0
595	HYDERABAD	December	1997	28.7	19.2	40.6
596	HYDERABAD	December	1998	28.7	12.8	0.0
597	HYDERABAD	December	1999	29.0	14.2	0.0
598	HYDERABAD	December	2000	29.6	13.3	1.0