Advanced Python Training at Arcesium - Day 2¶

Sep 25-27, 2019 Vikrant Patil

These notes are available online at http://notes.pipal.in/2019/arcesium_advanced_sep/day2.html

We will be using python 3 (>= 3.0) from anaconda for this training. You can download it from

https://www.anaconda.com/download/

Understanding Iterations¶

for i in [1, 2, 3, 4]:
    print(i)

for i in range(3):
    print(i)

0
1
2

for key in {"one":1, "two":2}:
    print(key)

one
two

The iteration protocol¶

items = [1, 2, 3, 4, 5]

itr_items = iter(items)

itr_items

<list_iterator at 0x7f442b929320>

next(itr_items)

1

next(itr_items)

2

next(itr_items)

3

next(itr_items)

4

next(itr_items)

5

next(itr_items)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-13-8f3d308df4eb> in <module>
----> 1 next(itr_items)

StopIteration:

generators¶

def squares(numbers):
    for n in numbers:
        yield n*n

squares

<function __main__.squares(numbers)>

sqr = squares(range(5))

sqr

<generator object squares at 0x7f442bd50f68>

for s in sqr:
    print(s)

0
1
4
9
16

for s in sqr:
    print(s)

def squares(numbers):
    print("Begin squares")
    for n in numbers:
        print("Computing square of", n)
        yield n*n
        print("Back to squares")
    print("Finished squares")

sq4 = squares(range(1, 5))

sq4

<generator object squares at 0x7f4448017e60>

next(sq4)

Begin squares
Computing square of 1

1

next(sq4)

Back to squares
Computing square of 2

4

next(sq4)

Back to squares
Computing square of 3

9

next(sq4)

Back to squares
Computing square of 4

16

next(sq4)

Back to squares
Finished squares

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-28-e55b3354c5d2> in <module>
----> 1 next(sq4)

StopIteration:

problems

Write a generator which can count number from n to 1.

>>> for i in countdown(4):
      print(i, end=",")
4,3,2,1

Is it possible to know lenghth of sequence generated through generator object?
Can we write a generator ones which can generate infinite sequence of ones?
How do we work with infinite sequences?
Write a term generator for y'days piseries. how can we use this generator to sum?

def foo(n):
    i = 1
    while True:
        print("Before yield")
        yield i
        print("after yield")
        
        if i ==4:
            return i+1
        i += 1

for i in foo(50):
    print(i, "="*5)

Before yield
1 =====
after yield
Before yield
2 =====
after yield
Before yield
3 =====
after yield
Before yield
4 =====
after yield

def ones():
    while True:
        yield 1

def take(seq, n):
    return [next(seq) for i in range(n)]

infinite = ones()

take(infinite, 10)

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

f = foo(4)

x = next(f)

Before yield

x

1

x=next(f)

after yield
Before yield

x

2

x = next(f)

after yield
Before yield

x

3

x = next(f)

after yield
Before yield

x

4

x = next(f)

after yield

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-61-dc64844b6993> in <module>
----> 1 x = next(f)

StopIteration: 5

x

4

f = foo(3)

next(f)

Before yield

1

next(f)

after yield
Before yield

2

next(f)

after yield
Before yield

3

f()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-c43e34e6d405> in <module>
----> 1 f()

TypeError: 'generator' object is not callable

try:
    next(f)
except StopIteration as s:

after yield
5

def piseries():
    n = 1
    while True:
        yield 8/((4*n-3)*(4*n-1))
        n += 1
        
def pi(n):
    series = piseries()
    return sum(take(series, n))

pi(10000)

3.1415426535898203

len(piseries())

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-76-d5cbcacc3e4a> in <module>
----> 1 len(piseries())

TypeError: object of type 'generator' has no len()

take(piseries(), 10)

[2.6666666666666665,
 0.22857142857142856,
 0.08080808080808081,
 0.041025641025641026,
 0.02476780185758514,
 0.016563146997929608,
 0.011851851851851851,
 0.008898776418242492,
 0.006926406926406926,
 0.005544005544005544]

Building data pipeline¶

import os

def find(root):
    for path, direnames, filenames in os.walk(root):
        for f in filenames:
            yield os.path.join(path, f)

def grep(pattern , seq):
    return (x for x in seq if pattern in x)

files = find("/home/vikrant/trainings")
pyfiles = grep(".py", files)
print(take(pyfiles, 5))

['/home/vikrant/trainings/2018/vmware-pune-jan-python/echo.py', '/home/vikrant/trainings/2018/vmware-pune-jan-python/module1.py', '/home/vikrant/trainings/2018/vmware-pune-jan-python/yes.py', '/home/vikrant/trainings/2018/vmware-pune-jan-python/wc.py', '/home/vikrant/trainings/2018/vmware-pune-jan-python/hello2.py']

x = (i*i for i in range(5))

x

<generator object <genexpr> at 0x7f442be0e0f8>

next(x)

0

next(x)

1

for i in x:
    print(i)

4
9
16

def count(seq):
    return sum(1 for i in seq)  
    
def count(seq):
    i = 0
    for x in seq:
        i += 1
    return i

def count(seq):
    return sum((1 for i in seq))

files = find("/home/vikrant/trainings")
pyfiles = grep(".py", files)
print(count(pyfiles))

847

def readlines(filenames):
    for file in filenames:
        with open(file) as f:
            for line in f:
                yield line

f = open("day1.html")

f. close()

files = find(".")
pyfiles = grep(".", files)
lines = readlines(pyfiles)
funcs = grep("def", lines)
count(funcs)

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-132-b01479fda144> in <module>
      3 lines = readlines(pyfiles)
      4 funcs = grep("def", lines)
----> 5 count(funcs)

<ipython-input-93-b1a97f114a99> in count(seq)
      9 
     10 def count(seq):
---> 11     return sum((1 for i in seq))

<ipython-input-93-b1a97f114a99> in <genexpr>(.0)
      9 
     10 def count(seq):
---> 11     return sum((1 for i in seq))

<ipython-input-81-da1dc4d4db3e> in <genexpr>(.0)
      1 def grep(pattern , seq):
----> 2     return (x for x in seq if pattern in x)

<ipython-input-129-a0ccabbf27c8> in readlines(filenames)
      2     for file in filenames:
      3         with open(file) as f:
----> 4             for line in f:
      5                 yield line

~/anaconda3/envs/vis/lib/python3.6/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfb in position 0: invalid start byte

def antigrep(pattern,  seq):
    return (x for x in seq if pattern not in x)

files = find("/home/vikrant/trainings/")
pyfiles = grep(".py", files)
pyfiles = antigrep(".pyc", pyfiles)
pyfiles = antigrep("~", pyfiles)
#print(take(pyfiles, 50))
lines = readlines(pyfiles)
funcs = grep("def", lines)
count(funcs)

870

problem

Write a function get_paragraphs to split given text into paragraphs. The function should take a sequence of lines and should return a sequence of paragraphs. Whenever you encounter empty line, it is end of previous paragraph.

How many paragraphs are there?
longest paragraph

https://ia802902.us.archive.org/4/items/prideandprejudic01342gut/pandp12.txt

import requests
def wget(url, filename):
    resp = requests.get(url)
    with open(filename, "w") as f:
        f.write(resp.text)

url="https://ia802902.us.archive.org/4/items/prideandprejudic01342gut/pandp12.txt"
wget(url, "pandp.txt")

!tail pandp.txt

def get_paragraphs(lines):
    paragraph = []
    for line in lines:
        if line.strip() !="":
            paragraph.append(line.strip())
        elif paragraph:
            yield "\n".join(paragraph)
            paragraph = []
    if paragraph:
        yield "\n".join(paragraph)

lines = readlines(["pandp.txt"])

paras = get_paragraphs(lines)
count(paras)

2202

lines = readlines(["pandp.txt"])

paras = get_paragraphs(lines)
max(paras, key=len)

'"By this time, my dearest sister, you have received my hurried\nletter; I wish this may be more intelligible, but though not\nconfined for time, my head is so bewildered that I cannot answer\nfor being coherent.  Dearest Lizzy, I hardly know what I would\nwrite, but I have bad news for you, and it cannot be delayed.\nImprudent as the marriage between Mr. Wickham and our poor\nLydia would be, we are now anxious to be assured it has taken\nplace, for there is but too much reason to fear they are not gone\nto Scotland.  Colonel Forster came yesterday, having left\nBrighton the day before, not many hours after the express.\nThough Lydia\'s short letter to Mrs. F. gave them to understand\nthat they were going to Gretna Green, something was dropped\nby Denny expressing his belief that W. never intended to go\nthere, or to marry Lydia at all, which was repeated to Colonel\nF., who, instantly taking the alarm, set off from B. intending to\ntrace their route.  He did trace them easily to Clapham, but no\nfurther; for on entering that place, they removed into a hackney\ncoach, and dismissed the chaise that brought them from Epsom.\nAll that is known after this is, that they were seen to continue\nthe London road.  I know not what to think.  After making every\npossible inquiry on that side London, Colonel F. came on into\nHertfordshire, anxiously renewing them at all the turnpikes, and\nat the inns in Barnet and Hatfield, but without any success--no\nsuch people had been seen to pass through.  With the kindest\nconcern he came on to Longbourn, and broke his apprehensions\nto us in a manner most creditable to his heart.  I am sincerely\ngrieved for him and Mrs. F., but no one can throw any blame\non them.  Our distress, my dear Lizzy, is very great.  My father\nand mother believe the worst, but I cannot think so ill of him.\nMany circumstances might make it more eligible for them to be\nmarried privately in town than to pursue their first plan;\nand even if _he_ could form such a design against a young woman\nof Lydia\'s connections, which is not likely, can I suppose her\nso lost to everything?  Impossible!  I grieve to find, however,\nthat Colonel F. is not disposed to depend upon their marriage;\nhe shook his head when I expressed my hopes, and said he feared\nW. was not a man to be trusted.  My poor mother is really ill,\nand keeps her room.  Could she exert herself, it would be better;\nbut this is not to be expected.  And as to my father, I never in\nmy life saw him so affected.  Poor Kitty has anger for having\nconcealed their attachment; but as it was a matter of confidence,\none cannot wonder.  I am truly glad, dearest Lizzy, that you\nhave been spared something of these distressing scenes; but\nnow, as the first shock is over, shall I own that I long for\nyour return?  I am not so selfish, however, as to press for it,\nif inconvenient.  Adieu!  I take up my pen again to do what I\nhave just told you I would not; but circumstances are such that\nI cannot help earnestly begging you all to come here as soon as\npossible.  I know my dear uncle and aunt so well, that I am not\nafraid of requesting it, though I have still something more to\nask of the former.  My father is going to London with Colonel\nForster instantly, to try to discover her.  What he means to do\nI am sure I know not; but his excessive distress will not allow\nhim to pursue any measure in the best and safest way, and\nColonel Forster is obliged to be at Brighton again to-morrow\nevening.  In such an exigence, my uncle\'s advice and assistance\nwould be everything in the world; he will immediately comprehend\nwhat I must feel, and I rely upon his goodness."'

'Please read the "legal small print," and other information about the\neBook and Project Gutenberg at the bottom of this file.  Included is\nimportant information about your specific rights and restrictions in\nhow the file may be used.  You can also find out about how to make a\ndonation to Project Gutenberg, and how to get involved.'

!wc pandp.txt

 14583 123882 717331 pandp.txt

Numpy¶

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7, 8])

a

array([1, 2, 3, 4, 5, 6, 7, 8])

a.shape

(8,)

a.ndim

1

a100 = np.arange(100).reshape(10, 10)

a100

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

a100.shape

(10, 10)

a100.ndim

2

a100[0]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

a100[0][0]

0

a100[:,0]

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

a100[0,:]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

a100[1,:]

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

np.zeros(100).reshape(5,20)

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.]])

np.zeros_like(a100)

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

np.ones_like(a100)

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

np.asarray([1,2,3,4,5,6,7])

array([1, 2, 3, 4, 5, 6, 7])

np.asarray(np.zeros(20).reshape(5,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

empty = np.empty(1000).reshape(10,10,10)

empty.ndim

3

empty.shape

(10, 10, 10)

np.empty_like(a100)

array([[     94617717583296,      94617681087424,                 747,
                         -1,      94617681101796,                   0,
        2314885530449486370, 8463501003136704544, 7863396448688172658,
        8511923646844464745],
       [8028899027463793761, 8674250015740290926, 2314885435961013097,
        8241956891972870176,  753071342771072353, 2314885530818453536,
        3255307777713450285, 2314885530817015085, 6926647712756539424,
        8027138984375840097],
       [7308339910637985906, 8242558622326875508, 3347140372856074345,
        2314885530818447882, 7800551649446141984, 2314885530817032051,
        3255307777712594976, 2314885530450210093, 7956010480644923424,
        7021800531646357548],
       [2308669228312243833, 5629534856563138592, 2314885437726225519,
        3255307721659916320, 2314885530817015085, 2338328528344326176,
        8295742012915741545, 6926663087888362849, 3348814942099825774,
        8458358425123711341],
       [7958552634295722100, 8386104240768098419, 7665811700586146162,
        2314885530454877029, 8243109553820934176, 8246760951645216869,
        2332986006060300641, 7309940760864190327, 2336912048571708788,
        3348814942099825774],
       [2314885530818447882, 7813865618884861984, 2314885530817033061,
        3255307777712594976, 2314885530450210093, 2323362894317625376,
        3346295663229739128, 7937726737527628141, 7306930285241773680,
        8315177829594706216],
       [3184933515446739304,  754388376050280756, 2314885530818453536,
        6568632450806997357, 3251634305119559771, 2318283077582859313,
        2314861394126908704, 2314885530818453536, 2331492554444382240,
        3185501927436465197],
       [3251634305221074976, 2314885530450156855, 2314885530818453536,
        4047927215829032992, 3251647555043336236, 6715202589269176369,
        2314885530817014109, 8655986920857280544, 2308703012608634158,
        3251634253311516704],
       [2314885530817016113, 8655986920857280544,  732169364633644334,
        2314885530818453536, 6568632450806997357, 3251634305254629467,
        2318280895957576761, 2308703239953002797, 4476613351956291616,
        7956010263077994046],
       [2314885530449948968, 8247323938940526624, 3687639246629795945,
        2314885530817014877, 2314885530818453536, 6716886986237419552,
        2314885530818447916, 2314885530818453536, 6727587505645494304,
             94613835221545]])

a100[:5, :5]

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

a100[5:,5:]

array([[55, 56, 57, 58, 59],
       [65, 66, 67, 68, 69],
       [75, 76, 77, 78, 79],
       [85, 86, 87, 88, 89],
       [95, 96, 97, 98, 99]])

subview = a100[:5,:5]

subview

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

subview[0,0] = -1

a100

array([[-1,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

scopy = subview.copy()

scopy[0,0]=0

subview

array([[-1,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

scopy

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

d = np.array(range(10))

d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

d > 3

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

d[d>3]

array([4, 5, 6, 7, 8, 9])

d - 2

array([-2, -1,  0,  1,  2,  3,  4,  5,  6,  7])

d2 = d*2

d2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

d + d2

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

d * d2

array([  0,   2,   8,  18,  32,  50,  72,  98, 128, 162])

np.exp(d)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

a100.max()

99

a100.std()

28.883384496973342

a100.cumsum()

array([  -1,    0,    2,    5,    9,   14,   20,   27,   35,   44,   54,
         65,   77,   90,  104,  119,  135,  152,  170,  189,  209,  230,
        252,  275,  299,  324,  350,  377,  405,  434,  464,  495,  527,
        560,  594,  629,  665,  702,  740,  779,  819,  860,  902,  945,
        989, 1034, 1080, 1127, 1175, 1224, 1274, 1325, 1377, 1430, 1484,
       1539, 1595, 1652, 1710, 1769, 1829, 1890, 1952, 2015, 2079, 2144,
       2210, 2277, 2345, 2414, 2484, 2555, 2627, 2700, 2774, 2849, 2925,
       3002, 3080, 3159, 3239, 3320, 3402, 3485, 3569, 3654, 3740, 3827,
       3915, 4004, 4094, 4185, 4277, 4370, 4464, 4559, 4655, 4752, 4850,
       4949])

help(np.arange)

Help on built-in function arange in module numpy:

arange(...)
    arange([start,] stop[, step,], dtype=None)
    
    Return evenly spaced values within a given interval.
    
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range` function, but returns an ndarray rather than a list.
    
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use `numpy.linspace` for these cases.
    
    Parameters
    ----------
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off affects the length of `out`.
    step : number, optional
        Spacing between values.  For any output `out`, this is the distance
        between two adjacent values, ``out[i+1] - out[i]``.  The default
        step size is 1.  If `step` is specified as a position argument,
        `start` must also be given.
    dtype : dtype
        The type of the output array.  If `dtype` is not given, infer the data
        type from the other input arguments.
    
    Returns
    -------
    arange : ndarray
        Array of evenly spaced values.
    
        For floating point arguments, the length of the result is
        ``ceil((stop - start)/step)``.  Because of floating point overflow,
        this rule may result in the last element of `out` being greater
        than `stop`.
    
    See Also
    --------
    linspace : Evenly spaced numbers with careful handling of endpoints.
    ogrid: Arrays of evenly spaced numbers in N-dimensions.
    mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.
    
    Examples
    --------
    >>> np.arange(3)
    array([0, 1, 2])
    >>> np.arange(3.0)
    array([ 0.,  1.,  2.])
    >>> np.arange(3,7)
    array([3, 4, 5, 6])
    >>> np.arange(3,7,2)
    array([3, 5])

from scipy.misc import face

image = face(gray=True)

image

array([[114, 130, 145, ..., 119, 129, 137],
       [ 83, 104, 123, ..., 118, 134, 146],
       [ 68,  88, 109, ..., 119, 134, 145],
       ...,
       [ 98, 103, 116, ..., 144, 143, 143],
       [ 94, 104, 120, ..., 143, 142, 142],
       [ 94, 106, 119, ..., 142, 141, 140]], dtype=uint8)

from matplotlib import pyplot as plt

%matplotlib inline

def imshow(img):
    plt.imshow(img, cmap=plt.cm.gray)
    plt.show()

imshow(image)

negate = 255 - image

imshow(negate)

thumb = image[::3, ::3]

thumb.shape

(256, 342)

image.shape

(768, 1024)

imshow(thumb)

plain = np.zeros_like(thumb)

imshow(plain)

plain[::10, :] = 255
plain[:,::10] = 255

imshow(plain)

plain

array([[255, 255, 255, ..., 255, 255, 255],
       [255,   0,   0, ...,   0, 255,   0],
       [255,   0,   0, ...,   0, 255,   0],
       ...,
       [255,   0,   0, ...,   0, 255,   0],
       [255,   0,   0, ...,   0, 255,   0],
       [255,   0,   0, ...,   0, 255,   0]], dtype=uint8)

plain[:21,:21]

array([[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
        255, 255, 255, 255, 255, 255, 255, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
        255, 255, 255, 255, 255, 255, 255, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 255],
       [255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
        255, 255, 255, 255, 255, 255, 255, 255]], dtype=uint8)

p = np.zeros(100).reshape(10,10)

p[::3, :] = 255
p[:,::3] = 255
imshow(p)

imshow(thumb*0.5 + plain*0.5)

imshow(thumb)

def swapcorners(img):
    imglike = img.copy()
    h, w = img.shape
    q1 = img[:h//2, :w//2].copy()
    q4 = img[h//2:, w//2:].copy()
    
    imglike[:h//2, :w//2] = q4
    imglike[h//2:, w//2:] = q1
    return imglike

imshow(swapcorners(thumb))

5/3

1.6666666666666667

5//3

1

thumb = image[::10, ::10]

hthumb = np.hstack([thumb, thumb, thumb])
vthumb = np.vstack([hthumb, hthumb, hthumb])
imshow(vthumb)

imshow(np.flip(thumb))

np.roll?

imshow(np.roll(image, 300, 0))

Matplotlib¶

Download data from http://notes.pipal.in/2019/arcesium_advanced_sep/HYDERABAD-weather.csv

url = "http://notes.pipal.in/2019/arcesium_advanced_sep/HYDERABAD-weather.csv"
wget(url, "HYDERABAD-weather.csv")

!tail HYDERABAD-weather.csv

589,HYDERABAD,December,1991,28.1,14.9,0.3
590,HYDERABAD,December,1992,27.1,13.8,0.0
591,HYDERABAD,December,1993,27.1,13.2,34.9
592,HYDERABAD,December,1994,27.9,12.0,0.0
593,HYDERABAD,December,1995,28.9,15.9,0.0
594,HYDERABAD,December,1996,28.3,14.9,0.0
595,HYDERABAD,December,1997,28.7,19.2,40.6
596,HYDERABAD,December,1998,28.7,12.8,0.0
597,HYDERABAD,December,1999,29.0,14.2,0.0
598,HYDERABAD,December,2000,29.6,13.3,1.0

import csv

with open("HYDERABAD-weather.csv") as f:
    data = list(csv.reader(f))

type(data)

list

data[0]

['', 'city', 'month', 'year', 'maxtemp', 'mintemp', 'rainfall']

d = data[1:]

d[:3]

[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0'],
 ['2', 'HYDERABAD', 'January', '1953', '28.6', '14.6', '3.5']]

def floatcolumn(data, n):
    return [float(row[n]) for row in data]

maxtemp = floatcolumn(d, 4)

mintemp = floatcolumn(d, 5)

rainfall = floatcolumn(d, 6)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-254-82f2c8fccab4> in <module>
----> 1 rainfall = floatcolumn(d, 6)

<ipython-input-251-44e06c6d288b> in floatcolumn(data, n)
      1 def floatcolumn(data, n):
----> 2     return [float(row[n]) for row in data]

<ipython-input-251-44e06c6d288b> in <listcomp>(.0)
      1 def floatcolumn(data, n):
----> 2     return [float(row[n]) for row in data]

ValueError: could not convert string to float:

def float_(sf):
    try:
        return float(sf)
    except Exception as e:
        print(e)
        return 0
def floatcolumn(data, n):
    return [float_(row[n]) for row in data]

rainfall = floatcolumn(d, 6)

could not convert string to float:

plt.scatter(rainfall, maxtemp)

<matplotlib.collections.PathCollection at 0x7f440b442160>

plt.scatter(rainfall, mintemp)

<matplotlib.collections.PathCollection at 0x7f440b2e3f60>

data[0]

['', 'city', 'month', 'year', 'maxtemp', 'mintemp', 'rainfall']

d[:2]

[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0']]

year = [int(row[3]) for row in d ]

d[:6]

[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0'],
 ['2', 'HYDERABAD', 'January', '1953', '28.6', '14.6', '3.5'],
 ['3', 'HYDERABAD', 'January', '1954', '28.2', '13.9', '0.0'],
 ['4', 'HYDERABAD', 'January', '1955', '28.0', '14.7', '0.0'],
 ['5', 'HYDERABAD', 'January', '1956', '28.1', '14.2', '0.0']]

d[-6:]

[['593', 'HYDERABAD', 'December', '1995', '28.9', '15.9', '0.0'],
 ['594', 'HYDERABAD', 'December', '1996', '28.3', '14.9', '0.0'],
 ['595', 'HYDERABAD', 'December', '1997', '28.7', '19.2', '40.6'],
 ['596', 'HYDERABAD', 'December', '1998', '28.7', '12.8', '0.0'],
 ['597', 'HYDERABAD', 'December', '1999', '29.0', '14.2', '0.0'],
 ['598', 'HYDERABAD', 'December', '2000', '29.6', '13.3', '1.0']]

len(set(year))

50

ra = np.array(rainfall)

sorteddata = sorted(d, key=lambda r:r[3])

rainfall = floatcolumn(sorteddata, 6)

could not convert string to float:

year = [int(row[3]) for row in sorteddata]

plt.plot(year, rainfall)

[<matplotlib.lines.Line2D at 0x7f440b3a0d30>]

import random 
plt.bar(range(12), [random.random() for i in range(12)])

<BarContainer object of 12 artists>

months = np.array([row[2] for row in d])

rainfall = np.array(floatcolumn(d, 6))

could not convert string to float:

rainfall[months=="January"].mean()

13.177999999999997

import datetime

def get_mean_rainfall(rainfall, months, month):
    return rainfall[months==month].mean()

d = datetime.datetime(2019, 9, 26)

d.strftime("%B")

'September'

help(d.strftime)

Help on built-in function strftime:

strftime(...) method of datetime.datetime instance
    format -> strftime() style string.

mnames = [datetime.datetime(2010, i+1, 1).strftime("%B") for i in range(12)]

mnames

['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']

rainfall_ = [get_mean_rainfall(rainfall, months, m) for m in mnames]

rainfall_

[13.177999999999997,
 7.94,
 15.264000000000001,
 20.23469387755102,
 35.714,
 103.75399999999999,
 169.86,
 178.69,
 158.292,
 97.15800000000002,
 21.971999999999998,
 5.912000000000001]

plt.bar(mnames, rainfall_)

<BarContainer object of 12 artists>

plt.bar(range(12), rainfall_)

<BarContainer object of 12 artists>

import altair as alt

%%file sample.txt
area,sales,profit
North,5,2
East,25,8
West,15,6
South,20,5
Central,10,3

Writing sample.txt

import pandas as pd

sample = pd.read_csv("sample.txt")

alt.Chart(sample).mark_point()

<VegaLite 3 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html

alt.renderers.enable('notebook')

RendererRegistry.enable('notebook')

alt.Chart(sample).mark_point()

alt.Chart(sample).mark_point().encode(y="area")

alt.Chart(sample).mark_point().encode(
    y = "area",
    x = "sales"
)

base = alt.Chart(sample).mark_point().encode(
    y = "area",
    x = "sales"
)

base.mark_bar()

base.mark_line()

base.mark_circle()

base = alt.Chart(sample).mark_bar().encode(
    y = "area",
    x = "sales",
    color = "area"
)

base

base.encode(size="profit")

base.encode(size="profit").mark_circle()

print(base.to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.4.0.json",
  "config": {
    "mark": {
      "tooltip": null
    },
    "view": {
      "height": 300,
      "width": 400
    }
  },
  "data": {
    "name": "data-e9a1bf97bac3c6f8642dc2ef7d8e4b49"
  },
  "datasets": {
    "data-e9a1bf97bac3c6f8642dc2ef7d8e4b49": [
      {
        "area": "North",
        "profit": 2,
        "sales": 5
      },
      {
        "area": "East",
        "profit": 8,
        "sales": 25
      },
      {
        "area": "West",
        "profit": 6,
        "sales": 15
      },
      {
        "area": "South",
        "profit": 5,
        "sales": 20
      },
      {
        "area": "Central",
        "profit": 3,
        "sales": 10
      }
    ]
  },
  "encoding": {
    "color": {
      "field": "area",
      "type": "nominal"
    },
    "x": {
      "field": "sales",
      "type": "quantitative"
    },
    "y": {
      "field": "area",
      "type": "nominal"
    }
  },
  "mark": "bar"
}

s = base.encode(size="profit").mark_circle()

s.save("sample.html")

!less sample.html

<!DOCTYPE html>
<html>
<head>
  <style>
    .vega-actions a {
        margin-right: 12px;
        color: #757575;
        font-weight: normal;
        font-size: 13px;
    }
    .error {
        color: red;
    }
  </style>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@5"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@3.4.0"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@4"></script>
</head>
<body>
  <div id="vis"></div>
:ple.html

sales = alt.Chart(sample).mark_bar().encode(
    alt.Y("area"),
    alt.X("sales"),
    alt.Color(value="green")
)
sales

profit = alt.Chart(sample).mark_bar().encode(
    alt.Y("area"),
    alt.X("profit"),
    alt.Color(value="firebrick")
)
profit

sales  + profit

pandas¶

sample

Series¶

area = pd.Series(['North','East','West','South','Central'])

area

0      North
1       East
2       West
3      South
4    Central
dtype: object

sales = pd.Series([5,25,15,20,10], index=area)

sales

North       5
East       25
West       15
South      20
Central    10
dtype: int64

sales['North']

5

sales[0]

5

sales.reindex(index=sorted(area))

Central    10
East       25
North       5
South      20
West       15
dtype: int64

sales

North       5
East       25
West       15
South      20
Central    10
dtype: int64

sales[sales > 10]

East     25
West     15
South    20
dtype: int64

sales

North       5
East       25
West       15
South      20
Central    10
dtype: int64

sales[-1]

10

profit = pd.Series([2, 8, 6, 5, 3])

profit[-1]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-330-ed2cbc3f45cb> in <module>
----> 1 profit[-1]

~/anaconda3/envs/vis/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
   1066         key = com.apply_if_callable(key, self)
   1067         try:
-> 1068             result = self.index.get_value(self, key)
   1069 
   1070             if not is_scalar(result):

~/anaconda3/envs/vis/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4728         k = self._convert_scalar_indexer(k, kind="getitem")
   4729         try:
-> 4730             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4731         except KeyError as e1:
   4732             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: -1

profit.mean()

4.8

profit.std()

2.3874672772626644

df = pd.DataFrame({"sales":[20,23,12,6,25],
                   "profit":[5,2,7,1,8]
                  }, index=['North','East','West','South','Central']
                 )

df

df['sales']

North      20
East       23
West       12
South       6
Central    25
Name: sales, dtype: int64

df.sales

North      20
East       23
West       12
South       6
Central    25
Name: sales, dtype: int64

df.head()

weather = pd.read_csv("HYDERABAD-weather.csv")

weather.head()

df.loc['North']

sales     20
profit     5
Name: North, dtype: int64

df.iloc[3]

sales     6
profit    1
Name: South, dtype: int64

weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 599 entries, 0 to 598
Data columns (total 7 columns):
Unnamed: 0    599 non-null int64
city          599 non-null object
month         599 non-null object
year          599 non-null int64
maxtemp       599 non-null float64
mintemp       599 non-null float64
rainfall      598 non-null float64
dtypes: float64(3), int64(2), object(2)
memory usage: 32.9+ KB

weather.plot("maxtemp", "rainfall", kind="scatter")

<matplotlib.axes._subplots.AxesSubplot at 0x7f4405302668>

weather.groupby('year').mean()

weather.groupby("month").mean()

weather

groupbymonth = weather.groupby('month').mean()

groupbymonth

del groupbymonth['Unnamed: 0']
del groupbymonth['year']

groupbymonth

groupbymonth.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x7f4403a9bf60>

groupbymonth

groupbymonth.index

Index(['April', 'August', 'December', 'February', 'January', 'July', 'June',
       'March', 'May', 'November', 'October', 'September'],
      dtype='object', name='month')

array(['January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December'], dtype='<U9')

mnames

['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']

groupbymonth.reindex(index=mnames).plot()

<matplotlib.axes._subplots.AxesSubplot at 0x7f441d0cb630>

decdata = groupbymonth.loc['December']

decdata

maxtemp     28.004
mintemp     14.526
rainfall     5.912
Name: December, dtype: float64

decdata['maxtemp']

28.003999999999998

dict(decdata)

{'maxtemp': 28.003999999999998,
 'mintemp': 14.525999999999996,
 'rainfall': 5.912000000000001}

groupbymonth.plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x7f4405d722b0>

XML¶

url = "http://www.thehindu.com/"

response = requests.get(url, params={"service":"rss"})

xmltext = response.text

print(xmltext[:1200])

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
    <channel>
        <title>The Hindu - Home</title>
        <link>https://www.thehindu.com/</link>
        <description>Default RSS Feed</description>
        <language>en-us</language>
        <copyright>Copyright 2019 The Hindu</copyright>
        <item>
            <title><![CDATA[PMC has enough liquidity, depositors’ money fully safe, claims suspended MD ]]></title>
            <author><![CDATA[PTI]]></author>
            <category><![CDATA[National]]></category>
            <link>https://www.thehindu.com/news/national/pmc-has-enough-liquidity-depositors-money-fully-safe-claims-suspended-md/article29519311.ece</link>
            <description><![CDATA[
                He states that the present crisis was solely due to large account - HDIL. 
            ]]></description>
            <pubDate><![CDATA[Thu, 26 Sep 2019 16:54:58 +0530]]></pubDate>
        </item>
        <item>
            <title><![CDATA[Students should start social media campaign demanding declaration of climate emergency: Anbumani Ramadoss]]></title>
            <author><![CDATA[Special Correspondent]]></author>
            <catego

from xml.etree import ElementTree as et

root = et.fromstring(xmltext)

items = root.findall(".//item")

len(items)

100

type(items)

list

items[0]

<Element 'item' at 0x7f44034e4098>

print(et.tostring(items[0]).decode())

<item>
            <title>PMC has enough liquidity, depositors&#8217; money fully safe, claims suspended MD </title>
            <author>PTI</author>
            <category>National</category>
            <link>https://www.thehindu.com/news/national/pmc-has-enough-liquidity-depositors-money-fully-safe-claims-suspended-md/article29519311.ece</link>
            <description>
                He states that the present crisis was solely due to large account - HDIL. 
            </description>
            <pubDate>Thu, 26 Sep 2019 16:54:58 +0530</pubDate>
        </item>

for item in items[:10]:
    print(item.findtext("title"))
    print(item.findtext("link"))
    print("-"*30)

PMC has enough liquidity, depositors’ money fully safe, claims suspended MD 
https://www.thehindu.com/news/national/pmc-has-enough-liquidity-depositors-money-fully-safe-claims-suspended-md/article29519311.ece
------------------------------
Students should start social media campaign demanding declaration of climate emergency: Anbumani Ramadoss
https://www.thehindu.com/news/cities/chennai/students-should-start-social-media-campaign-demanding-declaration-of-climate-emergency-anbumani-ramadoss/article29519727.ece
------------------------------
Sivakarthikeyan convinced me to do ‘Namma Veettu Pillai’: Pandiraj
https://www.thehindu.com/entertainment/movies/return-of-the-sibilings/article29519721.ece
------------------------------
Invis Multimedia has been providing digital content for Kerala Tourism for 20 years
https://www.thehindu.com/life-and-style/travel/mr-hari-managing-director-of-invis-multimedia-the-leader-in-creating-digital-content-for-kerala-tourism-for-nearly-20-years-explains-what-makes-them-click-with-tourists/article29519685.ece
------------------------------
On a musical stage with AI 
https://www.thehindu.com/entertainment/music/the-opening-ceremony-of-23rd-world-congress-on-information-technology-wcit-at-republic-square-in-yerevan-armenia-has-a-unique-musical-performance/article29519671.ece
------------------------------
Visakhapatnam’s hidden ecological hotspots
https://www.thehindu.com/life-and-style/travel/visakhapatnams-hidden-ecological-hotspots/article29519572.ece
------------------------------
Muruga in all aspects 
https://www.thehindu.com/entertainment/dance/muruga-in-all-aspects/article29519482.ece
------------------------------
Former French President Jacques Chirac, who stood up to U.S., dies at 86 
https://www.thehindu.com/news/international/ex-french-president-chirac-who-stood-up-to-us-dies-at-86/article29519446.ece
------------------------------
Gold, silver plunges on weak global trend 
https://www.thehindu.com/business/markets/gold-silver-plunges-on-weak-global-trend/article29519436.ece
------------------------------
Raga Pravesam - a journey through raga and tala
https://www.thehindu.com/entertainment/music/raga-pravesam-a-journey-through-raga-and-tala/article29519412.ece
------------------------------

from xml.dom.minidom import parseString

root = parseString(xmltext)

root

<xml.dom.minidom.Document at 0x7f440313ca08>

items = root.getElementsByTagName("item")

type(items)

xml.dom.minicompat.NodeList

for item in items[:10]:
    title = item.getElementsByTagName("title")[0]
    link = item.getElementsByTagName("link")[0]
    print(title.firstChild.data)
    print(link.firstChild.data)
    print("-"*30)

PMC has enough liquidity, depositors’ money fully safe, claims suspended MD 
https://www.thehindu.com/news/national/pmc-has-enough-liquidity-depositors-money-fully-safe-claims-suspended-md/article29519311.ece
------------------------------
Students should start social media campaign demanding declaration of climate emergency: Anbumani Ramadoss
https://www.thehindu.com/news/cities/chennai/students-should-start-social-media-campaign-demanding-declaration-of-climate-emergency-anbumani-ramadoss/article29519727.ece
------------------------------
Sivakarthikeyan convinced me to do ‘Namma Veettu Pillai’: Pandiraj
https://www.thehindu.com/entertainment/movies/return-of-the-sibilings/article29519721.ece
------------------------------
Invis Multimedia has been providing digital content for Kerala Tourism for 20 years
https://www.thehindu.com/life-and-style/travel/mr-hari-managing-director-of-invis-multimedia-the-leader-in-creating-digital-content-for-kerala-tourism-for-nearly-20-years-explains-what-makes-them-click-with-tourists/article29519685.ece
------------------------------
On a musical stage with AI 
https://www.thehindu.com/entertainment/music/the-opening-ceremony-of-23rd-world-congress-on-information-technology-wcit-at-republic-square-in-yerevan-armenia-has-a-unique-musical-performance/article29519671.ece
------------------------------
Visakhapatnam’s hidden ecological hotspots
https://www.thehindu.com/life-and-style/travel/visakhapatnams-hidden-ecological-hotspots/article29519572.ece
------------------------------
Muruga in all aspects 
https://www.thehindu.com/entertainment/dance/muruga-in-all-aspects/article29519482.ece
------------------------------
Former French President Jacques Chirac, who stood up to U.S., dies at 86 
https://www.thehindu.com/news/international/ex-french-president-chirac-who-stood-up-to-us-dies-at-86/article29519446.ece
------------------------------
Gold, silver plunges on weak global trend 
https://www.thehindu.com/business/markets/gold-silver-plunges-on-weak-global-trend/article29519436.ece
------------------------------
Raga Pravesam - a journey through raga and tala
https://www.thehindu.com/entertainment/music/raga-pravesam-a-journey-through-raga-and-tala/article29519412.ece
------------------------------

json¶

import json

decdata = dict(decdata)

decdata

{'maxtemp': 28.003999999999998,
 'mintemp': 14.525999999999996,
 'rainfall': 5.912000000000001}

s = json.dumps(decdata)

s

'{"maxtemp": 28.003999999999998, "mintemp": 14.525999999999996, "rainfall": 5.912000000000001}'

json.loads(s)

{'maxtemp': 28.003999999999998,
 'mintemp': 14.525999999999996,
 'rainfall': 5.912000000000001}

url = "https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=5min&outputsize=full&apikey=demo"

print(url)

https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=5min&outputsize=full&apikey=demo

resp = requests.get(url)
data = resp.json()

pd.DataFrame(data['Time Series (5min)']).transpose()

url = "https://api.github.com/orgs/{}/repos".format("google")

url

'https://api.github.com/orgs/google/repos'

repos = requests.get(url).json()

type(repos)

list

repos[0]

{'id': 1936771,
 'node_id': 'MDEwOlJlcG9zaXRvcnkxOTM2Nzcx',
 'name': 'truth',
 'full_name': 'google/truth',
 'private': False,
 'owner': {'login': 'google',
  'id': 1342004,
  'node_id': 'MDEyOk9yZ2FuaXphdGlvbjEzNDIwMDQ=',
  'avatar_url': 'https://avatars1.githubusercontent.com/u/1342004?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/google',
  'html_url': 'https://github.com/google',
  'followers_url': 'https://api.github.com/users/google/followers',
  'following_url': 'https://api.github.com/users/google/following{/other_user}',
  'gists_url': 'https://api.github.com/users/google/gists{/gist_id}',
  'starred_url': 'https://api.github.com/users/google/starred{/owner}{/repo}',
  'subscriptions_url': 'https://api.github.com/users/google/subscriptions',
  'organizations_url': 'https://api.github.com/users/google/orgs',
  'repos_url': 'https://api.github.com/users/google/repos',
  'events_url': 'https://api.github.com/users/google/events{/privacy}',
  'received_events_url': 'https://api.github.com/users/google/received_events',
  'type': 'Organization',
  'site_admin': False},
 'html_url': 'https://github.com/google/truth',
 'description': 'Fluent assertions for Java and Android',
 'fork': False,
 'url': 'https://api.github.com/repos/google/truth',
 'forks_url': 'https://api.github.com/repos/google/truth/forks',
 'keys_url': 'https://api.github.com/repos/google/truth/keys{/key_id}',
 'collaborators_url': 'https://api.github.com/repos/google/truth/collaborators{/collaborator}',
 'teams_url': 'https://api.github.com/repos/google/truth/teams',
 'hooks_url': 'https://api.github.com/repos/google/truth/hooks',
 'issue_events_url': 'https://api.github.com/repos/google/truth/issues/events{/number}',
 'events_url': 'https://api.github.com/repos/google/truth/events',
 'assignees_url': 'https://api.github.com/repos/google/truth/assignees{/user}',
 'branches_url': 'https://api.github.com/repos/google/truth/branches{/branch}',
 'tags_url': 'https://api.github.com/repos/google/truth/tags',
 'blobs_url': 'https://api.github.com/repos/google/truth/git/blobs{/sha}',
 'git_tags_url': 'https://api.github.com/repos/google/truth/git/tags{/sha}',
 'git_refs_url': 'https://api.github.com/repos/google/truth/git/refs{/sha}',
 'trees_url': 'https://api.github.com/repos/google/truth/git/trees{/sha}',
 'statuses_url': 'https://api.github.com/repos/google/truth/statuses/{sha}',
 'languages_url': 'https://api.github.com/repos/google/truth/languages',
 'stargazers_url': 'https://api.github.com/repos/google/truth/stargazers',
 'contributors_url': 'https://api.github.com/repos/google/truth/contributors',
 'subscribers_url': 'https://api.github.com/repos/google/truth/subscribers',
 'subscription_url': 'https://api.github.com/repos/google/truth/subscription',
 'commits_url': 'https://api.github.com/repos/google/truth/commits{/sha}',
 'git_commits_url': 'https://api.github.com/repos/google/truth/git/commits{/sha}',
 'comments_url': 'https://api.github.com/repos/google/truth/comments{/number}',
 'issue_comment_url': 'https://api.github.com/repos/google/truth/issues/comments{/number}',
 'contents_url': 'https://api.github.com/repos/google/truth/contents/{+path}',
 'compare_url': 'https://api.github.com/repos/google/truth/compare/{base}...{head}',
 'merges_url': 'https://api.github.com/repos/google/truth/merges',
 'archive_url': 'https://api.github.com/repos/google/truth/{archive_format}{/ref}',
 'downloads_url': 'https://api.github.com/repos/google/truth/downloads',
 'issues_url': 'https://api.github.com/repos/google/truth/issues{/number}',
 'pulls_url': 'https://api.github.com/repos/google/truth/pulls{/number}',
 'milestones_url': 'https://api.github.com/repos/google/truth/milestones{/number}',
 'notifications_url': 'https://api.github.com/repos/google/truth/notifications{?since,all,participating}',
 'labels_url': 'https://api.github.com/repos/google/truth/labels{/name}',
 'releases_url': 'https://api.github.com/repos/google/truth/releases{/id}',
 'deployments_url': 'https://api.github.com/repos/google/truth/deployments',
 'created_at': '2011-06-22T18:55:12Z',
 'updated_at': '2019-09-25T19:48:40Z',
 'pushed_at': '2019-09-23T18:35:43Z',
 'git_url': 'git://github.com/google/truth.git',
 'ssh_url': 'git@github.com:google/truth.git',
 'clone_url': 'https://github.com/google/truth.git',
 'svn_url': 'https://github.com/google/truth',
 'homepage': 'https://truth.dev/',
 'size': 29473,
 'stargazers_count': 1892,
 'watchers_count': 1892,
 'language': 'Java',
 'has_issues': True,
 'has_projects': True,
 'has_downloads': True,
 'has_wiki': True,
 'has_pages': True,
 'forks_count': 198,
 'mirror_url': None,
 'archived': False,
 'disabled': False,
 'open_issues_count': 61,
 'license': {'key': 'apache-2.0',
  'name': 'Apache License 2.0',
  'spdx_id': 'Apache-2.0',
  'url': 'https://api.github.com/licenses/apache-2.0',
  'node_id': 'MDc6TGljZW5zZTI='},
 'forks': 198,
 'open_issues': 61,
 'watchers': 1892,
 'default_branch': 'master',
 'permissions': {'admin': False, 'push': False, 'pull': True}}

r = repos[0]

r['forks']

198

r['owner']['id']

1342004

for r in sorted(repos, key=lambda r:r['owner']['id'], reverse=True)[:20]:
    print(r['owner']['id'], r['forks'])

1342004 198
1342004 17
1342004 34
1342004 0
1342004 17
1342004 80
1342004 9
1342004 10
1342004 11
1342004 12
1342004 21
1342004 48
1342004 15
1342004 12
1342004 58
1342004 218
1342004 102
1342004 22
1342004 0
1342004 8

for r in sorted(repos, key=lambda r:r['forks'], reverse=True)[:10]:
    print(r['full_name'], r['forks'])

google/dagger 1710
google/traceur-compiler 603
google/ios-webkit-debug-proxy 379
google/tracing-framework 218
google/truth 198
google/namebench 102
google/googletv-android-samples 80
google/libcxx 58
google/cpp-netlib 58
google/module-server 48

pd.read_json("https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=5min&outputsize=full&apikey=demo")

Flask¶

%%file flask_app.py

from flask import Flask, render_template

app = Flask(__name__)

@app.route("/hello/<name>")
def hellourl(name="Flask"):
    return render_template("hello.html", name=name)

@app.route("/")
def index():
    return "This is index page of flask app"

if __name__ == "__main__":
    app.run()

Overwriting flask_app.py

!python flask_app.py

 * Serving Flask app "flask_app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [26/Sep/2019 18:56:34] "GET /hello/Python HTTP/1.1" 200 -
^C

!mkdir templates

%%file templates/hello.html
<!doctype html>
<title>Hello from Flask</title>
{% if name %}
   <h1>Hello {{ name }} </h1>
{% else %}
   <h1>Hello, World!</h1>
{% endif %}

Writing templates/hello.html

	Unnamed: 0	maxtemp	mintemp	rainfall
year
1951	274.333333	32.666667	20.233333	58.975000
1952	275.333333	31.975000	19.891667	46.741667
1953	276.333333	32.183333	20.266667	74.245455
1954	277.333333	31.525000	19.875000	70.366667
1955	278.333333	30.883333	19.725000	92.775000
1956	279.333333	30.783333	19.791667	64.941667
1957	280.333333	31.533333	20.016667	66.783333
1958	281.333333	31.733333	20.475000	76.216667
1959	282.333333	31.900000	20.358333	64.825000
1960	283.333333	31.841667	20.416667	57.775000
1961	284.333333	31.258333	20.225000	68.400000
1962	296.636364	30.418182	19.490909	107.118182
1963	286.250000	31.133333	19.308333	69.125000
1964	287.250000	32.150000	19.658333	58.400000
1965	288.250000	32.600000	19.541667	67.441667
1966	289.250000	32.666667	20.550000	55.358333
1967	290.250000	32.625000	19.483333	69.383333
1968	291.250000	32.416667	19.200000	53.250000
1969	292.250000	32.575000	20.408333	53.100000
1970	293.250000	32.041667	19.825000	95.566667
1971	294.250000	31.975000	20.008333	55.433333
1972	295.250000	32.633333	21.008333	42.958333
1973	296.250000	32.425000	21.200000	73.183333
1974	297.250000	32.175000	20.016667	56.283333
1975	298.250000	31.266667	20.258333	115.291667
1976	299.250000	32.050000	20.441667	66.075000
1977	300.250000	32.258333	20.775000	45.300000
1978	301.250000	31.575000	21.108333	93.116667
1979	302.250000	32.400000	21.616667	58.650000
1980	303.250000	32.816667	21.425000	49.325000
1981	304.250000	32.108333	20.750000	82.750000
1982	305.250000	32.241667	21.133333	63.891667
1983	306.250000	32.541667	21.000000	110.025000
1984	307.250000	32.483333	21.033333	64.083333
1985	308.250000	32.875000	20.975000	31.116667
1986	309.250000	32.800000	21.441667	51.775000
1987	310.250000	32.433333	21.200000	80.250000
1988	311.250000	32.525000	21.400000	76.458333
1989	312.250000	32.325000	20.825000	83.883333
1990	313.250000	31.541667	20.991667	76.666667
1991	314.250000	32.450000	21.416667	64.200000
1992	315.250000	32.683333	20.650000	63.716667
1993	316.250000	32.733333	20.516667	60.458333
1994	317.250000	32.225000	20.516667	68.325000
1995	318.250000	32.183333	20.916667	101.991667
1996	319.250000	32.633333	20.958333	80.958333
1997	320.250000	32.616667	21.025000	63.750000
1998	321.250000	33.125000	21.683333	78.516667
1999	322.250000	32.608333	20.341667	47.008333
2000	323.250000	32.583333	20.391667	87.066667

	Unnamed: 0	year	maxtemp	mintemp	rainfall
month
April	174.0	1975.77551	37.863265	24.273469	20.234694
August	373.5	1975.50000	29.786000	22.086000	178.690000
December	573.5	1975.50000	28.004000	14.526000	5.912000
February	74.5	1975.50000	31.932000	17.556000	7.940000
January	24.5	1975.50000	28.760000	15.214000	13.178000
July	323.5	1975.50000	30.754000	22.560000	169.860000
June	273.5	1975.50000	34.528000	23.976000	103.754000
March	124.5	1975.50000	35.444000	20.798000	15.264000
May	223.5	1975.50000	38.996000	26.160000	35.714000
November	523.5	1975.50000	29.016000	16.862000	22.420408
October	473.5	1975.50000	30.582000	20.306000	97.158000
September	423.5	1975.50000	30.452000	21.962000	158.292000

	Unnamed: 0	year	maxtemp	mintemp	rainfall
month
April	174.0	1975.77551	37.863265	24.273469	20.234694
August	373.5	1975.50000	29.786000	22.086000	178.690000
December	573.5	1975.50000	28.004000	14.526000	5.912000
February	74.5	1975.50000	31.932000	17.556000	7.940000
January	24.5	1975.50000	28.760000	15.214000	13.178000
July	323.5	1975.50000	30.754000	22.560000	169.860000
June	273.5	1975.50000	34.528000	23.976000	103.754000
March	124.5	1975.50000	35.444000	20.798000	15.264000
May	223.5	1975.50000	38.996000	26.160000	35.714000
November	523.5	1975.50000	29.016000	16.862000	22.420408
October	473.5	1975.50000	30.582000	20.306000	97.158000
September	423.5	1975.50000	30.452000	21.962000	158.292000

	maxtemp	mintemp	rainfall
month
April	37.863265	24.273469	20.234694
August	29.786000	22.086000	178.690000
December	28.004000	14.526000	5.912000
February	31.932000	17.556000	7.940000
January	28.760000	15.214000	13.178000
July	30.754000	22.560000	169.860000
June	34.528000	23.976000	103.754000
March	35.444000	20.798000	15.264000
May	38.996000	26.160000	35.714000
November	29.016000	16.862000	22.420408
October	30.582000	20.306000	97.158000
September	30.452000	21.962000	158.292000

	maxtemp	mintemp	rainfall
month
April	37.863265	24.273469	20.234694
August	29.786000	22.086000	178.690000
December	28.004000	14.526000	5.912000
February	31.932000	17.556000	7.940000
January	28.760000	15.214000	13.178000
July	30.754000	22.560000	169.860000
June	34.528000	23.976000	103.754000
March	35.444000	20.798000	15.264000
May	38.996000	26.160000	35.714000
November	29.016000	16.862000	22.420408
October	30.582000	20.306000	97.158000
September	30.452000	21.962000	158.292000

	Unnamed: 0	city	month	year	maxtemp	mintemp	rainfall
0	0	HYDERABAD	January	1951	29.0	14.8	0.0
1	1	HYDERABAD	January	1952	29.1	13.6	0.0
2	2	HYDERABAD	January	1953	28.6	14.6	3.5
3	3	HYDERABAD	January	1954	28.2	13.9	0.0
4	4	HYDERABAD	January	1955	28.0	14.7	0.0

	1. open	2. high	3. low	4. close	5. volume
2019-09-25 16:00:00	139.6800	139.6900	139.2800	139.3600	1085867
2019-09-25 15:55:00	139.8400	139.8800	139.6900	139.6900	480398
2019-09-25 15:50:00	139.6450	139.8400	139.5500	139.8400	392727
2019-09-25 15:45:00	139.5750	139.6900	139.5400	139.6450	241860
2019-09-25 15:40:00	139.5500	139.6000	139.4950	139.5800	182013
...	...	...	...	...	...
2019-09-05 09:55:00	139.1101	139.3900	139.0860	139.3600	395149
2019-09-05 09:50:00	138.9900	139.2300	138.9100	139.1200	601935
2019-09-05 09:45:00	139.0150	139.2200	138.8700	138.9900	673859
2019-09-05 09:40:00	138.9700	139.1100	138.9100	139.0100	510635
2019-09-05 09:35:00	138.9000	139.0800	138.7900	138.8760	1382341

	Meta Data	Time Series (5min)
1. Information	Intraday (5min) open, high, low, close prices ...	NaN
2. Symbol	MSFT	NaN
3. Last Refreshed	2019-09-25 16:00:00	NaN
4. Interval	5min	NaN
5. Output Size	Full size	NaN
...	...	...
2019-09-05 09:55:00	NaN	{'1. open': '139.1101', '2. high': '139.3900',...
2019-09-05 09:50:00	NaN	{'1. open': '138.9900', '2. high': '139.2300',...
2019-09-05 09:45:00	NaN	{'1. open': '139.0150', '2. high': '139.2200',...
2019-09-05 09:40:00	NaN	{'1. open': '138.9700', '2. high': '139.1100',...
2019-09-05 09:35:00	NaN	{'1. open': '138.9000', '2. high': '139.0800',...