Advanced Python Training at Arcesium - Day 2

Sep 25-27, 2019 Vikrant Patil

These notes are available online at http://notes.pipal.in/2020/arcesium_advanced_feb/day2.html

© Pipal Academy LLP

Day 1 | Day 2 | Day 3

We will be using python 3.7 from anaconda for this training. You can download it from

https://www.anaconda.com/download/

Understanding iterations

In [1]:
for n in [2, 3, 4, 5, 6, 7]:
    print(n)
2
3
4
5
6
7
In [2]:
for c in "This is a string to test for loop":
    print(c, end=",")
T,h,i,s, ,i,s, ,a, ,s,t,r,i,n,g, ,t,o, ,t,e,s,t, ,f,o,r, ,l,o,o,p,
In [3]:
for item in {"a":True, "b":False}:
    print(item)
a
b

The iteration protocol

In [6]:
items = [1, 2, 3, 4, 5]
In [7]:
itr_items =  iter(items)
In [8]:
itr_items
Out[8]:
<list_iterator at 0x7f5783d17a20>
In [9]:
next(itr_items)
Out[9]:
1
In [10]:
next(itr_items)
Out[10]:
2
In [11]:
next(itr_items)
Out[11]:
3
In [12]:
next(itr_items)
Out[12]:
4
In [13]:
next(itr_items)
Out[13]:
5
In [14]:
next(itr_items)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-14-8f3d308df4eb> in <module>
----> 1 next(itr_items)

StopIteration: 

generators

In [15]:
def squares(numbers):
    for n in numbers:
        yield n*n
In [16]:
squares
Out[16]:
<function __main__.squares(numbers)>
In [17]:
s =squares([2,3,4])
In [18]:
s
Out[18]:
<generator object squares at 0x7f5783c88750>
In [19]:
for i in s:
    print(i)
4
9
16
In [20]:
def squares(numbers):
    print("Begin squares")
    for n in numbers:
        print("Computing square of ", n)
        yield n*n
        print("Back to squares")
    print("Finished squares")
In [21]:
sqrs = squares([4,5,6])
In [22]:
sqrs
Out[22]:
<generator object squares at 0x7f5783c887c8>
In [23]:
next(sqrs)
Begin squares
Computing square of  4
Out[23]:
16
In [24]:
next(sqrs)
Back to squares
Computing square of  5
Out[24]:
25
In [25]:
next(sqrs)
Back to squares
Computing square of  6
Out[25]:
36
In [26]:
next(sqrs)
Back to squares
Finished squares
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-26-6a5cc5500491> in <module>
----> 1 next(sqrs)

StopIteration: 

problems

  • Write a generator coutdown which works exactly opposite of range!
  • Can we write a term generator for y'days piseries? How can we use this generator to sum pi?
  • is it possible to write infinite sequence generator? write infinite fib series generator.
  • How will we work with inifinite sequences? can you write a function called take which takes only n items from given sequence.
    >>> ones = infiniteones()
    >>> take(ones, 5)
    [1, 1, 1, 1, 1]
In [27]:
def hold():
    print("Enter ....")
    yield 1
    print("After 1")
    print("Going 2")
    yield 2
    print("After 2")
    print("Going 3")
    yield 3
    print("After 3")
    print("Stopping....")
In [28]:
h = hold()
In [29]:
next(h)
Enter ....
Out[29]:
1
In [30]:
next(h)
After 1
Going 2
Out[30]:
2
In [31]:
next(h)
After 2
Going 3
Out[31]:
3
In [32]:
next(h)
After 3
Stopping....
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-32-31146b9ab14d> in <module>
----> 1 next(h)

StopIteration: 
In [33]:
def foo(x):
    if x:
        return 1
    else:
        yield 0
In [39]:
f = foo(True)
In [40]:
f
Out[40]:
<generator object foo at 0x7f5783c885e8>
In [41]:
next(f)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-41-aff1dd02a623> in <module>
----> 1 next(f)

StopIteration: 1
In [42]:
f = foo(True)
In [43]:
x = next(f)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-43-dc64844b6993> in <module>
----> 1 x = next(f)

StopIteration: 1
In [44]:
print(x)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-44-fc17d851ef81> in <module>
----> 1 print(x)

NameError: name 'x' is not defined
In [45]:
def loop():
    n = 0
    while True:
        yield n
        
        if n == 3:
            return 
        n += 1
In [46]:
l = loop()
In [47]:
next(l)
Out[47]:
0
In [48]:
next(l)
Out[48]:
1
In [49]:
next(l)
Out[49]:
2
In [50]:
next(l)
Out[50]:
3
In [51]:
next(l)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-51-cdc8a39da60d> in <module>
----> 1 next(l)

StopIteration: 
In [52]:
help(max)
Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.

In [53]:
def countdown(n):
    while n > 0:
        yield n 
        n -= 1
In [54]:
for i in countdown(4):
    print(i, end=",")
4,3,2,1,
In [55]:
def piseries():
    n = 1
    while True:
        yield 8/((4*n-3)*(4*n-1))
        n += 1
In [56]:
def take(seq, n):
    return [next(seq) for _ in range(n)]
In [57]:
take(piseries(), 5)
Out[57]:
[2.6666666666666665,
 0.22857142857142856,
 0.08080808080808081,
 0.041025641025641026,
 0.02476780185758514]
In [58]:
sum(take(piseries(), 1000))
Out[58]:
3.141092653621038
In [59]:
def fibseries():
    cur, next_ = 1, 1
    while True:
        yield cur
        cur , next_ = next_, cur+next_
In [60]:
take(fibseries(), 10)
Out[60]:
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
In [61]:
cntd3 = countdown(3)
In [63]:
for i in cntd3:
    print(i, end=",")
3,2,1,
In [64]:
for i in cntd3:
    print(i, end=",")
In [65]:
c3 = countdown(3)
In [66]:
copyc3 = c3
In [67]:
for i in c3:
    print(i, end=",")
3,2,1,
In [68]:
for i in copyc3:
    print(i, end=",")

Building data pipeline using generators

In [73]:
import os

def take(seq, n):
    return [next(seq) for _ in range(n)]

def find(root):
    for path, dirnames, filenames in os.walk(root):
        for f in filenames:
            yield os.path.join(path, f)
            
def grep(pattern, seq):
    return (x for x in seq if pattern in x) # this called generator expression
In [74]:
files = find("/home/vikrant/trainings")
pyfiles = grep(".py", files)
print(take(pyfiles, 5))
['/home/vikrant/trainings/2018/vmware-advanced-apr/bank1.py', '/home/vikrant/trainings/2018/vmware-advanced-apr/bank0.py', '/home/vikrant/trainings/2018/vmware-advanced-apr/commands.py', '/home/vikrant/trainings/2018/vmware-advanced-apr/sockets.py~', '/home/vikrant/trainings/2018/vmware-advanced-apr/memoize.py']
In [78]:
def readlines(filenames):
    for file in filenames:
        with open(file) as f:
                yield from f
                
def count(seq):
    return sum(1 for item in seq) # this is also generator expression
In [81]:
files = find("/home/vikrant/trainings/")
csvfiles = grep(".csv", files)
lines = readlines(csvfiles)
count(lines)
Out[81]:
6813
In [82]:
import re
def grep(pattern, seq):
    p = re.compile(pattern)
    return (x for x in seq if p.match(x)) # this called generator expression
In [88]:
files = find("/home/vikrant/trainings/")
pyfiles = grep(r"[\w\/]+\.py", files)
take(pyfiles, 5)
Out[88]:
['/home/vikrant/trainings/nakul/bank1.py',
 '/home/vikrant/trainings/nakul/bank0.py',
 '/home/vikrant/trainings/nakul/bank2.py',
 '/home/vikrant/trainings/nakul/functions4.py',
 '/home/vikrant/trainings/nakul/functions.py']
In [85]:
import re
In [87]:
pattern = re.compile(r"\w+.py")
pattern.match("/vikrant/trainings/hello.py")
In [89]:
files = find("/home/vikrant/trainings/")
pyfiles = grep(r"[\w\/]+\.py", files)
lines = readlines(pyfiles)
funcs = grep(r"def .*", lines)
count(funcs)
Out[89]:
199
In [90]:
files = find("/home/vikrant/trainings/")
pyfiles = grep(r"[\w\/]+\.py", files)
lines = readlines(pyfiles)
funcs = grep(r"def .*", lines)
In [91]:
next(funcs)
Out[91]:
'def make_account():\n'
In [93]:
count(funcs)
Out[93]:
198

problem

https://ia802902.us.archive.org/4/items/prideandprejudic01342gut/pandp12.txt

  • Write a function get_paragraphs to split text in above text text file into paragraphs. When an empty line comes, thats end of paragraph.
    • How many paragraphs are there?
    • Which is longest paragraph?
In [94]:
import requests

def wget(url, filename):
    resp = requests.get(url)
    with open(filename, "w") as f:
        f.write(resp.text)
In [95]:
novelurl = "https://ia802902.us.archive.org/4/items/prideandprejudic01342gut/pandp12.txt"
wget(novelurl, "pandp.txt")
In [96]:
!tail pandp.txt









In [97]:
!head pandp.txt









In [119]:
def get_paragraphs(lines):
    para = []
    for line in lines:
        if line.strip() !="":
            para.append(line.strip())
        elif para:
            yield "\n".join(para)
            para = []
    if para:
        yield "\n".join(para)
        
def get_paragraphs_(lines):
    para = ""
    for line in lines:
            line = line.strip()
            if line=="":
                if para=="":
                    continue
                else:
                    yield para
                    para = ""
            else:
                para = para + "\n" + line
    if para:
        yield para
                
In [120]:
lines = readlines(["pandp.txt"])
count(get_paragraphs(lines))
Out[120]:
2202
In [121]:
!wc pandp.txt
 14583 123882 717331 pandp.txt
In [123]:
def test_get_paragraphs(func):
    def append_newl(items):
        return [item+"\n" for item in items]
    lines = [""]
    assert count(func(append_newl(lines))) == 0
    lines = ["A","B","","","C"]
    assert count(func(append_newl(lines))) == 2
    lines = ["A","","B","","","C","D"]
    assert count(func(append_newl(lines))) == 3
    assert max(func(append_newl(lines)), key=len)=="C\nD"
    
#test_get_paragraphs(get_paragraphs_)
test_get_paragraphs(get_paragraphs)

numpy

!pyhton -m pip install numpy

In [124]:
import numpy as np
In [125]:
a = np.array([1,2,3,4,5])
In [126]:
a
Out[126]:
array([1, 2, 3, 4, 5])
In [127]:
a.shape
Out[127]:
(5,)
In [128]:
a.ndim
Out[128]:
1
In [129]:
a100 = np.arange(100).reshape(10,10)
In [130]:
a100
Out[130]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
In [131]:
a100.shape
Out[131]:
(10, 10)
In [132]:
a100.ndim
Out[132]:
2
In [133]:
a100.dtype
Out[133]:
dtype('int64')
In [134]:
a100[0]
Out[134]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [135]:
a100[-1]
Out[135]:
array([90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
In [137]:
a100[:,0]
Out[137]:
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
In [138]:
a100[1,:]
Out[138]:
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
In [139]:
a100
Out[139]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

various ways to create arrays

In [140]:
np.zeros(100).reshape(20,5)
Out[140]:
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
In [141]:
z = _
In [143]:
z.dtype
Out[143]:
dtype('float64')
In [145]:
help(np.zeros)
Help on built-in function zeros in module numpy:

zeros(...)
    zeros(shape, dtype=float, order='C')
    
    Return a new array of given shape and type, filled with zeros.
    
    Parameters
    ----------
    shape : int or tuple of ints
        Shape of the new array, e.g., ``(2, 3)`` or ``2``.
    dtype : data-type, optional
        The desired data-type for the array, e.g., `numpy.int8`.  Default is
        `numpy.float64`.
    order : {'C', 'F'}, optional, default: 'C'
        Whether to store multi-dimensional data in row-major
        (C-style) or column-major (Fortran-style) order in
        memory.
    
    Returns
    -------
    out : ndarray
        Array of zeros with the given shape, dtype, and order.
    
    See Also
    --------
    zeros_like : Return an array of zeros with shape and type of input.
    empty : Return a new uninitialized array.
    ones : Return a new array setting values to one.
    full : Return a new array of given shape filled with value.
    
    Examples
    --------
    >>> np.zeros(5)
    array([ 0.,  0.,  0.,  0.,  0.])
    
    >>> np.zeros((5,), dtype=int)
    array([0, 0, 0, 0, 0])
    
    >>> np.zeros((2, 1))
    array([[ 0.],
           [ 0.]])
    
    >>> s = (2,2)
    >>> np.zeros(s)
    array([[ 0.,  0.],
           [ 0.,  0.]])
    
    >>> np.zeros((2,), dtype=[('x', 'i4'), ('y', 'i4')]) # custom dtype
    array([(0, 0), (0, 0)],
          dtype=[('x', '<i4'), ('y', '<i4')])

In [146]:
np.zeros(10, dtype=np.int16)
Out[146]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)
In [147]:
np.zeros_like(a100)
Out[147]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [148]:
np.ones_like(a100)
Out[148]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
In [149]:
np.asarray([1, 2, 3, 4, 5])
Out[149]:
array([1, 2, 3, 4, 5])
In [150]:
np.empty(100).reshape(25,4)
Out[150]:
array([[            nan, 0.00000000e+000, 4.94065646e-324,
        0.00000000e+000],
       [4.44659081e-323, 6.91758644e-310, 4.65489148e-310,
                    nan],
       [0.00000000e+000,             nan, 6.91758594e-310,
        4.94065646e-324],
       [3.55727265e-322, 3.45845952e-323, 6.91758644e-310,
        4.65489148e-310],
       [0.00000000e+000, 4.94065646e-324, 4.94065646e-324,
        6.91758594e-310],
       [4.94065646e-324, 7.11454530e-322, 1.48219694e-323,
        6.91758644e-310],
       [4.65489148e-310, 0.00000000e+000, 4.94065646e-324,
        4.94065646e-324],
       [0.00000000e+000, 4.94065646e-324, 1.06718180e-321,
        4.44659081e-323],
       [6.91758644e-310, 4.65489148e-310, 0.00000000e+000,
        1.97626258e-323],
       [4.94065646e-324, 4.94065646e-323, 6.91749408e-310,
        1.42290906e-321],
       [4.44659081e-323, 6.91758644e-310, 4.65489148e-310,
        3.95252517e-323],
       [1.97626258e-323, 4.94065646e-324, 0.00000000e+000,
        4.94065646e-324],
       [1.77863633e-321, 4.44659081e-323, 6.91758644e-310,
        4.65489148e-310],
       [0.00000000e+000, 2.96439388e-323, 1.48219694e-323,
        0.00000000e+000],
       [4.94065646e-324, 2.13436359e-321, 5.43472210e-323,
        6.91758644e-310],
       [4.65489148e-310, 9.38724727e-323, 2.96439388e-323,
        1.48219694e-323],
       [0.00000000e+000, 4.94065646e-324, 2.49009086e-321,
        4.94065646e-323],
       [6.91758644e-310, 4.65489148e-310, 0.00000000e+000,
        0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 4.94065646e-324,
        0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
        6.91758655e-310],
       [0.00000000e+000, 2.13436359e-321, 9.88131292e-324,
        6.91758655e-310],
       [4.65489169e-310,             nan, 1.28457068e-322,
                    nan],
       [6.91758655e-310, 4.65480997e-310, 3.59679790e-321,
        4.44659081e-323],
       [6.91758655e-310, 4.65489169e-310, 4.94065646e-324,
                    nan]])
In [152]:
np.empty_like(range(10))
Out[152]:
array([140013499548336, 140013499126056, 140013499126224, 140013499126280,
       140013499127960, 140013499127064, 140013499127344, 140013499128072,
       140013499128240, 140013499127400])

Access patterns

In [153]:
a100 = np.arange(100).reshape(10,10)
In [154]:
a100[:5, :5]
Out[154]:
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
In [155]:
a100[5:,5:]
Out[155]:
array([[55, 56, 57, 58, 59],
       [65, 66, 67, 68, 69],
       [75, 76, 77, 78, 79],
       [85, 86, 87, 88, 89],
       [95, 96, 97, 98, 99]])
In [156]:
a100[:5,5:]
Out[156]:
array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19],
       [25, 26, 27, 28, 29],
       [35, 36, 37, 38, 39],
       [45, 46, 47, 48, 49]])
In [157]:
subview = a100[:5, :5]
In [158]:
subview
Out[158]:
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
In [159]:
subview.shape
Out[159]:
(5, 5)
In [160]:
subview[0,0]= -1
In [161]:
subview
Out[161]:
array([[-1,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
In [162]:
type(subview)
Out[162]:
numpy.ndarray
In [163]:
a100
Out[163]:
array([[-1,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
In [164]:
type(a100)
Out[164]:
numpy.ndarray
In [165]:
copy_subview = subview.copy()
In [166]:
copy_subview
Out[166]:
array([[-1,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
In [167]:
copy_subview[0,0] = 0
In [168]:
copy_subview
Out[168]:
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
In [169]:
subview
Out[169]:
array([[-1,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
In [170]:
a100
Out[170]:
array([[-1,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

operations

In [171]:
a = np.array(range(10))
In [172]:
a
Out[172]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [173]:
a > 4
Out[173]:
array([False, False, False, False, False,  True,  True,  True,  True,
        True])
In [174]:
a[a>3]
Out[174]:
array([4, 5, 6, 7, 8, 9])
In [175]:
a + 4
Out[175]:
array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13])
In [176]:
a - 5
Out[176]:
array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])
In [177]:
a * 2
Out[177]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [178]:
a **2
Out[178]:
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
In [180]:
a + a
Out[180]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [181]:
a*a
Out[181]:
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
In [182]:
np.exp(a)
Out[182]:
array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])
In [183]:
a100
Out[183]:
array([[-1,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
In [184]:
a100.max()
Out[184]:
99
In [185]:
a100.min()
Out[185]:
-1
In [186]:
a100.std()
Out[186]:
28.883384496973342
In [187]:
a100.sum()
Out[187]:
4949
In [188]:
a100.cumsum()
Out[188]:
array([  -1,    0,    2,    5,    9,   14,   20,   27,   35,   44,   54,
         65,   77,   90,  104,  119,  135,  152,  170,  189,  209,  230,
        252,  275,  299,  324,  350,  377,  405,  434,  464,  495,  527,
        560,  594,  629,  665,  702,  740,  779,  819,  860,  902,  945,
        989, 1034, 1080, 1127, 1175, 1224, 1274, 1325, 1377, 1430, 1484,
       1539, 1595, 1652, 1710, 1769, 1829, 1890, 1952, 2015, 2079, 2144,
       2210, 2277, 2345, 2414, 2484, 2555, 2627, 2700, 2774, 2849, 2925,
       3002, 3080, 3159, 3239, 3320, 3402, 3485, 3569, 3654, 3740, 3827,
       3915, 4004, 4094, 4185, 4277, 4370, 4464, 4559, 4655, 4752, 4850,
       4949])
In [189]:
from scipy.misc import face
In [190]:
image = face(gray=True)
In [191]:
image
Out[191]:
array([[114, 130, 145, ..., 119, 129, 137],
       [ 83, 104, 123, ..., 118, 134, 146],
       [ 68,  88, 109, ..., 119, 134, 145],
       ...,
       [ 98, 103, 116, ..., 144, 143, 143],
       [ 94, 104, 120, ..., 143, 142, 142],
       [ 94, 106, 119, ..., 142, 141, 140]], dtype=uint8)
In [192]:
from matplotlib import pyplot  as plt
In [193]:
def imshow(img):
    plt.imshow(img, cmap=plt.cm.gray)
    plt.show()
In [194]:
%matplotlib inline
In [195]:
imshow(a100)
In [196]:
imshow(image)
In [197]:
image
Out[197]:
array([[114, 130, 145, ..., 119, 129, 137],
       [ 83, 104, 123, ..., 118, 134, 146],
       [ 68,  88, 109, ..., 119, 134, 145],
       ...,
       [ 98, 103, 116, ..., 144, 143, 143],
       [ 94, 104, 120, ..., 143, 142, 142],
       [ 94, 106, 119, ..., 142, 141, 140]], dtype=uint8)
In [198]:
negate = 255 - image
In [199]:
imshow(negate)
In [200]:
thumbnail = image[::3,::3]
In [201]:
imshow(thumbnail)
In [202]:
imshow(image[::5,::5])
In [203]:
imshow(image[::20,::20])
In [204]:
plain = np.zeros_like(thumbnail)
In [205]:
imshow(plain)
In [206]:
plain[::10,:] = 255
plain[:, ::10] = 255
In [207]:
imshow(plain)
In [209]:
plain[:12,:12]
Out[209]:
array([[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0],
       [255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255,   0]],
      dtype=uint8)
In [210]:
small = np.zeros(100).reshape(10,10)
In [211]:
small[::3,:] = 255
small[:,::3] = 255
imshow(small)
In [212]:
imshow(thumbnail + plain)
In [220]:
imshow(0.75*thumbnail + 0.25*plain)
In [219]:
imshow((0.5*thumbnail + 0.5*plain)*2)
In [217]:
imshow(a100+200)
In [221]:
imshow(np.maximum(thumbnail, plain))

problem

  • SWap topleft and bottm right corner of rhis image
In [224]:
def swapcorners(img):
    imglike = img.copy()
    h, w = img.shape
    q1 = img[:h//2, :w//2].copy()
    q2 = img[h//2:,w//2:].copy()
    
    imglike[:h//2, :w//2] = q2
    imglike[h//2:, w//2:] = q1
    
    return imglike
In [225]:
imshow(swapcorners(thumbnail))
In [226]:
thumb = image[::10, ::10]
In [227]:
hthumb = np.hstack([thumb, thumb, thumb])
vthump = np.vstack([hthumb, hthumb, hthumb])
imshow(vthump)
In [228]:
imshow(np.flip(thumb))
In [229]:
url = "https://notes.pipal.in/2020/arcesium_advanced_feb/HYDERABAD-weather.csv"
wget(url, "HYDERABAD-weather.csv")
In [230]:
!tail -n 5 HYDERABAD-weather.csv
594,HYDERABAD,December,1996,28.3,14.9,0.0
595,HYDERABAD,December,1997,28.7,19.2,40.6
596,HYDERABAD,December,1998,28.7,12.8,0.0
597,HYDERABAD,December,1999,29.0,14.2,0.0
598,HYDERABAD,December,2000,29.6,13.3,1.0
In [231]:
import csv
In [232]:
with open("HYDERABAD-weather.csv") as f:
    data = list(csv.reader(f))
In [233]:
type(data)
Out[233]:
list
In [234]:
data[0]
Out[234]:
['', 'city', 'month', 'year', 'maxtemp', 'mintemp', 'rainfall']
In [235]:
numeric_data = data[1:]
In [236]:
data[0]
Out[236]:
['', 'city', 'month', 'year', 'maxtemp', 'mintemp', 'rainfall']
In [237]:
numeric_data[:3]
Out[237]:
[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0'],
 ['2', 'HYDERABAD', 'January', '1953', '28.6', '14.6', '3.5']]
In [238]:
def floatcolumn(matrix, colnum):
    return [float(row[colnum]) for row in matrix]
In [239]:
maxtemp = floatcolumn(numeric_data, 4)
In [240]:
mintemp = floatcolumn(numeric_data, 5)
In [241]:
rainfall = floatcolumn(numeric_data, 6)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-241-ed8819080945> in <module>
----> 1 rainfall = floatcolumn(numeric_data, 6)

<ipython-input-238-435c5f6a0ecd> in floatcolumn(matrix, colnum)
      1 def floatcolumn(matrix, colnum):
----> 2     return [float(row[colnum]) for row in matrix]

<ipython-input-238-435c5f6a0ecd> in <listcomp>(.0)
      1 def floatcolumn(matrix, colnum):
----> 2     return [float(row[colnum]) for row in matrix]

ValueError: could not convert string to float: 
In [242]:
def parsefloat(sf):
    try:
        return float(sf)
    except ValueError as v:
        print(v)
        return 0

def floatcolumn(matrix, colnum):
    return [parsefloat(row[colnum]) for row in matrix]
In [243]:
rainfall = floatcolumn(numeric_data, 6)
could not convert string to float: 
In [244]:
plt.scatter(rainfall, maxtemp)
Out[244]:
<matplotlib.collections.PathCollection at 0x7f575da2bf98>
In [245]:
plt.scatter(rainfall, mintemp)
Out[245]:
<matplotlib.collections.PathCollection at 0x7f575d8e6978>
In [248]:
year = [int(row[3]) for row in numeric_data]
In [249]:
numeric_data[:6]
Out[249]:
[['0', 'HYDERABAD', 'January', '1951', '29.0', '14.8', '0.0'],
 ['1', 'HYDERABAD', 'January', '1952', '29.1', '13.6', '0.0'],
 ['2', 'HYDERABAD', 'January', '1953', '28.6', '14.6', '3.5'],
 ['3', 'HYDERABAD', 'January', '1954', '28.2', '13.9', '0.0'],
 ['4', 'HYDERABAD', 'January', '1955', '28.0', '14.7', '0.0'],
 ['5', 'HYDERABAD', 'January', '1956', '28.1', '14.2', '0.0']]
In [250]:
numeric_data[-6:]
Out[250]:
[['593', 'HYDERABAD', 'December', '1995', '28.9', '15.9', '0.0'],
 ['594', 'HYDERABAD', 'December', '1996', '28.3', '14.9', '0.0'],
 ['595', 'HYDERABAD', 'December', '1997', '28.7', '19.2', '40.6'],
 ['596', 'HYDERABAD', 'December', '1998', '28.7', '12.8', '0.0'],
 ['597', 'HYDERABAD', 'December', '1999', '29.0', '14.2', '0.0'],
 ['598', 'HYDERABAD', 'December', '2000', '29.6', '13.3', '1.0']]
In [252]:
plt.plot(year, rainfall)
Out[252]:
[<matplotlib.lines.Line2D at 0x7f575d8b8b70>]
In [253]:
sorted_data = sorted(numeric_data, key= lambda r:r[3])
In [254]:
year = [int(row[3]) for row in sorted_data]
In [255]:
rainfall = floatcolumn(sorted_data, 6)
could not convert string to float: 
In [256]:
plt.plot(year, rainfall)
Out[256]:
[<matplotlib.lines.Line2D at 0x7f575d7f68d0>]
In [258]:
a100.mean()
Out[258]:
49.49
In [272]:
months = np.array([row[2] for row in numeric_data])
In [273]:
months
Out[273]:
array(['January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'January', 'January', 'January', 'January',
       'January', 'January', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'February', 'February', 'February',
       'February', 'February', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'March', 'March', 'March',
       'March', 'March', 'March', 'March', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'April', 'April', 'April',
       'April', 'April', 'April', 'April', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May', 'May',
       'May', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'June', 'June', 'June', 'June', 'June',
       'June', 'June', 'June', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'July', 'July', 'July',
       'July', 'July', 'July', 'July', 'July', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'August', 'August', 'August', 'August', 'August', 'August',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'September', 'September', 'September', 'September', 'September',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'October', 'October', 'October', 'October',
       'October', 'October', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'November', 'November', 'November',
       'November', 'November', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December', 'December', 'December', 'December',
       'December', 'December'], dtype='<U9')
In [274]:
rainfall = np.array(floatcolumn(numeric_data, 6))
could not convert string to float: 
In [275]:
rainfall[:5]
Out[275]:
array([0. , 0. , 3.5, 0. , 0. ])
In [276]:
a = np.array(range(5))
In [277]:
b = np.array(['a','b','a','b','c'])
In [278]:
b=="a"
Out[278]:
array([ True, False,  True, False, False])
In [279]:
a[b=="a"]
Out[279]:
array([0, 2])
In [280]:
rainfall[months=="March"].mean()
Out[280]:
15.264000000000001
In [271]:
months=="March"
Out[271]:
False
In [281]:
def get_mean_rainfall(rainfall, months, month):
    return rainfall[months==month].mean()
In [282]:
import datetime
In [283]:
set(months)
Out[283]:
{'April',
 'August',
 'December',
 'February',
 'January',
 'July',
 'June',
 'March',
 'May',
 'November',
 'October',
 'September'}
In [285]:
uniqmonths = list(set(months))
rainfall_ = [get_mean_rainfall(rainfall, months, month) for month in uniqmonths]
In [286]:
plt.bar(uniqmonths, rainfall_)
Out[286]:
<BarContainer object of 12 artists>
In [287]:
import altair as alt
import pandas as pd
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-287-364103eee631> in <module>
----> 1 import altair as alt
      2 import pandas as pd

ModuleNotFoundError: No module named 'altair'
In [288]:
!python -m pip install altair
Collecting altair
  Downloading https://files.pythonhosted.org/packages/a8/07/d8acf03571db619ff117df5730dd5c0b1ad0822aa02ad1084d73e2659442/altair-4.0.1-py3-none-any.whl (708kB)
     |████████████████████████████████| 716kB 579kB/s eta 0:00:01
Requirement already satisfied: toolz in /home/vikrant/anaconda3/lib/python3.7/site-packages (from altair) (0.10.0)
Requirement already satisfied: pandas in /home/vikrant/anaconda3/lib/python3.7/site-packages (from altair) (0.24.2)
Requirement already satisfied: jinja2 in /home/vikrant/anaconda3/lib/python3.7/site-packages (from altair) (2.10.1)
Requirement already satisfied: entrypoints in /home/vikrant/anaconda3/lib/python3.7/site-packages (from altair) (0.3)
Requirement already satisfied: jsonschema in /home/vikrant/anaconda3/lib/python3.7/site-packages (from altair) (3.0.1)
Requirement already satisfied: numpy in /home/vikrant/anaconda3/lib/python3.7/site-packages (from altair) (1.16.4)
Requirement already satisfied: python-dateutil>=2.5.0 in /home/vikrant/anaconda3/lib/python3.7/site-packages (from pandas->altair) (2.8.0)
Requirement already satisfied: pytz>=2011k in /home/vikrant/anaconda3/lib/python3.7/site-packages (from pandas->altair) (2019.1)
Requirement already satisfied: MarkupSafe>=0.23 in /home/vikrant/anaconda3/lib/python3.7/site-packages (from jinja2->altair) (1.1.1)
Requirement already satisfied: attrs>=17.4.0 in /home/vikrant/anaconda3/lib/python3.7/site-packages (from jsonschema->altair) (19.1.0)
Requirement already satisfied: pyrsistent>=0.14.0 in /home/vikrant/anaconda3/lib/python3.7/site-packages (from jsonschema->altair) (0.14.11)
Requirement already satisfied: setuptools in /home/vikrant/anaconda3/lib/python3.7/site-packages (from jsonschema->altair) (41.0.1)
Requirement already satisfied: six>=1.11.0 in /home/vikrant/anaconda3/lib/python3.7/site-packages (from jsonschema->altair) (1.12.0)
Installing collected packages: altair
Successfully installed altair-4.0.1
In [289]:
import altair as alt
import pandas as pd
In [290]:
%%file sales.txt
area,sales,profit
North,5,2
East,25,8
West,15,6
South,20,5
Central,10,3
Writing sales.txt
In [291]:
sales = pd.read_csv("sales.txt")
In [292]:
sales
Out[292]:
area sales profit
0 North 5 2
1 East 25 8
2 West 15 6
3 South 20 5
4 Central 10 3
In [293]:
alt.Chart(sales).mark_point()
Out[293]:
In [295]:
alt.Chart(sales).mark_point().encode(y="area")
Out[295]:
In [296]:
alt.Chart(sales).mark_point().encode(
    x="sales", 
    y="area")
Out[296]:
In [297]:
alt.Chart(sales).mark_bar().encode(
    x="sales", 
    y="area")
Out[297]:
In [298]:
alt.Chart(sales).mark_line().encode(
    x="sales", 
    y="area")
Out[298]:
In [299]:
base = alt.Chart(sales).mark_bar().encode(
        x="sales", 
        y="area")
In [300]:
base.mark_circle()
Out[300]:
In [301]:
base.encode(color="area")
Out[301]:
In [302]:
base.encode(color="area", size="profit")
Out[302]:
In [303]:
base.encode(color="area", size="profit").mark_circle()
Out[303]:
In [305]:
base.to_json()
Out[305]:
'{\n  "$schema": "https://vega.github.io/schema/vega-lite/v4.0.2.json",\n  "config": {\n    "view": {\n      "continuousHeight": 300,\n      "continuousWidth": 400\n    }\n  },\n  "data": {\n    "name": "data-e9a1bf97bac3c6f8642dc2ef7d8e4b49"\n  },\n  "datasets": {\n    "data-e9a1bf97bac3c6f8642dc2ef7d8e4b49": [\n      {\n        "area": "North",\n        "profit": 2,\n        "sales": 5\n      },\n      {\n        "area": "East",\n        "profit": 8,\n        "sales": 25\n      },\n      {\n        "area": "West",\n        "profit": 6,\n        "sales": 15\n      },\n      {\n        "area": "South",\n        "profit": 5,\n        "sales": 20\n      },\n      {\n        "area": "Central",\n        "profit": 3,\n        "sales": 10\n      }\n    ]\n  },\n  "encoding": {\n    "x": {\n      "field": "sales",\n      "type": "quantitative"\n    },\n    "y": {\n      "field": "area",\n      "type": "nominal"\n    }\n  },\n  "mark": "bar"\n}'

JSON

In [306]:
import json
In [307]:
d = {"a":28.565,
     "b": 30,
    "c" : [1,2,3]}
In [308]:
d
Out[308]:
{'a': 28.565, 'b': 30, 'c': [1, 2, 3]}
In [309]:
json.dumps(d)
Out[309]:
'{"a": 28.565, "b": 30, "c": [1, 2, 3]}'
In [310]:
jsondata = '{"a": 28.565, "b": 30, "c": [1, 2, 3]}'
In [311]:
json.loads(jsondata)
Out[311]:
{'a': 28.565, 'b': 30, 'c': [1, 2, 3]}
In [312]:
url = "https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=5min&outputsize=full&apikey=demo"
In [313]:
resp = requests.get(url)
data = resp.json()
In [314]:
type(data)
Out[314]:
dict
In [315]:
data.keys()
Out[315]:
dict_keys(['Meta Data', 'Time Series (5min)'])
In [316]:
data['Meta Data']
Out[316]:
{'1. Information': 'Intraday (5min) open, high, low, close prices and volume',
 '2. Symbol': 'MSFT',
 '3. Last Refreshed': '2020-02-14 16:00:00',
 '4. Interval': '5min',
 '5. Output Size': 'Full size',
 '6. Time Zone': 'US/Eastern'}
In [317]:
pd.DataFrame(data['Time Series (5min)'])
Out[317]:
2020-02-14 16:00:00 2020-02-14 15:55:00 2020-02-14 15:50:00 2020-02-14 15:45:00 2020-02-14 15:40:00 2020-02-14 15:35:00 2020-02-14 15:30:00 2020-02-14 15:25:00 2020-02-14 15:20:00 2020-02-14 15:15:00 ... 2020-01-27 10:20:00 2020-01-27 10:15:00 2020-01-27 10:10:00 2020-01-27 10:05:00 2020-01-27 10:00:00 2020-01-27 09:55:00 2020-01-27 09:50:00 2020-01-27 09:45:00 2020-01-27 09:40:00 2020-01-27 09:35:00
1. open 185.1200 184.9000 184.7100 184.7200 184.6000 184.5100 184.6500 184.4150 184.4900 184.6303 ... 162.0900 162.3843 161.7700 162.0400 162.1000 161.8400 162.2300 161.9100 161.4525 160.3600
2. high 185.4200 185.1950 184.7950 184.7500 184.7300 184.6500 184.6500 184.6700 184.5500 184.7000 ... 162.3300 162.6074 162.3825 162.1100 162.1100 162.1900 162.2592 162.3200 162.1185 161.6600
3. low 185.0500 184.7950 184.7050 184.6750 184.5900 184.4496 184.4900 184.3800 184.3300 184.4200 ... 161.9501 162.0969 161.7450 161.7350 161.6300 161.8400 161.7900 161.9100 161.4200 160.2100
4. close 185.3400 185.1300 184.7950 184.7050 184.7300 184.5900 184.5100 184.6598 184.4300 184.4900 ... 162.0000 162.1143 162.3700 161.7655 162.0400 162.1600 161.8100 162.2700 161.9250 161.4910
5. volume 1362354 856529 347054 184177 223800 166058 135481 161852 179103 256761 ... 406067 342023 591544 409539 529315 408370 565614 524571 776577 3311154

5 rows × 1169 columns

In [318]:
pd.DataFrame(data['Time Series (5min)']).transpose()
Out[318]:
1. open 2. high 3. low 4. close 5. volume
2020-02-14 16:00:00 185.1200 185.4200 185.0500 185.3400 1362354
2020-02-14 15:55:00 184.9000 185.1950 184.7950 185.1300 856529
2020-02-14 15:50:00 184.7100 184.7950 184.7050 184.7950 347054
2020-02-14 15:45:00 184.7200 184.7500 184.6750 184.7050 184177
2020-02-14 15:40:00 184.6000 184.7300 184.5900 184.7300 223800
2020-02-14 15:35:00 184.5100 184.6500 184.4496 184.5900 166058
2020-02-14 15:30:00 184.6500 184.6500 184.4900 184.5100 135481
2020-02-14 15:25:00 184.4150 184.6700 184.3800 184.6598 161852
2020-02-14 15:20:00 184.4900 184.5500 184.3300 184.4300 179103
2020-02-14 15:15:00 184.6303 184.7000 184.4200 184.4900 256761
2020-02-14 15:10:00 184.6300 184.6400 184.5900 184.6400 135195
2020-02-14 15:05:00 184.7550 184.8050 184.5800 184.6200 167408
2020-02-14 15:00:00 184.7400 184.7850 184.7050 184.7500 152411
2020-02-14 14:55:00 184.6900 184.7500 184.6900 184.7451 106061
2020-02-14 14:50:00 184.7700 184.7900 184.6250 184.7050 164644
2020-02-14 14:45:00 184.7564 184.7750 184.6400 184.7750 125630
2020-02-14 14:40:00 184.7800 184.8100 184.7500 184.7500 196640
2020-02-14 14:35:00 184.5200 184.7900 184.5200 184.7900 203463
2020-02-14 14:30:00 184.3900 184.5851 184.3800 184.5200 136376
2020-02-14 14:25:00 184.1400 184.3954 184.1350 184.3818 134295
2020-02-14 14:20:00 184.3550 184.3550 184.0700 184.1400 194770
2020-02-14 14:15:00 184.5800 184.6100 184.3501 184.3569 148356
2020-02-14 14:10:00 184.4600 184.5900 184.4400 184.5800 157224
2020-02-14 14:05:00 184.2380 184.5300 184.1950 184.4546 168193
2020-02-14 14:00:00 184.3500 184.3800 184.1900 184.2300 124547
2020-02-14 13:55:00 184.3400 184.4800 184.2700 184.3500 123391
2020-02-14 13:50:00 184.5750 184.5779 184.2900 184.3350 179815
2020-02-14 13:45:00 184.5350 184.6000 184.4800 184.5800 114433
2020-02-14 13:40:00 184.4400 184.5600 184.4200 184.5350 175102
2020-02-14 13:35:00 184.2101 184.4700 184.1900 184.4500 124787
... ... ... ... ... ...
2020-01-27 12:00:00 162.9200 163.0500 162.8300 162.8800 212821
2020-01-27 11:55:00 162.8550 162.9750 162.7800 162.9200 229113
2020-01-27 11:50:00 163.0571 163.0571 162.8500 162.8550 244961
2020-01-27 11:45:00 162.7541 163.0750 162.7000 163.0500 418908
2020-01-27 11:40:00 162.5850 162.9700 162.5800 162.7550 312796
2020-01-27 11:35:00 162.6259 162.9012 162.5509 162.5959 221288
2020-01-27 11:30:00 162.3700 162.7500 162.2800 162.6150 261139
2020-01-27 11:25:00 162.2500 162.6000 162.2000 162.3900 251578
2020-01-27 11:20:00 162.3000 162.4200 162.2400 162.2500 213573
2020-01-27 11:15:00 162.3200 162.4852 162.2500 162.3000 177814
2020-01-27 11:10:00 162.4900 162.5150 162.2400 162.3600 237192
2020-01-27 11:05:00 162.2694 162.5250 162.1600 162.4900 213333
2020-01-27 11:00:00 161.9100 162.2800 161.8800 162.2700 282084
2020-01-27 10:55:00 162.1750 162.2050 161.9100 161.9200 247940
2020-01-27 10:50:00 162.1400 162.2450 162.1000 162.1750 257260
2020-01-27 10:45:00 162.3300 162.3300 162.0995 162.1400 244488
2020-01-27 10:40:00 162.1500 162.4300 162.1200 162.3200 203881
2020-01-27 10:35:00 162.6300 162.6600 162.1100 162.1594 316466
2020-01-27 10:30:00 162.4565 162.8659 162.4255 162.6415 472640
2020-01-27 10:25:00 161.9950 162.4600 161.9950 162.4400 358133
2020-01-27 10:20:00 162.0900 162.3300 161.9501 162.0000 406067
2020-01-27 10:15:00 162.3843 162.6074 162.0969 162.1143 342023
2020-01-27 10:10:00 161.7700 162.3825 161.7450 162.3700 591544
2020-01-27 10:05:00 162.0400 162.1100 161.7350 161.7655 409539
2020-01-27 10:00:00 162.1000 162.1100 161.6300 162.0400 529315
2020-01-27 09:55:00 161.8400 162.1900 161.8400 162.1600 408370
2020-01-27 09:50:00 162.2300 162.2592 161.7900 161.8100 565614
2020-01-27 09:45:00 161.9100 162.3200 161.9100 162.2700 524571
2020-01-27 09:40:00 161.4525 162.1185 161.4200 161.9250 776577
2020-01-27 09:35:00 160.3600 161.6600 160.2100 161.4910 3311154

1169 rows × 5 columns

XML

In [319]:
url = "http://www.thehindu.com"
In [320]:
resp = requests.get(url, params={"service":"rss"})
In [321]:
xmltext = resp.text
In [323]:
print(xmltext[:1300])
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
    <channel>
        <title>The Hindu - Home</title>
        <link>https://www.thehindu.com/</link>
        <description>Default RSS Feed</description>
        <language>en-us</language>
        <copyright>Copyright 2020 The Hindu</copyright>
        <item>
            <title><![CDATA[26 teams appointed to conduct eye tests for 2.6 lakh people: Collector]]></title>
            <author><![CDATA[Staff Reporter]]></author>
            <category><![CDATA[Andhra Pradesh]]></category>
            <link>https://www.thehindu.com/news/national/andhra-pradesh/26-teams-appointed-to-conduct-eye-tests-for-26-lakh-people-collector/article30852050.ece</link>
            <description><![CDATA[
                ‘Initially, the teams will complete medical tests in nine selected mandals’
            ]]></description>
            <pubDate><![CDATA[Tue, 18 Feb 2020 18:06:58 +0530]]></pubDate>
        </item>
        <item>
            <title><![CDATA[Envoy Sun Weidong says China will win battle against coronavirus]]></title>
            <author><![CDATA[PTI]]></author>
            <category><![CDATA[International]]></category>
            <link>https://www.thehindu.com/news/international/envoy-sun-weidong-says-china-will
In [324]:
from xml.etree import ElementTree as et
In [325]:
root = et.fromstring(xmltext)
In [326]:
items = root.findall(".//item")
In [327]:
type(items)
Out[327]:
list
In [328]:
len(items)
Out[328]:
100
In [329]:
items[0]
Out[329]:
<Element 'item' at 0x7f574fd92958>
In [331]:
print(et.tostring(items[0]).decode())
<item>
            <title>26 teams appointed to conduct eye tests for 2.6 lakh people: Collector</title>
            <author>Staff Reporter</author>
            <category>Andhra Pradesh</category>
            <link>https://www.thehindu.com/news/national/andhra-pradesh/26-teams-appointed-to-conduct-eye-tests-for-26-lakh-people-collector/article30852050.ece</link>
            <description>
                &#8216;Initially, the teams will complete medical tests in nine selected mandals&#8217;
            </description>
            <pubDate>Tue, 18 Feb 2020 18:06:58 +0530</pubDate>
        </item>
        
In [332]:
for item in items[:10]:
    print(item.findtext("title"))
    print(item.findtext("link"))
    print("*"*30)
26 teams appointed to conduct eye tests for 2.6 lakh people: Collector
https://www.thehindu.com/news/national/andhra-pradesh/26-teams-appointed-to-conduct-eye-tests-for-26-lakh-people-collector/article30852050.ece
******************************
Envoy Sun Weidong says China will win battle against coronavirus
https://www.thehindu.com/news/international/envoy-sun-weidong-says-china-will-win-battle-against-coronavirus/article30852046.ece
******************************
Health officers inspect roadside eateries 
https://www.thehindu.com/news/national/karnataka/health-officers-inspect-roadside-eateries/article30852028.ece
******************************
Setting up Water Front Management Authority for Yamuna not possible, DDA tells NGT 
https://www.thehindu.com/news/national/setting-up-water-front-management-authority-for-yamuna-not-possible-dda-tells-ngt/article30852025.ece
******************************
Adventures await
https://www.thehindu.com/life-and-style/motoring/adventures-await/article30852018.ece
******************************
In the Ennore-Pulicat wetlands, livelihoods depend heavily on the area’s biodiversity 
https://www.thehindu.com/news/cities/chennai/in-the-ennore-pulicat-wetlands-livelihoods-depend-heavily-on-the-areas-biodiversity/article30851988.ece
******************************
RTC will operate 120 special buses to Siva temples for Maha Sivaratri: official
https://www.thehindu.com/news/national/andhra-pradesh/rtc-will-operate-120-special-buses-to-siva-temples-for-maha-sivaratri-official/article30851938.ece
******************************
Cosmic Ray, Caracas, Star Superior, Amalfi Sunrise, Code Of Honour and Adjudicate excel 
https://www.thehindu.com/sport/races/cosmic-ray-caracas-star-superior-amalfi-sunrise-code-of-honour-and-adjudicate-excel/article30851879.ece
******************************
Water supply snapped to tax evaders’ buildings, houses
https://www.thehindu.com/news/cities/Madurai/water-supply-snapped-to-tax-evaders-buildings-houses/article30851865.ece
******************************
TDB decries delay in payment of wages to workers 
https://www.thehindu.com/news/national/kerala/tdb-decries-delay-in-payment-of-wages-to-workers/article30851852.ece
******************************
In [333]:
from xml.dom.minidom import parseString
In [334]:
root = parseString(xmltext)
items = root.getElementsByTagName("item")
In [336]:
for item in items[:10]:
    title = item.getElementsByTagName("title")[0]
    link = item.getElementsByTagName("link")[0]
    print(title.firstChild.data)
    print(link.firstChild.data)
    print("*"*30)
26 teams appointed to conduct eye tests for 2.6 lakh people: Collector
https://www.thehindu.com/news/national/andhra-pradesh/26-teams-appointed-to-conduct-eye-tests-for-26-lakh-people-collector/article30852050.ece
******************************
Envoy Sun Weidong says China will win battle against coronavirus
https://www.thehindu.com/news/international/envoy-sun-weidong-says-china-will-win-battle-against-coronavirus/article30852046.ece
******************************
Health officers inspect roadside eateries 
https://www.thehindu.com/news/national/karnataka/health-officers-inspect-roadside-eateries/article30852028.ece
******************************
Setting up Water Front Management Authority for Yamuna not possible, DDA tells NGT 
https://www.thehindu.com/news/national/setting-up-water-front-management-authority-for-yamuna-not-possible-dda-tells-ngt/article30852025.ece
******************************
Adventures await
https://www.thehindu.com/life-and-style/motoring/adventures-await/article30852018.ece
******************************
In the Ennore-Pulicat wetlands, livelihoods depend heavily on the area’s biodiversity 
https://www.thehindu.com/news/cities/chennai/in-the-ennore-pulicat-wetlands-livelihoods-depend-heavily-on-the-areas-biodiversity/article30851988.ece
******************************
RTC will operate 120 special buses to Siva temples for Maha Sivaratri: official
https://www.thehindu.com/news/national/andhra-pradesh/rtc-will-operate-120-special-buses-to-siva-temples-for-maha-sivaratri-official/article30851938.ece
******************************
Cosmic Ray, Caracas, Star Superior, Amalfi Sunrise, Code Of Honour and Adjudicate excel 
https://www.thehindu.com/sport/races/cosmic-ray-caracas-star-superior-amalfi-sunrise-code-of-honour-and-adjudicate-excel/article30851879.ece
******************************
Water supply snapped to tax evaders’ buildings, houses
https://www.thehindu.com/news/cities/Madurai/water-supply-snapped-to-tax-evaders-buildings-houses/article30851865.ece
******************************
TDB decries delay in payment of wages to workers 
https://www.thehindu.com/news/national/kerala/tdb-decries-delay-in-payment-of-wages-to-workers/article30851852.ece
******************************

Database

In [337]:
import sqlite3
In [338]:
conn = sqlite3.connect("data.db")
In [339]:
cur = conn.cursor()
In [340]:
cur.execute("create table person (name varchar(100), email varchar(100))")
Out[340]:
<sqlite3.Cursor at 0x7f57505f3a40>
In [341]:
cur.execute("insert into person (name, email) values('alice', 'alice@wonder.land')")
Out[341]:
<sqlite3.Cursor at 0x7f57505f3a40>
In [342]:
cur = cur.execute("select * from person")
In [343]:
cur.fetchall()
Out[343]:
[('alice', 'alice@wonder.land')]
In [352]:
def find(conn , email):
    q = "select * from person where email='{}'".format(email)
    print(q)
    cur = conn.cursor()
    return cur.execute(q).fetchall()
In [353]:
find(conn, "alice@wonder.land")
select * from person where email='alice@wonder.land'
Out[353]:
[('alice', 'alice@wonder.land')]
In [350]:
def find_(conn, email):
    q = "select * from person where email=?"
    cur = conn.cursor()
    return cur.execute(q, (email,)).fetchall()
In [351]:
find_(conn, "alice@wonder.land")
Out[351]:
[('alice', 'alice@wonder.land')]
In [354]:
conn.commit()
In [355]:
conn.close()
In [356]:
conn = sqlite3.connect("data.db")
In [357]:
find(conn, "*")
select * from person where email='*'
Out[357]:
[]
In [358]:
find_(conn, "alice@wonder.land")
Out[358]:
[('alice', 'alice@wonder.land')]
In [359]:
records = [
    ("alex", "alex@zoo.in"),
    ("Elsa", "elsa@frozen.mov"),
    ("ELisa", "elisa@hacker.hack")
]
In [360]:
cur = conn.cursor()
cur.executemany("insert into person values(?,?)", records)
Out[360]:
<sqlite3.Cursor at 0x7f57505f3570>
In [361]:
cur.execute("select * from person").fetchall()
Out[361]:
[('alice', 'alice@wonder.land'),
 ('alex', 'alex@zoo.in'),
 ('Elsa', 'elsa@frozen.mov'),
 ('ELisa', 'elisa@hacker.hack')]

To manage database tables as classes, one should use ORM (Object Relational Mapping) which can be done using library sqlalchemy. More details can ne seen at library homepage

In [ ]: