Python Virtual Training For Arcesium - Module III - Day 2

Dec 17-23, 2020 Vikrant Patil

These notes are available online at http://notes.pipal.in/2020/arcesium_finop_batch3/module3-day2.html

© Pipal Academy LLP

Day 1 | Day 2 | Day 3 | Day 4 | Day 5

We will be using jupyter hub from http://lab.pipal.in for this training. Create a notebook with name module3-day2.ipynb for today's session. Before you start shutdown all kernels except today's notebook.

In [2]:
import pandas as pd
In [3]:
wallet = pd.read_csv("wallet.csv")
In [4]:
wallet
Out[4]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

In [5]:
wallet[wallet.category=="Music"]
Out[5]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
23 23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
37 37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
49 49 2021-03-16 09:53:28.377359 Music spotify 232.303402
54 54 2020-11-16 10:53:28.377359 Music spotify 160.817543
65 65 2020-10-23 18:53:28.377359 Music Netflix 188.748743
67 67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
76 76 2020-11-17 09:53:28.377359 Music Netflix 197.534600
78 78 2020-09-09 16:53:28.377359 Music spotify 415.372894
79 79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
82 82 2020-09-22 09:53:28.377359 Music spotify 411.142701
84 84 2020-09-21 10:53:28.377359 Music Netflix 158.793646
85 85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [6]:
wallet[wallet.debit > 200]
Out[6]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
94 94 2021-01-04 13:53:28.377359 Utility Phone 431.185537
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469

78 rows × 5 columns

In [7]:
wallet[wallet.category=="Music"][wallet.debit>300] # category -> Music and amount > 300
<ipython-input-7-72b0c702cf8b>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  wallet[wallet.category=="Music"][wallet.debit>300] # category -> Music and amount > 300
Out[7]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
23 23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
67 67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
78 78 2020-09-09 16:53:28.377359 Music spotify 415.372894
79 79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
82 82 2020-09-22 09:53:28.377359 Music spotify 411.142701
In [8]:
music = wallet[wallet.category=="Music"]
In [9]:
music
Out[9]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
23 23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
37 37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
49 49 2021-03-16 09:53:28.377359 Music spotify 232.303402
54 54 2020-11-16 10:53:28.377359 Music spotify 160.817543
65 65 2020-10-23 18:53:28.377359 Music Netflix 188.748743
67 67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
76 76 2020-11-17 09:53:28.377359 Music Netflix 197.534600
78 78 2020-09-09 16:53:28.377359 Music spotify 415.372894
79 79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
82 82 2020-09-22 09:53:28.377359 Music spotify 411.142701
84 84 2020-09-21 10:53:28.377359 Music Netflix 158.793646
85 85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [10]:
music[music.debit>300]
Out[10]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
23 23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
67 67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
78 78 2020-09-09 16:53:28.377359 Music spotify 415.372894
79 79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
82 82 2020-09-22 09:53:28.377359 Music spotify 411.142701
In [12]:
wallet[(wallet.description=="Amazon") & (wallet.category=='Music')]
Out[12]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
37 37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
85 85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [13]:
wallet[(wallet.description=="Netflix") & (wallet.category=='Music')]
Out[13]:
Unnamed: 0 date category description debit
23 23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
65 65 2020-10-23 18:53:28.377359 Music Netflix 188.748743
67 67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
76 76 2020-11-17 09:53:28.377359 Music Netflix 197.534600
79 79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
84 84 2020-09-21 10:53:28.377359 Music Netflix 158.793646
In [14]:
wallet[(wallet.description=="Netflix") | (wallet.description=='Amazon')]
Out[14]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
12 12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
23 23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
24 24 2021-05-31 11:53:28.377359 Books Amazon 498.100496
37 37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
65 65 2020-10-23 18:53:28.377359 Music Netflix 188.748743
67 67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
72 72 2021-06-30 18:53:28.377359 Books Amazon 294.662869
76 76 2020-11-17 09:53:28.377359 Music Netflix 197.534600
79 79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
84 84 2020-09-21 10:53:28.377359 Music Netflix 158.793646
85 85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [15]:
def total_expenditure(wallet, category):
    df_category = wallet[wallet.category==category]
    debit = df_category.debit
    return debit.sum()
In [16]:
total_expenditure(wallet, "Books")
Out[16]:
4929.750393283798
In [17]:
total_expenditure(wallet, "Music")
Out[17]:
4233.451868232711
In [18]:
total_expenditure(wallet, "Food")
Out[18]:
8281.189172581233
In [19]:
wallet.category.unique()
Out[19]:
array(['Music', 'Food', 'Books', 'Utility', 'Travel'], dtype=object)
In [20]:
for cat in wallet.category.unique():
    print("Expenditure for {} is".format(cat),total_expenditure(wallet, cat))
Expenditure for Music is 4233.451868232711
Expenditure for Food is 8281.189172581233
Expenditure for Books is 4929.750393283798
Expenditure for Utility is 7562.267232638567
Expenditure for Travel is 6052.931876440963
In [22]:
wallet.groupby("category").mean() # average for every numeric column grouped by category
Out[22]:
Unnamed: 0 debit
category
Books 43.142857 352.125028
Food 50.962963 306.710710
Music 60.625000 264.590742
Travel 45.062500 378.308242
Utility 47.370370 280.083972
In [23]:
wallet.groupby("category").sum()
Out[23]:
Unnamed: 0 debit
category
Books 604 4929.750393
Food 1376 8281.189173
Music 970 4233.451868
Travel 721 6052.931876
Utility 1279 7562.267233
In [24]:
wallet.groupby("category").max()
Out[24]:
Unnamed: 0 date description debit
category
Books 96 2021-06-30 18:53:28.377359 Flipcart 498.100496
Food 99 2021-08-24 17:53:28.377359 Zomato 489.143483
Music 90 2021-08-17 09:53:28.377359 spotify 421.207327
Travel 98 2021-08-15 17:53:28.377359 Taxi 494.124399
Utility 97 2021-08-23 17:53:28.377359 Phone 499.858182
In [27]:
wallet.date
Out[27]:
0     2021-03-07 14:53:28.377359
1     2020-10-08 09:53:28.377359
2     2021-02-23 09:53:28.377359
3     2020-11-01 14:53:28.377359
4     2021-06-05 13:53:28.377359
                 ...            
95    2021-07-19 13:53:28.377359
96    2021-01-12 19:53:28.377359
97    2021-03-25 11:53:28.377359
98    2021-05-13 15:53:28.377359
99    2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: object
In [28]:
import datetime
In [29]:
today = datetime.datetime.now()
In [30]:
today
Out[30]:
datetime.datetime(2021, 1, 12, 11, 22, 43, 36486)
In [31]:
tommorrow = today + datetime.timedelta(days=1)
In [32]:
tommorrow
Out[32]:
datetime.datetime(2021, 1, 13, 11, 22, 43, 36486)
In [33]:
tommorrow > today
Out[33]:
True
In [34]:
today > tommorrow
Out[34]:
False
In [35]:
pd.to_datetime(wallet.date)
Out[35]:
0    2021-03-07 14:53:28.377359
1    2020-10-08 09:53:28.377359
2    2021-02-23 09:53:28.377359
3    2020-11-01 14:53:28.377359
4    2021-06-05 13:53:28.377359
                ...            
95   2021-07-19 13:53:28.377359
96   2021-01-12 19:53:28.377359
97   2021-03-25 11:53:28.377359
98   2021-05-13 15:53:28.377359
99   2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: datetime64[ns]
In [37]:
wallet.date
Out[37]:
0     2021-03-07 14:53:28.377359
1     2020-10-08 09:53:28.377359
2     2021-02-23 09:53:28.377359
3     2020-11-01 14:53:28.377359
4     2021-06-05 13:53:28.377359
                 ...            
95    2021-07-19 13:53:28.377359
96    2021-01-12 19:53:28.377359
97    2021-03-25 11:53:28.377359
98    2021-05-13 15:53:28.377359
99    2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: object
In [38]:
wallet['date'] = pd.to_datetime(wallet.date)
In [39]:
wallet.date
Out[39]:
0    2021-03-07 14:53:28.377359
1    2020-10-08 09:53:28.377359
2    2021-02-23 09:53:28.377359
3    2020-11-01 14:53:28.377359
4    2021-06-05 13:53:28.377359
                ...            
95   2021-07-19 13:53:28.377359
96   2021-01-12 19:53:28.377359
97   2021-03-25 11:53:28.377359
98   2021-05-13 15:53:28.377359
99   2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: datetime64[ns]
In [42]:
wallet.date.min()
Out[42]:
Timestamp('2020-08-26 09:53:28.377359')
In [43]:
wallet.date.max()
Out[43]:
Timestamp('2021-08-24 17:53:28.377359')
In [44]:
today
Out[44]:
datetime.datetime(2021, 1, 12, 11, 22, 43, 36486)
In [45]:
wallet[wallet.date < today]
Out[45]:
Unnamed: 0 date category description debit
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
9 9 2020-11-29 14:53:28.377359 Travel Auto 443.618884
20 20 2020-10-12 18:53:28.377359 Travel Auto 365.921808
21 21 2021-01-04 19:53:28.377359 Travel Metro 329.097372
23 23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
26 26 2020-08-26 15:53:28.377359 Books Amazon Kindle 138.806578
28 28 2020-12-14 15:53:28.377359 Utility Phone 358.459933
30 30 2020-09-15 18:53:28.377359 Food Swiggy 203.529240
31 31 2020-09-25 11:53:28.377359 Books Flipcart 246.503527
36 36 2020-12-10 10:53:28.377359 Travel Auto 472.941439
43 43 2020-09-21 12:53:28.377359 Utility Phone 486.033933
44 44 2020-12-26 19:53:28.377359 Utility Electricity 257.927593
47 47 2020-10-28 10:53:28.377359 Books Flipcart 310.408610
50 50 2020-12-24 11:53:28.377359 Food Zomato 463.001875
51 51 2020-12-22 17:53:28.377359 Food Zomato 331.227023
54 54 2020-11-16 10:53:28.377359 Music spotify 160.817543
59 59 2020-08-28 11:53:28.377359 Food Swiggy 208.232912
65 65 2020-10-23 18:53:28.377359 Music Netflix 188.748743
66 66 2020-10-09 16:53:28.377359 Food Swiggy 263.957770
68 68 2020-08-26 09:53:28.377359 Travel Taxi 279.147884
69 69 2020-10-10 15:53:28.377359 Utility Electricity 300.524620
76 76 2020-11-17 09:53:28.377359 Music Netflix 197.534600
78 78 2020-09-09 16:53:28.377359 Music spotify 415.372894
81 81 2020-10-29 16:53:28.377359 Food Hotel 470.080995
82 82 2020-09-22 09:53:28.377359 Music spotify 411.142701
84 84 2020-09-21 10:53:28.377359 Music Netflix 158.793646
85 85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
90 90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
93 93 2021-01-11 09:53:28.377359 Utility Electricity 111.720809
94 94 2021-01-04 13:53:28.377359 Utility Phone 431.185537
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501
In [46]:
wallet.to_csv("wallet_dataframe.csv")
In [47]:
!head wallet_dataframe.csv
,Unnamed: 0,date,category,description,debit
0,0,2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991
1,1,2020-10-08 09:53:28.377359,Food,Swiggy,328.4400802428426
2,2,2021-02-23 09:53:28.377359,Books,Amazon,244.67943701511356
3,3,2020-11-01 14:53:28.377359,Utility,Phone,222.75631758052768
4,4,2021-06-05 13:53:28.377359,Books,Flipcart,494.1284923793595
5,5,2021-07-28 19:53:28.377359,Utility,Electricity,219.9417113096841
6,6,2021-04-16 11:53:28.377359,Books,Amazon Kindle,270.32259514795845
7,7,2021-02-15 10:53:28.377359,Food,Zomato,457.1831036346536
8,8,2021-08-10 19:53:28.377359,Utility,Phone,151.4963725994779
In [51]:
df = wallet[['date','category','description','debit']]
In [52]:
df.to_csv("wallet_dataframe1.csv")
In [53]:
!head wallet_dataframe1.csv
,date,category,description,debit
0,2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991
1,2020-10-08 09:53:28.377359,Food,Swiggy,328.4400802428426
2,2021-02-23 09:53:28.377359,Books,Amazon,244.67943701511356
3,2020-11-01 14:53:28.377359,Utility,Phone,222.75631758052768
4,2021-06-05 13:53:28.377359,Books,Flipcart,494.1284923793595
5,2021-07-28 19:53:28.377359,Utility,Electricity,219.9417113096841
6,2021-04-16 11:53:28.377359,Books,Amazon Kindle,270.32259514795845
7,2021-02-15 10:53:28.377359,Food,Zomato,457.1831036346536
8,2021-08-10 19:53:28.377359,Utility,Phone,151.4963725994779
In [54]:
df.to_csv("wallet_dataframe1.csv", index=False)
In [55]:
!head wallet_dataframe1.csv
date,category,description,debit
2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991
2020-10-08 09:53:28.377359,Food,Swiggy,328.4400802428426
2021-02-23 09:53:28.377359,Books,Amazon,244.67943701511356
2020-11-01 14:53:28.377359,Utility,Phone,222.75631758052768
2021-06-05 13:53:28.377359,Books,Flipcart,494.1284923793595
2021-07-28 19:53:28.377359,Utility,Electricity,219.9417113096841
2021-04-16 11:53:28.377359,Books,Amazon Kindle,270.32259514795845
2021-02-15 10:53:28.377359,Food,Zomato,457.1831036346536
2021-08-10 19:53:28.377359,Utility,Phone,151.4963725994779
In [ ]:
help(df.to_excel)
In [56]:
pd.read_csv("https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv")
Out[56]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

Take home assignment

  • look at help of DataFrame.to_excel and write wallet dataset into an excel workbook in a sheet with name 'wallet'

Combining dataframes

In [57]:
df1 = pd.DataFrame(
    {"a":[1, 2, 3, 4, 5],
    "b":[43, 54, 23, 65, 34],
    "labels" : ['x','y','z','m','n']}
)
In [58]:
df1
Out[58]:
a b labels
0 1 43 x
1 2 54 y
2 3 23 z
3 4 65 m
4 5 34 n
In [59]:
df2 = pd.DataFrame(
    {"c": [1, 2, 3, 4],
    'd': [34,56, 56, 45],
    'labels' : ['x','y','z','m']}
)
In [60]:
df2
Out[60]:
c d labels
0 1 34 x
1 2 56 y
2 3 56 z
3 4 45 m
In [61]:
pd.merge(df2, df1, on='labels')
Out[61]:
c d labels a b
0 1 34 x 1 43
1 2 56 y 2 54
2 3 56 z 3 23
3 4 45 m 4 65
In [64]:
stocks = pd.DataFrame(
    {
    "value": [123, 130, 140, 150],
    "high":[125, 135, 145, 155],
    "low":[120, 125, 138, 149],
    "volume":[10, 101, 10, 20]},
    index = ['APPLE','AT&T','AGILENT','HP']
)
In [65]:
stocks
Out[65]:
value high low volume
APPLE 123 125 120 10
AT&T 130 135 125 101
AGILENT 140 145 138 10
HP 150 155 149 20
In [66]:
stocks1 = pd.DataFrame(
    {
    "value": [123, 130],
    "high":[125, 135],
    "low":[120, 125],
    "volume":[10, 101]},
    index = ['XEROX','TESLA']
)
In [68]:
stocks1
Out[68]:
value high low volume
XEROX 123 125 120 10
TESLA 130 135 125 101
In [69]:
pd.concat([stocks, stocks1])
Out[69]:
value high low volume
APPLE 123 125 120 10
AT&T 130 135 125 101
AGILENT 140 145 138 10
HP 150 155 149 20
XEROX 123 125 120 10
TESLA 130 135 125 101
In [70]:
dfj1 = pd.DataFrame(
    {"a":[1,2,3,4,5],
    "b":[32, 34, 23, 45 ,12]},
    index = ['x','y','z','m','n']
)

dfj2 = pd.DataFrame(
    {"c":[5, 6, 7, 8],
    "d":[65,656,67,67]},
    index = ['x','y','z','m',]
)
In [71]:
dfj1
Out[71]:
a b
x 1 32
y 2 34
z 3 23
m 4 45
n 5 12
In [72]:
dfj2
Out[72]:
c d
x 5 65
y 6 656
z 7 67
m 8 67
In [73]:
dfj1.join(dfj2)
Out[73]:
a b c d
x 1 32 5.0 65.0
y 2 34 6.0 656.0
z 3 23 7.0 67.0
m 4 45 8.0 67.0
n 5 12 NaN NaN

str operations

In [74]:
wallet = pd.read_csv("https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv")
In [76]:
wallet.date
Out[76]:
0     2021-03-07 14:53:28.377359
1     2020-10-08 09:53:28.377359
2     2021-02-23 09:53:28.377359
3     2020-11-01 14:53:28.377359
4     2021-06-05 13:53:28.377359
                 ...            
95    2021-07-19 13:53:28.377359
96    2021-01-12 19:53:28.377359
97    2021-03-25 11:53:28.377359
98    2021-05-13 15:53:28.377359
99    2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: object
In [77]:
s = "2021-03-07 14:53:28.377359"
In [79]:
date, timestamp = s.split()
In [80]:
date
Out[80]:
'2021-03-07'
In [81]:
date.split("-")[0]
Out[81]:
'2021'
In [84]:
wallet.date.str.split()
Out[84]:
0     [2021-03-07, 14:53:28.377359]
1     [2020-10-08, 09:53:28.377359]
2     [2021-02-23, 09:53:28.377359]
3     [2020-11-01, 14:53:28.377359]
4     [2021-06-05, 13:53:28.377359]
                  ...              
95    [2021-07-19, 13:53:28.377359]
96    [2021-01-12, 19:53:28.377359]
97    [2021-03-25, 11:53:28.377359]
98    [2021-05-13, 15:53:28.377359]
99    [2020-10-11, 16:53:28.377359]
Name: date, Length: 100, dtype: object
In [86]:
date_ = wallet.date.str.split(expand=True)[0]
In [90]:
date_df = date_.str.split("-", expand=True)
In [94]:
date_with_columns = date_df.rename(columns={0:'year',1:'month',2:'day'})
In [95]:
date_with_columns
Out[95]:
year month day
0 2021 03 07
1 2020 10 08
2 2021 02 23
3 2020 11 01
4 2021 06 05
... ... ... ...
95 2021 07 19
96 2021 01 12
97 2021 03 25
98 2021 05 13
99 2020 10 11

100 rows × 3 columns

In [96]:
wallet
Out[96]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

In [97]:
wallet['year'] = date_with_columns['year']
In [99]:
wallet.groupby('year').sum()
Out[99]:
Unnamed: 0 debit
year
2020 1413 8240.900169
2021 3537 22818.690374
In [100]:
wallet.groupby(['category','year']).sum()
Out[100]:
Unnamed: 0 debit
category year
Books 2020 104 695.718715
2021 500 4234.031678
Food 2020 437 2368.925397
2021 939 5912.263775
Music 2020 552 1988.923644
2021 418 2244.528224
Travel 2020 133 1561.630016
2021 588 4491.301860
Utility 2020 187 1625.702397
2021 1092 5936.564836
In [104]:
columns_of_insterest = ['date','category','description','debit', 'year']
In [105]:
w = wallet[columns_of_insterest]
In [106]:
w.groupby(['category','year']).sum()
Out[106]:
debit
category year
Books 2020 695.718715
2021 4234.031678
Food 2020 2368.925397
2021 5912.263775
Music 2020 1988.923644
2021 2244.528224
Travel 2020 1561.630016
2021 4491.301860
Utility 2020 1625.702397
2021 5936.564836
In [107]:
wallet.groupby(['category','year']).sum()[['debit']] # this is going to give me dataframe
Out[107]:
debit
category year
Books 2020 695.718715
2021 4234.031678
Food 2020 2368.925397
2021 5912.263775
Music 2020 1988.923644
2021 2244.528224
Travel 2020 1561.630016
2021 4491.301860
Utility 2020 1625.702397
2021 5936.564836
In [109]:
wallet.groupby(['category','year']).sum()['debit'] # this will give series
Out[109]:
category  year
Books     2020     695.718715
          2021    4234.031678
Food      2020    2368.925397
          2021    5912.263775
Music     2020    1988.923644
          2021    2244.528224
Travel    2020    1561.630016
          2021    4491.301860
Utility   2020    1625.702397
          2021    5936.564836
Name: debit, dtype: float64
In [110]:
d = wallet.groupby(['category','year']).sum()[['debit']] # this is going to give me dataframe
In [111]:
type(d)
Out[111]:
pandas.core.frame.DataFrame
In [112]:
s = wallet.groupby(['category','year']).sum()['debit'] # this will give series
In [113]:
type(s)
Out[113]:
pandas.core.series.Series
In [114]:
wallet['debit']
Out[114]:
0     421.207327
1     328.440080
2     244.679437
3     222.756318
4     494.128492
         ...    
95    388.671213
96    467.554562
97    320.789434
98    442.096469
99    100.455501
Name: debit, Length: 100, dtype: float64
In [115]:
wallet[['debit']]
Out[115]:
debit
0 421.207327
1 328.440080
2 244.679437
3 222.756318
4 494.128492
... ...
95 388.671213
96 467.554562
97 320.789434
98 442.096469
99 100.455501

100 rows × 1 columns

Working with virtual environment

To create virtual environment on your system (windows)

open cmd

then run

python -m venv NAME_OF_VIRTUAL_ENV

this will create a folder with name NAME_OF_VIRTUAL_ENV for unix/linux/mac

+ NAME_OF_VIRTUAL_ENV
  |
  +-bin
  +-include
  +-lib
  +-lib64
  +-pyenv.cfg

For windows following structure will get created

+ NAME_OF_VIRTUAL_ENV
  |
  +-Include
  +-Lib
  +-Scripts
  +-pyvenv.cfg

to activate environment on windows

c:\Users\vik> NAME_OF_VIRTUAL_ENV\Scripts\activate.bat
(NAME_OF_VIRTUAL_ENV) c:\Users\vik>

to deactivate environment on windows

(NAME_OF_VIRTUAL_ENV) c:\Users\vik> NAME_OF_VIRTUAL_ENV\Scripts\deactivate.bat
c:\Users\vik>

to activate on linux/max/unix

source NAME_OF_VIRTUAL_ENV/bin/activate
(NAME_OF_VIRTUAL_ENV) $

installing packages in virtual environment

  1. Activate environment

    c:\Users\vik> NAME_OF_VIRTUAL_ENV\Scripts\activate.bat
    (NAME_OF_VIRTUAL_ENV) c:\Users\vik>
  2. fire pip command to install packages

(NAME_OF_VIRTUAL_ENV) c:\Users\vik> pip install pandas
  1. TO check what all packages are installed in this virtual environment use pip freeze command
(NAME_OF_VIRTUAL_ENV) c:\Users\vik> pip freeze

requirements.txt

this is a text file with one entry on every line which corresponds to package name (and version if required)

In [116]:
%%file requirements.txt
jupyter
pandas
requests
xlrd
Writing requirements.txt

Home assignment

  1. create one virtual environment
  2. Activate the environment
  3. use above requirements.txt for installing packages in that environment
  4. launch jupyter from cmd using jupyter lab
In [ ]: