Python Virtual Training For Arcesium - Module III - Day 1¶

Aug 19-25, 2022 Vikrant Patil

All notes are available online at https://notes.pipal.in/2022/arcesium_finop_batch1/

Please accept the invitation that you have received in your email and login to

https://engage.pipal.in/

login to lab and create today's notebook module3-day1

© Pipal Academy LLP

Pandas - Spreadsheet of python¶

In [1]:
import pandas as pd
In [2]:
!pip install pandas
Requirement already satisfied: pandas in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (1.4.2)
Requirement already satisfied: pytz>=2020.1 in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from pandas) (2022.1)
Requirement already satisfied: numpy>=1.21.0 in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from pandas) (1.22.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)

[notice] A new release of pip available: 22.1.2 -> 22.2.2
[notice] To update, run: pip install --upgrade pip
In [3]:
import pandas as pd
In [4]:
!python -m pip install pandas
Requirement already satisfied: pandas in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (1.4.2)
Requirement already satisfied: pytz>=2020.1 in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from pandas) (2022.1)
Requirement already satisfied: numpy>=1.21.0 in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from pandas) (1.22.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)

[notice] A new release of pip available: 22.1.2 -> 22.2.2
[notice] To update, run: pip install --upgrade pip
In [1]:
import pandas as pd
In [2]:
data = pd.read_csv("https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv")
In [3]:
data
Out[3]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

In [4]:
pd.read_csv("wallet.csv") # you can give filepath or url where the file is stored
Out[4]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 4 columns

In [5]:
url = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
In [6]:
pd.read_csv(url, columns=['date','category','description','debit'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 pd.read_csv(url, columns=['date','category','description','debit'])

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

TypeError: read_csv() got an unexpected keyword argument 'columns'
In [7]:
pd.read_csv(url, usecols=['date','category','description','debit'])
Out[7]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 4 columns

In [9]:
pd.read_csv(url, usecols=['date','category'])
Out[9]:
date category
0 2021-03-07 14:53:28.377359 Music
1 2020-10-08 09:53:28.377359 Food
2 2021-02-23 09:53:28.377359 Books
3 2020-11-01 14:53:28.377359 Utility
4 2021-06-05 13:53:28.377359 Books
... ... ...
95 2021-07-19 13:53:28.377359 Utility
96 2021-01-12 19:53:28.377359 Books
97 2021-03-25 11:53:28.377359 Utility
98 2021-05-13 15:53:28.377359 Travel
99 2020-10-11 16:53:28.377359 Food

100 rows × 2 columns

In [10]:
wallet = pd.read_csv(url, usecols=['date','category','description','debit'])
In [11]:
wallet
Out[11]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 4 columns

In [13]:
wallet.head() # show first five lines
Out[13]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
In [14]:
wallet.tail() # show last five lines
Out[14]:
date category description debit
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501
In [16]:
wallet.describe() # describe numeric columns by giving basic statistical information about it
Out[16]:
debit
count 100.000000
mean 310.595905
std 121.178218
min 100.455501
25% 216.429447
50% 318.110776
75% 424.109079
max 499.858182
In [17]:
wallet.tail(10)
Out[17]:
date category description debit
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
91 2021-01-22 17:53:28.377359 Food Hotel 232.663468
92 2021-01-12 19:53:28.377359 Travel Taxi 356.842638
93 2021-01-11 09:53:28.377359 Utility Electricity 111.720809
94 2021-01-04 13:53:28.377359 Utility Phone 431.185537
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501
In [18]:
help(wallet.head)
Help on method head in module pandas.core.generic:

head(n: 'int' = 5) -> 'NDFrameT' method of pandas.core.frame.DataFrame instance
    Return the first `n` rows.
    
    This function returns the first `n` rows for the object based
    on position. It is useful for quickly testing if your object
    has the right type of data in it.
    
    For negative values of `n`, this function returns all rows except
    the last `n` rows, equivalent to ``df[:-n]``.
    
    Parameters
    ----------
    n : int, default 5
        Number of rows to select.
    
    Returns
    -------
    same type as caller
        The first `n` rows of the caller object.
    
    See Also
    --------
    DataFrame.tail: Returns the last `n` rows.
    
    Examples
    --------
    >>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
    ...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
    >>> df
          animal
    0  alligator
    1        bee
    2     falcon
    3       lion
    4     monkey
    5     parrot
    6      shark
    7      whale
    8      zebra
    
    Viewing the first 5 lines
    
    >>> df.head()
          animal
    0  alligator
    1        bee
    2     falcon
    3       lion
    4     monkey
    
    Viewing the first `n` lines (three in this case)
    
    >>> df.head(3)
          animal
    0  alligator
    1        bee
    2     falcon
    
    For negative values of `n`
    
    >>> df.head(-3)
          animal
    0  alligator
    1        bee
    2     falcon
    3       lion
    4     monkey
    5     parrot

problem

  • Try reading data from "https://www.moneycontrol.com/markets/indian-indices/" using pd.read_html
  • Read wallet data from excel file using pd.read_excel. excel file is stored at this url "https://github.com/vikipedia/python-trainings/raw/master/online_course/source/module2/wallet.xlsx"
In [19]:
!pip install lxml
Requirement already satisfied: lxml in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (4.9.1)

[notice] A new release of pip available: 22.1.2 -> 22.2.2
[notice] To update, run: pip install --upgrade pip
In [21]:
money_control = pd.read_html("https://www.moneycontrol.com/markets/indian-indices/")
In [22]:
type(money_control)
Out[22]:
list
In [23]:
len(money_control)
Out[23]:
7
In [24]:
money_control[0]
Out[24]:
Stock Name Sector LTP Change %Chg
0 No Data No Data No Data No Data No Data
In [25]:
money_control[1]
Out[25]:
Stock Name Sector LTP Change %Chg
0 No Data No Data No Data No Data No Data
In [26]:
money_control[2]
Out[26]:
Stock Name Sector LTP Change %Chg
0 IndusInd Bank Banks - Private Sector 1074.05 -31.05 -2.81
1 BPCL Refineries 339.45 -7.60 -2.19
2 Power Grid Corp Power - Generation & Distribution 227.15 -4.20 -1.82
3 Apollo Hospital Hospitals & Medical Services 4190.20 -76.45 -1.79
4 Coal India Mining & Minerals 218.50 -3.25 -1.47
5 HUL Personal Care 2645.50 -36.55 -1.36
6 Bajaj Finserv Finance - Investments 16580.95 -224.95 -1.34
7 Sun Pharma Pharmaceuticals 904.70 -12.05 -1.31
8 HDFC Life Life & Health Insurance 579.35 -7.40 -1.26
9 Bajaj Finance Finance - NBFC 7394.80 -93.45 -1.25
10 ICICI Bank Banks - Private Sector 874.60 -10.50 -1.19
11 TATA Cons. Prod Plantations - Tea & Coffee 798.50 -9.35 -1.16
12 NTPC Power - Generation & Distribution 159.00 -1.80 -1.12
13 Tata Motors Auto - LCVs & HCVs 479.40 -5.40 -1.11
14 SBI Banks - Public Sector 526.50 -5.85 -1.10
15 HDFC Bank Banks - Private Sector 1495.60 -16.10 -1.07
16 Asian Paints Paints & Varnishes 3497.90 -37.45 -1.06
17 Maruti Suzuki Auto - Cars & Jeeps 8855.00 -92.70 -1.04
18 Hindalco Iron & Steel 433.50 -4.55 -1.04
19 Britannia Food Processing 3656.70 -36.35 -0.98
20 Hero Motocorp Auto - 2 & 3 Wheelers 2882.00 -26.20 -0.90
21 Cipla Pharmaceuticals 1023.00 -9.20 -0.89
22 HDFC Finance - Housing 2476.65 -21.30 -0.85
23 Tata Steel Iron & Steel 111.60 -0.95 -0.84
24 JSW Steel Steel - Large 668.50 -4.90 -0.73
25 Shree Cements Cement - Major 21474.85 -147.65 -0.68
26 Divis Labs Pharmaceuticals 3720.00 -23.60 -0.63
27 Axis Bank Banks - Private Sector 760.50 -4.55 -0.59
28 UPL Chemicals 769.60 -4.35 -0.56
29 Dr Reddys Labs Pharmaceuticals 4202.00 -23.10 -0.55
30 Reliance Refineries 2647.55 -13.75 -0.52
31 Grasim Diversified 1605.35 -8.40 -0.52
32 ITC Diversified 314.75 -0.90 -0.29
33 SBI Life Insura Diversified 1320.55 -3.65 -0.28
34 M&M Auto - Cars & Jeeps 1257.80 -3.35 -0.27
35 Titan Company Miscellaneous 2475.15 -3.55 -0.14
36 UltraTechCement Cement - Major 6696.00 -7.95 -0.12
37 Nestle Food Processing 19554.30 -3.35 -0.02
In [27]:
df = money_control[2]
In [28]:
df.head()
Out[28]:
Stock Name Sector LTP Change %Chg
0 IndusInd Bank Banks - Private Sector 1074.05 -31.05 -2.81
1 BPCL Refineries 339.45 -7.60 -2.19
2 Power Grid Corp Power - Generation & Distribution 227.15 -4.20 -1.82
3 Apollo Hospital Hospitals & Medical Services 4190.20 -76.45 -1.79
4 Coal India Mining & Minerals 218.50 -3.25 -1.47
In [29]:
df.describe()
Out[29]:
LTP Change %Chg
count 38.000000 38.000000 38.000000
mean 3391.881579 -27.044737 -0.969737
std 5157.015842 45.436558 0.574073
min 111.600000 -224.950000 -2.810000
25% 601.637500 -25.550000 -1.235000
50% 1289.175000 -8.800000 -1.010000
75% 3617.000000 -4.400000 -0.567500
max 21474.850000 -0.900000 -0.020000
In [31]:
len(money_control)
Out[31]:
7
In [32]:
money_control
Out[32]:
[  Stock Name   Sector      LTP   Change     %Chg
 0    No Data  No Data  No Data  No Data  No Data,
   Stock Name   Sector      LTP   Change     %Chg
 0    No Data  No Data  No Data  No Data  No Data,
          Stock Name                             Sector       LTP  Change  %Chg
 0     IndusInd Bank             Banks - Private Sector   1074.05  -31.05 -2.81
 1              BPCL                         Refineries    339.45   -7.60 -2.19
 2   Power Grid Corp  Power - Generation & Distribution    227.15   -4.20 -1.82
 3   Apollo Hospital       Hospitals & Medical Services   4190.20  -76.45 -1.79
 4        Coal India                  Mining & Minerals    218.50   -3.25 -1.47
 5               HUL                      Personal Care   2645.50  -36.55 -1.36
 6     Bajaj Finserv              Finance - Investments  16580.95 -224.95 -1.34
 7        Sun Pharma                    Pharmaceuticals    904.70  -12.05 -1.31
 8         HDFC Life            Life & Health Insurance    579.35   -7.40 -1.26
 9     Bajaj Finance                     Finance - NBFC   7394.80  -93.45 -1.25
 10       ICICI Bank             Banks - Private Sector    874.60  -10.50 -1.19
 11  TATA Cons. Prod         Plantations - Tea & Coffee    798.50   -9.35 -1.16
 12             NTPC  Power - Generation & Distribution    159.00   -1.80 -1.12
 13      Tata Motors                 Auto - LCVs & HCVs    479.40   -5.40 -1.11
 14              SBI              Banks - Public Sector    526.50   -5.85 -1.10
 15        HDFC Bank             Banks - Private Sector   1495.60  -16.10 -1.07
 16     Asian Paints                 Paints & Varnishes   3497.90  -37.45 -1.06
 17    Maruti Suzuki                Auto - Cars & Jeeps   8855.00  -92.70 -1.04
 18         Hindalco                       Iron & Steel    433.50   -4.55 -1.04
 19        Britannia                    Food Processing   3656.70  -36.35 -0.98
 20    Hero Motocorp              Auto - 2 & 3 Wheelers   2882.00  -26.20 -0.90
 21            Cipla                    Pharmaceuticals   1023.00   -9.20 -0.89
 22             HDFC                  Finance - Housing   2476.65  -21.30 -0.85
 23       Tata Steel                       Iron & Steel    111.60   -0.95 -0.84
 24        JSW Steel                      Steel - Large    668.50   -4.90 -0.73
 25    Shree Cements                     Cement - Major  21474.85 -147.65 -0.68
 26       Divis Labs                    Pharmaceuticals   3720.00  -23.60 -0.63
 27        Axis Bank             Banks - Private Sector    760.50   -4.55 -0.59
 28              UPL                          Chemicals    769.60   -4.35 -0.56
 29   Dr Reddys Labs                    Pharmaceuticals   4202.00  -23.10 -0.55
 30         Reliance                         Refineries   2647.55  -13.75 -0.52
 31           Grasim                        Diversified   1605.35   -8.40 -0.52
 32              ITC                        Diversified    314.75   -0.90 -0.29
 33  SBI Life Insura                        Diversified   1320.55   -3.65 -0.28
 34              M&M                Auto - Cars & Jeeps   1257.80   -3.35 -0.27
 35    Titan Company                      Miscellaneous   2475.15   -3.55 -0.14
 36  UltraTechCement                     Cement - Major   6696.00   -7.95 -0.12
 37           Nestle                    Food Processing  19554.30   -3.35 -0.02,
         Stock Name                        Sector      LTP  Change  %Chg
 0    Tech Mahindra          Computers - Software  1124.90   19.75  1.79
 1           Larsen      Infrastructure - General  1924.05   29.50  1.56
 2          Infosys          Computers - Software  1599.80   16.85  1.06
 3            Wipro          Computers - Software   438.60    3.55  0.82
 4             ONGC  Oil Drilling And Exploration   136.60    1.05  0.77
 5         HCL Tech          Computers - Software   979.50    6.00  0.62
 6    Eicher Motors            Auto - LCVs & HCVs  3431.55   20.15  0.59
 7              TCS          Computers - Software  3398.80   17.55  0.52
 8   Kotak Mahindra        Banks - Private Sector  1905.95    1.45  0.08
 9    Bharti Airtel  Telecommunications - Service   733.60    0.40  0.05
 10      Bajaj Auto         Auto - 2 & 3 Wheelers  4062.15    1.50  0.04,
     Stock Name                    Sector    LTP  Change  %Chg
 0  Adani Ports  Infrastructure - General  867.8   34.75  4.17,
   Stock Name   Sector      LTP   Change     %Chg
 0    No Data  No Data  No Data  No Data  No Data,
     0   1   2   3   4   5   6   7   8   9
 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]
In [33]:
money_control[3]
Out[33]:
Stock Name Sector LTP Change %Chg
0 Tech Mahindra Computers - Software 1124.90 19.75 1.79
1 Larsen Infrastructure - General 1924.05 29.50 1.56
2 Infosys Computers - Software 1599.80 16.85 1.06
3 Wipro Computers - Software 438.60 3.55 0.82
4 ONGC Oil Drilling And Exploration 136.60 1.05 0.77
5 HCL Tech Computers - Software 979.50 6.00 0.62
6 Eicher Motors Auto - LCVs & HCVs 3431.55 20.15 0.59
7 TCS Computers - Software 3398.80 17.55 0.52
8 Kotak Mahindra Banks - Private Sector 1905.95 1.45 0.08
9 Bharti Airtel Telecommunications - Service 733.60 0.40 0.05
10 Bajaj Auto Auto - 2 & 3 Wheelers 4062.15 1.50 0.04
In [34]:
money_control[4]
Out[34]:
Stock Name Sector LTP Change %Chg
0 Adani Ports Infrastructure - General 867.8 34.75 4.17
In [35]:
money_control[5]
Out[35]:
Stock Name Sector LTP Change %Chg
0 No Data No Data No Data No Data No Data
In [36]:
df
Out[36]:
Stock Name Sector LTP Change %Chg
0 IndusInd Bank Banks - Private Sector 1074.05 -31.05 -2.81
1 BPCL Refineries 339.45 -7.60 -2.19
2 Power Grid Corp Power - Generation & Distribution 227.15 -4.20 -1.82
3 Apollo Hospital Hospitals & Medical Services 4190.20 -76.45 -1.79
4 Coal India Mining & Minerals 218.50 -3.25 -1.47
5 HUL Personal Care 2645.50 -36.55 -1.36
6 Bajaj Finserv Finance - Investments 16580.95 -224.95 -1.34
7 Sun Pharma Pharmaceuticals 904.70 -12.05 -1.31
8 HDFC Life Life & Health Insurance 579.35 -7.40 -1.26
9 Bajaj Finance Finance - NBFC 7394.80 -93.45 -1.25
10 ICICI Bank Banks - Private Sector 874.60 -10.50 -1.19
11 TATA Cons. Prod Plantations - Tea & Coffee 798.50 -9.35 -1.16
12 NTPC Power - Generation & Distribution 159.00 -1.80 -1.12
13 Tata Motors Auto - LCVs & HCVs 479.40 -5.40 -1.11
14 SBI Banks - Public Sector 526.50 -5.85 -1.10
15 HDFC Bank Banks - Private Sector 1495.60 -16.10 -1.07
16 Asian Paints Paints & Varnishes 3497.90 -37.45 -1.06
17 Maruti Suzuki Auto - Cars & Jeeps 8855.00 -92.70 -1.04
18 Hindalco Iron & Steel 433.50 -4.55 -1.04
19 Britannia Food Processing 3656.70 -36.35 -0.98
20 Hero Motocorp Auto - 2 & 3 Wheelers 2882.00 -26.20 -0.90
21 Cipla Pharmaceuticals 1023.00 -9.20 -0.89
22 HDFC Finance - Housing 2476.65 -21.30 -0.85
23 Tata Steel Iron & Steel 111.60 -0.95 -0.84
24 JSW Steel Steel - Large 668.50 -4.90 -0.73
25 Shree Cements Cement - Major 21474.85 -147.65 -0.68
26 Divis Labs Pharmaceuticals 3720.00 -23.60 -0.63
27 Axis Bank Banks - Private Sector 760.50 -4.55 -0.59
28 UPL Chemicals 769.60 -4.35 -0.56
29 Dr Reddys Labs Pharmaceuticals 4202.00 -23.10 -0.55
30 Reliance Refineries 2647.55 -13.75 -0.52
31 Grasim Diversified 1605.35 -8.40 -0.52
32 ITC Diversified 314.75 -0.90 -0.29
33 SBI Life Insura Diversified 1320.55 -3.65 -0.28
34 M&M Auto - Cars & Jeeps 1257.80 -3.35 -0.27
35 Titan Company Miscellaneous 2475.15 -3.55 -0.14
36 UltraTechCement Cement - Major 6696.00 -7.95 -0.12
37 Nestle Food Processing 19554.30 -3.35 -0.02
In [37]:
df.describe()
Out[37]:
LTP Change %Chg
count 38.000000 38.000000 38.000000
mean 3391.881579 -27.044737 -0.969737
std 5157.015842 45.436558 0.574073
min 111.600000 -224.950000 -2.810000
25% 601.637500 -25.550000 -1.235000
50% 1289.175000 -8.800000 -1.010000
75% 3617.000000 -4.400000 -0.567500
max 21474.850000 -0.900000 -0.020000

Basic data strutcures of pandas Series¶

In [40]:
df['LTP'] # access LTP column from dataframe
Out[40]:
0      1074.05
1       339.45
2       227.15
3      4190.20
4       218.50
5      2645.50
6     16580.95
7       904.70
8       579.35
9      7394.80
10      874.60
11      798.50
12      159.00
13      479.40
14      526.50
15     1495.60
16     3497.90
17     8855.00
18      433.50
19     3656.70
20     2882.00
21     1023.00
22     2476.65
23      111.60
24      668.50
25    21474.85
26     3720.00
27      760.50
28      769.60
29     4202.00
30     2647.55
31     1605.35
32      314.75
33     1320.55
34     1257.80
35     2475.15
36     6696.00
37    19554.30
Name: LTP, dtype: float64
In [41]:
ltp  = df['LTP']
In [42]:
type(ltp)
Out[42]:
pandas.core.series.Series
In [43]:
df
Out[43]:
Stock Name Sector LTP Change %Chg
0 IndusInd Bank Banks - Private Sector 1074.05 -31.05 -2.81
1 BPCL Refineries 339.45 -7.60 -2.19
2 Power Grid Corp Power - Generation & Distribution 227.15 -4.20 -1.82
3 Apollo Hospital Hospitals & Medical Services 4190.20 -76.45 -1.79
4 Coal India Mining & Minerals 218.50 -3.25 -1.47
5 HUL Personal Care 2645.50 -36.55 -1.36
6 Bajaj Finserv Finance - Investments 16580.95 -224.95 -1.34
7 Sun Pharma Pharmaceuticals 904.70 -12.05 -1.31
8 HDFC Life Life & Health Insurance 579.35 -7.40 -1.26
9 Bajaj Finance Finance - NBFC 7394.80 -93.45 -1.25
10 ICICI Bank Banks - Private Sector 874.60 -10.50 -1.19
11 TATA Cons. Prod Plantations - Tea & Coffee 798.50 -9.35 -1.16
12 NTPC Power - Generation & Distribution 159.00 -1.80 -1.12
13 Tata Motors Auto - LCVs & HCVs 479.40 -5.40 -1.11
14 SBI Banks - Public Sector 526.50 -5.85 -1.10
15 HDFC Bank Banks - Private Sector 1495.60 -16.10 -1.07
16 Asian Paints Paints & Varnishes 3497.90 -37.45 -1.06
17 Maruti Suzuki Auto - Cars & Jeeps 8855.00 -92.70 -1.04
18 Hindalco Iron & Steel 433.50 -4.55 -1.04
19 Britannia Food Processing 3656.70 -36.35 -0.98
20 Hero Motocorp Auto - 2 & 3 Wheelers 2882.00 -26.20 -0.90
21 Cipla Pharmaceuticals 1023.00 -9.20 -0.89
22 HDFC Finance - Housing 2476.65 -21.30 -0.85
23 Tata Steel Iron & Steel 111.60 -0.95 -0.84
24 JSW Steel Steel - Large 668.50 -4.90 -0.73
25 Shree Cements Cement - Major 21474.85 -147.65 -0.68
26 Divis Labs Pharmaceuticals 3720.00 -23.60 -0.63
27 Axis Bank Banks - Private Sector 760.50 -4.55 -0.59
28 UPL Chemicals 769.60 -4.35 -0.56
29 Dr Reddys Labs Pharmaceuticals 4202.00 -23.10 -0.55
30 Reliance Refineries 2647.55 -13.75 -0.52
31 Grasim Diversified 1605.35 -8.40 -0.52
32 ITC Diversified 314.75 -0.90 -0.29
33 SBI Life Insura Diversified 1320.55 -3.65 -0.28
34 M&M Auto - Cars & Jeeps 1257.80 -3.35 -0.27
35 Titan Company Miscellaneous 2475.15 -3.55 -0.14
36 UltraTechCement Cement - Major 6696.00 -7.95 -0.12
37 Nestle Food Processing 19554.30 -3.35 -0.02
In [45]:
pd.Series([1, 2, 3, 4]) # it will make aseries of type integer
Out[45]:
0    1
1    2
2    3
3    4
dtype: int64
In [47]:
pd.Series([45.6, 20.1, 30.2, 4.5]) # a series of type float
Out[47]:
0    45.6
1    20.1
2    30.2
3     4.5
dtype: float64
In [48]:
pd.Series(["a","b","v","f"])
Out[48]:
0    a
1    b
2    v
3    f
dtype: object
In [49]:
ltp
Out[49]:
0      1074.05
1       339.45
2       227.15
3      4190.20
4       218.50
5      2645.50
6     16580.95
7       904.70
8       579.35
9      7394.80
10      874.60
11      798.50
12      159.00
13      479.40
14      526.50
15     1495.60
16     3497.90
17     8855.00
18      433.50
19     3656.70
20     2882.00
21     1023.00
22     2476.65
23      111.60
24      668.50
25    21474.85
26     3720.00
27      760.50
28      769.60
29     4202.00
30     2647.55
31     1605.35
32      314.75
33     1320.55
34     1257.80
35     2475.15
36     6696.00
37    19554.30
Name: LTP, dtype: float64
In [51]:
values = pd.Series([12.0,23.3,43.6], index=['a','b','c'])
In [52]:
values
Out[52]:
a    12.0
b    23.3
c    43.6
dtype: float64
In [54]:
values['a'] # access by label
Out[54]:
12.0
In [56]:
values[0] # access by index
Out[56]:
12.0
In [57]:
labels = ["APPLE", "AT&T", "IBM", "NIKE"]
value = pd.Series([234.5, 221.3, 125.3, 100.5], index=labels)
high = pd.Series([240, 225, 130, 110.0], index=labels)
low = pd.Series([230.0,220.0,120.0,100.0], index=labels)
volume = pd.Series([100,200,300,400], index=labels)
stocks = pd.DataFrame({ "value":value,
                       "high": high,
                       "low": low,
                       "volume": volume})
In [58]:
stocks
Out[58]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [59]:

In [60]:
stocks1
Out[60]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [61]:
pd.DataFrame({"value":[234.5, 221.3, 125.3, 100.5],
                       "high": [240, 225, 130, 110.0],
                       "low": [230.0,220.0,120.0,100.0],
                       "volume": [100,200,300,400]},
                       )
Out[61]:
value high low volume
0 234.5 240.0 230.0 100
1 221.3 225.0 220.0 200
2 125.3 130.0 120.0 300
3 100.5 110.0 100.0 400
In [63]:
stocks.value # if column does not have space in it then we can also acess column like
Out[63]:
APPLE    234.5
AT&T     221.3
IBM      125.3
NIKE     100.5
Name: value, dtype: float64
In [65]:
stocks['value']
Out[65]:
APPLE    234.5
AT&T     221.3
IBM      125.3
NIKE     100.5
Name: value, dtype: float64
In [66]:
stocks.value.std()
Out[66]:
67.37863162754198
In [67]:
ltp
Out[67]:
0      1074.05
1       339.45
2       227.15
3      4190.20
4       218.50
5      2645.50
6     16580.95
7       904.70
8       579.35
9      7394.80
10      874.60
11      798.50
12      159.00
13      479.40
14      526.50
15     1495.60
16     3497.90
17     8855.00
18      433.50
19     3656.70
20     2882.00
21     1023.00
22     2476.65
23      111.60
24      668.50
25    21474.85
26     3720.00
27      760.50
28      769.60
29     4202.00
30     2647.55
31     1605.35
32      314.75
33     1320.55
34     1257.80
35     2475.15
36     6696.00
37    19554.30
Name: LTP, dtype: float64
In [68]:
ltp.min()
Out[68]:
111.6
In [69]:
ltp.max()
Out[69]:
21474.85
In [70]:
ltp.mean()
Out[70]:
3391.881578947369
In [71]:
ltp.describe()
Out[71]:
count       38.000000
mean      3391.881579
std       5157.015842
min        111.600000
25%        601.637500
50%       1289.175000
75%       3617.000000
max      21474.850000
Name: LTP, dtype: float64
In [72]:
stocks.value
Out[72]:
APPLE    234.5
AT&T     221.3
IBM      125.3
NIKE     100.5
Name: value, dtype: float64
In [73]:
stocks.value['APPLE']
Out[73]:
234.5
In [74]:
stocks
Out[74]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [75]:
stocks['APPLE'] # this will not row! on dataframe this is a way to access column
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance)
   3620 try:
-> 3621     return self._engine.get_loc(casted_key)
   3622 except KeyError as err:

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/_libs/index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'APPLE'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Input In [75], in <cell line: 1>()
----> 1 stocks['APPLE']

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/frame.py:3505, in DataFrame.__getitem__(self, key)
   3503 if self.columns.nlevels > 1:
   3504     return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
   3506 if is_integer(indexer):
   3507     indexer = [indexer]

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/indexes/base.py:3623, in Index.get_loc(self, key, method, tolerance)
   3621     return self._engine.get_loc(casted_key)
   3622 except KeyError as err:
-> 3623     raise KeyError(key) from err
   3624 except TypeError:
   3625     # If we have a listlike key, _check_indexing_error will raise
   3626     #  InvalidIndexError. Otherwise we fall through and re-raise
   3627     #  the TypeError.
   3628     self._check_indexing_error(key)

KeyError: 'APPLE'
In [76]:
stocks.loc['APPLE',:] # row location
Out[76]:
value     234.5
high      240.0
low       230.0
volume    100.0
Name: APPLE, dtype: float64
In [77]:
stocks
Out[77]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [80]:
stocks.loc['APPLE'] # just take a row
Out[80]:
value     234.5
high      240.0
low       230.0
volume    100.0
Name: APPLE, dtype: float64
In [81]:
stocks.loc[['APPLE','IBM']]
Out[81]:
value high low volume
APPLE 234.5 240.0 230.0 100
IBM 125.3 130.0 120.0 300
In [82]:
stocks.loc[['APPLE','IBM'],'value']
Out[82]:
APPLE    234.5
IBM      125.3
Name: value, dtype: float64
In [84]:
stocks.loc['APPLE', 'volume']
Out[84]:
100
In [85]:
stocks.loc['APPLE',:]
Out[85]:
value     234.5
high      240.0
low       230.0
volume    100.0
Name: APPLE, dtype: float64
In [87]:
stocks.iloc[0,0] # 0th row and 0th column
Out[87]:
234.5
In [88]:
stocks.iloc[0:3,:]
Out[88]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
In [89]:
stocks.tail()# there is this round bracket ..which says this is a method
Out[89]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [90]:
stocks.loc()# not to be called like this
Out[90]:
<pandas.core.indexing._LocIndexer at 0x7f1edf929850>
In [91]:
stocks.loc['IBM']
Out[91]:
value     125.3
high      130.0
low       120.0
volume    300.0
Name: IBM, dtype: float64
In [92]:
value
Out[92]:
APPLE    234.5
AT&T     221.3
IBM      125.3
NIKE     100.5
dtype: float64
In [93]:
value*2
Out[93]:
APPLE    469.0
AT&T     442.6
IBM      250.6
NIKE     201.0
dtype: float64
In [94]:
value
Out[94]:
APPLE    234.5
AT&T     221.3
IBM      125.3
NIKE     100.5
dtype: float64
In [95]:
volume
Out[95]:
APPLE    100
AT&T     200
IBM      300
NIKE     400
dtype: int64
In [96]:
value*volume
Out[96]:
APPLE    23450.0
AT&T     44260.0
IBM      37590.0
NIKE     40200.0
dtype: float64
In [98]:
s1 = pd.Series([1,2,3,4], index=['a','b','c','d'])
s2 = pd.Series([4,3,2,1], index=['d','c','b','a'])
In [99]:
s1
Out[99]:
a    1
b    2
c    3
d    4
dtype: int64
In [100]:
s2
Out[100]:
d    4
c    3
b    2
a    1
dtype: int64
In [101]:
s1 + s2
Out[101]:
a    2
b    4
c    6
d    8
dtype: int64
In [102]:
s1*s2
Out[102]:
a     1
b     4
c     9
d    16
dtype: int64
In [103]:
s1 = pd.Series([1,2,3,4], index=['a','b','c','d'])
s2 = pd.Series([3,2,1], index=['c','b','a'])
In [104]:
s1
Out[104]:
a    1
b    2
c    3
d    4
dtype: int64
In [105]:
s2
Out[105]:
c    3
b    2
a    1
dtype: int64
In [106]:
s1+s2
Out[106]:
a    2.0
b    4.0
c    6.0
d    NaN
dtype: float64
In [107]:
s1
Out[107]:
a    1
b    2
c    3
d    4
dtype: int64
In [108]:
s1 + s2
Out[108]:
a    2.0
b    4.0
c    6.0
d    NaN
dtype: float64

Selecting or Filtering¶

In [109]:
wallet
Out[109]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 4 columns

In [110]:
s1
Out[110]:
a    1
b    2
c    3
d    4
dtype: int64
In [111]:
s1  + 1
Out[111]:
a    2
b    3
c    4
d    5
dtype: int64
In [113]:
s1 > 2 # boolean searies
Out[113]:
a    False
b    False
c     True
d     True
dtype: bool
In [114]:
s1[s1>2]
Out[114]:
c    3
d    4
dtype: int64
In [116]:
wallet.debit[wallet.debit>400]
Out[116]:
0     421.207327
4     494.128492
7     457.183104
9     443.618884
11    434.495468
14    485.297743
18    433.822404
22    489.143483
24    498.100496
25    483.315864
33    449.248030
34    499.858182
35    441.602143
36    472.941439
38    494.124399
41    425.187191
43    486.033933
50    463.001875
52    403.610070
55    423.749708
62    497.770860
75    425.625291
77    482.152343
78    415.372894
81    470.080995
82    411.142701
83    451.584407
94    431.185537
96    467.554562
98    442.096469
Name: debit, dtype: float64
In [117]:
wallet[wallet.debit>450]
Out[117]:
date category description debit
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
7 2021-02-15 10:53:28.377359 Food Zomato 457.183104
14 2021-06-09 13:53:28.377359 Travel Taxi 485.297743
22 2021-06-24 15:53:28.377359 Food Zomato 489.143483
24 2021-05-31 11:53:28.377359 Books Amazon 498.100496
25 2021-05-21 14:53:28.377359 Food Hotel 483.315864
34 2021-05-14 10:53:28.377359 Utility Phone 499.858182
36 2020-12-10 10:53:28.377359 Travel Auto 472.941439
38 2021-08-15 10:53:28.377359 Travel Auto 494.124399
43 2020-09-21 12:53:28.377359 Utility Phone 486.033933
50 2020-12-24 11:53:28.377359 Food Zomato 463.001875
62 2021-01-27 19:53:28.377359 Books Amazon Kindle 497.770860
77 2021-01-18 14:53:28.377359 Books Amazon Kindle 482.152343
81 2020-10-29 16:53:28.377359 Food Hotel 470.080995
83 2021-03-18 09:53:28.377359 Books Flipcart 451.584407
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
In [118]:
wallet[wallet.category == 'Books']
Out[118]:
date category description debit
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
6 2021-04-16 11:53:28.377359 Books Amazon Kindle 270.322595
19 2021-05-16 10:53:28.377359 Books Flipcart 109.325909
24 2021-05-31 11:53:28.377359 Books Amazon 498.100496
26 2020-08-26 15:53:28.377359 Books Amazon Kindle 138.806578
31 2020-09-25 11:53:28.377359 Books Flipcart 246.503527
47 2020-10-28 10:53:28.377359 Books Flipcart 310.408610
55 2021-01-21 19:53:28.377359 Books Flipcart 423.749708
62 2021-01-27 19:53:28.377359 Books Amazon Kindle 497.770860
72 2021-06-30 18:53:28.377359 Books Amazon 294.662869
77 2021-01-18 14:53:28.377359 Books Amazon Kindle 482.152343
83 2021-03-18 09:53:28.377359 Books Flipcart 451.584407
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
In [119]:
def expenses_for_category(walletdata, category):
    return walletdata[walletdata.category==category]
In [120]:
expenses_for_category(wallet, "Music")
Out[120]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
49 2021-03-16 09:53:28.377359 Music spotify 232.303402
54 2020-11-16 10:53:28.377359 Music spotify 160.817543
65 2020-10-23 18:53:28.377359 Music Netflix 188.748743
67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
76 2020-11-17 09:53:28.377359 Music Netflix 197.534600
78 2020-09-09 16:53:28.377359 Music spotify 415.372894
79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
82 2020-09-22 09:53:28.377359 Music spotify 411.142701
84 2020-09-21 10:53:28.377359 Music Netflix 158.793646
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [122]:
print(url)
https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv
In [121]:
wallet = pd.read_csv(url, usecols=['date','category','description','debit'])
In [123]:
music = expenses_for_category(wallet, "Music")
In [124]:
music
Out[124]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
49 2021-03-16 09:53:28.377359 Music spotify 232.303402
54 2020-11-16 10:53:28.377359 Music spotify 160.817543
65 2020-10-23 18:53:28.377359 Music Netflix 188.748743
67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
76 2020-11-17 09:53:28.377359 Music Netflix 197.534600
78 2020-09-09 16:53:28.377359 Music spotify 415.372894
79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
82 2020-09-22 09:53:28.377359 Music spotify 411.142701
84 2020-09-21 10:53:28.377359 Music Netflix 158.793646
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [126]:
music.debit.sum()
Out[126]:
4233.45186823271
In [127]:
def total_expenses_for_category(walletdata, category):
    df_category = walletdata[walletdata.category==category]
    return df_category.debit.sum()
In [128]:
total_expenses_for_category(wallet, "Music")
Out[128]:
4233.45186823271
In [129]:
total_expenses_for_category(wallet, "Books")
Out[129]:
4929.750393283798
In [131]:
wallet[wallet.description == 'Amazon']
Out[131]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
24 2021-05-31 11:53:28.377359 Books Amazon 498.100496
37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
72 2021-06-30 18:53:28.377359 Books Amazon 294.662869
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [133]:
wallet[ (wallet.description == 'Amazon') & (wallet.category=='Music')]
Out[133]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [135]:
wallet.query("description == 'Amazon' & category=='Music'")
Out[135]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
37 2021-04-18 16:53:28.377359 Music Amazon 266.069078
85 2021-01-12 09:53:28.377359 Music Amazon 130.374908
89 2021-04-12 14:53:28.377359 Music Amazon 218.487173
90 2020-12-01 14:53:28.377359 Music Amazon 101.573276
In [136]:
wallet[stocks.value>300] # the filter that you create should have same index as dataframe
/tmp/ipykernel_19555/494991953.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  wallet[stocks.value>300] # the filter that you create should have same index as dataframe
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
Input In [136], in <cell line: 1>()
----> 1 wallet[stocks.value>300]

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/frame.py:3496, in DataFrame.__getitem__(self, key)
   3494 # Do we have a (boolean) 1d indexer?
   3495 if com.is_bool_indexer(key):
-> 3496     return self._getitem_bool_array(key)
   3498 # We are left with two options: a single key, and a collection of keys,
   3499 # We interpret tuples as collections only for non-MultiIndex
   3500 is_single_key = isinstance(key, tuple) or not is_list_like(key)

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/frame.py:3549, in DataFrame._getitem_bool_array(self, key)
   3543     raise ValueError(
   3544         f"Item wrong length {len(key)} instead of {len(self.index)}."
   3545     )
   3547 # check_bool_indexer will throw exception if Series key cannot
   3548 # be reindexed to match DataFrame rows
-> 3549 key = check_bool_indexer(self.index, key)
   3550 indexer = key.nonzero()[0]
   3551 return self._take_with_is_copy(indexer, axis=0)

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/indexing.py:2383, in check_bool_indexer(index, key)
   2381     mask = isna(result._values)
   2382     if mask.any():
-> 2383         raise IndexingError(
   2384             "Unalignable boolean Series provided as "
   2385             "indexer (index of the boolean Series and of "
   2386             "the indexed object do not match)."
   2387         )
   2388     return result.astype(bool)._values
   2389 if is_object_dtype(key):
   2390     # key might be object-dtype bool, check_array_indexer needs bool array

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
In [137]:
wallet[wallet.debit>300]
Out[137]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
7 2021-02-15 10:53:28.377359 Food Zomato 457.183104
9 2020-11-29 14:53:28.377359 Travel Auto 443.618884
10 2021-06-15 13:53:28.377359 Travel Metro 328.175421
11 2021-07-24 13:53:28.377359 Food Zomato 434.495468
12 2021-07-24 14:53:28.377359 Music Amazon 329.536003
14 2021-06-09 13:53:28.377359 Travel Taxi 485.297743
16 2021-03-05 19:53:28.377359 Utility Phone 390.316876
17 2021-04-17 18:53:28.377359 Utility Electricity 316.878675
18 2021-05-08 15:53:28.377359 Travel Auto 433.822404
20 2020-10-12 18:53:28.377359 Travel Auto 365.921808
21 2021-01-04 19:53:28.377359 Travel Metro 329.097372
22 2021-06-24 15:53:28.377359 Food Zomato 489.143483
23 2020-12-11 10:53:28.377359 Music Netflix 354.940241
24 2021-05-31 11:53:28.377359 Books Amazon 498.100496
25 2021-05-21 14:53:28.377359 Food Hotel 483.315864
28 2020-12-14 15:53:28.377359 Utility Phone 358.459933
32 2021-06-23 11:53:28.377359 Food Zomato 345.030436
33 2021-05-14 18:53:28.377359 Food Hotel 449.248030
34 2021-05-14 10:53:28.377359 Utility Phone 499.858182
35 2021-02-18 18:53:28.377359 Travel Metro 441.602143
36 2020-12-10 10:53:28.377359 Travel Auto 472.941439
38 2021-08-15 10:53:28.377359 Travel Auto 494.124399
41 2021-02-20 19:53:28.377359 Utility Phone 425.187191
43 2020-09-21 12:53:28.377359 Utility Phone 486.033933
46 2021-05-15 15:53:28.377359 Utility Electricity 359.324972
47 2020-10-28 10:53:28.377359 Books Flipcart 310.408610
48 2021-08-23 17:53:28.377359 Utility Electricity 310.058410
50 2020-12-24 11:53:28.377359 Food Zomato 463.001875
51 2020-12-22 17:53:28.377359 Food Zomato 331.227023
52 2021-03-26 09:53:28.377359 Travel Taxi 403.610070
55 2021-01-21 19:53:28.377359 Books Flipcart 423.749708
56 2021-05-19 18:53:28.377359 Utility Phone 319.342876
60 2021-01-17 11:53:28.377359 Utility Electricity 382.519510
62 2021-01-27 19:53:28.377359 Books Amazon Kindle 497.770860
63 2021-05-10 11:53:28.377359 Travel Taxi 355.989050
67 2021-07-31 14:53:28.377359 Music Netflix 324.786917
69 2020-10-10 15:53:28.377359 Utility Electricity 300.524620
74 2021-03-20 11:53:28.377359 Travel Taxi 303.055421
75 2021-03-03 12:53:28.377359 Food Hotel 425.625291
77 2021-01-18 14:53:28.377359 Books Amazon Kindle 482.152343
78 2020-09-09 16:53:28.377359 Music spotify 415.372894
79 2021-08-17 09:53:28.377359 Music Netflix 321.763416
81 2020-10-29 16:53:28.377359 Food Hotel 470.080995
82 2020-09-22 09:53:28.377359 Music spotify 411.142701
83 2021-03-18 09:53:28.377359 Books Flipcart 451.584407
87 2021-05-19 15:53:28.377359 Food Zomato 378.820641
92 2021-01-12 19:53:28.377359 Travel Taxi 356.842638
94 2021-01-04 13:53:28.377359 Utility Phone 431.185537
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
In [141]:
greater_than_475 = wallet.debit > 475 
In [142]:
greater_than_475
Out[142]:
0     False
1     False
2     False
3     False
4      True
      ...  
95    False
96    False
97    False
98    False
99    False
Name: debit, Length: 100, dtype: bool
In [143]:
wallet[greater_than_475] # all rows with True will be kept and row with False will be removed
Out[143]:
date category description debit
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
14 2021-06-09 13:53:28.377359 Travel Taxi 485.297743
22 2021-06-24 15:53:28.377359 Food Zomato 489.143483
24 2021-05-31 11:53:28.377359 Books Amazon 498.100496
25 2021-05-21 14:53:28.377359 Food Hotel 483.315864
34 2021-05-14 10:53:28.377359 Utility Phone 499.858182
38 2021-08-15 10:53:28.377359 Travel Auto 494.124399
43 2020-09-21 12:53:28.377359 Utility Phone 486.033933
62 2021-01-27 19:53:28.377359 Books Amazon Kindle 497.770860
77 2021-01-18 14:53:28.377359 Books Amazon Kindle 482.152343
In [144]:
wallet.category[greater_than_475]
Out[144]:
4       Books
14     Travel
22       Food
24      Books
25       Food
34    Utility
38     Travel
43    Utility
62      Books
77      Books
Name: category, dtype: object
filter = series_f > 454

dataframe[filter]
series1[filter] 

dataframe and series1 should have same index as in series_f
In [145]:
stocks
Out[145]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [146]:
f = stocks.value > 200
In [147]:
f
Out[147]:
APPLE     True
AT&T      True
IBM      False
NIKE     False
Name: value, dtype: bool
In [148]:
stocks[f]
Out[148]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
In [149]:
value
Out[149]:
APPLE    234.5
AT&T     221.3
IBM      125.3
NIKE     100.5
dtype: float64
In [150]:
value[f]
Out[150]:
APPLE    234.5
AT&T     221.3
dtype: float64
In [151]:
wallet[f]
/tmp/ipykernel_19555/2995545018.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  wallet[f]
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
Input In [151], in <cell line: 1>()
----> 1 wallet[f]

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/frame.py:3496, in DataFrame.__getitem__(self, key)
   3494 # Do we have a (boolean) 1d indexer?
   3495 if com.is_bool_indexer(key):
-> 3496     return self._getitem_bool_array(key)
   3498 # We are left with two options: a single key, and a collection of keys,
   3499 # We interpret tuples as collections only for non-MultiIndex
   3500 is_single_key = isinstance(key, tuple) or not is_list_like(key)

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/frame.py:3549, in DataFrame._getitem_bool_array(self, key)
   3543     raise ValueError(
   3544         f"Item wrong length {len(key)} instead of {len(self.index)}."
   3545     )
   3547 # check_bool_indexer will throw exception if Series key cannot
   3548 # be reindexed to match DataFrame rows
-> 3549 key = check_bool_indexer(self.index, key)
   3550 indexer = key.nonzero()[0]
   3551 return self._take_with_is_copy(indexer, axis=0)

File ~/usr/local/jupyter-py3.10/lib/python3.10/site-packages/pandas/core/indexing.py:2383, in check_bool_indexer(index, key)
   2381     mask = isna(result._values)
   2382     if mask.any():
-> 2383         raise IndexingError(
   2384             "Unalignable boolean Series provided as "
   2385             "indexer (index of the boolean Series and of "
   2386             "the indexed object do not match)."
   2387         )
   2388     return result.astype(bool)._values
   2389 if is_object_dtype(key):
   2390     # key might be object-dtype bool, check_array_indexer needs bool array

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
In [152]:
s1
Out[152]:
a    1
b    2
c    3
d    4
dtype: int64
In [153]:
s2
Out[153]:
c    3
b    2
a    1
dtype: int64
In [154]:
s1[s2]
Out[154]:
d    4
c    3
b    2
dtype: int64
In [155]:
x = 9
In [156]:
def check(x):
    if x:
        print(x, "True")
    else:
        print(x, "False")
In [157]:
check(9)
9 True
In [158]:
check(0)
0 False
In [159]:
check([])
[] False
In [160]:
check([1, 2, 3])
[1, 2, 3] True
In [161]:
s1
Out[161]:
a    1
b    2
c    3
d    4
dtype: int64
In [162]:
a = pd.Series([1, 2, 3, 4, 5])
b = pd.Series([1, 1, 1, 0, 1])
In [163]:
a
Out[163]:
0    1
1    2
2    3
3    4
4    5
dtype: int64
In [164]:
b
Out[164]:
0    1
1    1
2    1
3    0
4    1
dtype: int64
In [167]:
a[b] # I don't know what this is doing!
Out[167]:
1    2
1    2
1    2
0    1
1    2
dtype: int64
In [168]:
print(url)
https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv
In [169]:
import math as m
In [170]:
m.pi
Out[170]:
3.141592653589793
In [171]:
!pip install openpyxl
Requirement already satisfied: openpyxl in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (3.0.10)
Requirement already satisfied: et-xmlfile in /home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages (from openpyxl) (1.1.0)

[notice] A new release of pip available: 22.1.2 -> 22.2.2
[notice] To update, run: pip install --upgrade pip
In [172]:
pd.read_excel("https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.xlsx")
Out[172]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

Grouping¶

In [173]:
wallet
Out[173]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 4 columns

In [174]:
categorygroup = wallet.groupby('category')
In [176]:
categorygroup.sum()
Out[176]:
debit
category
Books 4929.750393
Food 8281.189173
Music 4233.451868
Travel 6052.931876
Utility 7562.267233
In [177]:
wallet.groupby('category').sum()
Out[177]:
debit
category
Books 4929.750393
Food 8281.189173
Music 4233.451868
Travel 6052.931876
Utility 7562.267233
In [178]:
wallet.groupby('category').mean()
Out[178]:
debit
category
Books 352.125028
Food 306.710710
Music 264.590742
Travel 378.308242
Utility 280.083972
In [180]:
df = pd.DataFrame({"A":list("XYZ"*10),
                  "N1":range(200, 230),
                  "N2":range(30)})
In [181]:
df
Out[181]:
A N1 N2
0 X 200 0
1 Y 201 1
2 Z 202 2
3 X 203 3
4 Y 204 4
5 Z 205 5
6 X 206 6
7 Y 207 7
8 Z 208 8
9 X 209 9
10 Y 210 10
11 Z 211 11
12 X 212 12
13 Y 213 13
14 Z 214 14
15 X 215 15
16 Y 216 16
17 Z 217 17
18 X 218 18
19 Y 219 19
20 Z 220 20
21 X 221 21
22 Y 222 22
23 Z 223 23
24 X 224 24
25 Y 225 25
26 Z 226 26
27 X 227 27
28 Y 228 28
29 Z 229 29
In [183]:
df.groupby("A")['N1'].sum()
Out[183]:
A
X    2135
Y    2145
Z    2155
Name: N1, dtype: int64
In [186]:
wallet[['date','debit']]
Out[186]:
date debit
0 2021-03-07 14:53:28.377359 421.207327
1 2020-10-08 09:53:28.377359 328.440080
2 2021-02-23 09:53:28.377359 244.679437
3 2020-11-01 14:53:28.377359 222.756318
4 2021-06-05 13:53:28.377359 494.128492
... ... ...
95 2021-07-19 13:53:28.377359 388.671213
96 2021-01-12 19:53:28.377359 467.554562
97 2021-03-25 11:53:28.377359 320.789434
98 2021-05-13 15:53:28.377359 442.096469
99 2020-10-11 16:53:28.377359 100.455501

100 rows × 2 columns

In [188]:
df.groupby("A")[['N1',"N2"]].sum() # take only specified columns
Out[188]:
N1 N2
A
X 2135 135
Y 2145 145
Z 2155 155
In [190]:
wallet.groupby('category')
Out[190]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f1edcf7b970>
In [191]:
wallet = pd.read_csv(url)
In [193]:
wallet
Out[193]:
Unnamed: 0 date category description debit
0 0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ... ...
95 95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 5 columns

In [192]:
wallet.groupby('category').sum()
Out[192]:
Unnamed: 0 debit
category
Books 604 4929.750393
Food 1376 8281.189173
Music 970 4233.451868
Travel 721 6052.931876
Utility 1279 7562.267233
In [194]:
wallet.groupby('category').sum().loc['Music']
Out[194]:
Unnamed: 0     970.000000
debit         4233.451868
Name: Music, dtype: float64
In [195]:
musicdf = wallet[wallet.category=='Music']
musicdf.groupby('category').sum()
Out[195]:
Unnamed: 0 debit
category
Music 970 4233.451868
In [196]:
wallet = wallet[["date","category","description","debit"]]
In [197]:
wallet
Out[197]:
date category description debit
0 2021-03-07 14:53:28.377359 Music Amazon 421.207327
1 2020-10-08 09:53:28.377359 Food Swiggy 328.440080
2 2021-02-23 09:53:28.377359 Books Amazon 244.679437
3 2020-11-01 14:53:28.377359 Utility Phone 222.756318
4 2021-06-05 13:53:28.377359 Books Flipcart 494.128492
... ... ... ... ...
95 2021-07-19 13:53:28.377359 Utility Phone 388.671213
96 2021-01-12 19:53:28.377359 Books Flipcart 467.554562
97 2021-03-25 11:53:28.377359 Utility Phone 320.789434
98 2021-05-13 15:53:28.377359 Travel Taxi 442.096469
99 2020-10-11 16:53:28.377359 Food Hotel 100.455501

100 rows × 4 columns

In [198]:
wallet.groupby(['category', 'description']).sum()
Out[198]:
debit
category description
Books Amazon 1037.442802
Amazon Kindle 1389.052376
Flipcart 2503.255216
Food Hotel 2752.174732
Swiggy 1936.495366
Zomato 3592.519075
Music Amazon 1467.247766
Netflix 1546.567562
spotify 1219.636541
Travel Auto 2210.428935
Metro 1216.463665
Taxi 2626.039276
Utility Electricity 2885.064355
Phone 4677.202878
In [199]:
wallet.groupby(['category', 'description']).sum().loc[('Books','Flipcart')]
Out[199]:
debit    2503.255216
Name: (Books, Flipcart), dtype: float64
In [202]:
wallet.groupby('description').max()
Out[202]:
date category debit
description
Amazon 2021-07-24 14:53:28.377359 Music 498.100496
Amazon Kindle 2021-04-16 11:53:28.377359 Books 497.770860
Auto 2021-08-15 10:53:28.377359 Travel 494.124399
Electricity 2021-08-23 17:53:28.377359 Utility 382.519510
Flipcart 2021-06-05 13:53:28.377359 Books 494.128492
Hotel 2021-08-22 17:53:28.377359 Food 483.315864
Metro 2021-08-15 17:53:28.377359 Travel 441.602143
Netflix 2021-08-17 09:53:28.377359 Music 354.940241
Phone 2021-08-17 13:53:28.377359 Utility 499.858182
Swiggy 2021-07-19 12:53:28.377359 Food 328.440080
Taxi 2021-06-09 13:53:28.377359 Travel 485.297743
Zomato 2021-08-24 17:53:28.377359 Food 489.143483
spotify 2021-03-16 09:53:28.377359 Music 415.372894
In [204]:
wallet.groupby(['category','description']).max()
Out[204]:
date debit
category description
Books Amazon 2021-06-30 18:53:28.377359 498.100496
Amazon Kindle 2021-04-16 11:53:28.377359 497.770860
Flipcart 2021-06-05 13:53:28.377359 494.128492
Food Hotel 2021-08-22 17:53:28.377359 483.315864
Swiggy 2021-07-19 12:53:28.377359 328.440080
Zomato 2021-08-24 17:53:28.377359 489.143483
Music Amazon 2021-07-24 14:53:28.377359 421.207327
Netflix 2021-08-17 09:53:28.377359 354.940241
spotify 2021-03-16 09:53:28.377359 415.372894
Travel Auto 2021-08-15 10:53:28.377359 494.124399
Metro 2021-08-15 17:53:28.377359 441.602143
Taxi 2021-06-09 13:53:28.377359 485.297743
Utility Electricity 2021-08-23 17:53:28.377359 382.519510
Phone 2021-08-17 13:53:28.377359 499.858182
In [206]:
wallet['custom_category'] = "Not_a_book"
In [209]:
f = (wallet.description == 'Amazon') | (wallet.description == 'Flipcart')
wallet.custom_category[f] = "book"
/tmp/ipykernel_19555/3913929795.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  wallet.custom_category[f] = "book"
In [211]:
wallet.groupby(['category','custom_category']).sum()
Out[211]:
debit
category custom_category
Books Not_a_book 1389.052376
book 3540.698017
Food Not_a_book 8281.189173
Music Not_a_book 2766.204103
book 1467.247766
Travel Not_a_book 6052.931876
Utility Not_a_book 7562.267233
In [212]:
wallet = wallet[["date","category","description","debit"]]
In [214]:
wallet[f].groupby(['category', 'description']).sum()
Out[214]:
debit
category description
Books Amazon 1037.442802
Flipcart 2503.255216
Music Amazon 1467.247766
In [215]:
df1 = wallet[f].groupby(['category', 'description']).sum()
In [216]:
df1
Out[216]:
debit
category description
Books Amazon 1037.442802
Flipcart 2503.255216
Music Amazon 1467.247766
In [218]:
df1.reset_index().groupby('category').sum()
Out[218]:
debit
category
Books 3540.698017
Music 1467.247766
In [219]:
stocks
Out[219]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [220]:
stocks.to_csv("stocks_dataframe.csv")
In [221]:
!cat stocks_dataframe.csv
,value,high,low,volume
APPLE,234.5,240.0,230.0,100
AT&T,221.3,225.0,220.0,200
IBM,125.3,130.0,120.0,300
NIKE,100.5,110.0,100.0,400
In [222]:
stocks
Out[222]:
value high low volume
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [223]:
stocks.index.name = "ticker"
In [224]:
stocks
Out[224]:
value high low volume
ticker
APPLE 234.5 240.0 230.0 100
AT&T 221.3 225.0 220.0 200
IBM 125.3 130.0 120.0 300
NIKE 100.5 110.0 100.0 400
In [225]:
stocks.to_csv("stocks_dataframe1.csv")
In [226]:
!cat stocks_dataframe1.csv
ticker,value,high,low,volume
APPLE,234.5,240.0,230.0,100
AT&T,221.3,225.0,220.0,200
IBM,125.3,130.0,120.0,300
NIKE,100.5,110.0,100.0,400

Combining two dataframes¶

In [227]:
labels = ["APPLE","AT&T","IBM","NIKE"]
value = pd.Series([234.5, 221.6, 125.7, 100.5], index=labels)
high = pd.Series([240.32, 222.5, 127.3, 105.0], index=labels)
low = pd.Series([233.0, 220.0, 123.0, 104.0], index=labels)
volume = pd.Series([100, 200, 50, 1000], index=labels)
stocks = pd.DataFrame({"value":value, "high":high, "low":low, "volume":volume})

stocks1 = pd.DataFrame({
        "value" : [125, 500.0, 300.4, 423.9],
        "low" : [125.0, 490.0, 299.5, 421.1],
        "high" : [130.0, 500.0, 305.0, 425.5],
        "volume" : [123, 50, 100, 80]
        },
        index = ["BELL","XEROX","FORD","TESLA"]
)
In [228]:
stocks
Out[228]:
value high low volume
APPLE 234.5 240.32 233.0 100
AT&T 221.6 222.50 220.0 200
IBM 125.7 127.30 123.0 50
NIKE 100.5 105.00 104.0 1000
In [229]:
stocks1
Out[229]:
value low high volume
BELL 125.0 125.0 130.0 123
XEROX 500.0 490.0 500.0 50
FORD 300.4 299.5 305.0 100
TESLA 423.9 421.1 425.5 80
In [230]:
pd.concat([stocks, stocks1])
Out[230]:
value high low volume
APPLE 234.5 240.32 233.0 100
AT&T 221.6 222.50 220.0 200
IBM 125.7 127.30 123.0 50
NIKE 100.5 105.00 104.0 1000
BELL 125.0 130.00 125.0 123
XEROX 500.0 500.00 490.0 50
FORD 300.4 305.00 299.5 100
TESLA 423.9 425.50 421.1 80
In [231]:
labels = ["APPLE","AT&T","IBM","NIKE"]
value = pd.Series([234.5, 221.6, 125.7, 100.5], index=labels)
high = pd.Series([240.32, 222.5, 127.3, 105.0], index=labels)
low = pd.Series([233.0, 220.0, 123.0, 104.0], index=labels)
volume = pd.Series([100, 200, 50, 1000], index=labels)
stocks = pd.DataFrame({"value":value, "high":high, "low":low, "volume":volume})

stocks1 = pd.DataFrame({
        "value" : [125, 500.0, 300.4, 423.9],
        "low" : [125.0, 490.0, 299.5, 421.1],
        "high" : [130.0, 500.0, 305.0, 425.5],
        },
        index = ["BELL","XEROX","FORD","TESLA"]
)
In [232]:
stocks
Out[232]:
value high low volume
APPLE 234.5 240.32 233.0 100
AT&T 221.6 222.50 220.0 200
IBM 125.7 127.30 123.0 50
NIKE 100.5 105.00 104.0 1000
In [233]:
stocks1
Out[233]:
value low high
BELL 125.0 125.0 130.0
XEROX 500.0 490.0 500.0
FORD 300.4 299.5 305.0
TESLA 423.9 421.1 425.5
In [234]:
pd.concat([stocks, stocks1])
Out[234]:
value high low volume
APPLE 234.5 240.32 233.0 100.0
AT&T 221.6 222.50 220.0 200.0
IBM 125.7 127.30 123.0 50.0
NIKE 100.5 105.00 104.0 1000.0
BELL 125.0 130.00 125.0 NaN
XEROX 500.0 500.00 490.0 NaN
FORD 300.4 305.00 299.5 NaN
TESLA 423.9 425.50 421.1 NaN
In [235]:
combined_data = pd.concat([stocks, stocks1])
In [236]:
combined_data
Out[236]:
value high low volume
APPLE 234.5 240.32 233.0 100.0
AT&T 221.6 222.50 220.0 200.0
IBM 125.7 127.30 123.0 50.0
NIKE 100.5 105.00 104.0 1000.0
BELL 125.0 130.00 125.0 NaN
XEROX 500.0 500.00 490.0 NaN
FORD 300.4 305.00 299.5 NaN
TESLA 423.9 425.50 421.1 NaN
In [237]:
combined_data.describe()
Out[237]:
value high low volume
count 8.000000 8.000000 8.000000 4.000000
mean 253.950000 256.952500 251.950000 337.500000
std 146.203918 145.046197 143.205168 446.047456
min 100.500000 105.000000 104.000000 50.000000
25% 125.525000 129.325000 124.500000 87.500000
50% 228.050000 231.410000 226.500000 150.000000
75% 331.275000 335.125000 329.900000 400.000000
max 500.000000 500.000000 490.000000 1000.000000
In [240]:
combined_data.isna().sum() # seeing how many missing values are there in a dataframe
Out[240]:
value     0
high      0
low       0
volume    4
dtype: int64

ANother way of combining dataframe¶

In [241]:
df1 = pd.DataFrame(
{
"a":[1,2, 3, 4, 5],
"b":[34, 56, 76, 87, 9],
"labels": ["x","y","z","m","n"]
}
)

df2 = pd.DataFrame(
{
"c":[1,2, 3, 4],
"d":[34, 56, 76, 87],
"labels":["x","y","z","m"]}
)
In [242]:
df1
Out[242]:
a b labels
0 1 34 x
1 2 56 y
2 3 76 z
3 4 87 m
4 5 9 n
In [243]:
df2
Out[243]:
c d labels
0 1 34 x
1 2 56 y
2 3 76 z
3 4 87 m
In [244]:
pd.merge(df2, df1)
Out[244]:
c d labels a b
0 1 34 x 1 34
1 2 56 y 2 56
2 3 76 z 3 76
3 4 87 m 4 87
In [245]:
pd.merge(df2, df1, on='labels')
Out[245]:
c d labels a b
0 1 34 x 1 34
1 2 56 y 2 56
2 3 76 z 3 76
3 4 87 m 4 87
In [ ]: