Oct 11-15, 2021 Vikrant Patil
These notes are available online at https://notes.pipal.in/2021/arcesium_finop_batch2/
© Pipal Academy LLP
Day 1 | Day 2 | Day 3 | Day 4 | Day 5
We will be using jupyter hub from https://lab2.pipal.in for this training.
create a notebook with name module3-day1
import pandas # if this does not work , install pandas using following command
!python3 -m pip install pandas
Requirement already satisfied: pandas in /home/vikrant/anaconda3/lib/python3.8/site-packages (1.1.3) Requirement already satisfied: pytz>=2017.2 in /home/vikrant/anaconda3/lib/python3.8/site-packages (from pandas) (2020.1) Requirement already satisfied: numpy>=1.15.4 in /home/vikrant/anaconda3/lib/python3.8/site-packages (from pandas) (1.19.2) Requirement already satisfied: python-dateutil>=2.7.3 in /home/vikrant/anaconda3/lib/python3.8/site-packages (from pandas) (2.8.1) Requirement already satisfied: six>=1.5 in /home/vikrant/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
# restart the kernel to get the newly installed library into this interpreter
import pandas as pd
%%file download.py
import sys
import requests
def download(url, filename):
resp = requests.get(url)
with open(filename, "w") as f:
f.write(resp.text)
if __name__ == "__main__":
url = sys.argv[1]
filename = sys.argv[2]
download(url, filename)
Writing download.py
!python3 download.py https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv wallet.csv
!python3 head.py wallet.csv
,date,category,description,debit 0,2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991 1,2020-10-08 09:53:28.377359,Food,Swiggy,328.4400802428426 2,2021-02-23 09:53:28.377359,Books,Amazon,244.67943701511354 3,2020-11-01 14:53:28.377359,Utility,Phone,222.7563175805277
wallet = pd.read_csv("wallet.csv")
wallet # dataframe
| Unnamed: 0 | date | category | description | debit | |
|---|---|---|---|---|---|
| 0 | 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... | ... |
| 95 | 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 5 columns
url = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
pd.read_csv(url) # you can also read data directly from website
| Unnamed: 0 | date | category | description | debit | |
|---|---|---|---|---|---|
| 0 | 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... | ... |
| 95 | 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 5 columns
excelurl = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.xlsx"
pd.read_excel(excelurl)
| Unnamed: 0 | date | category | description | debit | |
|---|---|---|---|---|---|
| 0 | 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... | ... |
| 95 | 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 5 columns
!python3 -m pip install openpyxl lxml # after this restart the kernel
Requirement already satisfied: openpyxl in /home/vikrant/anaconda3/lib/python3.8/site-packages (3.0.5) Requirement already satisfied: lxml in /home/vikrant/anaconda3/lib/python3.8/site-packages (4.6.1) Requirement already satisfied: et-xmlfile in /home/vikrant/anaconda3/lib/python3.8/site-packages (from openpyxl) (1.0.1) Requirement already satisfied: jdcal in /home/vikrant/anaconda3/lib/python3.8/site-packages (from openpyxl) (1.4.1)
wallet = pd.read_csv("wallet.csv")
wallet
| Unnamed: 0 | date | category | description | debit | |
|---|---|---|---|---|---|
| 0 | 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... | ... |
| 95 | 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 5 columns
!python3 head.py wallet.csv
,date,category,description,debit 0,2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991 1,2020-10-08 09:53:28.377359,Food,Swiggy,328.4400802428426 2,2021-02-23 09:53:28.377359,Books,Amazon,244.67943701511354 3,2020-11-01 14:53:28.377359,Utility,Phone,222.7563175805277
pd.read_csv("wallet.csv", index_col=0)
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... |
| 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 4 columns
problems
!ls indexdata1.xlsx
indexdata1.xlsx
pd.read_excel("indexdata1.xlsx", sheet_name="prices", usecols=[1,2,3,4])
| Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | |
|---|---|---|---|---|
| 0 | IBM | Monday | 111.3430 | 1000 |
| 1 | IBM | Tuesday | 112.3232 | 3000 |
| 2 | MICROSOFT | Monday | 234.4540 | 5000 |
| 3 | MICROSOFT | Tuesday | 343.5400 | 0 |
| 4 | APPLE | Monday | 454.5000 | 10000 |
| 5 | APPLE | Tuesday | 465.5000 | 4323 |
pd.read_excel("indexdata1.xlsx", sheet_name="prices", usecols=[1,2,3,4], names=['symbol','day','value','volume'])
| symbol | day | value | volume | |
|---|---|---|---|---|
| 0 | IBM | Monday | 111.3430 | 1000 |
| 1 | IBM | Tuesday | 112.3232 | 3000 |
| 2 | MICROSOFT | Monday | 234.4540 | 5000 |
| 3 | MICROSOFT | Tuesday | 343.5400 | 0 |
| 4 | APPLE | Monday | 454.5000 | 10000 |
| 5 | APPLE | Tuesday | 465.5000 | 4323 |
import xlsxwriter
data = [['IBM', 'Monday', 111.343, 1000],
['IBM', 'Tuesday', 112.3232, 3000],
['MICROSOFT', 'Monday', 234.454, 5000],
['MICROSOFT', 'Tuesday', 343.54, 0],
['APPLE', 'Monday', 454.5, 10000],
['APPLE', 'Tuesday', 465.5, 4323]]
w = xlsxwriter.Workbook("indexdata1.xlsx")
worksheet = w.add_worksheet("prices")
for r, row in enumerate(data, start=1):
for c, entry in enumerate(row, start=1):
worksheet.write(r, c, entry)
w.close()
pd.read_excel("indexdata1.xlsx", sheet_name="prices", usecols=[1,2,3,4], names=['symbol','day','value','volume'])
| symbol | day | value | volume | |
|---|---|---|---|---|
| 0 | IBM | Monday | 111.3430 | 1000 |
| 1 | IBM | Tuesday | 112.3232 | 3000 |
| 2 | MICROSOFT | Monday | 234.4540 | 5000 |
| 3 | MICROSOFT | Tuesday | 343.5400 | 0 |
| 4 | APPLE | Monday | 454.5000 | 10000 |
| 5 | APPLE | Tuesday | 465.5000 | 4323 |
results = pd.read_html("https://www.moneycontrol.com/markets/indian-indices/")
type(results) # type will tell the type of object
list
len(results)
7
results[0]
| Stock Name | Sector | LTP | Change | %Chg | |
|---|---|---|---|---|---|
| 0 | No Data | No Data | No Data | No Data | No Data |
results[1]
| Stock Name | Sector | LTP | Change | %Chg | |
|---|---|---|---|---|---|
| 0 | No Data | No Data | No Data | No Data | No Data |
results[2]
| Stock Name | Sector | LTP | Change | %Chg | |
|---|---|---|---|---|---|
| 0 | ICICI Bank | Banks - Private Sector | 827.60 | -14.10 | -1.68 |
| 1 | Axis Bank | Banks - Private Sector | 831.80 | -13.30 | -1.57 |
| 2 | IndusInd Bank | Banks - Private Sector | 1164.40 | -13.35 | -1.13 |
| 3 | Power Grid Corp | Power - Generation & Distribution | 190.80 | -1.85 | -0.96 |
| 4 | Kotak Mahindra | Banks - Private Sector | 2134.55 | -19.95 | -0.93 |
| 5 | HUL | Personal Care | 2435.20 | -20.65 | -0.84 |
| 6 | HDFC Bank | Banks - Private Sector | 1652.00 | -5.00 | -0.30 |
| 7 | Adani Ports | Infrastructure - General | 750.10 | -1.95 | -0.26 |
| 8 | Infosys | Computers - Software | 1703.70 | -3.85 | -0.23 |
| 9 | Coal India | Mining & Minerals | 173.75 | -0.35 | -0.20 |
| 10 | Asian Paints | Paints & Varnishes | 2913.00 | -5.05 | -0.17 |
| 11 | TCS | Computers - Software | 3488.95 | -4.00 | -0.11 |
| 12 | HCL Tech | Computers - Software | 1164.15 | -1.25 | -0.11 |
| 13 | Wipro | Computers - Software | 667.30 | -0.40 | -0.06 |
| 14 | Dr Reddys Labs | Pharmaceuticals | 4673.40 | -2.75 | -0.06 |
| 15 | Bajaj Finserv | Finance - Investments | 18085.00 | -2.15 | -0.01 |
results[3]
| Stock Name | Sector | LTP | Change | %Chg | |
|---|---|---|---|---|---|
| 0 | Tata Steel | Steel - Large | 1322.55 | 29.25 | 2.26 |
| 1 | Hindalco | Aluminium | 483.75 | 10.50 | 2.22 |
| 2 | UPL | Chemicals | 709.25 | 13.95 | 2.01 |
| 3 | Bajaj Finance | Finance - Leasing & Hire Purchase | 7796.20 | 149.55 | 1.96 |
| 4 | Eicher Motors | Auto - LCVs & HCVs | 2607.60 | 45.85 | 1.79 |
| 5 | SBI Life Insura | Diversified | 1149.55 | 19.85 | 1.76 |
| 6 | Grasim | Diversified | 1740.05 | 29.40 | 1.72 |
| 7 | UltraTechCement | Cement - Major | 7270.00 | 120.20 | 1.68 |
| 8 | ONGC | Oil Drilling And Exploration | 163.85 | 2.45 | 1.52 |
| 9 | Titan Company | Miscellaneous | 2413.85 | 34.70 | 1.46 |
| 10 | JSW Steel | Steel - Large | 682.80 | 8.55 | 1.27 |
| 11 | IOC | Refineries | 131.95 | 1.50 | 1.15 |
| 12 | HDFC Life | Miscellaneous | 691.05 | 7.80 | 1.14 |
| 13 | Shree Cements | Cement - Major | 27543.75 | 268.75 | 0.99 |
| 14 | SBI | Banks - Public Sector | 511.20 | 4.70 | 0.93 |
| 15 | Reliance | Refineries | 2618.50 | 16.70 | 0.64 |
| 16 | HDFC | Finance - Housing | 2913.75 | 17.60 | 0.61 |
| 17 | Nestle | Food Processing | 18796.45 | 100.20 | 0.54 |
| 18 | ITC | Cigarettes | 234.65 | 1.25 | 0.54 |
| 19 | Cipla | Pharmaceuticals | 906.75 | 4.90 | 0.54 |
| 20 | Divis Labs | Pharmaceuticals | 5019.00 | 23.05 | 0.46 |
| 21 | TATA Cons. Prod | Plantations - Tea & Coffee | 792.25 | 3.30 | 0.42 |
| 22 | BPCL | Refineries | 433.40 | 1.70 | 0.39 |
| 23 | Bajaj Auto | Auto - 2 & 3 Wheelers | 3778.25 | 14.50 | 0.39 |
| 24 | Maruti Suzuki | Auto - Cars & Jeeps | 7288.50 | 27.90 | 0.38 |
| 25 | Sun Pharma | Pharmaceuticals | 815.00 | 1.80 | 0.22 |
| 26 | NTPC | Power - Generation & Distribution | 144.20 | 0.30 | 0.21 |
| 27 | Hero Motocorp | Auto - 2 & 3 Wheelers | 2691.10 | 4.30 | 0.16 |
| 28 | Larsen | Infrastructure - General | 1786.65 | 2.50 | 0.14 |
| 29 | Bharti Airtel | Telecommunications - Service | 692.10 | 1.00 | 0.14 |
| 30 | Britannia | Food Processing | 3634.65 | 3.60 | 0.10 |
| 31 | M&M | Auto - Cars & Jeeps | 891.00 | 0.30 | 0.03 |
!python3 -m pip install html5lib
Requirement already satisfied: html5lib in /home/vikrant/anaconda3/lib/python3.8/site-packages (1.1) Requirement already satisfied: webencodings in /home/vikrant/anaconda3/lib/python3.8/site-packages (from html5lib) (0.5.1) Requirement already satisfied: six>=1.9 in /home/vikrant/anaconda3/lib/python3.8/site-packages (from html5lib) (1.15.0)
!python3 -m pip install bs4
Collecting bs4 Downloading bs4-0.0.1.tar.gz (1.1 kB) Requirement already satisfied: beautifulsoup4 in /home/vikrant/anaconda3/lib/python3.8/site-packages (from bs4) (4.9.3) Requirement already satisfied: soupsieve>1.2; python_version >= "3.0" in /home/vikrant/anaconda3/lib/python3.8/site-packages (from beautifulsoup4->bs4) (2.0.1) Building wheels for collected packages: bs4 Building wheel for bs4 (setup.py) ... done Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=8fdc02f3b5a62ff110a2c1e7c5c7e651ca289aafa91b0674fd28a57cff755098 Stored in directory: /home/vikrant/.cache/pip/wheels/75/78/21/68b124549c9bdc94f822c02fb9aa3578a669843f9767776bca Successfully built bs4 Installing collected packages: bs4 Successfully installed bs4-0.0.1
wallet = pd.read_csv("wallet.csv", index_col=0)
wallet
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... |
| 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 4 columns
wallet['date'] # columns from dataframe can be accessed like dictionary
0 2021-03-07 14:53:28.377359
1 2020-10-08 09:53:28.377359
2 2021-02-23 09:53:28.377359
3 2020-11-01 14:53:28.377359
4 2021-06-05 13:53:28.377359
...
95 2021-07-19 13:53:28.377359
96 2021-01-12 19:53:28.377359
97 2021-03-25 11:53:28.377359
98 2021-05-13 15:53:28.377359
99 2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: object
wallet.date # or column can be accessed like an attribute only if column name does not have space in it
0 2021-03-07 14:53:28.377359
1 2020-10-08 09:53:28.377359
2 2021-02-23 09:53:28.377359
3 2020-11-01 14:53:28.377359
4 2021-06-05 13:53:28.377359
...
95 2021-07-19 13:53:28.377359
96 2021-01-12 19:53:28.377359
97 2021-03-25 11:53:28.377359
98 2021-05-13 15:53:28.377359
99 2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: object
wallet.debit # this is series
0 421.207327
1 328.440080
2 244.679437
3 222.756318
4 494.128492
...
95 388.671213
96 467.554562
97 320.789434
98 442.096469
99 100.455501
Name: debit, Length: 100, dtype: float64
s1 = pd.Series([421, 328, 123, 234])
s1
0 421 1 328 2 123 3 234 dtype: int64
s1[0]
421
s1[3]
234
[i for i in s1]
[421, 328, 123, 234]
stocks = pd.Series([421, 328, 123, 234], index=['APPLE','AT&T','IBM','NIKE'])
stocks
APPLE 421 AT&T 328 IBM 123 NIKE 234 dtype: int64
s1 # if no index is given by default it is taken as row number
0 421 1 328 2 123 3 234 dtype: int64
stocks['APPLE']
421
stocks[0]
421
labels = ['APPLE','AT&T','IBM','NIKE']
value = pd.Series([421.0, 328.4, 123.6, 234], index=labels)
high = pd.Series([425, 350, 130, 250.4], index=labels)
low = pd.Series([420, 300, 120, 230.5], index=labels)
volume = pd.Series([100, 200, 300, 400], index=labels)
stocks = pd.DataFrame({
"value":value,
"high":high,
"low": low,
"volume": volume
})
stocks
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | 234.0 | 250.4 | 230.5 | 400 |
pd.DataFrame({
"value":[421.0, 328.4, 123.6, 234],
"high": [425, 350, 130, 250.4],
"low":[420, 300, 120, 230.5],
"volume":[100, 200, 300, 400]},
index= labels)
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | 234.0 | 250.4 | 230.5 | 400 |
stocks
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | 234.0 | 250.4 | 230.5 | 400 |
pd.DataFrame({
"value":[421.0, 328.4, 123.6, 234],
"high": [425, 350, 130],
"low":[420, 300, 120, 230.5],
"volume":[100, 200, 300, 400]},
index= labels)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-34-788c00da388c> in <module> ----> 1 pd.DataFrame({ 2 "value":[421.0, 328.4, 123.6, 234], 3 "high": [425, 350, 130], 4 "low":[420, 300, 120, 230.5], 5 "volume":[100, 200, 300, 400]}, ~/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy) 466 467 elif isinstance(data, dict): --> 468 mgr = init_dict(data, index, columns, dtype=dtype) 469 elif isinstance(data, ma.MaskedArray): 470 import numpy.ma.mrecords as mrecords ~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype) 281 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays 282 ] --> 283 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) 284 285 ~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity) 91 axes = [columns, index] 92 ---> 93 return create_block_manager_from_arrays(arrays, arr_names, axes) 94 95 ~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in create_block_manager_from_arrays(arrays, names, axes) 1686 return mgr 1687 except ValueError as e: -> 1688 raise construction_error(len(arrays), arrays[0].shape, axes, e) 1689 1690 ~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in create_block_manager_from_arrays(arrays, names, axes) 1681 1682 try: -> 1683 blocks = form_blocks(arrays, names, axes) 1684 mgr = BlockManager(blocks, axes) 1685 mgr._consolidate_inplace() ~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in form_blocks(arrays, names, axes) 1751 1752 if len(items_dict["IntBlock"]): -> 1753 int_blocks = _multi_blockify(items_dict["IntBlock"]) 1754 blocks.extend(int_blocks) 1755 ~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in _multi_blockify(tuples, dtype) 1833 for dtype, tup_block in grouper: 1834 -> 1835 values, placement = _stack_arrays(list(tup_block), dtype) 1836 1837 block = make_block(values, placement=placement) ~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in _stack_arrays(tuples, dtype) 1863 stacked = np.empty(shape, dtype=dtype) 1864 for i, arr in enumerate(arrays): -> 1865 stacked[i] = _asarray_compat(arr) 1866 1867 return stacked, placement ValueError: could not broadcast input array from shape (4) into shape (3)
labels = ['APPLE','AT&T','IBM','NIKE']
value = pd.Series([421.0, 328.4, 123.6], index=['APPLE','AT&T','IBM'])
high = pd.Series([425, 350, 130, 250.4], index=labels)
low = pd.Series([420, 300, 120, 230.5], index=labels)
volume = pd.Series([100, 200, 300, 400], index=labels)
stocks_ = pd.DataFrame({
"value":value,
"high":high,
"low": low,
"volume": volume
})
stocks_ # for NIKE.. value column has NaN ..
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | NaN | 250.4 | 230.5 | 400 |
%%file empty_data.csv
symbol,value_,volume,some column
IBM,233,100,
APPLE,,100,
AT&T,677,,
GOODTOHAVE,,,
Overwriting empty_data.csv
df = pd.read_csv("empty_data.csv")
df
| symbol | value_ | volume | some column | |
|---|---|---|---|---|
| 0 | IBM | 233.0 | 100.0 | NaN |
| 1 | APPLE | NaN | 100.0 | NaN |
| 2 | AT&T | 677.0 | NaN | NaN |
| 3 | GOODTOHAVE | NaN | NaN | NaN |
df.symbol
0 IBM 1 APPLE 2 AT&T 3 GOODTOHAVE Name: symbol, dtype: object
df.symbol
0 IBM 1 APPLE 2 AT&T 3 GOODTOHAVE Name: symbol, dtype: object
df.value_
0 233.0 1 NaN 2 677.0 3 NaN Name: value_, dtype: float64
df.volume
0 100.0 1 100.0 2 NaN 3 NaN Name: volume, dtype: float64
df.some column
File "<ipython-input-71-2f42fa2104c3>", line 1 df.some column ^ SyntaxError: invalid syntax
df['some column']
0 NaN 1 NaN 2 NaN 3 NaN Name: some column, dtype: float64
stocks_
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | NaN | 250.4 | 230.5 | 400 |
value = stocks_.value
value
APPLE 421.0 AT&T 328.4 IBM 123.6 NIKE NaN Name: value, dtype: float64
stocks_
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | NaN | 250.4 | 230.5 | 400 |
value['APPLE']
421.0
value[0]
421.0
value[-1]
nan
s1
0 421 1 328 2 123 3 234 dtype: int64
s1[-1]
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance) 354 try: --> 355 return self._range.index(new_key) 356 except ValueError as err: ValueError: -1 is not in range The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) <ipython-input-82-cd4e1c954467> in <module> ----> 1 s1[-1] ~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key) 880 881 elif key_is_scalar: --> 882 return self._get_value(key) 883 884 if is_hashable(key): ~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_value(self, label, takeable) 987 988 # Similar to Index.get_value, but we do not fall back to positional --> 989 loc = self.index.get_loc(label) 990 return self.index._get_values_for_loc(self, loc, label) 991 ~/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance) 355 return self._range.index(new_key) 356 except ValueError as err: --> 357 raise KeyError(key) from err 358 raise KeyError(key) 359 return super().get_loc(key, method=method, tolerance=tolerance) KeyError: -1
value['APPLE']
421.0
value[3]
nan
stocks_.value
APPLE 421.0 AT&T 328.4 IBM 123.6 NIKE NaN Name: value, dtype: float64
stocks_['value']
APPLE 421.0 AT&T 328.4 IBM 123.6 NIKE NaN Name: value, dtype: float64
stocks_
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | NaN | 250.4 | 230.5 | 400 |
stocks_.loc[['APPLE','AT&T'],["high","volume"]]
| high | volume | |
|---|---|---|
| APPLE | 425.0 | 100 |
| AT&T | 350.0 | 200 |
stocks_[['high','low']] # subset columns
| high | low | |
|---|---|---|
| APPLE | 425.0 | 420.0 |
| AT&T | 350.0 | 300.0 |
| IBM | 130.0 | 120.0 |
| NIKE | 250.4 | 230.5 |
stocks_.loc['APPLE'] # this will return series
value 421.0 high 425.0 low 420.0 volume 100.0 Name: APPLE, dtype: float64
stocks_.loc[['APPLE','IBM']] # this will return DataFrame
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
value = stocks_.value
value['IBM']
123.6
s1
0 421 1 328 2 123 3 234 dtype: int64
s1*5
0 2105 1 1640 2 615 3 1170 dtype: int64
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([-1, 1, -1, 1])
s1*s2
0 -1 1 2 2 -3 3 4 dtype: int64
s1
0 1 1 2 2 3 3 4 dtype: int64
s2
0 -1 1 1 2 -1 3 1 dtype: int64
s1 + s2
0 0 1 3 2 2 3 5 dtype: int64
s1.sum()
10
wallet.debit
0 421.207327
1 328.440080
2 244.679437
3 222.756318
4 494.128492
...
95 388.671213
96 467.554562
97 320.789434
98 442.096469
99 100.455501
Name: debit, Length: 100, dtype: float64
wallet.debit.sum()
31059.590543177277
wallet.debit.std()
121.17821796983658
wallet.debit.mean()
310.59590543177285
wallet.debit.quantile()
318.1107758465628
wallet
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... |
| 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 4 columns
wallet.head()
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
head10 = wallet.head(10)
head10
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| 5 | 2021-07-28 19:53:28.377359 | Utility | Electricity | 219.941711 |
| 6 | 2021-04-16 11:53:28.377359 | Books | Amazon Kindle | 270.322595 |
| 7 | 2021-02-15 10:53:28.377359 | Food | Zomato | 457.183104 |
| 8 | 2021-08-10 19:53:28.377359 | Utility | Phone | 151.496373 |
| 9 | 2020-11-29 14:53:28.377359 | Travel | Auto | 443.618884 |
head10.describe() # show basic stats for all numeric columns
| debit | |
|---|---|
| count | 10.000000 |
| mean | 325.377432 |
| std | 120.392439 |
| min | 151.496373 |
| 25% | 228.237097 |
| 50% | 299.381338 |
| 75% | 438.015995 |
| max | 494.128492 |
wallet.describe()
| debit | |
|---|---|
| count | 100.000000 |
| mean | 310.595905 |
| std | 121.178218 |
| min | 100.455501 |
| 25% | 216.429447 |
| 50% | 318.110776 |
| 75% | 424.109079 |
| max | 499.858182 |
wallet.sum()
date 2021-03-07 14:53:28.3773592020-10-08 09:53:28.... category MusicFoodBooksUtilityBooksUtilityBooksFoodUtil... description AmazonSwiggyAmazonPhoneFlipcartElectricityAmaz... debit 31059.6 dtype: object
wallet.mean()
debit 310.595905 dtype: float64
wallet.tail()
| date | category | description | debit | |
|---|---|---|---|---|
| 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
head10
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| 5 | 2021-07-28 19:53:28.377359 | Utility | Electricity | 219.941711 |
| 6 | 2021-04-16 11:53:28.377359 | Books | Amazon Kindle | 270.322595 |
| 7 | 2021-02-15 10:53:28.377359 | Food | Zomato | 457.183104 |
| 8 | 2021-08-10 19:53:28.377359 | Utility | Phone | 151.496373 |
| 9 | 2020-11-29 14:53:28.377359 | Travel | Auto | 443.618884 |
debit10 = head10.debit
debit10
0 421.207327 1 328.440080 2 244.679437 3 222.756318 4 494.128492 5 219.941711 6 270.322595 7 457.183104 8 151.496373 9 443.618884 Name: debit, dtype: float64
debit10 > 300
0 True 1 True 2 False 3 False 4 True 5 False 6 False 7 True 8 False 9 True Name: debit, dtype: bool
debit10[debit10 > 300]
0 421.207327 1 328.440080 4 494.128492 7 457.183104 9 443.618884 Name: debit, dtype: float64
stocks_[stocks_.value<300]
| value | high | low | volume | |
|---|---|---|---|---|
| IBM | 123.6 | 130.0 | 120.0 | 300 |
head10
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| 5 | 2021-07-28 19:53:28.377359 | Utility | Electricity | 219.941711 |
| 6 | 2021-04-16 11:53:28.377359 | Books | Amazon Kindle | 270.322595 |
| 7 | 2021-02-15 10:53:28.377359 | Food | Zomato | 457.183104 |
| 8 | 2021-08-10 19:53:28.377359 | Utility | Phone | 151.496373 |
| 9 | 2020-11-29 14:53:28.377359 | Travel | Auto | 443.618884 |
head10.description=="Amazon"
0 True 1 False 2 True 3 False 4 False 5 False 6 False 7 False 8 False 9 False Name: description, dtype: bool
head10[head10.description=="Amazon"]
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
wallet
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... |
| 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 4 columns
amazon = wallet[wallet.description=="Amazon"]
amazon
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 12 | 2021-07-24 14:53:28.377359 | Music | Amazon | 329.536003 |
| 24 | 2021-05-31 11:53:28.377359 | Books | Amazon | 498.100496 |
| 37 | 2021-04-18 16:53:28.377359 | Music | Amazon | 266.069078 |
| 72 | 2021-06-30 18:53:28.377359 | Books | Amazon | 294.662869 |
| 85 | 2021-01-12 09:53:28.377359 | Music | Amazon | 130.374908 |
| 89 | 2021-04-12 14:53:28.377359 | Music | Amazon | 218.487173 |
| 90 | 2020-12-01 14:53:28.377359 | Music | Amazon | 101.573276 |
amazon.sum()
date 2021-03-07 14:53:28.3773592021-02-23 09:53:28.... category MusicBooksMusicBooksMusicBooksMusicMusicMusic description AmazonAmazonAmazonAmazonAmazonAmazonAmazonAmaz... debit 2504.69 dtype: object
music = wallet[wallet.category=="Music"]
music
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 12 | 2021-07-24 14:53:28.377359 | Music | Amazon | 329.536003 |
| 23 | 2020-12-11 10:53:28.377359 | Music | Netflix | 354.940241 |
| 37 | 2021-04-18 16:53:28.377359 | Music | Amazon | 266.069078 |
| 49 | 2021-03-16 09:53:28.377359 | Music | spotify | 232.303402 |
| 54 | 2020-11-16 10:53:28.377359 | Music | spotify | 160.817543 |
| 65 | 2020-10-23 18:53:28.377359 | Music | Netflix | 188.748743 |
| 67 | 2021-07-31 14:53:28.377359 | Music | Netflix | 324.786917 |
| 76 | 2020-11-17 09:53:28.377359 | Music | Netflix | 197.534600 |
| 78 | 2020-09-09 16:53:28.377359 | Music | spotify | 415.372894 |
| 79 | 2021-08-17 09:53:28.377359 | Music | Netflix | 321.763416 |
| 82 | 2020-09-22 09:53:28.377359 | Music | spotify | 411.142701 |
| 84 | 2020-09-21 10:53:28.377359 | Music | Netflix | 158.793646 |
| 85 | 2021-01-12 09:53:28.377359 | Music | Amazon | 130.374908 |
| 89 | 2021-04-12 14:53:28.377359 | Music | Amazon | 218.487173 |
| 90 | 2020-12-01 14:53:28.377359 | Music | Amazon | 101.573276 |
music.sum()
date 2021-03-07 14:53:28.3773592021-07-24 14:53:28.... category MusicMusicMusicMusicMusicMusicMusicMusicMusicM... description AmazonAmazonNetflixAmazonspotifyspotifyNetflix... debit 4233.45 dtype: object
netflix_music = music[music.description=="Netflix"]
netflix_music
| date | category | description | debit | |
|---|---|---|---|---|
| 23 | 2020-12-11 10:53:28.377359 | Music | Netflix | 354.940241 |
| 65 | 2020-10-23 18:53:28.377359 | Music | Netflix | 188.748743 |
| 67 | 2021-07-31 14:53:28.377359 | Music | Netflix | 324.786917 |
| 76 | 2020-11-17 09:53:28.377359 | Music | Netflix | 197.534600 |
| 79 | 2021-08-17 09:53:28.377359 | Music | Netflix | 321.763416 |
| 84 | 2020-09-21 10:53:28.377359 | Music | Netflix | 158.793646 |
date = wallet.date
date
0 2021-03-07 14:53:28.377359
1 2020-10-08 09:53:28.377359
2 2021-02-23 09:53:28.377359
3 2020-11-01 14:53:28.377359
4 2021-06-05 13:53:28.377359
...
95 2021-07-19 13:53:28.377359
96 2021-01-12 19:53:28.377359
97 2021-03-25 11:53:28.377359
98 2021-05-13 15:53:28.377359
99 2020-10-11 16:53:28.377359
Name: date, Length: 100, dtype: object
date.str.split() #all the string opeartions will work
0 [2021-03-07, 14:53:28.377359]
1 [2020-10-08, 09:53:28.377359]
2 [2021-02-23, 09:53:28.377359]
3 [2020-11-01, 14:53:28.377359]
4 [2021-06-05, 13:53:28.377359]
...
95 [2021-07-19, 13:53:28.377359]
96 [2021-01-12, 19:53:28.377359]
97 [2021-03-25, 11:53:28.377359]
98 [2021-05-13, 15:53:28.377359]
99 [2020-10-11, 16:53:28.377359]
Name: date, Length: 100, dtype: object
date_time = date.str.split(expand=True)
date_time
| 0 | 1 | |
|---|---|---|
| 0 | 2021-03-07 | 14:53:28.377359 |
| 1 | 2020-10-08 | 09:53:28.377359 |
| 2 | 2021-02-23 | 09:53:28.377359 |
| 3 | 2020-11-01 | 14:53:28.377359 |
| 4 | 2021-06-05 | 13:53:28.377359 |
| ... | ... | ... |
| 95 | 2021-07-19 | 13:53:28.377359 |
| 96 | 2021-01-12 | 19:53:28.377359 |
| 97 | 2021-03-25 | 11:53:28.377359 |
| 98 | 2021-05-13 | 15:53:28.377359 |
| 99 | 2020-10-11 | 16:53:28.377359 |
100 rows × 2 columns
new_date_time = date_time.rename(columns={0:"date", 1:"timestamp"})
new_date_time
| date | timestamp | |
|---|---|---|
| 0 | 2021-03-07 | 14:53:28.377359 |
| 1 | 2020-10-08 | 09:53:28.377359 |
| 2 | 2021-02-23 | 09:53:28.377359 |
| 3 | 2020-11-01 | 14:53:28.377359 |
| 4 | 2021-06-05 | 13:53:28.377359 |
| ... | ... | ... |
| 95 | 2021-07-19 | 13:53:28.377359 |
| 96 | 2021-01-12 | 19:53:28.377359 |
| 97 | 2021-03-25 | 11:53:28.377359 |
| 98 | 2021-05-13 | 15:53:28.377359 |
| 99 | 2020-10-11 | 16:53:28.377359 |
100 rows × 2 columns
date_time # it has not changed
| 0 | 1 | |
|---|---|---|
| 0 | 2021-03-07 | 14:53:28.377359 |
| 1 | 2020-10-08 | 09:53:28.377359 |
| 2 | 2021-02-23 | 09:53:28.377359 |
| 3 | 2020-11-01 | 14:53:28.377359 |
| 4 | 2021-06-05 | 13:53:28.377359 |
| ... | ... | ... |
| 95 | 2021-07-19 | 13:53:28.377359 |
| 96 | 2021-01-12 | 19:53:28.377359 |
| 97 | 2021-03-25 | 11:53:28.377359 |
| 98 | 2021-05-13 | 15:53:28.377359 |
| 99 | 2020-10-11 | 16:53:28.377359 |
100 rows × 2 columns
wallet
| date | category | description | debit | |
|---|---|---|---|---|
| 0 | 2021-03-07 14:53:28.377359 | Music | Amazon | 421.207327 |
| 1 | 2020-10-08 09:53:28.377359 | Food | Swiggy | 328.440080 |
| 2 | 2021-02-23 09:53:28.377359 | Books | Amazon | 244.679437 |
| 3 | 2020-11-01 14:53:28.377359 | Utility | Phone | 222.756318 |
| 4 | 2021-06-05 13:53:28.377359 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... | ... |
| 95 | 2021-07-19 13:53:28.377359 | Utility | Phone | 388.671213 |
| 96 | 2021-01-12 19:53:28.377359 | Books | Flipcart | 467.554562 |
| 97 | 2021-03-25 11:53:28.377359 | Utility | Phone | 320.789434 |
| 98 | 2021-05-13 15:53:28.377359 | Travel | Taxi | 442.096469 |
| 99 | 2020-10-11 16:53:28.377359 | Food | Hotel | 100.455501 |
100 rows × 4 columns
del wallet['date'] # delete old date column
wallet
| category | description | debit | |
|---|---|---|---|
| 0 | Music | Amazon | 421.207327 |
| 1 | Food | Swiggy | 328.440080 |
| 2 | Books | Amazon | 244.679437 |
| 3 | Utility | Phone | 222.756318 |
| 4 | Books | Flipcart | 494.128492 |
| ... | ... | ... | ... |
| 95 | Utility | Phone | 388.671213 |
| 96 | Books | Flipcart | 467.554562 |
| 97 | Utility | Phone | 320.789434 |
| 98 | Travel | Taxi | 442.096469 |
| 99 | Food | Hotel | 100.455501 |
100 rows × 3 columns
wallet['date'] = new_date_time.date
wallet
| category | description | debit | date | |
|---|---|---|---|---|
| 0 | Music | Amazon | 421.207327 | 2021-03-07 |
| 1 | Food | Swiggy | 328.440080 | 2020-10-08 |
| 2 | Books | Amazon | 244.679437 | 2021-02-23 |
| 3 | Utility | Phone | 222.756318 | 2020-11-01 |
| 4 | Books | Flipcart | 494.128492 | 2021-06-05 |
| ... | ... | ... | ... | ... |
| 95 | Utility | Phone | 388.671213 | 2021-07-19 |
| 96 | Books | Flipcart | 467.554562 | 2021-01-12 |
| 97 | Utility | Phone | 320.789434 | 2021-03-25 |
| 98 | Travel | Taxi | 442.096469 | 2021-05-13 |
| 99 | Food | Hotel | 100.455501 | 2020-10-11 |
100 rows × 4 columns
wallet['timestamp'] = new_date_time.timestamp
wallet
| category | description | debit | date | timestamp | |
|---|---|---|---|---|---|
| 0 | Music | Amazon | 421.207327 | 2021-03-07 | 14:53:28.377359 |
| 1 | Food | Swiggy | 328.440080 | 2020-10-08 | 09:53:28.377359 |
| 2 | Books | Amazon | 244.679437 | 2021-02-23 | 09:53:28.377359 |
| 3 | Utility | Phone | 222.756318 | 2020-11-01 | 14:53:28.377359 |
| 4 | Books | Flipcart | 494.128492 | 2021-06-05 | 13:53:28.377359 |
| ... | ... | ... | ... | ... | ... |
| 95 | Utility | Phone | 388.671213 | 2021-07-19 | 13:53:28.377359 |
| 96 | Books | Flipcart | 467.554562 | 2021-01-12 | 19:53:28.377359 |
| 97 | Utility | Phone | 320.789434 | 2021-03-25 | 11:53:28.377359 |
| 98 | Travel | Taxi | 442.096469 | 2021-05-13 | 15:53:28.377359 |
| 99 | Food | Hotel | 100.455501 | 2020-10-11 | 16:53:28.377359 |
100 rows × 5 columns
pd.to_datetime(new_date_time.date) # this will convert str date into actual datetime object
0 2021-03-07
1 2020-10-08
2 2021-02-23
3 2020-11-01
4 2021-06-05
...
95 2021-07-19
96 2021-01-12
97 2021-03-25
98 2021-05-13
99 2020-10-11
Name: date, Length: 100, dtype: datetime64[ns]
wallet['date'] = pd.to_datetime(new_date_time.date)
wallet
| category | description | debit | date | timestamp | |
|---|---|---|---|---|---|
| 0 | Music | Amazon | 421.207327 | 2021-03-07 | 14:53:28.377359 |
| 1 | Food | Swiggy | 328.440080 | 2020-10-08 | 09:53:28.377359 |
| 2 | Books | Amazon | 244.679437 | 2021-02-23 | 09:53:28.377359 |
| 3 | Utility | Phone | 222.756318 | 2020-11-01 | 14:53:28.377359 |
| 4 | Books | Flipcart | 494.128492 | 2021-06-05 | 13:53:28.377359 |
| ... | ... | ... | ... | ... | ... |
| 95 | Utility | Phone | 388.671213 | 2021-07-19 | 13:53:28.377359 |
| 96 | Books | Flipcart | 467.554562 | 2021-01-12 | 19:53:28.377359 |
| 97 | Utility | Phone | 320.789434 | 2021-03-25 | 11:53:28.377359 |
| 98 | Travel | Taxi | 442.096469 | 2021-05-13 | 15:53:28.377359 |
| 99 | Food | Hotel | 100.455501 | 2020-10-11 | 16:53:28.377359 |
100 rows × 5 columns
import datetime
date1 = datetime.datetime(2021,3, 1)
transactions_later_date1 = wallet[wallet.date > date1]
transactions_later_date1.shape
(53, 5)
import datetime
date1 = datetime.datetime(2020,12, 31)
transactions_later_date1 = wallet[wallet.date > date1]
transactions_later_date1
| category | description | debit | date | timestamp | |
|---|---|---|---|---|---|
| 0 | Music | Amazon | 421.207327 | 2021-03-07 | 14:53:28.377359 |
| 2 | Books | Amazon | 244.679437 | 2021-02-23 | 09:53:28.377359 |
| 4 | Books | Flipcart | 494.128492 | 2021-06-05 | 13:53:28.377359 |
| 5 | Utility | Electricity | 219.941711 | 2021-07-28 | 19:53:28.377359 |
| 6 | Books | Amazon Kindle | 270.322595 | 2021-04-16 | 11:53:28.377359 |
| ... | ... | ... | ... | ... | ... |
| 94 | Utility | Phone | 431.185537 | 2021-01-04 | 13:53:28.377359 |
| 95 | Utility | Phone | 388.671213 | 2021-07-19 | 13:53:28.377359 |
| 96 | Books | Flipcart | 467.554562 | 2021-01-12 | 19:53:28.377359 |
| 97 | Utility | Phone | 320.789434 | 2021-03-25 | 11:53:28.377359 |
| 98 | Travel | Taxi | 442.096469 | 2021-05-13 | 15:53:28.377359 |
72 rows × 5 columns
import datetime
date1 = datetime.datetime(2020,12, 31)
transactions_later_date1 = wallet[wallet.date <= date1]
transactions_later_date1.shape
(28, 5)
stocks1 = stocks_.copy()
stocks1
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | NaN | 250.4 | 230.5 | 400 |
stocks2 = stocks_.copy()
pd.concat([stocks1, stocks2])
| value | high | low | volume | |
|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | NaN | 250.4 | 230.5 | 400 |
| APPLE | 421.0 | 425.0 | 420.0 | 100 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 |
| IBM | 123.6 | 130.0 | 120.0 | 300 |
| NIKE | NaN | 250.4 | 230.5 | 400 |
stocks3 = stocks_.copy()
stocks3['dummy'] = range(4)
stocks3
| value | high | low | volume | dummy | |
|---|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 | 0 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 | 1 |
| IBM | 123.6 | 130.0 | 120.0 | 300 | 2 |
| NIKE | NaN | 250.4 | 230.5 | 400 | 3 |
pd.concat([stocks1, stocks2, stocks3])
| value | high | low | volume | dummy | |
|---|---|---|---|---|---|
| APPLE | 421.0 | 425.0 | 420.0 | 100 | NaN |
| AT&T | 328.4 | 350.0 | 300.0 | 200 | NaN |
| IBM | 123.6 | 130.0 | 120.0 | 300 | NaN |
| NIKE | NaN | 250.4 | 230.5 | 400 | NaN |
| APPLE | 421.0 | 425.0 | 420.0 | 100 | NaN |
| AT&T | 328.4 | 350.0 | 300.0 | 200 | NaN |
| IBM | 123.6 | 130.0 | 120.0 | 300 | NaN |
| NIKE | NaN | 250.4 | 230.5 | 400 | NaN |
| APPLE | 421.0 | 425.0 | 420.0 | 100 | 0.0 |
| AT&T | 328.4 | 350.0 | 300.0 | 200 | 1.0 |
| IBM | 123.6 | 130.0 | 120.0 | 300 | 2.0 |
| NIKE | NaN | 250.4 | 230.5 | 400 | 3.0 |
concatdf = pd.concat([stocks1, stocks2, stocks3])
concatdf.value['APPLE']
APPLE 421.0 APPLE 421.0 APPLE 421.0 Name: value, dtype: float64
this works on a column!
df1 = pd.DataFrame({
"a":[1, 2, 3, 4, 5],
"b": [43, 54, 65, 77,54],
"labels":['x','y','z','m','n']})
df1
| a | b | labels | |
|---|---|---|---|
| 0 | 1 | 43 | x |
| 1 | 2 | 54 | y |
| 2 | 3 | 65 | z |
| 3 | 4 | 77 | m |
| 4 | 5 | 54 | n |
df2 = pd.DataFrame({
"c":[11, 22, 33, 44],
"d":[112, 221, 331, 443],
"labels":['x','y','z','m']
})
df2
| c | d | labels | |
|---|---|---|---|
| 0 | 11 | 112 | x |
| 1 | 22 | 221 | y |
| 2 | 33 | 331 | z |
| 3 | 44 | 443 | m |
pd.merge(df2, df1, on='labels')
| c | d | labels | a | b | |
|---|---|---|---|---|---|
| 0 | 11 | 112 | x | 1 | 43 |
| 1 | 22 | 221 | y | 2 | 54 |
| 2 | 33 | 331 | z | 3 | 65 |
| 3 | 44 | 443 | m | 4 | 77 |
dfj1 = pd.DataFrame({
"a":[1, 2, 3, 4, 5],
"b" : [32,43, 45, 65, 76]},
index= ['x','y','z','m','n']
)
dfj1
| a | b | |
|---|---|---|
| x | 1 | 32 |
| y | 2 | 43 |
| z | 3 | 45 |
| m | 4 | 65 |
| n | 5 | 76 |
dfj2 = pd.DataFrame({
"c":[11, 21, 31, 41],
"d" : [321,431, 451, 651]},
index= ['x','y','z','m']
)
dfj2
| c | d | |
|---|---|---|
| x | 11 | 321 |
| y | 21 | 431 |
| z | 31 | 451 |
| m | 41 | 651 |
dfj1.join(dfj2)
| a | b | c | d | |
|---|---|---|---|---|
| x | 1 | 32 | 11.0 | 321.0 |
| y | 2 | 43 | 21.0 | 431.0 |
| z | 3 | 45 | 31.0 | 451.0 |
| m | 4 | 65 | 41.0 | 651.0 |
| n | 5 | 76 | NaN | NaN |
dfj2.join(dfj1)
| c | d | a | b | |
|---|---|---|---|---|
| x | 11 | 321 | 1 | 32 |
| y | 21 | 431 | 2 | 43 |
| z | 31 | 451 | 3 | 45 |
| m | 41 | 651 | 4 | 65 |
categorygorup = wallet.groupby("category") # you give column name/s
categorygorup.sum() # sum of all numeric columns for every category
| debit | |
|---|---|
| category | |
| Books | 4929.750393 |
| Food | 8281.189173 |
| Music | 4233.451868 |
| Travel | 6052.931876 |
| Utility | 7562.267233 |
categorygorup.mean()
| debit | |
|---|---|
| category | |
| Books | 352.125028 |
| Food | 306.710710 |
| Music | 264.590742 |
| Travel | 378.308242 |
| Utility | 280.083972 |
categorygorup.first() # give only first transaction in every category
| description | debit | date | timestamp | |
|---|---|---|---|---|
| category | ||||
| Books | Amazon | 244.679437 | 2021-02-23 | 09:53:28.377359 |
| Food | Swiggy | 328.440080 | 2020-10-08 | 09:53:28.377359 |
| Music | Amazon | 421.207327 | 2021-03-07 | 14:53:28.377359 |
| Travel | Auto | 443.618884 | 2020-11-29 | 14:53:28.377359 |
| Utility | Phone | 222.756318 | 2020-11-01 | 14:53:28.377359 |
categorygorup.max(numeric_only=True)
| debit | |
|---|---|
| category | |
| Books | 498.100496 |
| Food | 489.143483 |
| Music | 421.207327 |
| Travel | 494.124399 |
| Utility | 499.858182 |
categorygorup.min(numeric_only=True)
| debit | |
|---|---|
| category | |
| Books | 109.325909 |
| Food | 100.455501 |
| Music | 101.573276 |
| Travel | 117.588729 |
| Utility | 103.680791 |
mixed_group = wallet.groupby(['category','description'])
mixed_group.sum()
| debit | ||
|---|---|---|
| category | description | |
| Books | Amazon | 1037.442802 |
| Amazon Kindle | 1389.052376 | |
| Flipcart | 2503.255216 | |
| Food | Hotel | 2752.174732 |
| Swiggy | 1936.495366 | |
| Zomato | 3592.519075 | |
| Music | Amazon | 1467.247766 |
| Netflix | 1546.567562 | |
| spotify | 1219.636541 | |
| Travel | Auto | 2210.428935 |
| Metro | 1216.463665 | |
| Taxi | 2626.039276 | |
| Utility | Electricity | 2885.064355 |
| Phone | 4677.202878 |
wallet
| category | description | debit | date | timestamp | |
|---|---|---|---|---|---|
| 0 | Music | Amazon | 421.207327 | 2021-03-07 | 14:53:28.377359 |
| 1 | Food | Swiggy | 328.440080 | 2020-10-08 | 09:53:28.377359 |
| 2 | Books | Amazon | 244.679437 | 2021-02-23 | 09:53:28.377359 |
| 3 | Utility | Phone | 222.756318 | 2020-11-01 | 14:53:28.377359 |
| 4 | Books | Flipcart | 494.128492 | 2021-06-05 | 13:53:28.377359 |
| ... | ... | ... | ... | ... | ... |
| 95 | Utility | Phone | 388.671213 | 2021-07-19 | 13:53:28.377359 |
| 96 | Books | Flipcart | 467.554562 | 2021-01-12 | 19:53:28.377359 |
| 97 | Utility | Phone | 320.789434 | 2021-03-25 | 11:53:28.377359 |
| 98 | Travel | Taxi | 442.096469 | 2021-05-13 | 15:53:28.377359 |
| 99 | Food | Hotel | 100.455501 | 2020-10-11 | 16:53:28.377359 |
100 rows × 5 columns
pd.to_datetime() # it will convert series into date time...
pd.to_numeric() # if pandas by default loads any column as string ..becuase some spaces given in csv file
catgprysum = categorygorup.sum()
catgprysum
| debit | |
|---|---|
| category | |
| Books | 4929.750393 |
| Food | 8281.189173 |
| Music | 4233.451868 |
| Travel | 6052.931876 |
| Utility | 7562.267233 |
catgprysum.to_csv("sum_by_cats.csv")
!python3 cat.py sum_by_cats.csv
category,debit Books,4929.750393283797 Food,8281.189172581233 Music,4233.451868232711 Travel,6052.931876440963 Utility,7562.267232638568
Writing Excel
writer = pd.ExcelWriter("catgory_sum.xlsx", engine="xlsxwriter")
catgprysum.to_excel(writer, sheet_name="category-sum")
writer.save()
!ls catgory_sum.xlsx
catgory_sum.xlsx