Aug 19-25, 2022 Vikrant Patil
All notes are available online at https://notes.pipal.in/2022/arcesium_finop_batch1/
Please accept the invitation that you have received in your email and login to
login to lab and create today's notebook module3-day3
© Pipal Academy LLP
HTTP protocol has 4 methods
if I try to search for some text in a search engine , this is how the url looks like!
import requests
alphavantageurl = "https://www.alphavantage.co/query"
API_KEY = "UKVFE0JLE0TBPDEF"
params = {
"function":"TIME_SERIES_INTRADAY",
"symbol":"AAPL",
"interval":"15min",
"apikey":API_KEY
}
resp = requests.get(alphavantageurl, params=params)
resp.status_code
200
data = resp.json()
type(data)
dict
data.keys()
dict_keys(['Meta Data', 'Time Series (15min)'])
data['Meta Data']
{'1. Information': 'Intraday (15min) open, high, low, close prices and volume',
'2. Symbol': 'AAPL',
'3. Last Refreshed': '2022-08-22 20:00:00',
'4. Interval': '15min',
'5. Output Size': 'Compact',
'6. Time Zone': 'US/Eastern'}
type(data['Time Series (15min)'])
dict
len(data['Time Series (15min)'])
100
data['Time Series (15min)'].keys()
dict_keys(['2022-08-22 20:00:00', '2022-08-22 19:45:00', '2022-08-22 19:30:00', '2022-08-22 19:15:00', '2022-08-22 19:00:00', '2022-08-22 18:45:00', '2022-08-22 18:30:00', '2022-08-22 18:15:00', '2022-08-22 18:00:00', '2022-08-22 17:45:00', '2022-08-22 17:30:00', '2022-08-22 17:15:00', '2022-08-22 17:00:00', '2022-08-22 16:45:00', '2022-08-22 16:30:00', '2022-08-22 16:15:00', '2022-08-22 16:00:00', '2022-08-22 15:45:00', '2022-08-22 15:30:00', '2022-08-22 15:15:00', '2022-08-22 15:00:00', '2022-08-22 14:45:00', '2022-08-22 14:30:00', '2022-08-22 14:15:00', '2022-08-22 14:00:00', '2022-08-22 13:45:00', '2022-08-22 13:30:00', '2022-08-22 13:15:00', '2022-08-22 13:00:00', '2022-08-22 12:45:00', '2022-08-22 12:30:00', '2022-08-22 12:15:00', '2022-08-22 12:00:00', '2022-08-22 11:45:00', '2022-08-22 11:30:00', '2022-08-22 11:15:00', '2022-08-22 11:00:00', '2022-08-22 10:45:00', '2022-08-22 10:30:00', '2022-08-22 10:15:00', '2022-08-22 10:00:00', '2022-08-22 09:45:00', '2022-08-22 09:30:00', '2022-08-22 09:15:00', '2022-08-22 09:00:00', '2022-08-22 08:45:00', '2022-08-22 08:30:00', '2022-08-22 08:15:00', '2022-08-22 08:00:00', '2022-08-22 07:45:00', '2022-08-22 07:30:00', '2022-08-22 07:15:00', '2022-08-22 07:00:00', '2022-08-22 06:45:00', '2022-08-22 06:30:00', '2022-08-22 06:15:00', '2022-08-22 06:00:00', '2022-08-22 05:45:00', '2022-08-22 05:30:00', '2022-08-22 05:15:00', '2022-08-22 05:00:00', '2022-08-22 04:45:00', '2022-08-22 04:30:00', '2022-08-22 04:15:00', '2022-08-19 20:00:00', '2022-08-19 19:45:00', '2022-08-19 19:30:00', '2022-08-19 19:15:00', '2022-08-19 19:00:00', '2022-08-19 18:45:00', '2022-08-19 18:30:00', '2022-08-19 18:15:00', '2022-08-19 18:00:00', '2022-08-19 17:45:00', '2022-08-19 17:30:00', '2022-08-19 17:15:00', '2022-08-19 17:00:00', '2022-08-19 16:45:00', '2022-08-19 16:30:00', '2022-08-19 16:15:00', '2022-08-19 16:00:00', '2022-08-19 15:45:00', '2022-08-19 15:30:00', '2022-08-19 15:15:00', '2022-08-19 15:00:00', '2022-08-19 14:45:00', '2022-08-19 14:30:00', '2022-08-19 14:15:00', '2022-08-19 14:00:00', '2022-08-19 13:45:00', '2022-08-19 13:30:00', '2022-08-19 13:15:00', '2022-08-19 13:00:00', '2022-08-19 12:45:00', '2022-08-19 12:30:00', '2022-08-19 12:15:00', '2022-08-19 12:00:00', '2022-08-19 11:45:00', '2022-08-19 11:30:00', '2022-08-19 11:15:00'])
data['Time Series (15min)']['2022-08-22 20:00:00']
{'1. open': '167.8800',
'2. high': '167.9900',
'3. low': '167.8600',
'4. close': '167.9800',
'5. volume': '20165'}
import pandas as pd
pd.DataFrame(data['Time Series (15min)'])
| 2022-08-22 20:00:00 | 2022-08-22 19:45:00 | 2022-08-22 19:30:00 | 2022-08-22 19:15:00 | 2022-08-22 19:00:00 | 2022-08-22 18:45:00 | 2022-08-22 18:30:00 | 2022-08-22 18:15:00 | 2022-08-22 18:00:00 | 2022-08-22 17:45:00 | ... | 2022-08-19 13:30:00 | 2022-08-19 13:15:00 | 2022-08-19 13:00:00 | 2022-08-19 12:45:00 | 2022-08-19 12:30:00 | 2022-08-19 12:15:00 | 2022-08-19 12:00:00 | 2022-08-19 11:45:00 | 2022-08-19 11:30:00 | 2022-08-19 11:15:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. open | 167.8800 | 167.8900 | 167.8500 | 167.7500 | 167.7600 | 167.6700 | 167.7000 | 167.7700 | 167.7300 | 167.7500 | ... | 172.4200 | 172.7400 | 172.6500 | 172.1600 | 172.1374 | 172.1600 | 172.1100 | 172.0100 | 171.8719 | 171.8350 |
| 2. high | 167.9900 | 167.9100 | 167.8800 | 167.8500 | 167.8100 | 167.7700 | 167.7000 | 167.7800 | 167.7692 | 167.7500 | ... | 172.4800 | 173.0300 | 172.9400 | 172.6700 | 172.3500 | 172.1800 | 172.4480 | 172.2500 | 172.1600 | 172.1600 |
| 3. low | 167.8600 | 167.8500 | 167.8200 | 167.7500 | 167.7500 | 167.6500 | 167.6000 | 167.6800 | 167.7300 | 167.5700 | ... | 172.0150 | 172.3000 | 172.5000 | 172.0600 | 172.0700 | 171.8600 | 172.0800 | 171.8600 | 171.7850 | 171.6400 |
| 4. close | 167.9800 | 167.8800 | 167.8600 | 167.8100 | 167.8100 | 167.7500 | 167.6700 | 167.6800 | 167.7692 | 167.7300 | ... | 172.0200 | 172.4250 | 172.7400 | 172.6600 | 172.1600 | 172.1312 | 172.1700 | 172.1100 | 172.0050 | 171.8750 |
| 5. volume | 20165 | 8805 | 7143 | 4649 | 6759 | 13170 | 5764 | 14911 | 7538 | 5932 | ... | 1242545 | 1804845 | 1834057 | 1452497 | 1160098 | 1378402 | 1701756 | 1714674 | 1805881 | 2337331 |
5 rows × 100 columns
time_series_data = pd.DataFrame(data['Time Series (15min)']).transpose()
time_series_data
| 1. open | 2. high | 3. low | 4. close | 5. volume | |
|---|---|---|---|---|---|
| 2022-08-22 20:00:00 | 167.8800 | 167.9900 | 167.8600 | 167.9800 | 20165 |
| 2022-08-22 19:45:00 | 167.8900 | 167.9100 | 167.8500 | 167.8800 | 8805 |
| 2022-08-22 19:30:00 | 167.8500 | 167.8800 | 167.8200 | 167.8600 | 7143 |
| 2022-08-22 19:15:00 | 167.7500 | 167.8500 | 167.7500 | 167.8100 | 4649 |
| 2022-08-22 19:00:00 | 167.7600 | 167.8100 | 167.7500 | 167.8100 | 6759 |
| ... | ... | ... | ... | ... | ... |
| 2022-08-19 12:15:00 | 172.1600 | 172.1800 | 171.8600 | 172.1312 | 1378402 |
| 2022-08-19 12:00:00 | 172.1100 | 172.4480 | 172.0800 | 172.1700 | 1701756 |
| 2022-08-19 11:45:00 | 172.0100 | 172.2500 | 171.8600 | 172.1100 | 1714674 |
| 2022-08-19 11:30:00 | 171.8719 | 172.1600 | 171.7850 | 172.0050 | 1805881 |
| 2022-08-19 11:15:00 | 171.8350 | 172.1600 | 171.6400 | 171.8750 | 2337331 |
100 rows × 5 columns
problem
def download_alpha(ticker, interval, filenname):
pass
download_alpha("AAPL", "15min", "aaple_15min.csv")
def download_alpha(ticker, interval, filename):
alphavantageurl = "https://www.alphavantage.co/query"
API_KEY = "UKVFE0JLE0TBPDEF"
params = {"function":"TIME_SERIES_INTRADAY",
"symbol":ticker,
"apikey":API_KEY,
"datatype": "csv",
"interval":interval}
resp = requests.get(alphavantageurl, params=params)
if resp.status_code == 200:
with open(filename, "w") as f:
f.write(resp.text)
else:
raise Exception("Data download failed") # instaed of printing , raise an exception
45 + "r"
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [17], in <cell line: 1>() ----> 1 45 + "r" TypeError: unsupported operand type(s) for +: 'int' and 'str'
download_alpha("AAPL", "15min","apple_15min.csv")
!head apple_15min.csv
timestamp,open,high,low,close,volume 2022-08-22 20:00:00,167.8800,167.9900,167.8600,167.9800,20165 2022-08-22 19:45:00,167.8900,167.9100,167.8500,167.8800,8805 2022-08-22 19:30:00,167.8500,167.8800,167.8200,167.8600,7143 2022-08-22 19:15:00,167.7500,167.8500,167.7500,167.8100,4649 2022-08-22 19:00:00,167.7600,167.8100,167.7500,167.8100,6759 2022-08-22 18:45:00,167.6700,167.7700,167.6500,167.7500,13170 2022-08-22 18:30:00,167.7000,167.7000,167.6000,167.6700,5764 2022-08-22 18:15:00,167.7700,167.7800,167.6800,167.6800,14911 2022-08-22 18:00:00,167.7300,167.7692,167.7300,167.7692,7538
%%file download_data.py
import requests
import typer
def download_alpha(ticker:str, interval:str, filename:str):
alphavantageurl = "https://www.alphavantage.co/query"
API_KEY = "UKVFE0JLE0TBPDEF"
params = {"function":"TIME_SERIES_INTRADAY",
"symbol":ticker,
"apikey":API_KEY,
"datatype": "csv",
"interval":interval}
resp = requests.get(alphavantageurl, params=params)
if resp.status_code == 200:
with open(filename, "w") as f:
f.write(resp.text)
else:
raise Exception("Data download failed") # instaed of printing , raise an exception
if __name__ == "__main__":
typer.run(download_alpha)
Writing download_data.py
!python download_data.py --help
/home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
Usage: download_data.py [OPTIONS] TICKER INTERVAL FILENAME
Arguments:
TICKER [required]
INTERVAL [required]
FILENAME [required]
Options:
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
!python download_data.py "IBM" "5min" "ibm_5min.csv"
/home/vikrant/usr/local/jupyter-py3.10/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
!head ibm_5min.csv
timestamp,open,high,low,close,volume 2022-08-22 20:00:00,135.7200,135.7200,135.7200,135.7200,159 2022-08-22 19:55:00,135.7300,135.7300,135.7300,135.7300,100 2022-08-22 18:15:00,135.9500,136.0500,135.9400,136.0500,925 2022-08-22 17:55:00,135.8600,135.9000,135.8600,135.9000,1101 2022-08-22 17:35:00,135.8000,135.8000,135.8000,135.8000,500 2022-08-22 16:40:00,135.5570,135.5570,135.5500,135.5500,662 2022-08-22 16:30:00,135.5500,135.5500,135.5500,135.5500,245 2022-08-22 16:25:00,135.7500,135.7500,135.7500,135.7500,101 2022-08-22 16:20:00,135.5500,135.5500,135.5500,135.5500,11368
def download_big(url, filename, chunksize=1024):
resp = requests.get(url)
with open(filename, "wb") as f:
for chunk in resp.iter_content(chunk_size=chunksize):
f.write(chunk)
print(".", end="")
excelurl = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.xlsx"
download_big(excelurl, "excel_data.xlsx")
...........
user = "vikipedia"
pass_ = open("/tmp/pass.txt").read().strip()
resp = requests.get("http://api.github.com/user", auth=(user, pass_)) # simple user/password authentication
resp.status_code
401
For your case
pip install requests requests-kerberos
kerberos_auth = HTTPKerberosAuth(mutual_authetication="OPTIONAL")
response = requests.get(request_url, auth=kerberos_auth, params=params)
response.json()
url = "http://www.thehindu.com"
resp = requests.get(url, params={"service":"rss"})
resp.status_code
200
xmltext = resp.text
print(xmltext[:1000])
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title> The Hindu - Home </title>
<link> https://www.thehindu.com/ </link>
<description> RSS Feed </description>
<language>en-us</language>
<copyright>Copyright 2022 The Hindu</copyright>
<item>
<title>
<![CDATA[Rupee falls 4 paise to 79.88 against U.S. dollar in early trade ]]>
</title>
<author>
<![CDATA[PTI ]]>
</author>
<category>
<![CDATA[Markets]]>
</category>
<link>https://www.thehindu.com/business/markets/rupee-falls-4-paise-to-7988-against-us-dollar-in-early-trade/article65800436.ece
</link>
<description>
<![CDATA[The rupee opened at 79.85 against the dollar, then fell to 79.88, registering a decline of 4 paise over the last close]]>
</description>
<pubDate>
<![CDATA[Tue, 23 Aug 2022 11:25:15 +0530]]>
</pubDate>
</item>
<item>
<title>
<![CDATA[BJP MLA T. Raja Singh arrested by Hyd
from xml.etree import ElementTree as et
root = et.fromstring(xmltext)
items = root.findall(".//item")
print(et.tostring(items[0]).decode())
<item>
<title>
Rupee falls 4 paise to 79.88 against U.S. dollar in early trade
</title>
<author>
PTI
</author>
<category>
Markets
</category>
<link>https://www.thehindu.com/business/markets/rupee-falls-4-paise-to-7988-against-us-dollar-in-early-trade/article65800436.ece
</link>
<description>
The rupee opened at 79.85 against the dollar, then fell to 79.88, registering a decline of 4 paise over the last close
</description>
<pubDate>
Tue, 23 Aug 2022 11:25:15 +0530
</pubDate>
</item>
for item in items[:5]:
print(item.findtext("title").strip())
print(item.findtext("link").strip())
print(item.findtext("author").strip())
print("="*25)
Rupee falls 4 paise to 79.88 against U.S. dollar in early trade https://www.thehindu.com/business/markets/rupee-falls-4-paise-to-7988-against-us-dollar-in-early-trade/article65800436.ece PTI ========================= BJP MLA T. Raja Singh arrested by Hyderabad police https://www.thehindu.com/news/national/telangana/bjp-mla-t-raja-singh-arrested-by-hyderabad-police/article65800422.ece B.Pradeep ========================= BSF recovers cache of arms near Indo-Pak border in Punjab https://www.thehindu.com/news/national/bsf-recovers-cache-of-arms-near-indo-pak-border-in-punjab/article65800420.ece PTI ========================= Top news developments in Karnataka on August 23, 2022 https://www.thehindu.com/news/national/karnataka/top-news-developments-in-karnataka-on-august-23-2022/article65800284.ece Karnataka Bureau ========================= Google Doodle pays tribute to Indian physicist and meteorologist Anna Mani https://www.thehindu.com/sci-tech/science/google-doodle-pays-tribute-to-indian-physicist-and-meteorologist-anna-mani/article65800385.ece The Hindu Bureau =========================
resp = requests.post("https://httpbin.org/post", data={"input1":"x","input2":"y"})
resp.json()
{'args': {},
'data': '',
'files': {},
'form': {'input1': 'x', 'input2': 'y'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Content-Length': '17',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.27.1',
'X-Amzn-Trace-Id': 'Root=1-630472f3-07d1a0306fd233143683ee3e'},
'json': None,
'origin': '152.57.196.198',
'url': 'https://httpbin.org/post'}
resp = requests.get("https://httpbin.org/get", params={"param1":"x","param2":"y"})
resp.json()
{'args': {'param1': 'x', 'param2': 'y'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.27.1',
'X-Amzn-Trace-Id': 'Root=1-63047348-688d409a2f7d0c9a4afedc60'},
'origin': '152.57.196.198',
'url': 'https://httpbin.org/get?param1=x¶m2=y'}
selenium allows you to launch browser from python program and actually click on website
pip install selenium
more detailed documentation of selenium
%%file search_python_docs.py
from selenium import webdriver
from selenium.webdriver.common.key import Keys
from selenium.webdriver.common.by import By
driver = webdriver.FireFox()
driver.get("http://www.python.org")
elem = driver.find_element(By.Name, "q")
elem.clear()
elem.send_keys("python docs")
elem.send_keys(Keys.RETURN)
driver.close()
Writing search_python_docs.py
import re
pattern = re.compile("^$") # ^ means start of line/text, $-> end of line/text
linewith_single_char = re.compile("^.$") # . meand any char!
datelikepattern = re.compile("^\d{4,4}-\d{2,2}-\d{2,2}") # \d means digit, {m,n}
lines = """line1
1
2
some dsjhdkjs kdjhfkds
2021-10-25
sadk ksjdksa
kjs"""
for line in lines.split("\n"):
if pattern.match(line):
print("found empty line")
found empty line found empty line
for line in lines.split("\n"):
if datelikepattern.match(line):
print(line)
2021-10-25
for line in lines.split("\n"):
if linewith_single_char.match(line):
print(line)
1 2
lines = """line1
1
2
some dsjhdkjs kdjhfkds
2021-10-25
sadk ksjdksa
kjs,
Total = (2323.45)
"""
search_total = re.compile("Total += *\(\d+\\.?\d*\)")
searchitem = None
for line in lines.split("\n"):
if search_total.match(line):
searchitem = line
searchitem.split("=")[1].strip().replace("(","").replace(")","")
'2323.45'
problem
!cat wallet.csv
,date,category,description,debit 0,2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991 1,2020-10-08 09:53:28.377359,Food,Swiggy,328.4400802428426 2,2021-02-23 09:53:28.377359,Books,Amazon,244.67943701511354 3,2020-11-01 14:53:28.377359,Utility,Phone,222.7563175805277 4,2021-06-05 13:53:28.377359,Books,Flipcart,494.1284923793595 5,2021-07-28 19:53:28.377359,Utility,Electricity,219.94171130968408 6,2021-04-16 11:53:28.377359,Books,Amazon Kindle,270.32259514795845 7,2021-02-15 10:53:28.377359,Food,Zomato,457.1831036346536 8,2021-08-10 19:53:28.377359,Utility,Phone,151.49637259947792 9,2020-11-29 14:53:28.377359,Travel,Auto,443.61888423247854 10,2021-06-15 13:53:28.377359,Travel,Metro,328.1754210974373 11,2021-07-24 13:53:28.377359,Food,Zomato,434.4954675355444 12,2021-07-24 14:53:28.377359,Music,Amazon,329.5360031897569 13,2021-06-06 10:53:28.377359,Utility,Phone,154.0449491816659 14,2021-06-09 13:53:28.377359,Travel,Taxi,485.2977429821982 15,2021-08-24 17:53:28.377359,Food,Zomato,262.9439932340398 16,2021-03-05 19:53:28.377359,Utility,Phone,390.31687619327926 17,2021-04-17 18:53:28.377359,Utility,Electricity,316.8786754246636 18,2021-05-08 15:53:28.377359,Travel,Auto,433.82240427779357 19,2021-05-16 10:53:28.377359,Books,Flipcart,109.32590886550067 20,2020-10-12 18:53:28.377359,Travel,Auto,365.92180825376613 21,2021-01-04 19:53:28.377359,Travel,Metro,329.09737150258513 22,2021-06-24 15:53:28.377359,Food,Zomato,489.1434830522253 23,2020-12-11 10:53:28.377359,Music,Netflix,354.94024099198157 24,2021-05-31 11:53:28.377359,Books,Amazon,498.10049550461065 25,2021-05-21 14:53:28.377359,Food,Hotel,483.315863517772 26,2020-08-26 15:53:28.377359,Books,Amazon Kindle,138.806577801854 27,2021-05-01 15:53:28.377359,Utility,Electricity,103.68079074846585 28,2020-12-14 15:53:28.377359,Utility,Phone,358.4599327957656 29,2021-06-20 10:53:28.377359,Utility,Electricity,184.5577284049955 30,2020-09-15 18:53:28.377359,Food,Swiggy,203.5292397894327 31,2020-09-25 11:53:28.377359,Books,Flipcart,246.50352738452796 32,2021-06-23 11:53:28.377359,Food,Zomato,345.03043608141513 33,2021-05-14 18:53:28.377359,Food,Hotel,449.24802955761743 34,2021-05-14 10:53:28.377359,Utility,Phone,499.8581815222449 35,2021-02-18 18:53:28.377359,Travel,Metro,441.6021430011205 36,2020-12-10 10:53:28.377359,Travel,Auto,472.94143917262176 37,2021-04-18 16:53:28.377359,Music,Amazon,266.0690783774673 38,2021-08-15 10:53:28.377359,Travel,Auto,494.1243994056571 39,2021-05-17 17:53:28.377359,Food,Swiggy,112.33316019807455 40,2021-07-19 12:53:28.377359,Food,Swiggy,291.54598801930536 41,2021-02-20 19:53:28.377359,Utility,Phone,425.18719068071806 42,2021-08-22 17:53:28.377359,Food,Hotel,210.25626950078572 43,2020-09-21 12:53:28.377359,Utility,Phone,486.03393276160733 44,2020-12-26 19:53:28.377359,Utility,Electricity,257.92759337085425 45,2021-05-27 16:53:28.377359,Utility,Electricity,154.74287259516655 46,2021-05-15 15:53:28.377359,Utility,Electricity,359.3249716537848 47,2020-10-28 10:53:28.377359,Books,Flipcart,310.408610004679 48,2021-08-23 17:53:28.377359,Utility,Electricity,310.05840961423314 49,2021-03-16 09:53:28.377359,Music,spotify,232.30340219121138 50,2020-12-24 11:53:28.377359,Food,Zomato,463.00187492635547 51,2020-12-22 17:53:28.377359,Food,Zomato,331.22702332837093 52,2021-03-26 09:53:28.377359,Travel,Taxi,403.6100701341934 53,2021-01-27 09:53:28.377359,Utility,Electricity,183.1866624101276 54,2020-11-16 10:53:28.377359,Music,spotify,160.81754340768396 55,2021-01-21 19:53:28.377359,Books,Flipcart,423.74970808720553 56,2021-05-19 18:53:28.377359,Utility,Phone,319.3428762684619 57,2021-07-15 15:53:28.377359,Utility,Phone,279.6090437716363 58,2021-05-20 10:53:28.377359,Food,Hotel,255.8710346734312 59,2020-08-28 11:53:28.377359,Food,Swiggy,208.2329120852039 60,2021-01-17 11:53:28.377359,Utility,Electricity,382.5195101154448 61,2021-02-25 13:53:28.377359,Food,Hotel,124.65827844174062 62,2021-01-27 19:53:28.377359,Books,Amazon Kindle,497.7708601564023 63,2021-05-10 11:53:28.377359,Travel,Taxi,355.9890502253258 64,2021-01-31 14:53:28.377359,Food,Zomato,232.2223798622789 65,2020-10-23 18:53:28.377359,Music,Netflix,188.7487426895118 66,2020-10-09 16:53:28.377359,Food,Swiggy,263.9577700340145 67,2021-07-31 14:53:28.377359,Music,Netflix,324.786916846731 68,2020-08-26 09:53:28.377359,Travel,Taxi,279.1478844739421 69,2020-10-10 15:53:28.377359,Utility,Electricity,300.52462041935115 70,2021-08-17 13:53:28.377359,Utility,Phone,125.22977317126336 71,2021-03-30 12:53:28.377359,Food,Swiggy,245.36050838040904 72,2021-06-30 18:53:28.377359,Books,Amazon,294.66286899004876 73,2021-08-15 17:53:28.377359,Travel,Metro,117.58872931045573 74,2021-03-20 11:53:28.377359,Travel,Taxi,303.05542098520453 75,2021-03-03 12:53:28.377359,Food,Hotel,425.6252909948148 76,2020-11-17 09:53:28.377359,Music,Netflix,197.5346000167895 77,2021-01-18 14:53:28.377359,Books,Amazon Kindle,482.1523430204321 78,2020-09-09 16:53:28.377359,Music,spotify,415.3728938035302 79,2021-08-17 09:53:28.377359,Music,Netflix,321.7634156544651 80,2021-02-17 09:53:28.377359,Food,Swiggy,283.09570727160764 81,2020-10-29 16:53:28.377359,Food,Hotel,470.08099539923614 82,2020-09-22 09:53:28.377359,Music,spotify,411.14270120842224 83,2021-03-18 09:53:28.377359,Books,Flipcart,451.5844070294999 84,2020-09-21 10:53:28.377359,Music,Netflix,158.7936457269333 85,2021-01-12 09:53:28.377359,Music,Amazon,130.37490757527 86,2021-05-07 16:53:28.377359,Food,Zomato,198.450671792638 87,2021-05-19 15:53:28.377359,Food,Zomato,378.82064134052473 88,2021-04-18 09:53:28.377359,Utility,Phone,124.2212478444578 89,2021-04-12 14:53:28.377359,Music,Amazon,218.487173429263 90,2020-12-01 14:53:28.377359,Music,Amazon,101.57327588889417 91,2021-01-22 17:53:28.377359,Food,Hotel,232.66346838787223 92,2021-01-12 19:53:28.377359,Travel,Taxi,356.8426379886326 93,2021-01-11 09:53:28.377359,Utility,Electricity,111.72080867898062 94,2021-01-04 13:53:28.377359,Utility,Phone,431.1855366816298 95,2021-07-19 13:53:28.377359,Utility,Phone,388.6712132388421 96,2021-01-12 19:53:28.377359,Books,Flipcart,467.5545618966052 97,2021-03-25 11:53:28.377359,Utility,Phone,320.78943360123816 98,2021-05-13 15:53:28.377359,Travel,Taxi,442.0964693975505 99,2020-10-11 16:53:28.377359,Food,Hotel,100.45550129902665
csvurl = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
download_big(csvurl, "wallet.csv")
.......
def lines_between2_3_pm(filename):
datepattern = re.compile("\d{1,2},\d{4,4}-\d{2,2}-\d{2,2} 14:.+")
with open(filename) as f:
for line in f:
if datepattern.match(line.strip()):
print(line, end="")
lines_between2_3_pm("wallet.csv")
0,2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991 3,2020-11-01 14:53:28.377359,Utility,Phone,222.7563175805277 9,2020-11-29 14:53:28.377359,Travel,Auto,443.61888423247854 12,2021-07-24 14:53:28.377359,Music,Amazon,329.5360031897569 25,2021-05-21 14:53:28.377359,Food,Hotel,483.315863517772 64,2021-01-31 14:53:28.377359,Food,Zomato,232.2223798622789 67,2021-07-31 14:53:28.377359,Music,Netflix,324.786916846731 77,2021-01-18 14:53:28.377359,Books,Amazon Kindle,482.1523430204321 89,2021-04-12 14:53:28.377359,Music,Amazon,218.487173429263 90,2020-12-01 14:53:28.377359,Music,Amazon,101.57327588889417
l = "0,2021-03-07 14:53:28.377359,Music,Amazon,421.2073272347991"
datepattern = re.compile("\d{1,2},\d{4,4}-\d{2,2}-\d{2,2} 14:.+")
datepattern.match(l)
<re.Match object; span=(0, 59), match='0,2021-03-07 14:53:28.377359,Music,Amazon,421.207>
More help on regular expression can be found here
"\d{4,5}" # an integer with 4 or 5 digits
'\\d{4,5}'
"3\d{3,3}" # any four digit number that start with 3
'3\\d{3,3}'
"\d\d\d\d" # 4 digits!
'\\d\\d\\d\\d'
^ - start of line
$ - end of line
\d - digits
\s - white space
. - any char
+ - 1 or more of whatever is previous to this
? - 0 or 1 of whatever is previous to this
* - 0 or many occurences of whatever is previous to this
{m,n} - min m time and max n times of whatever is previous to this