Mar 13-17, 2023 Vikrant Patil
All notes are available online at https://notes.pipal.in/2023/arcesium_finop_jan/
Please login to https://engage.pipal.in/ and launch jupyter lab
For today create a notebook with name module3-day2
notebook names are case sensitive. Make sure you give correct name
© Pipal Academy LLP
HTTP protocol
import requests
!pip install requests
Requirement already satisfied: requests in /home/vikrant/usr/local/default/lib/python3.10/site-packages (2.28.1) Requirement already satisfied: idna<4,>=2.5 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (from requests) (3.4) Requirement already satisfied: certifi>=2017.4.17 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (from requests) (2022.9.24) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (from requests) (1.26.13) Requirement already satisfied: charset-normalizer<3,>=2 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (from requests) (2.1.1) [notice] A new release of pip available: 22.3.1 -> 23.0.1 [notice] To update, run: pip install --upgrade pip
url = "https://www.python.org/"
response = requests.get(url)
response # the number shown here is reponse code
<Response [200]>
print(response.text[:600]) # We access this if data is text/html
<!doctype html>
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr"> <!--<![endif]-->
<head>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-TF35YF9CVH"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js'
with open("python.org.html", "w") as f:
f.write(response.text)
url = "https://www.python.org/search/"
params = {"q": "pandas"}
r = requests.get(url, params=params)
r
<Response [200]>
with open("python.org-search.html", "w") as f:
f.write(r.text)
We will download stocks data from alphavantage.co
url = 'https://www.alphavantage.co/query'
API_KEY = "UKVFE0JLE0TBPDEF"
params = {"function":"TIME_SERIES_INTRADAY",
"symbol":"IBM",
"interval":"5min",
"apikey":API_KEY}
r = requests.get(url, params=params)
data = r.json() # this will return data in python dictionaries/lists/basic datatypes
r
<Response [200]>
type(data)
dict
data.keys()
dict_keys(['Meta Data', 'Time Series (5min)'])
data['Meta Data']
{'1. Information': 'Intraday (5min) open, high, low, close prices and volume',
'2. Symbol': 'IBM',
'3. Last Refreshed': '2023-03-13 20:00:00',
'4. Interval': '5min',
'5. Output Size': 'Compact',
'6. Time Zone': 'US/Eastern'}
len(data['Time Series (5min)'])
100
import pandas as pd
import json
pd.read_json(json.dumps(data['Time Series (5min)'])) # this expects json string
| 2023-03-13 20:00:00 | 2023-03-13 19:30:00 | 2023-03-13 18:55:00 | 2023-03-13 18:50:00 | 2023-03-13 18:10:00 | 2023-03-13 17:30:00 | 2023-03-13 17:00:00 | 2023-03-13 16:25:00 | 2023-03-13 16:15:00 | 2023-03-13 16:05:00 | ... | 2023-03-13 09:10:00 | 2023-03-13 09:05:00 | 2023-03-13 09:00:00 | 2023-03-13 08:55:00 | 2023-03-13 08:50:00 | 2023-03-13 08:20:00 | 2023-03-13 08:15:00 | 2023-03-13 08:10:00 | 2023-03-13 08:05:00 | 2023-03-13 07:15:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. open | 125.7101 | 125.71 | 125.68 | 125.62 | 125.58 | 126.1694 | 125.61 | 125.58 | 125.23 | 125.58 | ... | 125 | 125.1 | 125.2 | 124.7971 | 125 | 125 | 125.00 | 125.4 | 125.71 | 125.71 |
| 2. high | 125.7101 | 125.71 | 125.68 | 125.62 | 125.58 | 126.1694 | 125.61 | 125.58 | 125.23 | 125.58 | ... | 125 | 125.1 | 125.2 | 124.7971 | 125 | 125 | 125.01 | 125.4 | 125.71 | 125.71 |
| 3. low | 125.7101 | 125.71 | 125.68 | 125.62 | 125.58 | 126.1694 | 125.61 | 125.58 | 125.23 | 125.58 | ... | 125 | 125.1 | 125.2 | 124.7971 | 125 | 125 | 125.00 | 125.2 | 125.50 | 125.71 |
| 4. close | 125.7101 | 125.71 | 125.68 | 125.62 | 125.58 | 126.1694 | 125.61 | 125.58 | 125.23 | 125.58 | ... | 125 | 125.1 | 125.2 | 124.7971 | 125 | 125 | 125.00 | 125.2 | 125.50 | 125.71 |
| 5. volume | 385.0000 | 100.00 | 101.00 | 352.00 | 1444.00 | 365.0000 | 496.00 | 2653.00 | 210.00 | 111201.00 | ... | 201 | 160.0 | 120.0 | 100.0000 | 1693 | 501 | 731.00 | 755.0 | 865.00 | 335.00 |
5 rows × 100 columns
Json plain text formatted data with some basic data types in it
json.dumps([1, 2, 3, 4]) # dump python data as json string
'[1, 2, 3, 4]'
json.dumps(params)
'{"function": "TIME_SERIES_INTRADAY", "symbol": "IBM", "interval": "5min", "apikey": "UKVFE0JLE0TBPDEF"}'
xjson = input("Please input list of integer")
xjson
'[1, 2, 3, 4, 5]'
json.loads(xjson)
[1, 2, 3, 4, 5]
pd.read_json(json.dumps(data['Time Series (5min)']))
| 2023-03-13 20:00:00 | 2023-03-13 19:30:00 | 2023-03-13 18:55:00 | 2023-03-13 18:50:00 | 2023-03-13 18:10:00 | 2023-03-13 17:30:00 | 2023-03-13 17:00:00 | 2023-03-13 16:25:00 | 2023-03-13 16:15:00 | 2023-03-13 16:05:00 | ... | 2023-03-13 09:10:00 | 2023-03-13 09:05:00 | 2023-03-13 09:00:00 | 2023-03-13 08:55:00 | 2023-03-13 08:50:00 | 2023-03-13 08:20:00 | 2023-03-13 08:15:00 | 2023-03-13 08:10:00 | 2023-03-13 08:05:00 | 2023-03-13 07:15:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. open | 125.7101 | 125.71 | 125.68 | 125.62 | 125.58 | 126.1694 | 125.61 | 125.58 | 125.23 | 125.58 | ... | 125 | 125.1 | 125.2 | 124.7971 | 125 | 125 | 125.00 | 125.4 | 125.71 | 125.71 |
| 2. high | 125.7101 | 125.71 | 125.68 | 125.62 | 125.58 | 126.1694 | 125.61 | 125.58 | 125.23 | 125.58 | ... | 125 | 125.1 | 125.2 | 124.7971 | 125 | 125 | 125.01 | 125.4 | 125.71 | 125.71 |
| 3. low | 125.7101 | 125.71 | 125.68 | 125.62 | 125.58 | 126.1694 | 125.61 | 125.58 | 125.23 | 125.58 | ... | 125 | 125.1 | 125.2 | 124.7971 | 125 | 125 | 125.00 | 125.2 | 125.50 | 125.71 |
| 4. close | 125.7101 | 125.71 | 125.68 | 125.62 | 125.58 | 126.1694 | 125.61 | 125.58 | 125.23 | 125.58 | ... | 125 | 125.1 | 125.2 | 124.7971 | 125 | 125 | 125.00 | 125.2 | 125.50 | 125.71 |
| 5. volume | 385.0000 | 100.00 | 101.00 | 352.00 | 1444.00 | 365.0000 | 496.00 | 2653.00 | 210.00 | 111201.00 | ... | 201 | 160.0 | 120.0 | 100.0000 | 1693 | 501 | 731.00 | 755.0 | 865.00 | 335.00 |
5 rows × 100 columns
df = pd.read_json(json.dumps(data['Time Series (5min)']))
daily_IBM = df.transpose()
daily_IBM
| 1. open | 2. high | 3. low | 4. close | 5. volume | |
|---|---|---|---|---|---|
| 2023-03-13 20:00:00 | 125.7101 | 125.7101 | 125.7101 | 125.7101 | 385.0 |
| 2023-03-13 19:30:00 | 125.7100 | 125.7100 | 125.7100 | 125.7100 | 100.0 |
| 2023-03-13 18:55:00 | 125.6800 | 125.6800 | 125.6800 | 125.6800 | 101.0 |
| 2023-03-13 18:50:00 | 125.6200 | 125.6200 | 125.6200 | 125.6200 | 352.0 |
| 2023-03-13 18:10:00 | 125.5800 | 125.5800 | 125.5800 | 125.5800 | 1444.0 |
| ... | ... | ... | ... | ... | ... |
| 2023-03-13 08:20:00 | 125.0000 | 125.0000 | 125.0000 | 125.0000 | 501.0 |
| 2023-03-13 08:15:00 | 125.0000 | 125.0100 | 125.0000 | 125.0000 | 731.0 |
| 2023-03-13 08:10:00 | 125.4000 | 125.4000 | 125.2000 | 125.2000 | 755.0 |
| 2023-03-13 08:05:00 | 125.7100 | 125.7100 | 125.5000 | 125.5000 | 865.0 |
| 2023-03-13 07:15:00 | 125.7100 | 125.7100 | 125.7100 | 125.7100 | 335.0 |
100 rows × 5 columns
r = requests.get("https://api.github.com/events")
r
<Response [200]>
githubdata = r.json() # although the method name is json, it actually returns python data
# originally site reponded with json data, but requests will convert it
# into python data
type(githubdata)
list
len(githubdata)
30
githubdata[0]
{'id': '27698359707',
'type': 'PushEvent',
'actor': {'id': 76208813,
'login': 'NaveendraKumar',
'display_login': 'NaveendraKumar',
'gravatar_id': '',
'url': 'https://api.github.com/users/NaveendraKumar',
'avatar_url': 'https://avatars.githubusercontent.com/u/76208813?'},
'repo': {'id': 613300826,
'name': 'NaveendraKumar/Qt_Git_Jenkins_Config',
'url': 'https://api.github.com/repos/NaveendraKumar/Qt_Git_Jenkins_Config'},
'payload': {'repository_id': 613300826,
'push_id': 12931866936,
'size': 1,
'distinct_size': 1,
'ref': 'refs/heads/master',
'head': '6b07253a4b03413275ffffb83de6148fb77679fa',
'before': '72e40bf859576e3279be47b0293b28c75d1eae01',
'commits': [{'sha': '6b07253a4b03413275ffffb83de6148fb77679fa',
'author': {'email': 'naveendrakumar37@gmail.com',
'name': 'NaveendraKumar'},
'message': 'New Rectangle added in project',
'distinct': True,
'url': 'https://api.github.com/repos/NaveendraKumar/Qt_Git_Jenkins_Config/commits/6b07253a4b03413275ffffb83de6148fb77679fa'}]},
'public': True,
'created_at': '2023-03-14T05:23:06Z'}
some websites neeed username and password to access data
%%file /tmp/pass.txt
GhjgGf&(3jd
Writing /tmp/pass.txt
user = "someusername"
pass_ = open("/tmp/pass.txt").read().strip()
resp = requests.get("http://api.github.com/user", auth=(user, pass_))
resp # error !
<Response [401]>
from requests_kerberos import HTTPKerberosAuth, OPTIONAL
kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL)
r = requests.get(request_url, auth=kerberos_auth)
url = 'https://www.alphavantage.co/query'
API_KEY = "UKVFE0JLE0TBPDEF"
params = {"function":"TIME_SERIES_INTRADAY",
"symbol":"IBM",
"interval":"5min",
"apikey":API_KEY}
r = requests.get(url, params=params)
data = r.json()
url = "https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=IBM&interval=5min&apikey=UKVFE0JLE0TBPDEF"
r = requests.get(url)
data = r.json()
def get_api_key():
return "UKVFE0JLE0TBPDEF"
def get_daily_time_series(ticker, interval="5min"):
url = 'https://www.alphavantage.co/query'
API_KEY = get_api_key()
params = {"function":"TIME_SERIES_INTRADAY",
"symbol":ticker,
"interval":interval,
"apikey":API_KEY}
r = requests.get(url, params=params)
data = r.json()
return pd.DataFrame(data[f'Time Series ({interval})']).transpose()
get_daily_time_series("APLE", "30min")
| 1. open | 2. high | 3. low | 4. close | 5. volume | |
|---|---|---|---|---|---|
| 2023-03-13 16:30:00 | 15.3300 | 15.3300 | 15.3300 | 15.3300 | 22111 |
| 2023-03-13 16:00:00 | 15.3400 | 15.3550 | 15.3025 | 15.3200 | 482534 |
| 2023-03-13 15:30:00 | 15.4200 | 15.4450 | 15.2800 | 15.3350 | 421490 |
| 2023-03-13 15:00:00 | 15.4500 | 15.5000 | 15.4000 | 15.4300 | 84357 |
| 2023-03-13 14:30:00 | 15.4200 | 15.4800 | 15.3600 | 15.4500 | 201028 |
| ... | ... | ... | ... | ... | ... |
| 2023-03-03 15:00:00 | 17.0800 | 17.1000 | 17.0150 | 17.0200 | 82605 |
| 2023-03-03 14:30:00 | 17.0600 | 17.0800 | 17.0400 | 17.0750 | 58178 |
| 2023-03-03 14:00:00 | 17.0400 | 17.0700 | 17.0350 | 17.0600 | 35509 |
| 2023-03-03 13:30:00 | 17.0150 | 17.0600 | 16.9800 | 17.0350 | 59608 |
| 2023-03-03 13:00:00 | 16.9850 | 17.0300 | 16.9800 | 17.0150 | 35553 |
100 rows × 5 columns
r = requests.post("http://httpbin.org/post", data={"value1":"A", "value2":"B"})
r.json()
{'args': {},
'data': '',
'files': {},
'form': {'value1': 'A', 'value2': 'B'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Content-Length': '17',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.28.1',
'X-Amzn-Trace-Id': 'Root=1-64100b61-1c10c28f6694520a5f66b50d'},
'json': None,
'origin': '157.33.224.254',
'url': 'http://httpbin.org/post'}
def download(url, filename):
resp = requests.get(url)
with open(filename, "wb") as f:
f.write(resp.content)
download("https://www.python.org/ftp/python/3.10.10/Python-3.10.10.tgz", "Python-source.tgz")
problem
download_notes which will download your training notes. it takes module name and day as parameters>>> download_notes("module1", "day1")
dowloaded .. module1-day1.html
def download_all_notes():
for m in range(1, 4):
for d in range(1, 6):
download_notes(f"module{m}",f"day{d}")
def download_notes(module, day):
filename = f"{module}-{day}.html"
url = "https://notes.pipal.in/2023/arcesium_finop_jan/{filename}"
download(url, filename)
download_notes("module1", "day5")
!ls module1-day5.html
module1-day5.html
python3 -m venv selenium
download geckodriver and extract in selenium/bin/ folder for linux/mac in selenium\Scripts .. for windows
activate the virtual env
source seleniun/bin/activate
for windows
selenium\Scripts\activate.bat
pip install selenium
%%file download_with_selenium.py
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webdriver.Firefox() # launch a browser
driver.get("http://www.python.org") # got to this url
assert "Python" in driver.title # make sure the site is loaded
time.sleep(5)
elem = driver.find_element(By.NAME, "q")
elem.clear()
elem.send_keys("pandas")
elem.send_keys(Keys.RETURN)
time.sleep(10)
assert "No results found." not in driver.page_source
driver.close()
Overwriting download_with_selenium.py