Sep 13-17, 2021 Vikrant Patil
These notes are available online at https://notes.pipal.in/2021/arcesium_finop_batch1/
© Pipal Academy LLP
Day 1 | Day 2 | Day 3 | Day 4 | Day 5
We will be using jupyter hub from https://lab.pipal.in for this training.
login to hub and create a notebook with name module3-day3
Many times it happens that different projects have requirements of python packages such that they conflict each other. In such cases how do you work on two different project on same machine? If we install python packages for one project, those packages will confict with other projects. Virtual environment is there to help us. Virtual environment allows us to have set of python packages seperately for each project. Also added advantage is, it won't affect system python's packages. The way to handle this is with help of venv module we create virtual environment for each project. All requirements for the proejct are installed in the virtual environment and not in system python's packages. Let's take some examples. Conflicting Requirements
Suppose we have two projects, datascraping and analytics. For datascraping project requirements are following packages:
requests==2.24.0
openpyxl==2.4.8
and analytics project needs following packages:
pandas==1.1.2
openpyxl==3.0.5
requests==2.24.0
Now here is confiliting requirement, one project needs openpyxl verson 2.4.8 and other needs 3.0.5. creating evenv
To create virutal environment on your system, what you need is python version > 3.5. Python comes with a package called venv (virtual environment). For older python, virtualenv was seperate application. We are going to work with virtual environment that comes with python 3. Easy steps to work with it are as given below. Open up terminal on linux/mac or cmd terminal on windows. on the prompt type following command to create virtual environment with name env1:
python -m venv env1
This will create a folder with name env1 in the current directory. On linux it will have following contents:
+-env1
|
+-bin
+-include
+-lib
+-lib64
+-pyenv.cfg
on windows system it will have following contents:
+-env1
|
+-Include
+-Lib
+-Scripts
+-pyenv.cfg
To activate virtual environment on linux run following command on terminal.:
bash$ source env1/bin/activate
(env1) bash$ # you can see the env1 environment activated as change in prompt
To activate virtual environment on windows run following command on windows cmd terminal:
C:\Users\vik> env1\bin\activate.bat
(env1) C:\Users\vik>
Installing packages in virtual environment
Once the virtual environment is created and activated, we are ready to use it. To install packages in this active virtual environment use pip install:
pip install typer
Collecting typer
Using cached https://files.pythonhosted.org/packages/90/34/d138832f6945432c638f32137e6c79a3b682f06a63c488dcfaca6b166c64/typer-0.3.2-py3-none-any.whl
Collecting click<7.2.0,>=7.1.1 (from typer)
Using cached https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl
Installing collected packages: click, typer
Successfully installed click-7.1.2 typer-0.3.2
to check packages installed
pip list
Package Version
---------- -------
click 7.1.2
pip 19.2.3
setuptools 41.2.0
typer 0.3.2
requirements.txt
If we want to replicate exact same virtual environment on other machine we need list of packages that pip can understand. The format is called as requirements file. it can be generated using
pip freeze
click==7.1.2
typer==0.3.2
The output can be saved to a file with name requirements.txt. This file can be used in other virtual env to recreate the same environment. For example, lets make use of above requirements to recreate another environment with name env1copy
bash$ python -m venv env1copy
bash$ source env1copy/bin/activate
(env1copy) bash$ pip install -r requirements.txt
Collecting click==7.1.2 (from -r env1/requirements.txt (line 1))
Using cached https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl
Collecting typer==0.3.2 (from -r env1/requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/90/34/d138832f6945432c638f32137e6c79a3b682f06a63c488dcfaca6b166c64/typer-0.3.2-py3-none-any.whl
Installing collected packages: click, typer
Successfully installed click-7.1.2 typer-0.3.2
you can check the packages installed:
pip freeze
click==7.1.2
typer==0.3.2
Summary
import requests
!cat download.py
import typer
import requests
app = typer.Typer()
@app.command()
def download(url, filename):
resp = requests.get(url)
with open(filename, "w") as f:
f.write(resp.text)
if __name__ == "__main__":
app()
def download(url, filename):
resp = requests.get(url)
with open(filename, "w") as f:
f.write(resp.text)
htpp - protocol
get
here are sample searches on search engines
geturl = "https://httpbin.org/get"
params = {"key1":"value1",
"key2":"value2",
"myparam":"myvalue"}
resp = requests.get(geturl, params=params)
resp.status_code # it is 2 hundred something ..then its ok..
200
resp.json()
{'args': {'key1': 'value1', 'key2': 'value2', 'myparam': 'myvalue'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.24.0',
'X-Amzn-Trace-Id': 'Root=1-617104f8-2b35be82368b6cf424c7368b'},
'origin': '157.33.213.172',
'url': 'https://httpbin.org/get?key1=value1&key2=value2&myparam=myvalue'}
pydata = {"name": "vikrant",
"message":"Hello World!",
"nums":[1,2,3,4,5]}
pydata
{'name': 'vikrant', 'message': 'Hello World!', 'nums': [1, 2, 3, 4, 5]}
import json
jsondata = json.dumps(pydata)
jsondata
'{"name": "vikrant", "message": "Hello World!", "nums": [1, 2, 3, 4, 5]}'
json.loads(jsondata)
{'name': 'vikrant', 'message': 'Hello World!', 'nums': [1, 2, 3, 4, 5]}
"<head>Hello</head>" #this is not json data
'{"x":1,"y":[1, 2, 3, 4]}' # this is json
'{"x":1,"y":[1, 2, 3, 4]}'
data = json.loads('{"x":1,"y":[1, 2, 3, 4]}')
data
{'x': 1, 'y': [1, 2, 3, 4]}
data['x']
1
data['y']
[1, 2, 3, 4]
items = input("enetr python list")
enetr python list[1, 2, 3, 4, "hello"]
items
'[1, 2, 3, 4, "hello"]'
json.loads(items)
[1, 2, 3, 4, 'hello']
resp = requests.get("https://notes.pipal.in/2021/arcesium_finop_batch1/module3-day2.html")
print(resp.text[:100])
<!DOCTYPE html> <html> <head><meta charset="utf-8" /> <meta name="viewport" content="width=device-wi
import pandas as pd
alphavantageurl = "https://www.alphavantage.co/query"
API_KEY = "UKVFE0JLE0TBPDEF"
params = {
"function":"TIME_SERIES_INTRADAY",
"symbol":"IBM",
"interval":"30min",
"apikey":API_KEY}
resp = requests.get(alphavantageurl, params=params)
resp.status_code
200
resp.json().keys()
dict_keys(['Meta Data', 'Time Series (30min)'])
params = {
"function":"TIME_SERIES_INTRADAY",
"symbol":"IBM",
"interval":"30min",
"datatype":"csv",
"apikey":API_KEY}
resp = requests.get(alphavantageurl, params=params)
resp.status_code # make sure is it returns 200 / anything that starts with 2
200
resp.content # if the contents returned are binary..for example pdf file ..
#then write this into a file by opening it in binary mode
resp.text # if we know that data returned is text..we can access
with open("ibm30min.csv", "w") as f:
f.write(resp.text)
pd.read_csv("ibm30min.csv")
| timestamp | open | high | low | close | volume | |
|---|---|---|---|---|---|---|
| 0 | 2021-10-20 20:00:00 | 135.75 | 135.93 | 135.70 | 135.85 | 15537 |
| 1 | 2021-10-20 19:30:00 | 135.94 | 135.98 | 135.66 | 135.66 | 7833 |
| 2 | 2021-10-20 19:00:00 | 135.99 | 136.05 | 135.94 | 135.95 | 20941 |
| 3 | 2021-10-20 18:30:00 | 135.65 | 136.00 | 135.65 | 135.99 | 40075 |
| 4 | 2021-10-20 18:00:00 | 135.27 | 136.40 | 135.00 | 135.74 | 100970 |
| ... | ... | ... | ... | ... | ... | ... |
| 95 | 2021-10-15 10:00:00 | 143.39 | 144.28 | 142.79 | 144.09 | 524512 |
| 96 | 2021-10-15 09:30:00 | 143.70 | 143.83 | 143.24 | 143.24 | 1559 |
| 97 | 2021-10-15 09:00:00 | 143.90 | 143.90 | 143.70 | 143.70 | 656 |
| 98 | 2021-10-15 08:30:00 | 143.51 | 143.81 | 143.39 | 143.39 | 1225 |
| 99 | 2021-10-15 07:30:00 | 143.54 | 143.54 | 143.52 | 143.52 | 249 |
100 rows × 6 columns
simple username/password authentication
user = "username"
pass_ = open("/tmp/pass.txt").read().strip()
resp = requests.get(url, params = {}, auth=(user, pass_))
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) <ipython-input-49-960592ad30a6> in <module> 1 user = "username" ----> 2 pass_ = open("/tmp/pass.txt").read().strip() 3 4 resp = requests.get(url, params = {}, auth=(user, pass_)) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pass.txt'
posturl = "https://httpbin.org/post"
resp = requests.post(posturl, data={"input1":"x", "input2":"y"})
resp.json()
{'args': {},
'data': '',
'files': {},
'form': {'input1': 'x', 'input2': 'y'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Content-Length': '17',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.24.0',
'X-Amzn-Trace-Id': 'Root=1-61711115-290dde9323eb3d3e4693c515'},
'json': None,
'origin': '157.33.213.172',
'url': 'https://httpbin.org/post'}
url = "http://www.thehindu.com/"
resp = requests.get(url, params={"service":"rss"})
resp.status_code
200
xmltext = resp.text
print(xmltext[:1500])
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>The Hindu - Home</title>
<link>https://www.thehindu.com/</link>
<description>Default RSS Feed</description>
<language>en-us</language>
<copyright>Copyright 2021 The Hindu</copyright>
<item>
<title><![CDATA[Martyrs remembered on Police Commemoration Day ]]></title>
<author><![CDATA[Special Correspondent]]></author>
<category><![CDATA[Chennai]]></category>
<link>https://www.thehindu.com/news/cities/chennai/martyrs-remembered-on-police-commemoration-day/article37104221.ece</link>
<description><![CDATA[
Police personnel, defence staff and retired officers, led by TN’s DGP C. Sylendra Babu, paid homage to colleagues who lost their lives in the line of duty
]]></description>
<pubDate><![CDATA[Thu, 21 Oct 2021 12:20:43 +0530]]></pubDate>
</item>
<item>
<title><![CDATA[Gauri Lankesh murder | Supreme Court sets aside Karnataka High Court order quashing charge sheet against accused for KCOCA offences]]></title>
<author><![CDATA[PTI]]></author>
<category><![CDATA[Other States]]></category>
<link>https://www.thehindu.com/news/national/other-states/gauri-lankesh-murder-supreme-court-sets-aside-karnataka-high-court-order-quashing-charge-sheet-against-accused-for-kcoca-offences/art
from xml.etree import ElementTree as et
root = et.fromstring(xmltext)
items =root.findall(".//item")
items[0]
<Element 'item' at 0x7f406b1af450>
print(et.tostring(items[0]).decode())
<item>
<title>Martyrs remembered on Police Commemoration Day </title>
<author>Special Correspondent</author>
<category>Chennai</category>
<link>https://www.thehindu.com/news/cities/chennai/martyrs-remembered-on-police-commemoration-day/article37104221.ece</link>
<description>
Police personnel, defence staff and retired officers, led by TN’s DGP C. Sylendra Babu, paid homage to colleagues who lost their lives in the line of duty
</description>
<pubDate>Thu, 21 Oct 2021 12:20:43 +0530</pubDate>
</item>
for item in items[:5]:
print(item.findtext("title"))
print(item.findtext("link"))
print(item.findtext("author"))
print("="*30)
Martyrs remembered on Police Commemoration Day https://www.thehindu.com/news/cities/chennai/martyrs-remembered-on-police-commemoration-day/article37104221.ece Special Correspondent ============================== Gauri Lankesh murder | Supreme Court sets aside Karnataka High Court order quashing charge sheet against accused for KCOCA offences https://www.thehindu.com/news/national/other-states/gauri-lankesh-murder-supreme-court-sets-aside-karnataka-high-court-order-quashing-charge-sheet-against-accused-for-kcoca-offences/article37104180.ece PTI ============================== Emily Blunt joins Christopher Nolan’s ‘Oppenheimer’ https://www.thehindu.com/entertainment/movies/emily-blunt-joins-christopher-nolans-oppenheimer/article37104178.ece PTI ============================== At least 3 dead in apparent gas explosion in north China https://www.thehindu.com/news/international/gas-explosion-in-china/article37104092.ece AP ============================== IAF plane crashes at Bhind in Madhya Pradesh; pilot ejects safely https://www.thehindu.com/news/national/other-states/iaf-plane-crashes-at-bhind-in-mp-pilot-ejects-safely/article37103960.ece Special Correspondent ==============================
def get_top_five_news():
url = "http://www.thehindu.com/"
resp = requests.get(url, params={"service":"rss"})
xmltext = resp.text
root = et.fromstring(xmltext)
items =root.findall(".//item")
for item in items[:5]:
print(item.findtext("title"))
print(item.findtext("link"))
print("="*30)
get_top_five_news()
Shahid Kapoor to play a paratrooper in his next film ‘Bull’ https://www.thehindu.com/entertainment/movies/shahid-kapoor-to-play-a-paratrooper-in-his-next-film-bull/article37104279.ece ============================== India now has 'protective shield' of 100 crore vaccine doses against COVID-19 pandemic: PM Modi https://www.thehindu.com/news/national/india-now-has-protective-shield-of-100-crore-vaccine-doses-against-covid-19-pandemic-pm-modi/article37104272.ece ============================== Martyrs remembered on Police Commemoration Day https://www.thehindu.com/news/cities/chennai/martyrs-remembered-on-police-commemoration-day/article37104221.ece ============================== Gauri Lankesh murder | Supreme Court sets aside Karnataka High Court order quashing charge sheet against accused for KCOCA offences https://www.thehindu.com/news/national/other-states/gauri-lankesh-murder-supreme-court-sets-aside-karnataka-high-court-order-quashing-charge-sheet-against-accused-for-kcoca-offences/article37104180.ece ============================== Emily Blunt joins Christopher Nolan’s ‘Oppenheimer’ https://www.thehindu.com/entertainment/movies/emily-blunt-joins-christopher-nolans-oppenheimer/article37104178.ece ==============================
for line in xmltext.split("\n")[:100]:
if line.strip().startswith("<title>"):
print(line)
<title>The Hindu - Home</title>
<title><![CDATA[Martyrs remembered on Police Commemoration Day ]]></title>
<title><![CDATA[Gauri Lankesh murder | Supreme Court sets aside Karnataka High Court order quashing charge sheet against accused for KCOCA offences]]></title>
<title><![CDATA[Emily Blunt joins Christopher Nolan’s ‘Oppenheimer’]]></title>
<title><![CDATA[At least 3 dead in apparent gas explosion in north China]]></title>
<title><![CDATA[IAF plane crashes at Bhind in Madhya Pradesh; pilot ejects safely]]></title>
<title><![CDATA[100 heritage monuments to be lit in tricolour to mark 100 crore Covid vaccination feat]]></title>
<title><![CDATA[Rare coin made in Colonial New England could fetch $3,00,000]]></title>
<title><![CDATA[Retail investors flocks to YouTube for stock trading advice in S.Korea]]></title>
<title><![CDATA[PayPal in $45 bln bid for Pinterest]]></title>
<title><![CDATA[Alphabet's Wing project unveils new drone delivery model in Texas]]></title>
selenium can launch browser from python program and actually click on websites!
To start working with selenium , two things are required
pip install selenium
More detailed documentation and examples
here is sample code that can be run from activated environment
%%file search_python.py
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.python.org")
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("python docs")
elem.send_keys(Keys.RETURN)
driver.close()
Writing search_python.py