Module 3 - Day 2

Virtual Environment

Many times it happens that different projects have requirements of python packages such that they conflict each other. In such cases how do you work on two different project on same machine? If we install python packages for one project, those packages will confict with other projects. Virtual environment is there to help us. Virtual environment allows us to have set of python packages seperately for each project. Also added advantage is, it won’t affect system python’s packages. The way to handle this is with help of venv module we create virtual environment for each project. All requirements for the proejct are installed in the virtual environment and not in system python’s packages. Let’s take some examples.

Conflicting Requirements

Suppose we have two projects, datascraping and analytics. For datascraping project requirements are following packages

  requests==2.24.0
  openpyxl==2.4.8

and analytics project needs following packages::

  pandas==1.1.2
  openpyxl==3.0.5
  requests==2.24.0

Now here is confiliting requirement, one project needs openpyxl verson 2.4.8 and other needs 3.0.5.

creating venv

To create virutal environment on your system, what you need is python version > 3.5. Python comes with a package called venv (virtual environment). For older python, virtualenv was seperate application. We are going to work with virtual environment that comes with python 3. Easy steps to work with it are as given below. Open up terminal on linux/mac or cmd terminal on windows. on the prompt type following command to create virtual environment with name env1

  python -m venv env1

This will create a folder with name env1 in the current directory. On linux it will have following contents

  +-env1
    |
    +-bin
    +-include
    +-lib
    +-lib64
    +-pyenv.cfg

on windows system it will have following contents

  +-env1
    |
    +-Include
    +-Lib
    +-Scripts
    +-pyenv.cfg

To activate virtual environment on linux run following command on terminal.

  bash$ source env1/bin/activate
  (env1) bash$ # you can see the env1 environment activated as change in prompt

To activate virtual environment on windows run following command on windows cmd terminal

  C:\Users\vik> env1\bin\activate.bat
  (env1) C:\Users\vik>

Installing packages in virtual environment

Once the virtul environment is created and activated, we are ready to use it. To install packages in this active virtual environment use pip install

  pip install typer
  Collecting typer
    Using cached https://files.pythonhosted.org/packages/90/34/d138832f6945432c638f32137e6c79a3b682f06a63c488dcfaca6b166c64/typer-0.3.2-py3-none-any.whl
  Collecting click<7.2.0,>=7.1.1 (from typer)
    Using cached https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl
  Installing collected packages: click, typer
  Successfully installed click-7.1.2 typer-0.3.2

to check packages installed

  pip list
  Package    Version
  ---------- -------
  click      7.1.2
  pip        19.2.3
  setuptools 41.2.0
  typer      0.3.2

requirements.txt

If we want to replicate exact same virtual environment on other machine we need list of packages that pip can understand. The format is called as requirements file. it can be generated using::

  pip freeze
  click==7.1.2
  typer==0.3.2

The output can be saved to a file with name requirements.txt. This file can be used in other virtual env to recreate the same environment. For example, lets make use of above requirements to recreate another environment with name env1copy

  bash$ python -m venv env1copy
  bash$ source env1copy/bin/activate
  (env1copy) bash$ pip install -r requirements.txt
  Collecting click==7.1.2 (from -r env1/requirements.txt (line 1))
    Using cached https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl
  Collecting typer==0.3.2 (from -r env1/requirements.txt (line 2))
    Using cached https://files.pythonhosted.org/packages/90/34/d138832f6945432c638f32137e6c79a3b682f06a63c488dcfaca6b166c64/typer-0.3.2-py3-none-any.whl
  Installing collected packages: click, typer
  Successfully installed click-7.1.2 typer-0.3.2

you can check the packages installed

  pip freeze
  click==7.1.2
  typer==0.3.2

Download using selenium

create virtual environment for selenium

pip install selenium

also copy geckdriver into bin/scripts folder of you virtulenv

%%file lanunch_python_website.py
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox() # this will launch firefox
driver.get("https://www.python.org") # it will ask the browser to go given url
time.sleep(5)

searchbox = driver.find_element(By.NAME, "q")
searchbox.clear()
searchbox.send_keys("pandas")
searchbox.send_keys(Keys.RETURN)
time.sleep(20)
driver.close()
Writing lanunch_python_website.py

json

import json

json data is collection - lists - dicts - floats - ints - strings

person = {"name" : "vikrant",
         "place": "Maharashtra"}
person
{'name': 'vikrant', 'place': 'Maharashtra'}
json.dumps(person)
'{"name": "vikrant", "place": "Maharashtra"}'
import requests
url = 'https://www.alphavantage.co/query'
params = {'function' : 'TIME_SERIES_INTRADAY',
         'symbol': 'IBM',
         'interval': '15min',
         'apikey':'UKVFE0JLE0TBPDEF'}

response  = requests.get(url, params=params)
d = response.json() # will convert json data to python lists/dicts
type(d)
dict
d.keys()
dict_keys(['Meta Data', 'Time Series (15min)'])
d['Meta Data']
{'1. Information': 'Intraday (15min) open, high, low, close prices and volume',
 '2. Symbol': 'IBM',
 '3. Last Refreshed': '2024-03-11 19:45:00',
 '4. Interval': '15min',
 '5. Output Size': 'Compact',
 '6. Time Zone': 'US/Eastern'}
type(d['Time Series (15min)'])
dict
import pandas as pd
pd.read_json(json.dumps(d['Time Series (15min)']))
/tmp/ipykernel_2991608/1128338275.py:1: FutureWarning: Passing literal json to 'read_json' is deprecated and will be removed in a future version. To read from a literal string, wrap it in a 'StringIO' object.
  pd.read_json(json.dumps(d['Time Series (15min)']))
2024-03-11 19:45:00 2024-03-11 19:30:00 2024-03-11 19:15:00 2024-03-11 19:00:00 2024-03-11 18:45:00 2024-03-11 18:30:00 2024-03-11 18:15:00 2024-03-11 18:00:00 2024-03-11 17:45:00 2024-03-11 17:30:00 ... 2024-03-08 13:00:00 2024-03-08 12:45:00 2024-03-08 12:30:00 2024-03-08 12:15:00 2024-03-08 12:00:00 2024-03-08 11:45:00 2024-03-08 11:30:00 2024-03-08 11:15:00 2024-03-08 11:00:00 2024-03-08 10:45:00
1. open 192.89 193.00 192.5 191.73 192.80 191.73 193.00 192 192.46 192.39 ... 195.130 195.71 195.70 195.43 196.295 196.42 196.460 196.74 197.540 197.215
2. high 192.89 193.00 193.0 192.97 193.00 193.00 193.00 193 192.46 192.47 ... 195.510 195.71 196.01 195.72 196.295 196.52 196.790 196.83 197.583 197.770
3. low 192.31 192.15 192.5 191.73 192.11 191.73 192.22 192 192.00 192.00 ... 194.960 194.97 195.63 195.42 195.340 196.18 196.245 196.39 196.510 197.140
4. close 192.50 192.16 193.0 192.00 192.28 193.00 192.40 193 192.20 192.39 ... 195.365 195.14 195.69 195.70 195.390 196.32 196.410 196.50 196.740 197.552
5. volume 178.00 287.00 3.0 425001.00 412.00 424898.00 24.00 1867 95.00 391.00 ... 76776.000 119221.00 91038.00 173257.00 189651.000 92136.00 112172.000 115824.00 131936.000 111196.000

5 rows × 100 columns

r = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=IBM&interval=5min&apikey=demo")
r.json().keys()
dict_keys(['Meta Data', 'Time Series (5min)'])
r = requests.get("https://www.python.org")
r
<Response [200]>
print(r.text[:1000])
<!doctype html>
<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->
<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->
<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr">  <!--<![endif]-->

<head>
    <!-- Google tag (gtag.js) -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-TF35YF9CVH"></script>
    <script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());
      gtag('config', 'G-TF35YF9CVH');
    </script>

    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">

    <link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
    <link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js">

    <meta name="application-name" content="Python.org">
filepath = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
r = requests.get(filepath)
r
<Response [200]>
with open("wallet.csv", "w") as f:
    f.write(r.text)
def downloadfile(url, filepath):
    with open(filepath, "wb") as f:
        r = requests.get(url)
        f.write(r.content)
        
csvurl = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.csv"
excelurl = "https://raw.githubusercontent.com/vikipedia/python-trainings/master/online_course/source/module2/wallet.xlsx"
downloadfile(csvurl, "x.csv")
downloadfile(excelurl, "x.xlsx")

problem

Write a function dwonload_notes to scrape live notes from https://notes.arcesium-lab.pipal.in. The pattern for url is this

  • notes for module 1 day 1 - > https://notes.arcesium-lab.pipal.in/1-1.html
  • notes for module 3 day 2 -> https://notes.arcesium-lab.pipal.in/3-2.html

There are three modules and 5 days in each module.