Python Virtual Training For Arcesium - Module III - Day 4¶

Mar 13-17, 2023 Vikrant Patil

All notes are available online at https://notes.pipal.in/2023/arcesium_finop_jan/

Please login to https://engage.pipal.in/ and launch jupyter lab

For today create a notebook with name module3-day4

notebook names are case sensitive. Make sure you give correct name

© Pipal Academy LLP

In [1]:
%%file download_with_selenium.py
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

driver = webdriver.Firefox() # launch a browser

driver.get("http://www.python.org") # got to this url

assert "Python" in driver.title # make sure the site is loaded
time.sleep(5)

elem = driver.find_element(By.NAME, "q")

elem.clear()
elem.send_keys("pandas") # this is how we can send username and password
elem.send_keys(Keys.RETURN)
time.sleep(10)
assert "No results found." not in driver.page_source
driver.close()
Overwriting download_with_selenium.py

Debugging strategies¶

In [2]:
!pip install pandas
Requirement already satisfied: pandas in /home/vikrant/usr/local/default/lib/python3.10/site-packages (1.5.2)
Requirement already satisfied: python-dateutil>=2.8.1 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (from pandas) (2.8.2)
Requirement already satisfied: numpy>=1.21.0 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (from pandas) (1.23.5)
Requirement already satisfied: pytz>=2020.1 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (from pandas) (2022.6)
Requirement already satisfied: six>=1.5 in /home/vikrant/usr/local/default/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)

Installing dependencies in system python¶

if there is python command avaialbale on your cmd

python -m pip install selenium pandas # this will make sure that these packages will be available in system IDLE

Giving your code to others¶

  1. Make a folder
    • scripts (myprogram.py)
    • requirements.txt
    • anything additional like geckodriver
    • README.txt
  2. making a python package
In [7]:
%%file download_with_selenium.py
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import typer

def search(keyword:str):
    """Loads python.org in browser and search for given keyword
    """
    driver = webdriver.Firefox() # launch a browser

    driver.get("http://www.python.org") # got to this url

    assert "Python" in driver.title # make sure the site is loaded
    time.sleep(5)

    elem = driver.find_element(By.NAME, "q")

    elem.clear()
    elem.send_keys(keyword) # this is how we can send username and password
    elem.send_keys(Keys.RETURN)
    time.sleep(10)
    assert "No results found." not in driver.page_source
    driver.close()
    
    
if __name__ == "__main__":
    typer.run(search)
Overwriting download_with_selenium.py
In [5]:
%%file requirements.txt
selenium
Overwriting requirements.txt
In [6]:
%%file README.txt
1.create virtual env and in that install packages using requirements.txt
2.then activate the env and from commandline run the program
Writing README.txt

Making a python package¶

search_on_pythonorg
  |
  |--setup.py
  |--requirements.txt
  +-A
    |
    |--__init__.py
    |-- download_with_selenium.py
    +-B
      |
      |-- __init__.py
      |-- stats.py
  • create a folder search_on_pythonorg
In [14]:
%%file search_on_pythonorg/setup.py
from distutils.core import setup

setup(
    name = "search_on_pythonorg",
    version = "1.0",
    description = "A sample package to demonstrate python packaging",
    author = "Vikrant",
    author_email = "vikrant@python.training.org",
    url = "https://searchonpythonorg.com",
    packages = ['A',"A.B"],
    install_requires = [
        'selenium',
        'typer',
    ]
)
Overwriting search_on_pythonorg/setup.py
In [9]:
%%file search_on_pythonorg/A/__init__.py
Writing search_on_pythonorg/A/__init__.py
In [10]:
%%file search_on_pythonorg/A/download_with_selenium.py
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import typer

def search(keyword:str):
    """Loads python.org in browser and search for given keyword
    """
    driver = webdriver.Firefox() # launch a browser

    driver.get("http://www.python.org") # got to this url

    assert "Python" in driver.title # make sure the site is loaded
    time.sleep(5)

    elem = driver.find_element(By.NAME, "q")

    elem.clear()
    elem.send_keys(keyword) # this is how we can send username and password
    elem.send_keys(Keys.RETURN)
    time.sleep(10)
    assert "No results found." not in driver.page_source
    driver.close()
    
    
if __name__ == "__main__":
    typer.run(search)
Writing search_on_pythonorg/A/download_with_selenium.py
In [11]:
%%file search_on_pythonorg/A/B/__init__.py
Writing search_on_pythonorg/A/B/__init__.py
In [12]:
%%file search_on_pythonorg/A/B/stats.py

def mean(nums):
    pass


def std(nums):
    pass


def median(nums):
    pass
Writing search_on_pythonorg/A/B/stats.py

Regular expresssion¶

In [15]:
import re
In [16]:
re.compile("..") # this matches with only 2 chars (any) . -> single char 
re.compile("^") #start of line
re.compile("$") #end of line
Out[16]:
re.compile(r'$', re.UNICODE)
In [17]:
lines = """jhdksjah
kjhfkjdshf

kjhkhdsf
wqiuen,wqeiwq,iuwyewq
23kliuasj


"""
In [18]:
empty_line = re.compile("^$")
In [19]:
count = 0
for line in lines.split("\n"):
    if empty_line.match(line):
        count += 1
        
In [20]:
count
Out[20]:
4
In [42]:
lines = """jhdksjah
kjhfkjdshf

kjhkhdsf
wqiuen,wqeiwq,iuwyewq
23kliuasj


23

fg

fgfhhg

"""
In [23]:
two_char = re.compile("^..$")
for line in lines.split("\n"):
    if two_char.match(line):
        print(line)        
23
fg
In [32]:
lines = """jhdksjah
kjhfkjdshf

kjhkhdsf
wqiuen,wqeiwq,iuwyewq
23kliuasj


23

fg34

fgfhhg

"""
In [35]:
two_digits = re.compile("\d\d") # match string with 2 digits 
In [34]:
for line in lines.split("\n"):
    if two_digits.match(line):
        print(line)
23kliuasj
23
In [36]:
one_or_more_digits = re.compile(".*\d+") # + is for one or more, * for zero or more
In [37]:
for line in lines.split("\n"):
    if one_or_more_digits.match(line):
        print(line)
23kliuasj
23
fg34
In [39]:
lines = """fkjdslkf
kdsjflkdsjfAsdlk
llk lkjsds kjlk
ljflkjf


Total = Rs 222.0
"""
In [40]:
search_total = re.compile("Total *= *Rs +\d+\.\d+")
In [41]:
for line in lines.split("\n"):
    if search_total.match(line):
        print(line)
Total = Rs 222.0

meta chars and their meanings in the regular expression pattern

  • ^ start of line
  • $ end of line
  • . any char
  • \d digit
  • + one or more occurence of previous char/pattern
  • * zero or more occerences of previous char/pattern
  • ? zero or one occerences of previous char/pattern
In [ ]: