Assignment 08

Solutions to Assignment 08.

Make Slug

Write a function make_slug that takes title as argument and converts that into a nice name that can be used as part of URL. Make sure the the slug is in lower case.

The function should replace the occurances all non-alpha-numeric characters into a hyphen (-). When there multiple such characters together, they should get converted to a single hypen. Also, there should not be any hypens at the beginning or the end of the slug.

>>> make_slug("Advanced python")
'advanced-python'
>>> make_slug("Hello, World!")
'hello-world'
>>> make_slug("1 + 2 = 3 !")
'1-2-3'
>>> make_slug("https://google.com/")
'https-google-com'

Solution

Disk Usage

Get the disk usage of the computer using the df command.

The unix command df prints the disk usage of every mounted filesystem on the computer.

$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
overlay         81106868 7855408  73235076  10% /
tmpfs              65536       0     65536   0% /dev
shm                65536       0     65536   0% /dev/shm
/dev/vda1       81106868 7855408  73235076  10% /home
tmpfs            2009932       0   2009932   0% /proc/acpi
tmpfs            2009932       0   2009932   0% /proc/scsi
tmpfs            2009932       0   2009932   0% /sys/firmware

Write a function df that invokes the command df, parses the output and returns the output as python values.

>>> df()
[
    {'filesystem': 'overlay', 'blocks': 81106868, 'used': 7855576, 'available': 73234908, 'percent_used': 10, 'path': '/'},
    {'filesystem': 'tmpfs', 'blocks': 65536, 'used': 0, 'available': 65536, 'percent_used': 0, 'path': '/dev'},
    {'filesystem': 'shm', 'blocks': 65536, 'used': 0, 'available': 65536, 'percent_used': 0, 'path': '/dev/shm'},
    {'filesystem': '/dev/vda1', 'blocks': 81106868, 'used': 7855576, 'available': 73234908, 'percent_used': 10, 'path': '/home'},
    {'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/proc/acpi'},
    {'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/proc/scsi'},
    {'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/sys/firmware'}
]

Please note that the values of blocks, used, available and percent_used are integers.

Hints:

You can use the os.popen command to run and command and read its output as a file.

>>> os.popen("seq 3").readlines()
['1\n', '2\n', '3\n']

In the above example, we are executing the command seq 3 and reading the output in Python.

Solution

import os


def parse_line(line):
    fs, blocks, used, available, percent, path = line.strip().split()
    return {
        "filesystem": fs,
        "blocks": int(blocks),
        "used": int(used),
        "available": int(available),
        "percent_used": int(percent.strip("%")),
        "path": path
    }

def df():
    lines = os.popen("df").readlines()
    return [parse_line(line) for line in lines[1:]]

The dig tool

The dig command is a popular tool for querying DNS name servers. You task is to expose the dig command in Python as a function dig.

The dig command

Let’s look at the output of the dig command to understand how it works.

$ dig amazon.com

; <<>> DiG 9.18.1-1ubuntu1.2-Ubuntu <<>> amazon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49697
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;amazon.com.            IN  A

;; ANSWER SECTION:
amazon.com.     128 IN  A   205.251.242.103
amazon.com.     128 IN  A   54.239.28.85
amazon.com.     128 IN  A   52.94.236.248

;; Query time: 16 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Wed Jan 18 13:45:30 IST 2023
;; MSG SIZE  rcvd: 87

The output is too verbose. We can silence everything else and show only answer using:

$ dig +noall +answer amazon.com
amazon.com.     128 IN  A   205.251.242.103
amazon.com.     128 IN  A   54.239.28.85
amazon.com.     128 IN  A   52.94.236.248

We used the dig command to query the name server to look up for amazon.com. By default, it queries for records of type A, which stands for IPv4 Address.

The DNS server responded with three records matching the domain name amazon.com. Each entry contains 5 fields, namely the domain name, TTL, class, record type and content.

See how to read dig output on wizard zines for more details.

Advanced Usage

Record Type

We can optionally pass the record type to dig to query of other types of DNS records like MX (mail exchange), TXT (text notes), NS (name server) etc.

The following command, queries for records of type MX.

$ dig +noall +answer amazon.com mx
amazon.com.     766 IN  MX  5 amazon-smtp.amazon.com.

In case of MX records, the content contains the priority and the mail server name. There could be more than one entry.

The following command queries for records of type NS.

$ dig +noall +answer amazon.com ns
amazon.com.     1626    IN  NS  pdns6.ultradns.co.uk.
amazon.com.     1626    IN  NS  ns3.p31.dynect.net.
amazon.com.     1626    IN  NS  ns4.p31.dynect.net.
amazon.com.     1626    IN  NS  ns1.p31.dynect.net.
amazon.com.     1626    IN  NS  ns2.p31.dynect.net.
amazon.com.     1626    IN  NS  pdns1.ultradns.net.

Server

By default, the dig command queries the DNS server configured in the system. However, we can explicitly pass a server to query it.

$ dig +noall +answer amazon.com @8.8.8.8
amazon.com.     128 IN  A   205.251.242.103
amazon.com.     128 IN  A   54.239.28.85
amazon.com.     128 IN  A   52.94.236.248

The 8.8.8.8 is the Google DNS server. The output would look exactly the same in the normal times, but it is very handy to troubleshoot DNS issues with your domain by querying various known DNS server.

There are some hobby DNS servers like dns.toys that provide useful utilities over dns.

$ dig +noall +answer mumbai.weather @dns.toys
mumbai.         1   IN  TXT "Mumbai (IN)" "30.40C (86.72F)" "32.10% hu." "clearsky_day" "14:30, Wed"
mumbai.         1   IN  TXT "Mumbai (IN)" "27.90C (82.22F)" "42.80% hu." "clearsky_day" "16:30, Wed"
mumbai.         1   IN  TXT "Mumbai (IN)" "23.00C (73.40F)" "71.40% hu." "clearsky_night" "18:30, Wed"
mumbai.         1   IN  TXT "Mumbai (IN)" "20.40C (68.72F)" "90.50% hu." "clearsky_night" "20:30, Wed"
mumbai.         1   IN  TXT "Mumbai (IN)" "18.80C (65.84F)" "90.40% hu." "clearsky_night" "22:30, Wed"

Summary of Usage

The dig command is used as follows:

dig options domain-name record-type @server

The options, record-type and server are optional.

The Python API

You task is to implement a Python function dig that calls the dig command with appropriate arguments, parse the output and returns the result as a Python data structure.

The funtion takes the doman name as argument and two optional arguments, record_type and server.

Sample Usage:

>>> dig("amazon.com")
[
    {"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "205.251.242.103"},
    {"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "54.239.28.85"},
    {"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "52.94.236.248"}
]

>>> dig("amazon.com", record_type="MX")
[
  {"name": "amazon.com", "ttl": 498, "class": "IN", "record_type": "MX", "content": "5 amazon-smtp.amazon.com."}
]

>>> dig("amazon.com", record_type="MX", server="8.8.8.8")
[
  {"name": "amazon.com", "ttl": 498, "class": "IN", "record_type": "MX", "content": "5 amazon-smtp.amazon.com."}
]

>>> dig("mumbai.weather", server="dns.toys")
[
    {"name": "mumbai.", "ttl": 1, "class": "IN", "record_type": "TXT", "content": '"Mumbai (IN)" "30.40C (86.72F)" "32.10% hu." "clearsky_day" "14:30, Wed"'},
    ...
]

Solution

import subprocess

def parse_record(line):
    name, ttl, klass, record_type, content = line.strip().split(None, 4)
    return {
        "name": name,
        "ttl": int(ttl),
        "record_type": record_type,
        "class": klass,
        "content": content
    }

def dig(name, record_type="A", server=None):
    """Python interface to the dig command.
    """
    cmd = ["dig", '+noall', '+answer', name, record_type]
    if server:
        cmd.append(f"@{server}")
    print(cmd)
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True)
    return [parse_record(line) for line in p.stdout]

Run-length Decoding

Run-length encoding (RLE) is a form of lossless data compression which represents the many consequetive occurances of the same values with its length and the value. It was used in early graphics file formats for compressing black and white images.

If we present the black and white values for pixels as W and B, the sequence of WWWBBW can be runtime encoded as 3W2B1W.

Similarly, the data 12W1B12W would be decoded as a sequence of twelve Ws, one B and twelve Ws.

Write functions encode and decode to encode a run-length encoded image and decode it back.

>>> encode('WWWBBW')
'3W2B1W'
>>> encode(''WWWWWWWWWWWWBWWWWWWWWWWWW')
'12W1B12W'

>>> decode('3W2B1W')
'WWWBBW'
>>> decode('12W1B12W')
'WWWWWWWWWWWWBWWWWWWWWWWWW'

Assume that the data is the correct format.

Hints

You may be able to use regular expressions. See re.findall.

References

Solution

import re

def decode(data):
    pairs = re.findall("(\d+)([BW]+)", data)
    return "".join([c* int(count) for count, c in pairs])

def encode(image):
    chunks = re.findall("(W+|B+)", image)
    return "".join(f"{len(x)}{x[0]}" for x in chunks)

Two Column Layout

Write a program twocol.py to arrange text into 2 columns (like you see in news papers). The program should take a filename as input, paginate the text to show fixed number of lines per page and arrange it in two columns.

The program takes the filename to print as an argument and the number of lines per page and the column width are specified using optional flags. The default values of the number of lines per page is 20 and the default column width is 30.

For every page, there will be a header showing the page number followed by empty line. There will also be a bottom margin of 2 empty lines for every page.

The following is the expected usage of the program.

usage: twocol.py [-h] [-l LINES_PER_PAGE] [-w COLUMN_WIDTH] filename

positional arguments:
  filename              file to format in two columns

options:
  -h, --help            show this help message and exit
  -l LINES_PER_PAGE, --lines-per-page LINES_PER_PAGE
                        number of lines to include per page (default: 20)
  -w COLUMN_WIDTH, --column-width COLUMN_WIDTH

Three sample files are provided for you to try.

  • files/10.txt - a file with numbers from 1 to 10
  • files/50.txt - a file with numbers from 1 to 50
  • files/tryst.txt - a long text file

Here is the expected output for some sample usage.

$ python twocol.py -w3 -l5 files/10.txt
Page 1

1   | 6
2   | 7
3   | 8
4   | 9
5   | 10


$ python twocol.py -w3 -l3 files/10.txt
Page 1

1   | 4
2   | 5
3   | 6


Page 2

7   | 10
8   |
9   |


$ python twocol.py files/50.txt
Page 1

1                              | 21
2                              | 22
3                              | 23
4                              | 24
5                              | 25
6                              | 26
7                              | 27
8                              | 28
9                              | 29
10                             | 30
11                             | 31
12                             | 32
13                             | 33
14                             | 34
15                             | 35
16                             | 36
17                             | 37
18                             | 38
19                             | 39
20                             | 40


Page 2

41                             |
42                             |
43                             |
44                             |
45                             |
46                             |
47                             |
48                             |
49                             |
50                             |

$ python twocol.py files/tryst.txt
Page 1

Long years ago... we made a    | pledge of dedication to the
tryst with destiny, and now    | service of India and her
the time comes when we shall   | people and to the still larger
redeem our pledge, not wholly  | cause of humanity.
or in full measure, but very   |
substantially.                 | At the dawn of history India
                               | started on her unending quest,
At the stroke of the midnight  | and trackless centuries are
hour, when the world sleeps,   | filled with her striving and
India will awake to life and   | the grandeur of her success
freedom. A moment comes, which | and her failures. Through good
comes, but rarely in history,  | and ill fortune alike she has
when we step out from the old  | never lost sight of that quest
to the new, when an age ends,  | or forgotten the ideals which
and when the soul of a nation, | gave her strength. We end
long suppressed, finds         | today a period of ill fortune
utterance.                     | and India discovers herself
                               | again.
It is fitting that at this     |
solemn moment we take the      | The achievement we celebrate


Page 2

today is but a step, an        | continue even now.
opening of opportunity, to the | Nevertheless, the past is over
greater triumphs and           | and it is the future that
achievements that await us.    | beckons to us now.
Are we brave enough and wise   |
enough to grasp this           | That future is not one of ease
opportunity and accept the     | or resting but of incessant
challenge of the future?       | striving so that we may fulfil
                               | the pledges we have so often
Freedom and power bring        | taken and the one we shall
responsibility. The            | take today. The service of
responsibility rests upon this | India means the service of the
Assembly, a sovereign body     | millions who suffer. It means
representing the sovereign     | the ending of poverty and
people of India. Before the    | ignorance and disease and
birth of freedom we have       | inequality of opportunity.
endured all the pains of       |
labour and our hearts are      | The ambition of the greatest
heavy with the memory of this  | man of our generation has been
sorrow. Some of those pains    | to wipe every tear from every


Page 3

eye. That may be beyond us,    | that can no longer be split
but as long as there are tears | into isolated fragments.
and suffering, so long our
work will not be over.

And so we have to labour and
to work, and work hard, to
give reality to our dreams.
Those dreams are for India,
but they are also for the
world, for all the nations and
peoples are too closely knit
together today for anyone of
them to imagine that it can
live apart.

Peace has been said to be
indivisible; so is freedom, so
is prosperity now, and so also
is disaster in this one world

Hint: You may want to use textwrap module for wrapping the text, even though doing it without that is not very hard.

Solution

import sys
import textwrap
import argparse

def paginate(text, width, rows):
    lines = []
    for line in text.splitlines():
        if not line.strip():
            lines.append("")
        chunk = textwrap.wrap(line, width)
        lines.extend(chunk)
    return [lines[i:i+rows] for i in range(0, len(lines), rows)]

def twocolumn(lines, width, rows):
    lines = lines + [''] * rows
    a = lines[:rows]
    b = lines[rows:]

    lines = []
    for left, right in zip(a, b):
        left = left.ljust(width)
        right = right.ljust(width)
        line = left + " | " + right
        lines.append(line)
    return lines

def print_page(page, index):
    print("Page", index)
    print()
    for line in page:
        print(line)
    print()
    print()

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("-l", "--lines-per-page", type=int, default=20, help="number of lines to include per page (default: %(default)d)")
    p.add_argument("-w", "--column-width", type=int, default=30, help="maximum number of characters in each column (default: %(default)d)")
    p.add_argument("filename", help="file to format in two columns")
    return p.parse_args()

def main():
    args = parse_args()
    width = args.column_width
    rows = args.lines_per_page
    pages = paginate(open(args.filename).read(), width=width, rows=2*rows)

    for i, p in enumerate(pages):
        p2 = twocolumn(p, width=width, rows=rows)
        print_page(p2, index=i+1)

if __name__ == "__main__":
    main()