Preface

Problem Set 0

Solutions to all problems in the Problem Set 0.

Problem 1.1

Print Five

Write a python program to print numbers from 1 to 5.

Solution

print(1)
print(2)
print(3)
print(4)
print(5)

Discussion

This problem is given to make the participants explore the interface to solve problems and submit them.

The problem as such is very simple. You just need to print numbers 1 to 5, with mutiple print statements.

print(1)
print(2)
print(3)
print(4)
print(5)

If you know how to use a for loop in Python, you can do that in a loop by providing the numbers 1, 2, 3, 4, and 5 as a list.

for n in [1, 2, 3, 4, 5]:
    print(n)

Or you could use the range function to create the sequence of numbers instead of creating the list manually.

for n in range(1, 6):
    print(n)

The call range(n) gives n numbers from 0 to n-1. If you want numbers from 1 to n, we need to use range(1, n+1).

Problem 1.2

Product of Numbers

Compute the product of the list of numbers mentioned on the top of the program using a for loop and print the result.

Solution

# Do not change this line
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Compute the product of the list of numbers mentioned above using
# a for loop and print the result.
# Your code below this line

result = 1
for n in numbers:
    result *= n

print(result)

Problem 1.3

Product of Even Numbers

Compute the product of even numbers among the list of numbers mentioned on the top of the program and print the result.

If you can if a number n is even or not by checking n % 2 == 0. For example:

n = 10
if n % 2 == 0:
    print("n is even")
else:
    print("n is odd")

Solution

# Do not change this line
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Compute the product of even numbers among the list of numbers
# mentioned above and print the result.
# Your code below this line

result = 1
for n in numbers:
    if n % 2 == 0:
        result *= n

print(result)

Problem 1.4

Square

Write a function square to compute the square of a number.

>>> square(4)
16

Solution

def square(n):
    return n*n

Problem 1.5

Mean

Write a function mean to compute the arthematic mean of a list of numbers.

The arthemetic mean of N numbers is computed by adding all the N numbers and dividing the total by N.

The function takes the list of numbers as argument and returns the mean.

>>> mean([1, 2, 3, 4, 5])
3.0

Solution

def mean(numbers):
    return sum(numbers)/len(numbers)

Discussion

The solution is quite straight-forward. We use the built-in function sum to compute the total and divide it by the number of numbers, which is computed using the built-in function len.

def mean(numbers):
    return sum(numbers) / len(numbers)

You could also write that in multiple lines, if you find that more comfortable.

def mean(numbers):
    total = sum(numbers)
    n = len(numbers)
    return total / n

Problem Set 1

Solutions to all problems in the Problem Set 1.

Problem 2.1

Add Two Numbers

Write a program add.py that takes two numbers as command-line arguments and prints their sum.

$ python add.py 3 4
7
$ python add.py 10 20
30

Solution

import sys

a = int(sys.argv[1])
b = int(sys.argv[2])
print(a+b)

Problem 2.2

Make Header

Write a program header.py that takes a word as command-line argument and prints it as header as shown below with the word converted to upper case.

$ python header.py python
======
PYTHON
======

$ python header.py python-foundation-course
========================
PYTHON-FOUNDATION-COURSE
========================

Hint:

>>> name = 'python'
>>> name.upper()
'PYTHON'

Solution

import sys

word = sys.argv[1].upper()
n = len(word)
print("=" * n)
print(word)
print("=" * n)

Discussion

The program deals gets the word to print as a command-line argument. So we need to use the sys module to access the arguments.

import sys

The word will be the first command-line argument and we need to convert it into upper case.

word = sys.argv[1].upper()

Now we need to print a line above and below the word. Line made of = characters and we use as many of them as the number of characters in the word, so that it aligns well with the word. For that we need to find the length of the word.

n = len(word)

Now, let's print the line above the word.

print("=" * n)

Followed by the word it self.

print(word)

And the line below the word.

print("=" * n)

Problem 2.3

Text in a Box

Write a program box.py that takes word as a command-line argument and prints the word in a box as shown below.

$ python box.py python
+--------+
| python |
+--------+

Please note that there should be exactly one space on either side of the text in the box.

Solution

import sys

def box(word):
    n = len(word) + 2
    line = "-" * n
    header = f"+{line}+"
    print(header)
    print(f"| {word} |")
    print(header)

word = sys.argv[1]
box(word)

Problem 2.4

Upper Case

Write a program uppercase.py that takes a filename as command-line argument and prints all the contents of the file in uppercase.

$ cat files/five.txt
One
Two
Three
Four
Five

$ python uppercase.py files/five.txt
ONE
TWO
THREE
FOUR
FIVE

Solution

import sys

path = sys.argv[1]
contents = open(path).read()
print(contents.upper())

Problem 2.5

Sort a file

Write a program sort.py that takes filename as a command-line argument and prints lines in that file in the sorted order.

There are two sample files files/names.txt and files/blake.txt to try your program against.

$ cat files/names.txt
bob
dave
charlie
alice

$ python sort.py files/names.txt
alice
bob
charlie
dave


$ cat files/blake.txt
Piping down the valleys wild 
Piping songs of pleasant glee 
On a cloud I saw a child. 
And he laughing said to me. 

Pipe a song about a Lamb; 
So I piped with merry chear, 
Piper pipe that song again— 
So I piped, he wept to hear.

$ python sort.py files/blake.txt

And he laughing said to me.
On a cloud I saw a child.
Pipe a song about a Lamb;
Piper pipe that song again—
Piping down the valleys wild
Piping songs of pleasant glee
So I piped with merry chear,
So I piped, he wept to hear.

Solution

import sys
path = sys.argv[1]

lines = open(path).readlines()

for line in sorted(lines):
    print(line, end="")

Problem 2.6

Reverse Lines

Write a program reverse.py that takes a filename as command-line argument and prints all the lines in that file in the reverse order.

$ cat files/five.txt
one
two
three
four
five and the last line!

$ python reverse.py files/five.txt
five and the last line!
four
three
two
one

Solution

import sys

filename = sys.argv[1]

lines = open(filename).readlines()[::-1]
for line in lines:
    print(line, end="")

Problem Set 2

Solutions to all problems in the Problem Set 2.

Problem 3.1

Sum File

Write a program sumfile.py that takes a filename as argument and prints sum of all numbers in the file. It is assumed that the file contains one number per line.

$ python sumfile.py files/ten.txt
55

Solution

import sys
filename = sys.argv[1]
numbers = [int(line) for line in open(filename)]
print(sum(numbers))

Problem 3.2

Grep Command

Implement Unix command grep in Python.

Write a program grep.py that takes a pattern and a file as command-line arguments and print all the lines in the file that contain that pattern.

The pattern could be any text and there is no need to support regular expressions.

$ cat files/zen.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

$ python grep.py never files/zen.txt
Errors should never pass silently.
Now is better than never.
Although never is often better than *right* now.

$ grep the files/zen.txt
Special cases aren't special enough to break the rules.
In the face of ambiguity, refuse the temptation to guess.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.

Solution

import sys

pat = sys.argv[1]
filename = sys.argv[2]

for line in open(filename):
    if pat in line:
        print(line, end="")

Problem 3.3

Center Align

Write a program center_align.py to center align all lines in the given file.

$ cat poem.txt
There was an Old Man with a beard
Who said, "It is just as I feared!
Two Owls and a Hen,
Four Larks and a Wren,
Have all built their nests in my beard!"

$ python center_align.py poem.txt
   There was an Old Man with a beard
   Who said, "It is just as I feared!
          Two Owls and a Hen,
         Four Larks and a Wren,
Have all built their nests in my beard!"

Hint:

>>> "hello".center(7)
" hello "
>>> "hello".center(9)
"  hello  "

The built-in function max can take a list of numbers and return the maximum value out of them.

>>> max([1, 5, 2, 7, 4])
7

Solution

import sys
filename = sys.argv[1]
lines = open(filename).readlines()
if lines:
    n = max(len(line.strip()) for line in lines)

    for line in lines:
        print(line.strip().center(n))

Problem 3.4

Random DNA Sequence

Write a function random_dna_sequence to generate a DNA sequence of given length by taking nucleotides selected at random.

>>> random_dna_sequence(4)
'TCAC'

>>> random_dna_sequence(10)
'GCTGCTGGCA'

Hint: Use of the random.choice function from random module which selects random element from a list/string.

>>> random.choice("ABCD")
'B'
>>> random.choice("ABCD")
'C'
>>> random.choice("ABCD")
'C'
>>> random.choice("ABCD")
'D'

Solution

import random

def random_dna_sequence(n):
    return "".join(random.choice("ATGC") for i in range(n))

Problem Set 3

Solutions to all problems in the Problem Set 3.

Problem 4.1

Group

Write a function group that take a list of values and splits into smaller lists of given size.

>>> group([1, 2, 3, 4, 5, 6, 7, 8, 9], 3)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

>>> group([1, 2, 3, 4, 5, 6, 7, 8, 9], 4)
[[1, 2, 3, 4], [5, 6, 7, 8], [9]]

Solution

def group(values, n):
    return [values[i:i+n] for i in range(0, len(values), n)]

Problem 4.2

Sum of Arguments

Write a program sum.py that takes one or more numbers as command-line argument and prints their sum.

$ python sum.py 1 2 3 4 5
15

Solution

import sys

args = sys.argv[1:]
numbers = [int(a) for a in args]
print(sum(numbers))

Problem 4.3

Paste

Write a program paste.py that takes two files as command-line arguments and contacenates the corresponding lines in those two files with a tab character and prints it.

For example, of the first file files/a.txt has the following contents:

A
B
C
D

and the second file files/b.txt has the following:

1
2
3
4

The output should be:

$ python paste.py files/a.txt files/b.txt
A       1
B       2
C       3
D       4

Note that the first line is "A\t1".

For simplicity, assume that both the files have exactly same number of lines.

Hint:

You can use the strip method on a string to remove the new line character.

>>> "a\n".strip("\n")
"a"

Solution

import sys

f1 = sys.argv[1]
f2 = sys.argv[2]

for left, right in zip(open(f1), open(f2)):
    left = left.strip("\n")
    right = right.strip("\n")
    print(f"{left}\t{right}")

Problem Set 4

Solutions to all problems in the Problem Set 4.

Problem 5.1

Sequence of Numbers

Write a program seq.py that takes a number n as argument and prints numbers from 1 to n. It should support the following two flags.

-s START --start START
    print number from START to n instead of 1 to n

-r --reverse
    print the numbers in the reverse order

The program should also print approprate help message when used with -h or --help flags.

Use the standard library module argparse for doing this. You may want to checkout the argparse tutorial to know how to use that module.

Expected Output:

$ python seq.py 5
1
2
3
4
5

$ python seq.py -s 3 5
3
4
5

$ python seq.py -r -s 3 5
5
4
3

Solution

import argparse

p = argparse.ArgumentParser()
p.add_argument("n", type=int)
p.add_argument("-s", "--start", type=int, default=1)
p.add_argument("-r", "--reverse", action="store_true", default=False)
args = p.parse_args()

numbers = list(range(args.start, args.n+1))
if args.reverse:
    numbers = numbers[::-1]

for n in numbers:
    print(n)

Problem 5.2

Skip lines in a file

Write a program skip.py to print the contents of a file after skipping the first few lines.

The program takes an optional flag -n to indicate the number of lines to skip, which is considered as 5 when not specified.

The program takes a filename as argument and prints the contents of this file after skipping the number of lines specified by -n.

$ python skip.py files/ten.txt
6
7
8
9
10

$ python skip.py -n 8 files/ten.txt
9
10

Hint: Use argparse module.

Solution

import argparse

p = argparse.ArgumentParser()
p.add_argument("-n", type=int, default=5, help="number of lines to skip")
p.add_argument("filename", help="name of the file to consider")
args = p.parse_args()

lines = open(args.filename).readlines()

for line in lines[args.n:]:
    print(line, end="")

Problem 5.3

Split a File

Write a program split.py that splits a large file into multiple smaller files. The program should take a filename and the number of lines as arguments and write multiple small files each containing the specified number of lines (The last one may have smaller number of lines).

$ python split.py files/100.txt 30
writing files/100-part1.txt
writing files/100-part2.txt
writing files/100-part3.txt
writing files/100-part4.txt

Solution

"""Program to split a file into smaller parts.

It takes a filename and the number of lines in each part as
command-line arguments and splits the file into smaller parts
with each file having no more than the specified number of lines.

USAGE:
    $ python split.py large-file.txt 100
    writing large-file-part1.txt
    writing large-file-part2.txt
    ...
"""
import sys

def group(values, n):
    return [values[i:i+n] for i in range(0, len(values), n)]

def splitfile(filename, chunk_size):
    lines = open(filename).readlines()
    return group(lines, chunk_size)

def write_lines(filename, lines):
    """Write a list of lines to the a file.
    """
    print("writing", filename)
    with open(filename, "w") as f:
        f.writelines(lines)

def generate_part_filename(filename, index):
    """Generates a new filename by adding index as suffix to the filename.

        >>> generate_part_filename("a.txt", 1)
        "a-part1.txt"
    """
    nameparts = filename.split(".", 1)
    if len(nameparts) == 2:
        name, ext = nameparts
        ext = "." + ext
    else:
        name = filename
        ext = ""

    return f"{name}-part{index}{ext}"

def write_small_files(filename, file_chunks):
    for i, chunk in enumerate(file_chunks, start=1):
        new_filename = generate_part_filename(filename, i)
        write_lines(new_filename, chunk)

def main():
    filename = sys.argv[1]
    chunk_size = int(sys.argv[2])
    file_chunks = splitfile(filename, chunk_size)
    write_small_files(filename, file_chunks)

if __name__ == "__main__":
    main()

Problem 5.4

Longest Argument

Write a program longest.py that takes a one or more words as command-line arguments and prints the longest word.

$ python longest.py joy of programming
programming

$ python longest.py this too shall pass
shall

Hint: You can use list slicing to get all the arguments. For example sys.argv[1:] will give you all arguments other than the program name.

Solution

import sys
args = sys.argv[1:]
print(max(args, key=len))

Problem Set 5

Solutions to all problems in the Problem Set 5.

Problem 6.1

Find Duplicates

Write a function dups that takes a list of values as argument and finds all the elements that appear more than once in the list.

>>> dups([1, 2, 1, 3, 2, 5])
[1, 2]
>>> dups([1, 2, 3, 4, 5])
[]
>>> dups([1, 1, 1, 1])
[1]

Solution

def dups(numbers):
    seen = []
    result = []
    for n in numbers:
        if n in seen and n not in result:
            result.append(n)
        seen.append(n)
    return result

Problem 6.2

Count File Extensions

Write a program extcount.py to count the number of files per extension. The program should take path to a directory as argument and print the count and extension for each available extension, sorted by the count. Files without any extension should be ignored.

$ python extcount.py files
4 py
3 txt
2 csv
1 yml

$ python extcount.py files/data
2 txt
1 csv

Solution

import sys
import os
from collections import Counter

path = sys.argv[1]
filenames = os.listdir(path)
extensions = [f.split(".")[-1] for f in filenames if "." in f]

for ext, count in Counter(extensions).most_common():
    print(count, ext)

Problem 6.3

Line with Most Words

Write a program line_with_most_words.py that takes a filename as command-line argument and prints the line with the most number of words from the file.

$ cat files/words.txt
one
one two
one two three
one two three four
one two three four five
two three four five
three four five
four five
five
one-two-three-four-five-six-seven

$ python line_with_most_words.py files/words.txt
one two three four five

Solution

import sys
filename = sys.argv[1]
lines = open(filename).readlines()

def word_count(line):
    return len(line.split())

longest = max(lines, key=word_count)
print(longest.strip("\n"))

Problem 6.4

Reverse Words

Write a function reverse_words that takes a sentence and returns a new sentence with all the words in the reserse order.

>>> reverse_words("joy of programming")
'programming of joy'

>>> reverse_words("less is more")
'more is less'

>>> reverse_words("road goes ever on and on")
'on and on ever goes road'

Please note that only the order of the words in the sentence is reversed, not the letters in each word.

Solution

def reverse_words(sentence):
    words = sentence.split()
    return " ".join(words[::-1])

Problem 6.5

Sales by Day

Write a program sales.py to compute the total sale amount per day, given a text file with details of transactions with three columns Order Id, Date and Amount.

Here is a sample input file.

$ cat files/orders.txt
1001 2023-01-01 100
1002 2023-01-01 50
1003 2023-01-02 50
1004 2023-01-02 150
1005 2023-01-01 25

The file contains multiple transactions, one in row. Each row will have three fields, Order Id, Date and Amount, seperated by a space.

YOur program should take the order file as a command-line argument and print the total same amount per day. The output should be sorted by date.

$ python sales.py files/orders.txt
2023-01-01 175
2023-01-02 200

Solution

import sys

filename = sys.argv[1]

sales = {}
for line in open(filename):
    order_id, date, amount = line.strip().split()
    sales[date] = sales.get(date, 0) + int(amount)

for date in sorted(sales):
    print(date, sales[date])

Problem 6.6

Uniq Command

Write a program uniq.py that takes filename a as argument, prints the lines to the output, ignoring the identical adjacent input lines.

The command should support the following command-line flags.

 -d      Only output lines that are repeated in the input.

 -u      Only output lines that are not repeated in the input.

 -c      Precede each output line with the count of the number of times
        the line occurred in the input, followed by a single space.

The program should also print approprate help message when used with -h or --help flags.

Use the standard library module argparse for doing this. You may want to checkout the argparse tutorial to know how to use that module.

A sample input file files/animals.txt with the following content is provided along with this problem.

cat
cat
cat
dog
dog
cat
rat

Expected Output

$ python uniq.py files/animals.txt
cat
dog
cat
rat

$ python uniq.py -d files/animals.txt
cat
dog

$ python uniq.py -u files/animals.txt
cat
rat

$ python uniq.py -c files/animals.txt
3 cat
2 dog
1 cat
1 rat

Hints

Look at the uniq command of unix.

Solution

import argparse
import itertools

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("filename")
    p.add_argument("-u", default=False, action="store_true")
    p.add_argument("-d", default=False, action="store_true")
    p.add_argument("-c", default=False, action="store_true")

    return p.parse_args()


def uniq(lines, unique=False, dups=False, count=False):
    for line, chunk in itertools.groupby(lines):
        n = len(list(chunk))
        if unique and n > 1:
            continue
        if dups and n == 1:
            continue

        if count:
            print(n, line, end="")
        else:
            print(line, end="")

def main():
    args = parse_args()
    f = open(args.filename)
    uniq(f, unique=args.u, dups=args.d, count=args.c)

if __name__ == "__main__":
    main()

Problem 6.7

Disk Usage

Get the disk usage of the computer using the df command.

The unix command df prints the disk usage of every mounted filesystem on the computer.

$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
overlay         81106868 7855408  73235076  10% /
tmpfs              65536       0     65536   0% /dev
shm                65536       0     65536   0% /dev/shm
/dev/vda1       81106868 7855408  73235076  10% /home
tmpfs            2009932       0   2009932   0% /proc/acpi
tmpfs            2009932       0   2009932   0% /proc/scsi
tmpfs            2009932       0   2009932   0% /sys/firmware

Write a function df that invokes the command df, parses the output and returns the output as python values.

>>> df()
[
    {'filesystem': 'overlay', 'blocks': 81106868, 'used': 7855576, 'available': 73234908, 'percent_used': 10, 'path': '/'},
    {'filesystem': 'tmpfs', 'blocks': 65536, 'used': 0, 'available': 65536, 'percent_used': 0, 'path': '/dev'},
    {'filesystem': 'shm', 'blocks': 65536, 'used': 0, 'available': 65536, 'percent_used': 0, 'path': '/dev/shm'},
    {'filesystem': '/dev/vda1', 'blocks': 81106868, 'used': 7855576, 'available': 73234908, 'percent_used': 10, 'path': '/home'},
    {'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/proc/acpi'},
    {'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/proc/scsi'},
    {'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/sys/firmware'}
]

Please note that the values of blocks, used, available and percent_used are integers.

Hints:

You can use the os.popen command to run and command and read its output as a file.

>>> os.popen("seq 3").readlines()
['1\n', '2\n', '3\n']

In the above example, we are executing the command seq 3 and reading the output in Python.

Solution

import os


def parse_line(line):
    fs, blocks, used, available, percent, path = line.strip().split()
    return {
        "filesystem": fs,
        "blocks": int(blocks),
        "used": int(used),
        "available": int(available),
        "percent_used": int(percent.strip("%")),
        "path": path
    }

def df():
    lines = os.popen("df").readlines()
    return [parse_line(line) for line in lines[1:]]

Problem 6.8

Transcribe DNA into RNA

Transcription is a process in the necleus, that copies the DNA sequence into messenger RNA (mRNA). The mRNA is a molecule similar to DNA with 4 nucliotide bases, but with the base Uracil (U) inplace of Thymine (T).

Write a function transcribe that takes a DNA sequence as a string and transcribe it into RNA sequence.

>>> transcribe('ATGGCT')
'AUGGCU'

>>> transcribe('AAGGCC')
'AAGGCC'

Solution

# Write a function `transcribe` that takes a DNA sequence as a string 
# and transcribe it into RNA sequence.

def transcribe(dna):
    return dna.replace("T", "U")

Problem 6.9

Summarize FASTA file

Write a program fasta_summary.py to summarize a FASTA file by printing the length of the sequence and the description for every record in the file.

The program is expected to take a FASTA file as a command-line argument and print the summary.

$ cat files/sample1.fasta
> SEQUENCE.1
AAGGTTCC
> SEQUENCE.2
AGTC
AGTC
AGTC
AGTC

Here is what is expected when the program is called with the above file as argument.

$ python fasta_summary.py files/sample1.fasta
8 SEQUENCE.1
16 SEQUENCE.2

Hint:

You can read a fasta file using SeqIO.parse function from Biopython.

Solution

import sys

def read_fasta_file(filename):
    description = ""
    seq = ""
    records = []
    for line in open(filename):
        if line.startswith(">"):
            if description:
                records.append((description, seq))
            description = line[1:].strip()
            seq = ""
        else:
            seq = seq + line.strip()

    if description:
        records.append((description, seq))
    return records

filename = sys.argv[1]
for desc, seq in read_fasta_file(filename):
    print(len(seq), desc)

# import re
# text = open(filename).read()
# rx = re.compile(r"> (.*)\n((?:[^>].*\n?)+)", re.M)
# for desc, seq in rx.findall(text):
#     print(len(seq), desc)

Problem 6.10

Translate RNA

Write a function translate that takes an RNA sequence as a string and translates that into aminoacid sequences accodinging to the codon table.

The translation process, the RNA sequence is treated as a sequence of codons with 3 nucleotides in each codon. Each codon translates to one aminoacid, with the exception of UAG, UGA, and UAA, which are known as stop codons. Each aminoacid is represented as a single letter and the stop codon is represeted with a * character.

You can refer to DNA and RNA codon tables on Wikipedia for the codon table.

Your translate function should take an RNA sequence as input and return a sequence of aminoacids.

>>> translate("AUGAUCUCG")
'MIS'

In the above example, the RNA translation works as follows:

AUG -> M (Methionine)
AUC -> I (Isoleucine)
UCG -> S (Serine)

Here are some more examples:

>>> translate("AUGAUCUCGUAA")
'MIS*'
>>> translate("GUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG")
'VAIVMGR*KGAR*'

Please note that you are expected to implement this without using any other modules like Biopython.

Hint

You could make the codon table as a dictionary.

codon_table = {
 'AAA': 'K',
 'AAU': 'N',
 'AAG': 'K',
 'AAC': 'N',
 'AUA': 'I',
 'AUU': 'I',
 'AUG': 'M',
 'AUC': 'I',
 'AGA': 'R',
 'AGU': 'S',
 'AGG': 'R',
 'AGC': 'S',
 'ACA': 'T',
 'ACU': 'T',
 'ACG': 'T',
 'ACC': 'T',
 'UAA': '*',
 'UAU': 'Y',
 'UAG': '*',
 'UAC': 'Y',
 'UUA': 'L',
 'UUU': 'F',
 'UUG': 'L',
 'UUC': 'F',
 'UGA': '*',
 'UGU': 'C',
 'UGG': 'W',
 'UGC': 'C',
 'UCA': 'S',
 'UCU': 'S',
 'UCG': 'S',
 'UCC': 'S',
 'GAA': 'E',
 'GAU': 'D',
 'GAG': 'E',
 'GAC': 'D',
 'GUA': 'V',
 'GUU': 'V',
 'GUG': 'V',
 'GUC': 'V',
 'GGA': 'G',
 'GGU': 'G',
 'GGG': 'G',
 'GGC': 'G',
 'GCA': 'A',
 'GCU': 'A',
 'GCG': 'A',
 'GCC': 'A',
 'CAA': 'Q',
 'CAU': 'H',
 'CAG': 'Q',
 'CAC': 'H',
 'CUA': 'L',
 'CUU': 'L',
 'CUG': 'L',
 'CUC': 'L',
 'CGA': 'R',
 'CGU': 'R',
 'CGG': 'R',
 'CGC': 'R',
 'CCA': 'P',
 'CCU': 'P',
 'CCG': 'P',
 'CCC': 'P'
}

Solution

codon_table = {
 'AAA': 'K',
 'AAU': 'N',
 'AAG': 'K',
 'AAC': 'N',
 'AUA': 'I',
 'AUU': 'I',
 'AUG': 'M',
 'AUC': 'I',
 'AGA': 'R',
 'AGU': 'S',
 'AGG': 'R',
 'AGC': 'S',
 'ACA': 'T',
 'ACU': 'T',
 'ACG': 'T',
 'ACC': 'T',
 'UAA': '*',
 'UAU': 'Y',
 'UAG': '*',
 'UAC': 'Y',
 'UUA': 'L',
 'UUU': 'F',
 'UUG': 'L',
 'UUC': 'F',
 'UGA': '*',
 'UGU': 'C',
 'UGG': 'W',
 'UGC': 'C',
 'UCA': 'S',
 'UCU': 'S',
 'UCG': 'S',
 'UCC': 'S',
 'GAA': 'E',
 'GAU': 'D',
 'GAG': 'E',
 'GAC': 'D',
 'GUA': 'V',
 'GUU': 'V',
 'GUG': 'V',
 'GUC': 'V',
 'GGA': 'G',
 'GGU': 'G',
 'GGG': 'G',
 'GGC': 'G',
 'GCA': 'A',
 'GCU': 'A',
 'GCG': 'A',
 'GCC': 'A',
 'CAA': 'Q',
 'CAU': 'H',
 'CAG': 'Q',
 'CAC': 'H',
 'CUA': 'L',
 'CUU': 'L',
 'CUG': 'L',
 'CUC': 'L',
 'CGA': 'R',
 'CGU': 'R',
 'CGG': 'R',
 'CGC': 'R',
 'CCA': 'P',
 'CCU': 'P',
 'CCG': 'P',
 'CCC': 'P'
}

def group(values, n):
    return [values[i:i+n] for i in range(0, len(values), n)]

def translate(seq):
    result = [codon_table[codon] for codon in group(seq, 3)]
    return "".join(result)

Problem Set 6

Solutions to all problems in the Problem Set 6.

Problem 7.1

Exchange Rate

Write a command-line program to find the currency exchange rate using exchangerate.host API (Please look at the "Convert currency" section of the API).

The program should take the currency symbol and a date as arguments and print the exchange rate of that symbol against USD on the specified date.

$ python exchange_rate.py INR 2021-01-01
73.092243

The base currency can be is changed from USD to another currency using optional flag -b or --base.

$ python exchange_rate.py -b EUR INR 2021-01-01
88.995799

The program should provide --help option to display the available options and usage.

Hint: Use argparse module to parse the arguments.

Solution

import argparse
import requests

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("-b", "--base", help="base currency", default="USD")
    p.add_argument("target_currency", help="target currency")
    p.add_argument("date", help="date")
    return p.parse_args()

def exchange_rate(base, target, date):
    params = {
        "from": base,
        "to": target,
        "date": date
    }
    url = 'https://api.exchangerate.host/convert'
    response = requests.get(url, params=params)
    data = response.json()
    return data['result']

def main():
    args = parse_args()
    d = exchange_rate(args.base, args.target_currency, args.date)
    print(d)

if __name__ == "__main__":
    main()

Problem 7.2

FASTQ to FASTA converter

Write a program fastq_to_fasta.py that takes path to a FASTQ file as command-line argument and converts that into a FASTA file. The result will be written in a seperate with the same name, but with .fasta extension.

Please note that your program should also support fastq files with multiple records.

$ python fastq_to_fasta.py files/sample1.fastq
converted files/sample1.fastq to files/sample1.fasta

$ python fastq_to_fasta.py files/sample2.fastq
converted files/sample2.fastq to files/sample2.fasta

Hints:

Use SeqIO from Biopython to read fastq files and write fasta files.

Solution

import sys
from pathlib import Path
from Bio import SeqIO

filename = sys.argv[1]
dest = Path(filename).with_suffix(".fasta")

records = SeqIO.parse(filename, "fastq")
SeqIO.write(records, dest, "fasta")

Problem 7.3

Genbank to FASTA

Write a program genbank_to_fasta.py to convert a file in Genbank format to FASTA format.

The program should take a genbank file as a command-line argument and write it as a FASTA file. The optional flag -o or --output can be used to specify the destination path. If it is not specified, the destination path is constructed from the source file path by replacing the extension with .fasta.

$ python genbank_to_fasta.py files/ls_orchid.gbk
converted files/ls_orchid.gbk to files/ls_orchid.fasta

$ python genbank_to_fasta.py files/ls_orchid.gbk -o output.fasta
converted files/ls_orchid.gbk to output.fasta

$ python genbank_to_fasta.py --help

usage: genbank_to_fasta.py [-h] [-o] filename

...

Solution

from Bio import SeqIO
import argparse
from pathlib import Path

p = argparse.ArgumentParser()
p.add_argument("filename", help="the genbank file to convert")
p.add_argument("-o", "--output", help="output file")
args = p.parse_args()

destination = args.output or Path(args.filename).with_suffix(".fasta")

records = SeqIO.parse(args.filename, "genbank")
SeqIO.write(records, destination, "fasta")

print(f"converted {args.filename} to {destination}")

Problem 7.4

Sort Sequences by GC Fraction

Sort sequences in a FASTA file by their GC fraction.

The program should take a fasta file as argument, sort the sequences in that file using their GC fraction and write them in FASTA format to file gcsort.fasta.

The progran should also support the following flags.

-o, --output to specify the file to write the output instead of writing to gcsort.fasta.

-r, --reverse to sort the sequences in the reverse order (descending order of gc fraction)

$ cat files/multi.fasta
> Seq1
AGTAGG
> Seq2
AGTCAGTC
AGTCATA
> Seq3
AGTCAGTCGC

$ python gcsort.py files/multi.fasta
Wrote the sorted sequences to gcsort.fasta

$ cat gcsort.fasta
> Seq2
AGTCAGTCAGTCATA
> Seq1
AGTAGG
> Seq3
AGTCAGTCGC

The -r option is used to sort in the reverse order.

$ python gcsort.py -r files/multi.fasta
Wrote the sorted sequences to gcsort.fasta

$ cat gcsort.fasta
> Seq3
AGTCAGTCGC
> Seq1
AGTAGG
> Seq2
AGTCAGTCAGTCATA

The -o flag is used to specify the output path.

$ python gcsort.py  files/multi.fasta -o a.fasta
Wrote the sorted sequences to a.fasta

$ cat a.fasta
> Seq2
AGTCAGTCAGTCATA
> Seq1
AGTAGG
> Seq3
AGTCAGTCGC

Solution

import argparse
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction

p = argparse.ArgumentParser()
p.add_argument("filename", help="FASTA file to sort")
p.add_argument("-r", "--reverse", help="sort in reverse order", action="store_true", default=False)
p.add_argument("-o", "--output", help="path of the output file", default="gcsort.fasta")
args = p.parse_args()

def gcsort(source_path, dest_path, reverse=False):
    records = SeqIO.parse(source_path, "fasta")
    records = sorted(records, key=gc_fraction, reverse=reverse)
    SeqIO.write(records, dest_path, "fasta")
    print("Wrote the sorted sequences to", dest_path)

gcsort(args.filename, args.output, reverse=args.reverse)

Problem 7.5

The dig tool

The dig command is a popular tool for querying DNS name servers. You task is to expose the dig command in Python as a function dig.

The dig command

Let's look at the output of the dig command to understand how it works.

$ dig amazon.com

; <<>> DiG 9.18.1-1ubuntu1.2-Ubuntu <<>> amazon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49697
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;amazon.com.			IN	A

;; ANSWER SECTION:
amazon.com.		128	IN	A	205.251.242.103
amazon.com.		128	IN	A	54.239.28.85
amazon.com.		128	IN	A	52.94.236.248

;; Query time: 16 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Wed Jan 18 13:45:30 IST 2023
;; MSG SIZE  rcvd: 87

The output is too verbose. We can silence everything else and show only answer using:

$ dig +noall +answer amazon.com
amazon.com.		128	IN	A	205.251.242.103
amazon.com.		128	IN	A	54.239.28.85
amazon.com.		128	IN	A	52.94.236.248

We used the dig command to query the name server to look up for amazon.com. By default, it queries for records of type A, which stands for IPv4 Address.

The DNS server responded with three records matching the domain name amazon.com. Each entry contains 5 fields, namely the domain name, TTL, class, record type and content.

See how to read dig output on wizard zines for more details.

Advanced Usage

Record Type

We can optionally pass the record type to dig to query of other types of DNS records like MX (mail exchange), TXT (text notes), NS (name server) etc.

The following command, queries for records of type MX.

$ dig +noall +answer amazon.com mx
amazon.com.		766	IN	MX	5 amazon-smtp.amazon.com.

In case of MX records, the content contains the priority and the mail server name. There could be more than one entry.

The following command queries for records of type NS.

$ dig +noall +answer amazon.com ns
amazon.com.		1626	IN	NS	pdns6.ultradns.co.uk.
amazon.com.		1626	IN	NS	ns3.p31.dynect.net.
amazon.com.		1626	IN	NS	ns4.p31.dynect.net.
amazon.com.		1626	IN	NS	ns1.p31.dynect.net.
amazon.com.		1626	IN	NS	ns2.p31.dynect.net.
amazon.com.		1626	IN	NS	pdns1.ultradns.net.

Server

By default, the dig command queries the DNS server configured in the system. However, we can explicitly pass a server to query it.

$ dig +noall +answer amazon.com @8.8.8.8
amazon.com.		128	IN	A	205.251.242.103
amazon.com.		128	IN	A	54.239.28.85
amazon.com.		128	IN	A	52.94.236.248

The 8.8.8.8 is the Google DNS server. The output would look exactly the same in the normal times, but it is very handy to troubleshoot DNS issues with your domain by querying various known DNS server.

There are some hobby DNS servers like dns.toys that provide useful utilities over dns.

$ dig +noall +answer mumbai.weather @dns.toys
mumbai.			1	IN	TXT	"Mumbai (IN)" "30.40C (86.72F)" "32.10% hu." "clearsky_day" "14:30, Wed"
mumbai.			1	IN	TXT	"Mumbai (IN)" "27.90C (82.22F)" "42.80% hu." "clearsky_day" "16:30, Wed"
mumbai.			1	IN	TXT	"Mumbai (IN)" "23.00C (73.40F)" "71.40% hu." "clearsky_night" "18:30, Wed"
mumbai.			1	IN	TXT	"Mumbai (IN)" "20.40C (68.72F)" "90.50% hu." "clearsky_night" "20:30, Wed"
mumbai.			1	IN	TXT	"Mumbai (IN)" "18.80C (65.84F)" "90.40% hu." "clearsky_night" "22:30, Wed"

Summary of Usage

The dig command is used as follows:

dig options domain-name record-type @server

The options, record-type and server are optional.

The Python API

You task is to implement a Python function dig that calls the dig command with appropriate arguments, parse the output and returns the result as a Python data structure.

The funtion takes the doman name as argument and two optional arguments, record_type and server.

Sample Usage:

>>> dig("amazon.com")
[
    {"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "205.251.242.103"},
    {"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "54.239.28.85"},
    {"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "52.94.236.248"}
]

>>> dig("amazon.com", record_type="MX")
[
  {"name": "amazon.com", "ttl": 498, "class": "IN", "record_type": "MX", "content": "5 amazon-smtp.amazon.com."}
]

>>> dig("amazon.com", record_type="MX", server="8.8.8.8")
[
  {"name": "amazon.com", "ttl": 498, "class": "IN", "record_type": "MX", "content": "5 amazon-smtp.amazon.com."}
]

>>> dig("mumbai.weather", server="dns.toys")
[
    {"name": "mumbai.", "ttl": 1, "class": "IN", "record_type": "TXT", "content": '"Mumbai (IN)" "30.40C (86.72F)" "32.10% hu." "clearsky_day" "14:30, Wed"'},
    ...
]

Solution

import subprocess

def parse_record(line):
    name, ttl, klass, record_type, content = line.strip().split(None, 4)
    return {
        "name": name,
        "ttl": int(ttl),
        "record_type": record_type,
        "class": klass,
        "content": content
    }

def dig(name, record_type="A", server=None):
    """Python interface to the dig command.
    """
    cmd = ["dig", '+noall', '+answer', name, record_type]
    if server:
        cmd.append(f"@{server}")
    print(cmd)
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True)
    return [parse_record(line) for line in p.stdout]