Preface
Problem Set 0
Solutions to all problems in the Problem Set 0.
Problem 1.1
Print Five
Write a python program to print numbers from 1 to 5.
Solution
print(1)
print(2)
print(3)
print(4)
print(5)
Discussion
This problem is given to make the participants explore the interface to solve problems and submit them.
The problem as such is very simple. You just need to print numbers 1 to 5, with mutiple print statements.
print(1)
print(2)
print(3)
print(4)
print(5)
If you know how to use a for
loop in Python, you can do that in a loop by providing the numbers 1, 2, 3, 4, and 5 as a list.
for n in [1, 2, 3, 4, 5]:
print(n)
Or you could use the range
function to create the sequence of numbers instead of creating the list manually.
for n in range(1, 6):
print(n)
The call range(n)
gives n
numbers from 0
to n-1
. If you want numbers from 1
to n
, we need to use range(1, n+1)
.
Problem 1.2
Product of Numbers
Compute the product of the list of numbers mentioned on the top of the program using a for loop and print the result.
Solution
# Do not change this line
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Compute the product of the list of numbers mentioned above using
# a for loop and print the result.
# Your code below this line
result = 1
for n in numbers:
result *= n
print(result)
Problem 1.3
Product of Even Numbers
Compute the product of even numbers among the list of numbers mentioned on the top of the program and print the result.
If you can if a number n
is even or not by checking n % 2 == 0
. For example:
n = 10
if n % 2 == 0:
print("n is even")
else:
print("n is odd")
Solution
# Do not change this line
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Compute the product of even numbers among the list of numbers
# mentioned above and print the result.
# Your code below this line
result = 1
for n in numbers:
if n % 2 == 0:
result *= n
print(result)
Problem 1.4
Square
Write a function square
to compute the square of a number.
>>> square(4)
16
Solution
def square(n):
return n*n
Problem 1.5
Mean
Write a function mean
to compute the arthematic mean of a list of numbers.
The arthemetic mean of N
numbers is computed by adding all the N
numbers and dividing the total by N
.
The function takes the list of numbers as argument and returns the mean.
>>> mean([1, 2, 3, 4, 5])
3.0
Solution
def mean(numbers):
return sum(numbers)/len(numbers)
Discussion
The solution is quite straight-forward. We use the built-in function sum
to compute the total and divide it by the number of numbers, which is computed using the built-in function len
.
def mean(numbers):
return sum(numbers) / len(numbers)
You could also write that in multiple lines, if you find that more comfortable.
def mean(numbers):
total = sum(numbers)
n = len(numbers)
return total / n
Problem Set 1
Solutions to all problems in the Problem Set 1.
Problem 2.1
Add Two Numbers
Write a program add.py
that takes two numbers as command-line arguments and prints their sum.
$ python add.py 3 4
7
$ python add.py 10 20
30
Solution
import sys
a = int(sys.argv[1])
b = int(sys.argv[2])
print(a+b)
Problem 2.2
Make Header
Write a program header.py
that takes a word as command-line argument and prints it as header as shown below with the word converted to upper case.
$ python header.py python
======
PYTHON
======
$ python header.py python-foundation-course
========================
PYTHON-FOUNDATION-COURSE
========================
Hint:
>>> name = 'python'
>>> name.upper()
'PYTHON'
Solution
import sys
word = sys.argv[1].upper()
n = len(word)
print("=" * n)
print(word)
print("=" * n)
Discussion
The program deals gets the word to print as a command-line argument. So we need to use the sys
module to access the arguments.
import sys
The word will be the first command-line argument and we need to convert it into upper case.
word = sys.argv[1].upper()
Now we need to print a line above and below the word. Line made of =
characters and we use as many of them as the number of characters in the word, so that it aligns well with the word. For that we need to find the length of the word.
n = len(word)
Now, let's print the line above the word.
print("=" * n)
Followed by the word it self.
print(word)
And the line below the word.
print("=" * n)
Problem 2.3
Text in a Box
Write a program box.py
that takes word as a command-line argument and prints the word in a box as shown below.
$ python box.py python
+--------+
| python |
+--------+
Please note that there should be exactly one space on either side of the text in the box.
Solution
import sys
def box(word):
n = len(word) + 2
line = "-" * n
header = f"+{line}+"
print(header)
print(f"| {word} |")
print(header)
word = sys.argv[1]
box(word)
Problem 2.4
Upper Case
Write a program uppercase.py that takes a filename as command-line argument and prints all the contents of the file in uppercase.
$ cat files/five.txt
One
Two
Three
Four
Five
$ python uppercase.py files/five.txt
ONE
TWO
THREE
FOUR
FIVE
Solution
import sys
path = sys.argv[1]
contents = open(path).read()
print(contents.upper())
Problem 2.5
Sort a file
Write a program sort.py
that takes filename as a command-line argument and prints lines in that file in the sorted order.
There are two sample files files/names.txt
and files/blake.txt
to try your program against.
$ cat files/names.txt
bob
dave
charlie
alice
$ python sort.py files/names.txt
alice
bob
charlie
dave
$ cat files/blake.txt
Piping down the valleys wild
Piping songs of pleasant glee
On a cloud I saw a child.
And he laughing said to me.
Pipe a song about a Lamb;
So I piped with merry chear,
Piper pipe that song again—
So I piped, he wept to hear.
$ python sort.py files/blake.txt
And he laughing said to me.
On a cloud I saw a child.
Pipe a song about a Lamb;
Piper pipe that song again—
Piping down the valleys wild
Piping songs of pleasant glee
So I piped with merry chear,
So I piped, he wept to hear.
Solution
import sys
path = sys.argv[1]
lines = open(path).readlines()
for line in sorted(lines):
print(line, end="")
Problem 2.6
Reverse Lines
Write a program reverse.py
that takes a filename as command-line argument and prints all the lines in that file in the reverse order.
$ cat files/five.txt
one
two
three
four
five and the last line!
$ python reverse.py files/five.txt
five and the last line!
four
three
two
one
Solution
import sys
filename = sys.argv[1]
lines = open(filename).readlines()[::-1]
for line in lines:
print(line, end="")
Problem Set 2
Solutions to all problems in the Problem Set 2.
Problem 3.1
Sum File
Write a program sumfile.py
that takes a filename as argument and prints sum of all numbers in the file. It is assumed that the file contains one number per line.
$ python sumfile.py files/ten.txt
55
Solution
import sys
filename = sys.argv[1]
numbers = [int(line) for line in open(filename)]
print(sum(numbers))
Problem 3.2
Grep Command
Implement Unix command grep in Python.
Write a program grep.py
that takes a pattern and a file as command-line arguments and print all the lines in the file that contain that pattern.
The pattern could be any text and there is no need to support regular expressions.
$ cat files/zen.txt
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
$ python grep.py never files/zen.txt
Errors should never pass silently.
Now is better than never.
Although never is often better than *right* now.
$ grep the files/zen.txt
Special cases aren't special enough to break the rules.
In the face of ambiguity, refuse the temptation to guess.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Solution
import sys
pat = sys.argv[1]
filename = sys.argv[2]
for line in open(filename):
if pat in line:
print(line, end="")
Problem 3.3
Center Align
Write a program center_align.py
to center align all lines in the given file.
$ cat poem.txt
There was an Old Man with a beard
Who said, "It is just as I feared!
Two Owls and a Hen,
Four Larks and a Wren,
Have all built their nests in my beard!"
$ python center_align.py poem.txt
There was an Old Man with a beard
Who said, "It is just as I feared!
Two Owls and a Hen,
Four Larks and a Wren,
Have all built their nests in my beard!"
Hint:
>>> "hello".center(7)
" hello "
>>> "hello".center(9)
" hello "
The built-in function max
can take a list of numbers and return the maximum value out of them.
>>> max([1, 5, 2, 7, 4])
7
Solution
import sys
filename = sys.argv[1]
lines = open(filename).readlines()
if lines:
n = max(len(line.strip()) for line in lines)
for line in lines:
print(line.strip().center(n))
Problem 3.4
Random DNA Sequence
Write a function random_dna_sequence
to generate a DNA sequence of given length by taking nucleotides selected at random.
>>> random_dna_sequence(4)
'TCAC'
>>> random_dna_sequence(10)
'GCTGCTGGCA'
Hint: Use of the random.choice
function from random
module which selects random element from a list/string.
>>> random.choice("ABCD")
'B'
>>> random.choice("ABCD")
'C'
>>> random.choice("ABCD")
'C'
>>> random.choice("ABCD")
'D'
Solution
import random
def random_dna_sequence(n):
return "".join(random.choice("ATGC") for i in range(n))
Problem Set 3
Solutions to all problems in the Problem Set 3.
Problem 4.1
Group
Write a function group
that take a list of values and splits into smaller lists of given size.
>>> group([1, 2, 3, 4, 5, 6, 7, 8, 9], 3)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> group([1, 2, 3, 4, 5, 6, 7, 8, 9], 4)
[[1, 2, 3, 4], [5, 6, 7, 8], [9]]
Solution
def group(values, n):
return [values[i:i+n] for i in range(0, len(values), n)]
Problem 4.2
Sum of Arguments
Write a program sum.py
that takes one or more numbers as command-line argument and prints their sum.
$ python sum.py 1 2 3 4 5
15
Solution
import sys
args = sys.argv[1:]
numbers = [int(a) for a in args]
print(sum(numbers))
Problem 4.3
Paste
Write a program paste.py
that takes two files as command-line arguments and contacenates the corresponding lines in those two files with a tab character and prints it.
For example, of the first file files/a.txt
has the following contents:
A
B
C
D
and the second file files/b.txt
has the following:
1
2
3
4
The output should be:
$ python paste.py files/a.txt files/b.txt
A 1
B 2
C 3
D 4
Note that the first line is "A\t1"
.
For simplicity, assume that both the files have exactly same number of lines.
Hint:
You can use the strip
method on a string to remove the new line character.
>>> "a\n".strip("\n")
"a"
Solution
import sys
f1 = sys.argv[1]
f2 = sys.argv[2]
for left, right in zip(open(f1), open(f2)):
left = left.strip("\n")
right = right.strip("\n")
print(f"{left}\t{right}")
Problem Set 4
Solutions to all problems in the Problem Set 4.
Problem 5.1
Sequence of Numbers
Write a program seq.py
that takes a number n
as argument and prints numbers from 1
to n
. It should support the following two flags.
-s START --start START
print number from START to n instead of 1 to n
-r --reverse
print the numbers in the reverse order
The program should also print approprate help message when used with -h
or --help
flags.
Use the standard library module argparse
for doing this. You may want to checkout the argparse tutorial to know how to use that module.
Expected Output:
$ python seq.py 5
1
2
3
4
5
$ python seq.py -s 3 5
3
4
5
$ python seq.py -r -s 3 5
5
4
3
Solution
import argparse
p = argparse.ArgumentParser()
p.add_argument("n", type=int)
p.add_argument("-s", "--start", type=int, default=1)
p.add_argument("-r", "--reverse", action="store_true", default=False)
args = p.parse_args()
numbers = list(range(args.start, args.n+1))
if args.reverse:
numbers = numbers[::-1]
for n in numbers:
print(n)
Problem 5.2
Skip lines in a file
Write a program skip.py
to print the contents of a file after skipping the first few lines.
The program takes an optional flag -n
to indicate the number of lines to skip, which is considered as 5 when not specified.
The program takes a filename as argument and prints the contents of this file after skipping the number of lines specified by -n
.
$ python skip.py files/ten.txt
6
7
8
9
10
$ python skip.py -n 8 files/ten.txt
9
10
Hint: Use argparse
module.
Solution
import argparse
p = argparse.ArgumentParser()
p.add_argument("-n", type=int, default=5, help="number of lines to skip")
p.add_argument("filename", help="name of the file to consider")
args = p.parse_args()
lines = open(args.filename).readlines()
for line in lines[args.n:]:
print(line, end="")
Problem 5.3
Split a File
Write a program split.py
that splits a large file into multiple smaller files. The program should take a filename and the number of lines as arguments and write multiple small files each containing the specified number of lines (The last one may have smaller number of lines).
$ python split.py files/100.txt 30
writing files/100-part1.txt
writing files/100-part2.txt
writing files/100-part3.txt
writing files/100-part4.txt
Solution
"""Program to split a file into smaller parts.
It takes a filename and the number of lines in each part as
command-line arguments and splits the file into smaller parts
with each file having no more than the specified number of lines.
USAGE:
$ python split.py large-file.txt 100
writing large-file-part1.txt
writing large-file-part2.txt
...
"""
import sys
def group(values, n):
return [values[i:i+n] for i in range(0, len(values), n)]
def splitfile(filename, chunk_size):
lines = open(filename).readlines()
return group(lines, chunk_size)
def write_lines(filename, lines):
"""Write a list of lines to the a file.
"""
print("writing", filename)
with open(filename, "w") as f:
f.writelines(lines)
def generate_part_filename(filename, index):
"""Generates a new filename by adding index as suffix to the filename.
>>> generate_part_filename("a.txt", 1)
"a-part1.txt"
"""
nameparts = filename.split(".", 1)
if len(nameparts) == 2:
name, ext = nameparts
ext = "." + ext
else:
name = filename
ext = ""
return f"{name}-part{index}{ext}"
def write_small_files(filename, file_chunks):
for i, chunk in enumerate(file_chunks, start=1):
new_filename = generate_part_filename(filename, i)
write_lines(new_filename, chunk)
def main():
filename = sys.argv[1]
chunk_size = int(sys.argv[2])
file_chunks = splitfile(filename, chunk_size)
write_small_files(filename, file_chunks)
if __name__ == "__main__":
main()
Problem 5.4
Longest Argument
Write a program longest.py
that takes a one or more words as command-line arguments and prints the longest word.
$ python longest.py joy of programming
programming
$ python longest.py this too shall pass
shall
Hint: You can use list slicing to get all the arguments. For example sys.argv[1:]
will give you all arguments other than the program name.
Solution
import sys
args = sys.argv[1:]
print(max(args, key=len))
Problem Set 5
Solutions to all problems in the Problem Set 5.
Problem 6.1
Find Duplicates
Write a function dups
that takes a list of values as argument and finds all the elements that appear more than once in the list.
>>> dups([1, 2, 1, 3, 2, 5])
[1, 2]
>>> dups([1, 2, 3, 4, 5])
[]
>>> dups([1, 1, 1, 1])
[1]
Solution
def dups(numbers):
seen = []
result = []
for n in numbers:
if n in seen and n not in result:
result.append(n)
seen.append(n)
return result
Problem 6.2
Count File Extensions
Write a program extcount.py
to count the number of files per extension. The program should take path to a directory as argument and print the count and extension for each available extension, sorted by the count. Files without any extension should be ignored.
$ python extcount.py files
4 py
3 txt
2 csv
1 yml
$ python extcount.py files/data
2 txt
1 csv
Solution
import sys
import os
from collections import Counter
path = sys.argv[1]
filenames = os.listdir(path)
extensions = [f.split(".")[-1] for f in filenames if "." in f]
for ext, count in Counter(extensions).most_common():
print(count, ext)
Problem 6.3
Line with Most Words
Write a program line_with_most_words.py
that takes a filename as command-line argument and prints the line with the most number of words from the file.
$ cat files/words.txt
one
one two
one two three
one two three four
one two three four five
two three four five
three four five
four five
five
one-two-three-four-five-six-seven
$ python line_with_most_words.py files/words.txt
one two three four five
Solution
import sys
filename = sys.argv[1]
lines = open(filename).readlines()
def word_count(line):
return len(line.split())
longest = max(lines, key=word_count)
print(longest.strip("\n"))
Problem 6.4
Reverse Words
Write a function reverse_words
that takes a sentence and returns a new sentence with all the words in the reserse order.
>>> reverse_words("joy of programming")
'programming of joy'
>>> reverse_words("less is more")
'more is less'
>>> reverse_words("road goes ever on and on")
'on and on ever goes road'
Please note that only the order of the words in the sentence is reversed, not the letters in each word.
Solution
def reverse_words(sentence):
words = sentence.split()
return " ".join(words[::-1])
Problem 6.5
Sales by Day
Write a program sales.py
to compute the total sale amount per day, given a text file with details of transactions with three columns Order Id, Date and Amount.
Here is a sample input file.
$ cat files/orders.txt
1001 2023-01-01 100
1002 2023-01-01 50
1003 2023-01-02 50
1004 2023-01-02 150
1005 2023-01-01 25
The file contains multiple transactions, one in row. Each row will have three fields, Order Id, Date and Amount, seperated by a space.
YOur program should take the order file as a command-line argument and print the total same amount per day. The output should be sorted by date.
$ python sales.py files/orders.txt
2023-01-01 175
2023-01-02 200
Solution
import sys
filename = sys.argv[1]
sales = {}
for line in open(filename):
order_id, date, amount = line.strip().split()
sales[date] = sales.get(date, 0) + int(amount)
for date in sorted(sales):
print(date, sales[date])
Problem 6.6
Uniq Command
Write a program uniq.py
that takes filename a as argument, prints the lines to the output, ignoring the identical adjacent input lines.
The command should support the following command-line flags.
-d Only output lines that are repeated in the input.
-u Only output lines that are not repeated in the input.
-c Precede each output line with the count of the number of times
the line occurred in the input, followed by a single space.
The program should also print approprate help message when used with -h
or --help
flags.
Use the standard library module argparse for doing this. You may want to checkout the argparse tutorial to know how to use that module.
A sample input file files/animals.txt
with the following content is provided along with this problem.
cat
cat
cat
dog
dog
cat
rat
Expected Output
$ python uniq.py files/animals.txt
cat
dog
cat
rat
$ python uniq.py -d files/animals.txt
cat
dog
$ python uniq.py -u files/animals.txt
cat
rat
$ python uniq.py -c files/animals.txt
3 cat
2 dog
1 cat
1 rat
Hints
Look at the uniq
command of unix.
Solution
import argparse
import itertools
def parse_args():
p = argparse.ArgumentParser()
p.add_argument("filename")
p.add_argument("-u", default=False, action="store_true")
p.add_argument("-d", default=False, action="store_true")
p.add_argument("-c", default=False, action="store_true")
return p.parse_args()
def uniq(lines, unique=False, dups=False, count=False):
for line, chunk in itertools.groupby(lines):
n = len(list(chunk))
if unique and n > 1:
continue
if dups and n == 1:
continue
if count:
print(n, line, end="")
else:
print(line, end="")
def main():
args = parse_args()
f = open(args.filename)
uniq(f, unique=args.u, dups=args.d, count=args.c)
if __name__ == "__main__":
main()
Problem 6.7
Disk Usage
Get the disk usage of the computer using the df command.
The unix command df
prints the disk usage of every mounted filesystem on the computer.
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
overlay 81106868 7855408 73235076 10% /
tmpfs 65536 0 65536 0% /dev
shm 65536 0 65536 0% /dev/shm
/dev/vda1 81106868 7855408 73235076 10% /home
tmpfs 2009932 0 2009932 0% /proc/acpi
tmpfs 2009932 0 2009932 0% /proc/scsi
tmpfs 2009932 0 2009932 0% /sys/firmware
Write a function df
that invokes the command df
, parses the output and returns the output as python values.
>>> df()
[
{'filesystem': 'overlay', 'blocks': 81106868, 'used': 7855576, 'available': 73234908, 'percent_used': 10, 'path': '/'},
{'filesystem': 'tmpfs', 'blocks': 65536, 'used': 0, 'available': 65536, 'percent_used': 0, 'path': '/dev'},
{'filesystem': 'shm', 'blocks': 65536, 'used': 0, 'available': 65536, 'percent_used': 0, 'path': '/dev/shm'},
{'filesystem': '/dev/vda1', 'blocks': 81106868, 'used': 7855576, 'available': 73234908, 'percent_used': 10, 'path': '/home'},
{'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/proc/acpi'},
{'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/proc/scsi'},
{'filesystem': 'tmpfs', 'blocks': 2009932, 'used': 0, 'available': 2009932, 'percent_used': 0, 'path': '/sys/firmware'}
]
Please note that the values of blocks
, used
, available
and percent_used
are integers.
Hints:
You can use the os.popen
command to run and command and read its output as a file.
>>> os.popen("seq 3").readlines()
['1\n', '2\n', '3\n']
In the above example, we are executing the command seq 3
and reading the output in Python.
Solution
import os
def parse_line(line):
fs, blocks, used, available, percent, path = line.strip().split()
return {
"filesystem": fs,
"blocks": int(blocks),
"used": int(used),
"available": int(available),
"percent_used": int(percent.strip("%")),
"path": path
}
def df():
lines = os.popen("df").readlines()
return [parse_line(line) for line in lines[1:]]
Problem 6.8
Transcribe DNA into RNA
Transcription is a process in the necleus, that copies the DNA sequence into messenger RNA (mRNA). The mRNA is a molecule similar to DNA with 4 nucliotide bases, but with the base Uracil (U) inplace of Thymine (T).
Write a function transcribe
that takes a DNA sequence as a string and transcribe it into RNA sequence.
>>> transcribe('ATGGCT')
'AUGGCU'
>>> transcribe('AAGGCC')
'AAGGCC'
Solution
# Write a function `transcribe` that takes a DNA sequence as a string
# and transcribe it into RNA sequence.
def transcribe(dna):
return dna.replace("T", "U")
Problem 6.9
Summarize FASTA file
Write a program fasta_summary.py
to summarize a FASTA file by printing the length of the sequence and the description for every record in the file.
The program is expected to take a FASTA file as a command-line argument and print the summary.
$ cat files/sample1.fasta
> SEQUENCE.1
AAGGTTCC
> SEQUENCE.2
AGTC
AGTC
AGTC
AGTC
Here is what is expected when the program is called with the above file as argument.
$ python fasta_summary.py files/sample1.fasta
8 SEQUENCE.1
16 SEQUENCE.2
Hint:
You can read a fasta file using SeqIO.parse
function from Biopython.
Solution
import sys
def read_fasta_file(filename):
description = ""
seq = ""
records = []
for line in open(filename):
if line.startswith(">"):
if description:
records.append((description, seq))
description = line[1:].strip()
seq = ""
else:
seq = seq + line.strip()
if description:
records.append((description, seq))
return records
filename = sys.argv[1]
for desc, seq in read_fasta_file(filename):
print(len(seq), desc)
# import re
# text = open(filename).read()
# rx = re.compile(r"> (.*)\n((?:[^>].*\n?)+)", re.M)
# for desc, seq in rx.findall(text):
# print(len(seq), desc)
Problem 6.10
Translate RNA
Write a function translate
that takes an RNA sequence as a string and translates that into aminoacid sequences accodinging to the codon table.
The translation process, the RNA sequence is treated as a sequence of codons with 3 nucleotides in each codon. Each codon translates to one aminoacid, with the exception of UAG, UGA, and UAA, which are known as stop codons. Each aminoacid is represented as a single letter and the stop codon is represeted with a *
character.
You can refer to DNA and RNA codon tables on Wikipedia for the codon table.
Your translate
function should take an RNA sequence as input and return a sequence of aminoacids.
>>> translate("AUGAUCUCG")
'MIS'
In the above example, the RNA translation works as follows:
AUG -> M (Methionine)
AUC -> I (Isoleucine)
UCG -> S (Serine)
Here are some more examples:
>>> translate("AUGAUCUCGUAA")
'MIS*'
>>> translate("GUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG")
'VAIVMGR*KGAR*'
Please note that you are expected to implement this without using any other modules like Biopython.
Hint
You could make the codon table as a dictionary.
codon_table = {
'AAA': 'K',
'AAU': 'N',
'AAG': 'K',
'AAC': 'N',
'AUA': 'I',
'AUU': 'I',
'AUG': 'M',
'AUC': 'I',
'AGA': 'R',
'AGU': 'S',
'AGG': 'R',
'AGC': 'S',
'ACA': 'T',
'ACU': 'T',
'ACG': 'T',
'ACC': 'T',
'UAA': '*',
'UAU': 'Y',
'UAG': '*',
'UAC': 'Y',
'UUA': 'L',
'UUU': 'F',
'UUG': 'L',
'UUC': 'F',
'UGA': '*',
'UGU': 'C',
'UGG': 'W',
'UGC': 'C',
'UCA': 'S',
'UCU': 'S',
'UCG': 'S',
'UCC': 'S',
'GAA': 'E',
'GAU': 'D',
'GAG': 'E',
'GAC': 'D',
'GUA': 'V',
'GUU': 'V',
'GUG': 'V',
'GUC': 'V',
'GGA': 'G',
'GGU': 'G',
'GGG': 'G',
'GGC': 'G',
'GCA': 'A',
'GCU': 'A',
'GCG': 'A',
'GCC': 'A',
'CAA': 'Q',
'CAU': 'H',
'CAG': 'Q',
'CAC': 'H',
'CUA': 'L',
'CUU': 'L',
'CUG': 'L',
'CUC': 'L',
'CGA': 'R',
'CGU': 'R',
'CGG': 'R',
'CGC': 'R',
'CCA': 'P',
'CCU': 'P',
'CCG': 'P',
'CCC': 'P'
}
Solution
codon_table = {
'AAA': 'K',
'AAU': 'N',
'AAG': 'K',
'AAC': 'N',
'AUA': 'I',
'AUU': 'I',
'AUG': 'M',
'AUC': 'I',
'AGA': 'R',
'AGU': 'S',
'AGG': 'R',
'AGC': 'S',
'ACA': 'T',
'ACU': 'T',
'ACG': 'T',
'ACC': 'T',
'UAA': '*',
'UAU': 'Y',
'UAG': '*',
'UAC': 'Y',
'UUA': 'L',
'UUU': 'F',
'UUG': 'L',
'UUC': 'F',
'UGA': '*',
'UGU': 'C',
'UGG': 'W',
'UGC': 'C',
'UCA': 'S',
'UCU': 'S',
'UCG': 'S',
'UCC': 'S',
'GAA': 'E',
'GAU': 'D',
'GAG': 'E',
'GAC': 'D',
'GUA': 'V',
'GUU': 'V',
'GUG': 'V',
'GUC': 'V',
'GGA': 'G',
'GGU': 'G',
'GGG': 'G',
'GGC': 'G',
'GCA': 'A',
'GCU': 'A',
'GCG': 'A',
'GCC': 'A',
'CAA': 'Q',
'CAU': 'H',
'CAG': 'Q',
'CAC': 'H',
'CUA': 'L',
'CUU': 'L',
'CUG': 'L',
'CUC': 'L',
'CGA': 'R',
'CGU': 'R',
'CGG': 'R',
'CGC': 'R',
'CCA': 'P',
'CCU': 'P',
'CCG': 'P',
'CCC': 'P'
}
def group(values, n):
return [values[i:i+n] for i in range(0, len(values), n)]
def translate(seq):
result = [codon_table[codon] for codon in group(seq, 3)]
return "".join(result)
Problem Set 6
Solutions to all problems in the Problem Set 6.
Problem 7.1
Exchange Rate
Write a command-line program to find the currency exchange rate using exchangerate.host API (Please look at the "Convert currency" section of the API).
The program should take the currency symbol and a date as arguments and print the exchange rate of that symbol against USD on the specified date.
$ python exchange_rate.py INR 2021-01-01
73.092243
The base currency can be is changed from USD to another currency using optional flag -b
or --base
.
$ python exchange_rate.py -b EUR INR 2021-01-01
88.995799
The program should provide --help
option to display the available options and usage.
Hint: Use argparse
module to parse the arguments.
Solution
import argparse
import requests
def parse_args():
p = argparse.ArgumentParser()
p.add_argument("-b", "--base", help="base currency", default="USD")
p.add_argument("target_currency", help="target currency")
p.add_argument("date", help="date")
return p.parse_args()
def exchange_rate(base, target, date):
params = {
"from": base,
"to": target,
"date": date
}
url = 'https://api.exchangerate.host/convert'
response = requests.get(url, params=params)
data = response.json()
return data['result']
def main():
args = parse_args()
d = exchange_rate(args.base, args.target_currency, args.date)
print(d)
if __name__ == "__main__":
main()
Problem 7.2
FASTQ to FASTA converter
Write a program fastq_to_fasta.py
that takes path to a FASTQ file as command-line argument and converts that into a FASTA file. The result will be written in a seperate with the same name, but with .fasta
extension.
Please note that your program should also support fastq files with multiple records.
$ python fastq_to_fasta.py files/sample1.fastq
converted files/sample1.fastq to files/sample1.fasta
$ python fastq_to_fasta.py files/sample2.fastq
converted files/sample2.fastq to files/sample2.fasta
Hints:
Use SeqIO
from Biopython to read fastq files and write fasta files.
Solution
import sys
from pathlib import Path
from Bio import SeqIO
filename = sys.argv[1]
dest = Path(filename).with_suffix(".fasta")
records = SeqIO.parse(filename, "fastq")
SeqIO.write(records, dest, "fasta")
Problem 7.3
Genbank to FASTA
Write a program genbank_to_fasta.py
to convert a file in Genbank format to FASTA format.
The program should take a genbank file as a command-line argument and write it as a FASTA file. The optional flag -o
or --output
can be used to specify the destination path. If it is not specified, the destination path is constructed from the source file path by replacing the extension with .fasta
.
$ python genbank_to_fasta.py files/ls_orchid.gbk
converted files/ls_orchid.gbk to files/ls_orchid.fasta
$ python genbank_to_fasta.py files/ls_orchid.gbk -o output.fasta
converted files/ls_orchid.gbk to output.fasta
$ python genbank_to_fasta.py --help
usage: genbank_to_fasta.py [-h] [-o] filename
...
Solution
from Bio import SeqIO
import argparse
from pathlib import Path
p = argparse.ArgumentParser()
p.add_argument("filename", help="the genbank file to convert")
p.add_argument("-o", "--output", help="output file")
args = p.parse_args()
destination = args.output or Path(args.filename).with_suffix(".fasta")
records = SeqIO.parse(args.filename, "genbank")
SeqIO.write(records, destination, "fasta")
print(f"converted {args.filename} to {destination}")
Problem 7.4
Sort Sequences by GC Fraction
Sort sequences in a FASTA file by their GC fraction.
The program should take a fasta file as argument, sort the sequences in that file using their GC fraction and write them in FASTA format to file gcsort.fasta
.
The progran should also support the following flags.
-o
, --output
to specify the file to write the output instead of writing to gcsort.fasta
.
-r
, --reverse
to sort the sequences in the reverse order (descending order of gc fraction)
$ cat files/multi.fasta
> Seq1
AGTAGG
> Seq2
AGTCAGTC
AGTCATA
> Seq3
AGTCAGTCGC
$ python gcsort.py files/multi.fasta
Wrote the sorted sequences to gcsort.fasta
$ cat gcsort.fasta
> Seq2
AGTCAGTCAGTCATA
> Seq1
AGTAGG
> Seq3
AGTCAGTCGC
The -r
option is used to sort in the reverse order.
$ python gcsort.py -r files/multi.fasta
Wrote the sorted sequences to gcsort.fasta
$ cat gcsort.fasta
> Seq3
AGTCAGTCGC
> Seq1
AGTAGG
> Seq2
AGTCAGTCAGTCATA
The -o
flag is used to specify the output path.
$ python gcsort.py files/multi.fasta -o a.fasta
Wrote the sorted sequences to a.fasta
$ cat a.fasta
> Seq2
AGTCAGTCAGTCATA
> Seq1
AGTAGG
> Seq3
AGTCAGTCGC
Solution
import argparse
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction
p = argparse.ArgumentParser()
p.add_argument("filename", help="FASTA file to sort")
p.add_argument("-r", "--reverse", help="sort in reverse order", action="store_true", default=False)
p.add_argument("-o", "--output", help="path of the output file", default="gcsort.fasta")
args = p.parse_args()
def gcsort(source_path, dest_path, reverse=False):
records = SeqIO.parse(source_path, "fasta")
records = sorted(records, key=gc_fraction, reverse=reverse)
SeqIO.write(records, dest_path, "fasta")
print("Wrote the sorted sequences to", dest_path)
gcsort(args.filename, args.output, reverse=args.reverse)
Problem 7.5
The dig tool
The dig
command is a popular tool for querying DNS name servers. You task is to expose the dig command in Python as a function dig
.
The dig
command
Let's look at the output of the dig command to understand how it works.
$ dig amazon.com
; <<>> DiG 9.18.1-1ubuntu1.2-Ubuntu <<>> amazon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49697
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;amazon.com. IN A
;; ANSWER SECTION:
amazon.com. 128 IN A 205.251.242.103
amazon.com. 128 IN A 54.239.28.85
amazon.com. 128 IN A 52.94.236.248
;; Query time: 16 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Wed Jan 18 13:45:30 IST 2023
;; MSG SIZE rcvd: 87
The output is too verbose. We can silence everything else and show only answer using:
$ dig +noall +answer amazon.com
amazon.com. 128 IN A 205.251.242.103
amazon.com. 128 IN A 54.239.28.85
amazon.com. 128 IN A 52.94.236.248
We used the dig
command to query the name server to look up for amazon.com. By default, it queries for records of type A
, which stands for IPv4 Address.
The DNS server responded with three records matching the domain name amazon.com
. Each entry contains 5 fields, namely the domain name, TTL, class, record type and content.
See how to read dig output on wizard zines for more details.
Advanced Usage
Record Type
We can optionally pass the record type to dig
to query of other types of DNS records like MX (mail exchange), TXT (text notes), NS (name server) etc.
The following command, queries for records of type MX.
$ dig +noall +answer amazon.com mx
amazon.com. 766 IN MX 5 amazon-smtp.amazon.com.
In case of MX records, the content contains the priority and the mail server name. There could be more than one entry.
The following command queries for records of type NS.
$ dig +noall +answer amazon.com ns
amazon.com. 1626 IN NS pdns6.ultradns.co.uk.
amazon.com. 1626 IN NS ns3.p31.dynect.net.
amazon.com. 1626 IN NS ns4.p31.dynect.net.
amazon.com. 1626 IN NS ns1.p31.dynect.net.
amazon.com. 1626 IN NS ns2.p31.dynect.net.
amazon.com. 1626 IN NS pdns1.ultradns.net.
Server
By default, the dig command queries the DNS server configured in the system. However, we can explicitly pass a server to query it.
$ dig +noall +answer amazon.com @8.8.8.8
amazon.com. 128 IN A 205.251.242.103
amazon.com. 128 IN A 54.239.28.85
amazon.com. 128 IN A 52.94.236.248
The 8.8.8.8
is the Google DNS server. The output would look exactly the same in the normal times, but it is very handy to troubleshoot DNS issues with your domain by querying various known DNS server.
There are some hobby DNS servers like dns.toys that provide useful utilities over dns.
$ dig +noall +answer mumbai.weather @dns.toys
mumbai. 1 IN TXT "Mumbai (IN)" "30.40C (86.72F)" "32.10% hu." "clearsky_day" "14:30, Wed"
mumbai. 1 IN TXT "Mumbai (IN)" "27.90C (82.22F)" "42.80% hu." "clearsky_day" "16:30, Wed"
mumbai. 1 IN TXT "Mumbai (IN)" "23.00C (73.40F)" "71.40% hu." "clearsky_night" "18:30, Wed"
mumbai. 1 IN TXT "Mumbai (IN)" "20.40C (68.72F)" "90.50% hu." "clearsky_night" "20:30, Wed"
mumbai. 1 IN TXT "Mumbai (IN)" "18.80C (65.84F)" "90.40% hu." "clearsky_night" "22:30, Wed"
Summary of Usage
The dig command is used as follows:
dig options domain-name record-type @server
The options, record-type and server are optional.
The Python API
You task is to implement a Python function dig
that calls the dig command with appropriate arguments, parse the output and returns the result as a Python data structure.
The funtion takes the doman name as argument and two optional arguments, record_type
and server
.
Sample Usage:
>>> dig("amazon.com")
[
{"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "205.251.242.103"},
{"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "54.239.28.85"},
{"name": "amazon.com.", "ttl": 128, "class": "IN", "record_type": "A", "content": "52.94.236.248"}
]
>>> dig("amazon.com", record_type="MX")
[
{"name": "amazon.com", "ttl": 498, "class": "IN", "record_type": "MX", "content": "5 amazon-smtp.amazon.com."}
]
>>> dig("amazon.com", record_type="MX", server="8.8.8.8")
[
{"name": "amazon.com", "ttl": 498, "class": "IN", "record_type": "MX", "content": "5 amazon-smtp.amazon.com."}
]
>>> dig("mumbai.weather", server="dns.toys")
[
{"name": "mumbai.", "ttl": 1, "class": "IN", "record_type": "TXT", "content": '"Mumbai (IN)" "30.40C (86.72F)" "32.10% hu." "clearsky_day" "14:30, Wed"'},
...
]
Solution
import subprocess
def parse_record(line):
name, ttl, klass, record_type, content = line.strip().split(None, 4)
return {
"name": name,
"ttl": int(ttl),
"record_type": record_type,
"class": klass,
"content": content
}
def dig(name, record_type="A", server=None):
"""Python interface to the dig command.
"""
cmd = ["dig", '+noall', '+answer', name, record_type]
if server:
cmd.append(f"@{server}")
print(cmd)
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True)
return [parse_record(line) for line in p.stdout]