Problem 7.3

Genbank to FASTA

Write a program genbank_to_fasta.py to convert a file in Genbank format to FASTA format.

The program should take a genbank file as a command-line argument and write it as a FASTA file. The optional flag -o or --output can be used to specify the destination path. If it is not specified, the destination path is constructed from the source file path by replacing the extension with .fasta.

$ python genbank_to_fasta.py files/ls_orchid.gbk
converted files/ls_orchid.gbk to files/ls_orchid.fasta

$ python genbank_to_fasta.py files/ls_orchid.gbk -o output.fasta
converted files/ls_orchid.gbk to output.fasta

$ python genbank_to_fasta.py --help

usage: genbank_to_fasta.py [-h] [-o] filename

...

Solution

from Bio import SeqIO
import argparse
from pathlib import Path

p = argparse.ArgumentParser()
p.add_argument("filename", help="the genbank file to convert")
p.add_argument("-o", "--output", help="output file")
args = p.parse_args()

destination = args.output or Path(args.filename).with_suffix(".fasta")

records = SeqIO.parse(args.filename, "genbank")
SeqIO.write(records, destination, "fasta")

print(f"converted {args.filename} to {destination}")