Assignment 05
Solutions to Assignment 05.
Unique Items
Write a function unique which will remove duplicates from a list and maintain the order of items as in original list.
>>> unique([1, 1, 2, 3, 1, 2, 3, 2, 4])
[1, 2, 3, 4]
Solution
def unique(items):
seen = []
for item in items:
if item not in seen:
seen.append(item)
return seenFlatten the list
Write a function flatten which combines nested lists into a single list. It takes a list of lists as an argument and returns a list of all elements from nested lists.
>>> flatten([[1,2,3],[4,5],[6,7,8,9]])
[1,2,3,4,5,6,7,8,9]
Solution
def flatten(items):
return sum(items, start=[])Count File Extensions
Write a program extcount.py to count the number of files per extension. The program should take path to a directory as argument and print the count and extension for each available extension, sorted by the count. Files without any extension should be ignored.
$ python extcount.py files/extcount
4 py
3 txt
2 csv
1 yml
$ python extcount.py files/extcount/data
4 txt
2 csv
Solution
import sys
import os
from collections import Counter
path = sys.argv[1]
filenames = os.listdir(path)
extensions = [f.split(".")[-1] for f in filenames if "." in f]
for ext, count in Counter(extensions).most_common():
print(count, ext)Sales by Day
Write a program sales.py to compute the total sale amount per day, given a text file with details of transactions with three columns Order Id, Date and Amount.
Here is a sample input file.
$ cat files/orders.txt
1001 2023-01-01 100
1002 2023-01-01 50
1003 2023-01-02 50
1004 2023-01-02 150
1005 2023-01-01 25
The file contains multiple transactions, one in row. Each row will have three fields, Order Id, Date and Amount, seperated by a space.
YOur program should take the order file as a command-line argument and print the total same amount per day. The output should be sorted by date.
$ python sales.py files/orders.txt
2023-01-01 175
2023-01-02 200
Please note that you are expected to parse the file yourself and not use pandas.
Solution
import sys
filename = sys.argv[1]
sales = {}
for line in open(filename):
order_id, date, amount = line.strip().split()
sales[date] = sales.get(date, 0) + int(amount)
for date in sorted(sales):
print(date, sales[date])Uniq Command
Write a program uniq.py that takes filename a as argument, prints the lines to the output, ignoring the identical adjacent input lines.
The command should support the following command-line flags.
-d Only output lines that are repeated in the input.
-u Only output lines that are not repeated in the input.
-c Precede each output line with the count of the number of times
the line occurred in the input, followed by a single space.
The program should also print approprate help message when used with -h or --help flags.
Use the standard library module argparse for doing this. You may want to checkout the argparse tutorial to know how to use that module.
A sample input file files/animals.txt with the following content is provided along with this problem.
cat
cat
cat
dog
dog
cat
rat
Expected Output
$ python uniq.py files/animals.txt
cat
dog
cat
rat
$ python uniq.py -d files/animals.txt
cat
dog
$ python uniq.py -u files/animals.txt
cat
rat
$ python uniq.py -c files/animals.txt
3 cat
2 dog
1 cat
1 rat
Hints
Look at the uniq command of unix.
Solution
import argparse
def parse_args():
p = argparse.ArgumentParser()
p.add_argument("filename")
p.add_argument("-u", default=False, action="store_true")
p.add_argument("-d", default=False, action="store_true")
p.add_argument("-c", default=False, action="store_true")
return p.parse_args()
def count_lines(lines):
"""Counts the subsequent identical lines.
Returns (count, line) for set of identical lines.
"""
result = []
seen = []
for line in lines:
if seen and seen[-1] != line:
result.append((len(seen), seen[0]))
seen = []
seen.append(line)
if seen:
result.append((len(seen), seen[0]))
return result
# def count_lines_with_itertools(lines):
# # count lines can be implemented easily with itertools
# import itertools
# return [len(chunk), line for line, chunk in itertools.groupby(lines)]
def uniq(lines, unique=False, dups=False, count=False):
for n, line in count_lines(lines):
if unique and n > 1:
continue
if dups and n == 1:
continue
if count:
print(n, line, end="")
else:
print(line, end="")
def main():
args = parse_args()
f = open(args.filename)
uniq(f, unique=args.u, dups=args.d, count=args.c)
if __name__ == "__main__":
main()