Assignment 05

Solutions to Assignment 05.

Unique Items

Write a function unique which will remove duplicates from a list and maintain the order of items as in original list.

>>> unique([1, 1, 2, 3, 1, 2, 3, 2, 4])
[1, 2, 3, 4]

Solution

def unique(items):
    seen = []
    for item in items:
        if item not in seen:
            seen.append(item)
    return seen

Flatten the list

Write a function flatten which combines nested lists into a single list. It takes a list of lists as an argument and returns a list of all elements from nested lists.

>>> flatten([[1,2,3],[4,5],[6,7,8,9]])
[1,2,3,4,5,6,7,8,9]

Solution

def flatten(items):
    return sum(items, start=[])

Count File Extensions

Write a program extcount.py to count the number of files per extension. The program should take path to a directory as argument and print the count and extension for each available extension, sorted by the count. Files without any extension should be ignored.

$ python extcount.py files/extcount
4 py
3 txt
2 csv
1 yml

$ python extcount.py files/extcount/data
4 txt
2 csv

Solution

import sys
import os
from collections import Counter

path = sys.argv[1]
filenames = os.listdir(path)
extensions = [f.split(".")[-1] for f in filenames if "." in f]

for ext, count in Counter(extensions).most_common():
    print(count, ext)

Sales by Day

Write a program sales.py to compute the total sale amount per day, given a text file with details of transactions with three columns Order Id, Date and Amount.

Here is a sample input file.

$ cat files/orders.txt
1001 2023-01-01 100
1002 2023-01-01 50
1003 2023-01-02 50
1004 2023-01-02 150
1005 2023-01-01 25

The file contains multiple transactions, one in row. Each row will have three fields, Order Id, Date and Amount, seperated by a space.

YOur program should take the order file as a command-line argument and print the total same amount per day. The output should be sorted by date.

$ python sales.py files/orders.txt
2023-01-01 175
2023-01-02 200

Please note that you are expected to parse the file yourself and not use pandas.

Solution

import sys

filename = sys.argv[1]

sales = {}
for line in open(filename):
    order_id, date, amount = line.strip().split()
    sales[date] = sales.get(date, 0) + int(amount)

for date in sorted(sales):
    print(date, sales[date])

Uniq Command

Write a program uniq.py that takes filename a as argument, prints the lines to the output, ignoring the identical adjacent input lines.

The command should support the following command-line flags.

 -d      Only output lines that are repeated in the input.

 -u      Only output lines that are not repeated in the input.

 -c      Precede each output line with the count of the number of times
        the line occurred in the input, followed by a single space.

The program should also print approprate help message when used with -h or --help flags.

Use the standard library module argparse for doing this. You may want to checkout the argparse tutorial to know how to use that module.

A sample input file files/animals.txt with the following content is provided along with this problem.

cat
cat
cat
dog
dog
cat
rat

Expected Output

$ python uniq.py files/animals.txt
cat
dog
cat
rat

$ python uniq.py -d files/animals.txt
cat
dog

$ python uniq.py -u files/animals.txt
cat
rat

$ python uniq.py -c files/animals.txt
3 cat
2 dog
1 cat
1 rat

Hints

Look at the uniq command of unix.

Solution

import argparse

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("filename")
    p.add_argument("-u", default=False, action="store_true")
    p.add_argument("-d", default=False, action="store_true")
    p.add_argument("-c", default=False, action="store_true")

    return p.parse_args()

def count_lines(lines):
    """Counts the subsequent identical lines.

    Returns (count, line) for set of identical lines.
    """
    result = []
    seen = []
    for line in lines:
        if seen and seen[-1] != line:
            result.append((len(seen), seen[0]))
            seen = []
        seen.append(line)

    if seen:
        result.append((len(seen), seen[0]))

    return result

# def count_lines_with_itertools(lines):
#     # count lines can be implemented easily with itertools
#     import itertools
#     return [len(chunk), line for line, chunk in itertools.groupby(lines)]

def uniq(lines, unique=False, dups=False, count=False):
    for n, line in count_lines(lines):
        if unique and n > 1:
            continue
        if dups and n == 1:
            continue

        if count:
            print(n, line, end="")
        else:
            print(line, end="")

def main():
    args = parse_args()
    f = open(args.filename)
    uniq(f, unique=args.u, dups=args.d, count=args.c)

if __name__ == "__main__":
    main()