CSCI 233 Python Exercise 5

Today, you will be working on completing and polishing the program you started in last week’s lab. This handout gives instructions for that, followed by some additional information that will be helpful as you do that. Once you’ve finished that, there is another (shorter) programming exercise.

Finishing the words program

Complete the words.py program from Exercise 4, before noon on Tuesday, Feb. 16.

Be sure that you do the following:

Before you turn in your program, write your reflections on the exercise in a file notes4.txt in the same directory as words.py. (You can use IDLE to write that file.) This replaces the previous instructions for a triple-quoted string of commentary at the end of your Python file. In that file, briefly comment on what you have learned, what difficulties you had, and anything else that is significant.

To turn in your program, make sure you are in the directory with your file words.py, then execute the command

/cslab/class/csci233/bin/handin P4 words.py notes4.txt

Be sure that the command prints “Successful submission”.

If you fix a problem or otherwise improve your program, you can turn it in again by re-executing the same command.

Program vs. module

As we have seen in class, a Python file that you write can be either imported by another Python file or run as a command. To make it well-behaved in either use, it must do nothing but definitions when imported, but execute some code when run as a command.

The test for whether the file is serving as a top-level program is to check the predefined variable __name__, which will be a string with the module’s name when it is imported, but whill be equal to "__main__" when it is run.

It is a good idea to wrap up the “action” for the command in a single function, and a good convention is to name that function main() or _main().

This choice highlights one more detail about importing a module: if you use

from module import *

it will add all of the names from the module to the current (global) set of names, except for the ones that begin with an underscore. So prefixing the name of the main function with an underscore keeps it from adding clutter to the set of names in a module that imports it.

Working with a dictionary

In class, we started working with a dictionary to count the number of occurrences of each value in a sequence. So now you create a simple program occurs.py that does that in a nice fashion, reading a sequence of lines from its input, computing the number of occurrences for each line, and printing them out in order from most to least frequent.

Note that a program like this might be used in conjuction with the one you’ve just finished: words.py could be used to turn input into the one-word-per-line form to be used as input for occurs.py. You could do that as a series of Unix commands, as in

python words.py text1.txt text2.txt > tmp.txt
python occurs.py < tmp.txt

Here is what we had in class:

def counts(seq):
    """Returns a dictionary with count of occurrences of each
    value appearing in seq.
    """
    d = dict()
    for i in seq:
        d[i] = d.get(i, 0) + 1
    return d

Create occurs.py and fill in the pieces needed to read/write data. Then work out getting it in the right order. Finally, make your program take an (optional) argument that indicates a maximum number of words to print, so that just the most frequent are printed.

Standard output and standard error

The kind of multi-stage processing we noted with our example

python words.py text1.txt text2.txt > tmp.txt
python occurs.py < tmp.txt

is so useful that Unix provides a pipe to connect the output of one program to the input of another:

python words.py text1.txt text2.txt | python occurs.py

It is because of the possibility of redirecting output—especially into a pipeline—that you should try to maintain the distinction between standard output and standard error: the former gets “real” output, while error messages, warnings, etc., get printed to the latter. IDLE highlights the difference by printing standard output in blue and standard error in red.

The two output streams have the names sys.stdout and sys.stderr. There are two ways that you can direct output to a particular stream:

You may wish to revisit your words.py to clean up this distinction there.

Handing it in

Write up your reflections on this program in notes5.txt. Turn in this program with

/cslab/class/csci233/bin/handin P5 occurs.py notes5.txt

or, if you’ve modified words.py and want to include that,

/cslab/class/csci233/bin/handin P5 occurs.py words.py notes5.txt

This is due before lab on February 18.