CSCI 233 Python Exercise 4

Today’s lab will be focused on wrting programs that process files and convert data.

Getting ready

Go to your directory for this class, then make a new directory lab4 for today’s lab, and change your working directory there.

Copy the stats module that we’ve written in class and another that I’m providing into that directory. You can do that at the command line with

cp /cslab/class/csci233/lab4/*.py .

(Notice that there is a ‘.’ (dot) as the second command argument.) If you list the directory, you should find two Python files, stats.py and smartinput.py.

First steps

Fire up IDLE so that you are editing a file words.py. Fill in the standard start-of-file comments.

Your first task is to write a function that will break an input file into words and print out those words, one per line. To get started, do something simpler using familiar Python: write a function that will read a line from the keyboard and print out the words in that line, one per line. You should be able to do this fairly easily using some methods on strings (str).

To get started, simply take whitespace as marking the word boundaries. Once you are confident that is working, get a tiny bit fancier and see if you can strip of any leading and trailing punctuation from your words, too. (Leave internal punctuation marks, such as hyphens and apostrophes.)

A tip: When you finish a working stage, make a copy of your file. That way you can back up to something known if you make a mess later on. It’s a good idea to fill in a comment that describes the current state before you do this. One (fairly safe) way to make a copy is to go to the (Unix) shell and issue a command like

cp -i words.py words1.py

Increase the number each time. The ‘-i’ tells the copy command to warn you before it overwrites a file that already exists.

You should be able to test your function from within IDLE.

Make it a command

Add the appropriate code so that you can run your file as a command. That way when you run the module under IDLE, it will process your input. You should also now be able to run your program at the Unix shell with the command

python words.py

Be sure to include the appropriate test so that your module can still be imported politely. Your program should prompt in a reasonable fashion.

Iterate

Now go back and fix it so that instead of reading one line it will read a series of lines, continuing until end-of-file. You signal end-of-file on the keyboard by typing control-D. Caution: if you type control-D to the Unix shell, your terminal window will disappear.

You may want to change the way your program prompts. It might, for example, print instructions before reading the first line, but read the lines without printing a prompt.

Again, be sure that your program works whether run within IDLE or from the Unix shell.

Make it friendly to redirection

Recall from class that Unix shell allows you to redirect a program’s input or output to come from (go to) a file. So if you have a file data.txt full of text lying about (or create one with IDLE, which will be happiest if you end its name with “.txt”), you should be able to run

python words.py < data.txt

to see a nice list of the words from that file.

One unpleasant thing about doing that is that your prompts still print. This is what the module smartinput that I’ve provided is for. It defines two functions, smart input() and smart print() that are like raw input() and the print statement, but don’t print anything when you redirect input. So fix your program to use those functions appropriately; verify that it works correctly no matter how you invoke it.

Make an extra copy of words.py now, as words-part1.py.

Messing with files

Many useful commands take option command-line arguments to indicate files from which input should be taken. Our goal is to make your program do this, too. When it is run with no arguments, it should read from the keyboard; when you run it with arguments, as in

python words data1.txt data2.txt

it should read and process those files, in order.

For starting out, you might want to make it work with a single filename.

Recall that after you open a file, you can read it a line at a time with the method readline(). There are two details you need to know about this method:

The other way of iterating over a file is to treat it as a sequence, using a for statement. That doesn’t work on the keyboard (sys.stdin), but I’ve included a function stdin as file() in the smartinput module that gives you an object that you can safely use to iterate on standard input.

Finishing up

Go back and look closely at your files words.py and words-part1.py to polish them.

Add a triple-quoted string to the end with your comments on what you’ve learned today.

Print out the two files to hand in:

a2ps words-part1.py words.py

Hand in a printed copy of your program. You may want to print a copy for yourself, too.