CSCI 233 Python Exercise 6

Today, you will be working on completing the second program from last week’s lab. Complete occurs.py and submit it as P5 before 5:00 pm on Tuesday, Feb. 23.

Working with dictionaries and lists

Basics

Recall from class that a dictionary (Python type dict) is a type that provides a mapping from one set of values (the keys) to another (the values). Another way to think of a dictionary is as a collection of (key, value) pairs, stored so that, given a key, it is easy to find the corresponding value.

Dictionaries use much of the same syntax as lists, with the key as the subscript. Thus subscriptina allows a value to be stored into, accessed, and removed from a dictionary d with

    d["word"] = "definition"
    print d["word"]
    del d["word"]

A dictionary is not ordered in any meaningful way; so they do not support the slicing that lists do.

Unlike a list, which is dense—its subscripts form a compact range—a dictionary is sparse. To test whether there is an entry for a given key, use the in operator, so that

k in d

is true if and only if key k is present in the dictionary d. The absence can be checked with

k not in d

It is so often useful to get the value for a key, if there is one, or some particular value if it is not present, that there is a method get() for this purpose:

d.get(k, v)

A similar method setdefault() can be used to insert the default value for the key.

Not all types can be used as keys. The technical name for the requirement on a key type is that it be hashable–the property needed to make efficient lookup possible. In practice, one requirement for hashable is that the type be immutable. So we can use strings, integers, and tupes for keys, but not lists or dictionaries.

Writing out dictionaries

Python will print a dictionary uses braces, with a colon separating each key and value, as in

{ 1:2, "word":"definition", "meaning":42 }

Note that, as this example shows, a dictionary need not be homogeneous in either the key or value type.

Extracting contents

A dictionary is stored for quick access, but the order in which key-value pairs are stored is not meaningful to us. You can extract the whole of a dictionary as a list using one of three methods:

    d.keys()
    d.values()
    d.items()

While the order is hard to predict, if you make calls to keys() and values() without making any changes to the dictionary, the two lists will be in corresponding order. A dictionary emphitem is a key-value pair, as a tuple. The ability to do multiple assignment from a tuple gives rise to a very common way of iterating over a dictionary:

for k,v in d.items():
do something with k and v

Iterators

The methods mentioned above build a list containing an entry for each item in the dictionary. If the dictionary were very large (or the computer’s memory small!), that might be wasteful when all you want to do is iterate over the list.

Python provides types known as iterators for this purpose. An iterator can be thought of as an object the represents a sequence, with the promise to produce each element of the sequence as needed. That may be more efficient than producing the entire list at one time.

    d.iterkeys()
    d.itervalues()
    d.iteritems()

are just like the previous methods, except that they return an iterator instead of an actual list.

Sorting, default and keyword parameters

In computing, we use the verb sort to mean “put into (ascending) order”. The list type has a standard method sort() that rearranges the list in this fashion. So, for example

>>> print x
[4, 2, 3]
>>> x.sort()
>>> print x
[2, 3, 4]

Help on method descriptor:

sort(…)
L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*

The parameters listed here are different from what we’ve seen before. First, they are optional–you can call the sort() method without specifying any of them. What this description shows you is the default value that each formal parameter is assigned if there is no corresponding actual parameter.

The third parameter, reverse, says whether to sort backwards. Since we know what the defaults for the other two parameters are (even if we don’t know what they mean), we can set just the third with a call like:

>>> x.sort(None,None,True)
>>> print x
[4, 3, 2]

But that is awkward: we have to remember what all the parameters are, what order they appear in, and their default values. Python makes this easier by allowing us to specify parameters by keyword, as in

>>> x.sort()
>>> print x
[2, 3, 4]
>>> x.sort(reverse=True)
>>> print x
[4, 3, 2]

When there is more than one optional parameter, as in this case, it is usual to specify values by keyword.

Sorting, with functions as parameters

The other two parameters have the default value None, which is a very common choice for a default. We’ll ignore cmp for the time being, but the key parameter expects to be passed a function. That function should be one that, when called on an item in the list, returns the value that should be used in comparisons for that item. This makes it possible to sort lists of tuples, lists, etc., in interesting ways. Given the list

>>> y = [(1, 12), (2, 10), (2, 8) ]

>>> y.sort()
>>> print y
[(1, 12), (2, 8), (2, 10)]

We could instead sort by the second element of each tuple by defining a function and passing it for key:

>>> def f(i):
return i[1]
>>> y.sort(key=f)
>>> print y
[(2, 8), (2, 10), (1, 12)]

There are lots of ways to use something like this. For example, if we had a list of strings that we wanted to sort in a case-insensitive manner, we could do it thus

>>> a = ["These", "are", "words"]
>>> a.sort()
>>> print a
[’These’, ’are’, ’words’]
>>> def g(s):
return s.lower()
>>> a.sort(key=g)
>>> print a
[’are’, ’These’, ’words’]

The sort() method works only on a list, and it changes the list on which it is invoked—that is what is meant by the “in place” in the description. There is also generic function sorted() that works on any iterable sequence and produces a new sequence (list) that is in sorted order. If you pass a list to sorted(), the original list is not changed. For example

>>> print a
[’are’, ’These’, ’words’]
>>> b = sorted(a)
>>> print b
[’These’, ’are’, ’words’]
>>> print a
[’are’, ’These’, ’words’]

Turning it in

Review the section on “Standard output and standard error” from last week’s lab handout. Be sure that both occurs.py and words.py (from last week) are well-behaved when used in a pipeline.

As before, write your reflections on the exercise in a file notes5.txt in the same directory as occurs.py.

To turn in your program, make sure you are in the directory with your file occurs.py, then execute the command

/cslab/class/csci233/bin/handin P5 occurs.py notes5.txt

/cslab/class/csci233/bin/handin P5 occurs.py words.py notes5.txt