Elaboration and hints on assignments

CLRS naming scheme. Note that CLRS has "exercises" at the end of each section and "problems" at the end of each chapter. The exercises are enumerated as chapter.section-exercise, for example 2.3-7. Problems are enumerated as chapter-problem, for example 2-3. Problems often have sub-parts, for example 2-3c. "Daily work" will come mainly from the exercises, and "homework" will come mainly from the problems, but there will be some crossover as well.

"Complete" problems. For problems that are designated as "complete", your submission should include

a code solution, with unit tests
a proof of the solution's correctness
an analysis of the solution's efficiency

Usually I will provide some guidance in the form of code stubs and a unit test or two (but you should write more unit tests), what part of the solution's correctness to focus on (ie, "state and prove an invariant for the loop that..."), and the important issues in the algorithm's efficiency.

You may ignore the code stubs that I provide and code your solution from scratch in any (reasonable) programming language you like. Actually, in principle I would recommend doing this. The practical liability, however, is that it makes the submissions much harder to grade. If you cod your solution from scratch, please document it thoroughly, make clear how to run it, make your assumptions about the input explicit, etc. (I talk more about this below under "Hints on coding up solutions.")

Turning in HW assignments. Please turn in your solutions electronically to /cslab/class/cs445/(your id)/(assignment id) where assignment id is in the form hw-(month)-(day), referring to the date it was assigned. For example, the homework assigned on Sept 5 should be turned in to hw-9-5.

Please turn in proofs etc as a typed (or typeset) pdf. Turn in the source code for your solutions to problems designated as "complete."

Hints on coding up solutions. For most HW problems that involve devising an algorithm Some of the problems in the book are not specified completely enough to implement in a real programming language without filling in a few details. This is one of the main reasons I provide code stubs, so we can be on the same page in our assumptions about how the input data and results are represented. This is especially true for problems that take a list or array or set but don't specify which. Sometimes the data is complicated enough that we make a little class for it.

Here's how I structure my code for problems in this course. My Java solution to Exercise 2.3-7 is in a project folder called c2s3e7j, that is, "Chapter 2 Section 3 Exercise 7 Java." That folder contains a package folder called c2s3e7. In that package I have two files, FindPairSum.java which contains the solution to the problem (as a static method in the FindPairSum class) and TestFindPairSum which contains the JUnit tests. I have a Python solution set up similarly, except it has only one file: c2s3e7p/c2s3e7/findPairSum.py contains both the solution (as a stand alone function) and the PyUnit testcase.

Practice test-infected programming! You start by writing the stub for the method (or more generally, unit) that you will test; then writing the unit tests; then confirming that the unit tests fail; and finally writing the solution so that the unit tests pass.

Of course doing this in Eclipse will be very convenient. But it's also good to have a commandline option available (it's what I use when grading, for example). To run the tests in c2s3e7.TestFindPairSum, do

java -cp .:/usr/share/java/junit4.jar org.junit.runner.JUnitCore c2s3e7.TestFindPairSum

For Python it's a bit easier:

python -m unittest c2s3e7.findPairSum

Assignments

Assignments (both daily and HW) are named by the date they are assigned (not due).

Daily Aug 29. Skim the first chapter of the book to become familiar with their terminology. Read Chapter 2 carefully--the thrust of the content should be review, but pay attention to the details. Do 2.2-(2&4) and 2.3-(3 & 6)).

Daily Aug 31. Do 2.3-7 and 2-3.c

For 2.3-7, practice what I'm going to call a complete solution: implement your solution in a programming language of your choice with unit tests; state any invariants for the loops, prove them, and use the proof(s) to argue for the algorithm's correctness; analyze the algorithm's efficiency carefully.

I'm providing stubs and unit tests for 2.3-7 in both Java and Python at ~tvandrun/Public/cs445/c2s3e7j and ~tvandrun/Public/cs445/c2s3e7p.

Make sure you read all of Problem 2-3 for context. But I'm asking you to work through only part c. It doesn't explicitly say you should prove the loop invariant (it says "use"), but that is implied. Then read Section 3.1 carefully, noting the differences among the asymptotic categories and notations as best you can.

The following files show what we did in class, including 2.3-7 as an example of a complete solution.

inclass-sept5.pdf
inclass-sept5.tex

Daily Sept 5. Read Sec 3.(1&2); do 3.1.(4 & 5) and 3.1-d.

In reading Section 3.1, do your best to understand the differences among the asymptotic categories and notations.

HW Sept 5. Please make sure you read all the details about homework at the beginning of this page.

Do 2-2, and 3-1(b,e). For 2-2 don't overlook part d on the top of pg 41.

Also do the following problem based on Exercise 4.4.9 from Anany Levitin, Introduction to the Design and Analysis of Algorithms, Third Edition, Pearson, 2012; pg 157:

Given a list (or array) of length n containing all integers from 0 to n inclusive except for one in increasing order, find the missing number. For example, in the list 0 1 2 3 5 6 7 8, the number 4 is missing. Corner cases: 1 2 3 4 5 is lacking 0, and 0 1 2 3 4 5 6 is lacking 7.

Make your algorithm as efficient as possible. Give a complete answer: Implement this in a programming language of your choice plus unit tests; state and prove a loop invariant, and use that to argue for your solution's correctness; analyze the efficiency.

You can find a stub and some unit tests for a Python solution at ~tvandrun/Public/cs445/supp1p, and ~tvandrun/Public/cs445/supp1j.

Due Sept 17

Daily Sept 7. Read Sec 4.(1-3); do 3-4.(a, b, c).

In reading about Strassen's method, don't get bogged down in the details of the matrix multiplication. If you're getting tired by pg 79 you can skim the rest of that section. I'd rather you give more attention to Section 4.3.

Daily Sep 10. Read Sec 4.(4&5).

Daily Sep 12. do 4.5-1 and 4-1(a,b). The master method (pg 94) looks nasty, but we'll break it down in class.

Daily Sep 14. Read Sec 7.(1-3); do 7.1.(2-4) and 7.2.(3&4)

Daily Sep 17. Read Sec 8.(1-4); do 8.1-(1,3,4). Section 8.1 is the main thing we'll be looking at; read it carefully. Sections 8.(2-4) should be review. Judge for yourself how carefully to read it---but you should know the stuff.

HW Sep 17. Do Problems 4-(2 & 5) and 7-(1 & 4).

For Problem 4-5.a, I think the way the the problem is stated in the book is misleading. The word "necessarily" is meaningless, and they shouldn't anthropomorphize the chips by talking about them "conspiring". I would rewrite it this way:

Show that if it is not known that more than n/2 chips are good---that is, it's possible that half or more are bad---then the professor cannot determine which chips are good using any strategy based on this kind of pairwise test.

Give an argument why good and bad chips are indistinguishable. I used graph theory ideas to do this in my solution, but there are other ways.

For Problem 4-5.b, describe a strategy that takes n chips with majority good, makes ⌊n/2⌋ comparisons, and results in a subset of no more than n/2 chips also of majority good. Then give a proof that your strategy maintains a majority good.

Give a "complete" solution for Problem 4-5.c, that is, code up a solution in a language of your choice, demonstrate correctness with JUnit tests, give a correctness proof, and an analysis. (The problem as stated in the text essentially asks for the correctness proof.) You can find my stub and some Junit tests at ~tvandrun/Public/cs445/c4p5j. In particular, I've made some classes to represent the chips, good and bad, being careful to make the difference between the two opaque to the code using the algorithm. Chip.java contains the code for the chips. The stub for the algorithm is in Diogenes.java.

For 7-1, here is a lemma that makes several parts easier:

At the end of the first iteration of the while loop, i = p.

Proof. By assignment, i is initially p-1. On the first iteration of the second repeat-until loop of the first iteration of the while loop, i is the incremented to p. Since A[i] = A[p] = x, the guard on the second repeat-until loop fails. With no other changes to i, i = p at the end of that first iteration of the while loop.

The significance of Problem 7-4 will be clearer to those who have taken Programming Language Concepts. If you haven't, you may want to discuss the problem with someone who has.

Problem 7-4.a is interesting (ie, complicated) because it involves both recursion and a loop.

Due Sept 26

Daily Sep 19. Do the following problem:

You are playing a computer game in which the hero must pass through a series of rooms and halls collecting treasure. There are 2n rooms (in pairs) and n-1 halls interspersed between the pairs. Each room has a one-way door to the next hall, and each hall has two one-way doors to the rooms of the next pair. The hero must, therefore, pass through exactly one room in each pair. The area looks something like

T_3,0 T_3,1

P₂

T_2,0 T_2,1

P₁

T_1,0 T_1,1

P₀

T_0,0 T_0,1

Each room has a certain amount of treasure, T_i,j. Halls do not have treasure, but they each have a guardian who demands payment to let the hero cross diagonally through the hall. So, to move from T_i-1,0 to T_i,0 is free, but to move from T_i-1,0 to T_i,1 costs P_i.

Devise and implement an algorithm to find the route that yields the most treasure. Analyze its efficiency.

A stub plus two unit tests in python can be found at ~tvandrun/Public/cs445/suppdynp/suppdyn/hero.py.

Daily Sept 19. Do the following problem (based on a problem by Susanne Hambrusch, 1998):

A lumberjack has an k-yard long log of wood he wants cut at n specific places j₁, j₂, ... j_n, represented as the distance of that cut point from one end of the log. (We can also consider the ends as trivial "cut points" j₀ = 0 and j_n+1 = k.) The sawmill charges $x to cut a log that is x yards long (regardless of where that cut is). The sawmill also allows the customer to specify the ordering and location of the cuts.

For example, if k = 20 and we want cuts at 3 yards, 6 yards, and 10 yards from the left end, then if we cut them from left to right the cost would be

20 + (20-3) + (20-6) = 20 + 17 + 14 = 51

But making the same cuts from right to left would cost

20 + 10 + 6 = 36

Devise and implement an algorithm to minimize the cost, and analyze its running time.

A stub plus two unit tests in python can be found at ~tvandrun/Public/cs445/suppdynp/suppdyn/sawmill.py.

HW Sep 24. Do Problems 8-4, 15-(4 & 6). Each of them are intended to be "complete": implement a solution, write unit tests for it, prove (something about) its correctness, and analyze its efficiency. But here are specific ways to adapt the idea of a "complete solution" to these particular problems:

For problem 8-4, I have provided a code base in Java that you may complete: ~tvandrun/Public/cs445/c8p4j. It has a class c8p4.Jug with nested classes Jug.Blue and Jug.Red set up so that you can compare a red jug with a blue jug, but not two jugs of the same color. The class c8p4.TestJugSort is set up so that you can easily add test cases that will apply to both parts (a) and (c).
For problem 8-4(a), you may finish the stub JugSort.jugSelectionSort(). State an invariant for each loop, but you do not need to write proofs for them. You may explain the Θ(n²) comparisons by comparing your solution to a known sorting algorithm rather than analyzing it from scratch.
For problem 8-4(b), use Theorem 8.1 or its proof.
For problem 8-4(c), don't worry about making the algorithm "randomized," just shoot for the expected case O(n lg n) number of comparisons. You may finish the stub JugSort.jugQuickSort(). State an invariant for each loop, but you do not need to write proofs for them. You may explain the number of comparisons by comparing your solution to a known sorting algorithm rather than analyzing it from scratch.

For problem 15-4, I have provided a code base in Java that you may complete: ~tvandrun/Public/cs445/c15p4j. Specifically, your solution would complete the stub c15p4.NeatPrint.neatPrint(). This method takes a string (presumably long and with no newlines) and a line length. It returns a string like the one given but with newlines inserted.
You can test this informally using the main method of NeatPrint. For example,
java c15p4.NeatPrint gettysburg 50
will print the contents of the file gettysburg to the terminal with lines of length 50. Formal testing would be done with c15p4.TestNeatPrint, which you can easily add test cases to. However, you'll need to know ahead of time what the minimum penalty is for the given text and line length. So for writing your own testcases, I suggest working out a few small examples by hand (which is a good practice for unit testing anyway).
Instead of writing and proving invariants for your loops, state the recursive characterization of your solution, explaining what the variables mean as explicitly as possible. In your code, document the tables you use, again explaining what they mean as explicitly as possible and linking them to your formal recursive characterizaton.
The analysis of its runtime shouldn't be difficult.

For problem 15-6, I have provided a code base in Java that you may complete: ~tvandrun/Public/cs445/c15p6j. It includes an implementation of the company structure as described in the second paragraph of the problem: The class c15p6.PersonnelTree is used to represent a person, and a person has a link to one peer and one subordinate. To make things easier, I have provided a way to iterate over a person's subordinates, a way to retrieve the person's conviviality, and a factory method which takes a string representing the company tree/forest. For example, PersonnelTree.factory("(18(6(2)(3))(5(8)(1(10)(12))))") produces the corporate structure

In that case the most convivial guest list is

The class c15p6.PersonnelTree also has a public instance variable aux which you can use to annotate the nodes in the tree.
You may complete the stub c15p6.PartyPlanner.makeGuestList().
As with the previous problem, instead of loop invariant, describe the recursive characterization carefully and indicate the linkages between that characterization and the tables in your code. Write unit tests for small examples you can figure out by hand. The analysis will probably not be difficult.

Due Oct 3

Daily Sept 26. Read Sec 16.3. The premise is review, since the Huffman encoding is covered in DMFP. Our focus will be on the greedy choice property and other aspects of correctness and efficiency. Do Exercises 16.3-(2 & 4).

HW Oct 10. Do Problem 16-2 and 17-2. These are in the spirit of "complete" problems, but I'll specify below what to prove etc. Make sure you read the problems in the book before reading my elaborations, since I'm going to assume you have a basic idea of the premise of each problem.

For Problem 16-2 I have provided starter code, and one JUnit for each part, found at ~tvandrun/Public/cs445/c16p2j. Note that the output of a solution to part a is much simpler than that of part b. My stub for part a is void; the assumption is that the method will merely rearrange the given array of tasks into an optimal order. Part b somehow must construct a schedule indicating what portion of which tasks to run in what order. The suggestion implied by my stub is to return an ordered collection (such as an ArrayList) of "schedule units", each of which indicates a task to run and the length of time given to that task before it is preempted.

The constraint for part a is merely that all tasks are executed, which doesn't require any enforcement. The constraint for part b is that all tasks are completed and that no task is executed before it is released. Do not assume that it is possible to schedule the tasks in such a way that the processor is always busy. For example, if the tasks have running times 3, 4, 9, and 1 but release times 0, 5, 6, and 7, respectively, then after the 3-cycle task is finished executing, the other tasks haven't even been released yet, and so the schedule would need to include some idle time until another task is ready. (Obviously you want your schedule to include as little idle time as possible).

For the correctness proof in each part, explain what the subproblem is and what the greedy choice is, and then prove that the problem has the greedy choice. Recall the structure of proof like that: suppose a solution for a given subproblem exists that doesn't use the greedy choice; construct a solution based on that supposed one but that does use the greedy choice; show that your constructed solution is as good as or better than the supposed solution. You should also demonstrate correctness with simple JUnit tests, and you should analyze the running time of your solutions.

For Problem 17-2, I have provided starter code found ~tvandrun/Public/cs445/c17p2j. The class DynamicBinarySearchSet is set up to implement the same Set interface as we used in CSCI 345. I have also provided the whole suite of testcases for sets from that class, which you can use through DBSSTest. You do not need to write your own testcases. My apologies that those testcases are not separated into testing the different parts of this problem. Also, you won't be able to test part a much until part b is also done. Notice also that n is stored in a byte, so there can be at most 8 arrays.

For part a, implement the helper method search(), which containsKey() and other methods use. Write an invariant for each loop (in the code is fine, I've provided some examples of [what I consider] good invariants in the given code). Find the amortized time of your implementation, and show how you determined it.

For part b, implement the helper method add(), for which I suggest you write the helper method merge(), whose stub is provided. Write an invariant for each loop you write. Find the amortized time of your implementation, and show how you determined it.

For part c, describe a strategy for deletion. You do not need to implement delete() (I haven't gotten around to it myself), but if you want to just for fun, you can test it with DBSSRTest. Find the amortized time of your strategy, and show how you determined it.

Finally, implement the iterator. I suggest the iterator's state have variables i and j where i, j is the location of the next item to return, unless the iteration is finished, in which case i is 8. In other words, I hightly recommend you set things up so that hasNext() is a one-liner and that in next() advances the state of the iterator to prepare for the following call to next()

Due Fri, Oct 19.

B Tree exercise. Instead of homework on B trees, we are turning this into an in-class activity in which we will complete an implementation of a B tree class, focusing on insertion. Find the starter code at ~tvandrun/Public/cs445/btreej. This exercise will feel very much like a CSCI 345 project.

Specifically,

Finish the method BNode.split(), which copies about half keys and values from a node to a new sibling.
Finish the method Leaf.insertNonFull(), which inserts a new key into a leaf node on which the method is called, with the precondition that the node is not full and thus could take this key/value without being split.
Finish the method Internal.splitChild()
Finish the method Internal.insertNonFull() which inserts a new key and value into a subtree rooted at the node on which the method is called, with the precondition that the node is not full.
Finish the method BTreeMap.iterator(). There is more than one way to do this, but the recommended way (which the given code provides a context for) is to maintain a "breadcrumb" stack indicating the trace of nodes on the path to the current position.

If you have time after writing these, then write test cases that exercise the code on large amounts of data. Extra credit if you write legitimate test cases that break my solution.

Turn your BTreeMap.java file to /cslab/class/cs445/(your user id)/btree. This will count towards your participation grade (not towards homework).

HW Oct 19.

Do Problem 30-1. Starter code in Python can be found at ~tvandrun/Public/cs445/c30p1p. I have a fair amount of explanation for the parts of this problem and a few hints, but I put them in this separate document. Please read the problem as it appears in the book first, then read my hints and elaboration. Also, if you want some help working through the fairly dense algebra, please ask.

Then implement the Graham's Scan algorithm (CLRS pg 1031) in the code base found in ~tvandrun/Public/cs445/c33examplesj, or implement it from scratch in a language of your choice. In the given code base, your task is to implement the initialization of the algorithm in GrahamScanner.reset() and one iteration of the main loop in Graham.Scanner.actionPerformed()

Due Wed, Oct 31.

HW Nov 12.

Do problems 3.1.10.(a & b) and 4.1.(8-10) in Lewis and Papadimitriou.

For 3.1.10, notice that the preamble to this problem gives a definition of regular grammar in terms of a special case of a context free grammar. We already know what a regular language is. For part b, you need to prove that a language is regular (old definition) if and only iff there exists a regular grammar (new definition) for it. Keep in mind what we already know---if a language is regular, we know there is a RE and DFA and NFA for it, you can pick which is most useful. Also, Example 3.1.5 (pg 119) can be used as a hint---to an extent. First, Example 3.1.5 constructs a grammar in which the rules have exactly one terminal followed by exactly one nonterminal; regular grammars described in this exercise can have any number of terminals, followed by a single nonterminal. Second, Example 3.1.5 is not worked out as fully as this exercise demands. I will grade your submission mostly on the structure of your proof, but do the best you can on the details also.

Due Mon, Nov 19.

HW Nov 14. Do problems 5.4.2.(c-i) in Lewis and Papadimitriou. These are difficult. I want to incentivize your writing good, complete proofs, yet make allowance for the fact that not all of you will get all of them, and at the same time disincentivize wild guessing and hand-waving. Here are some principles to guide you.

Notice that you are not told that these problems are undecidable. In fact, some are decidable [unless I'm mistaken...]. So your first task for each one is to determine whether it's decidable or not.
The text says "Explain your answers carefully." I say, "Prove your answers."
The best answer is first of all correct (decidable vs undecidable), and also has a complete, formal proof. The "long answers" given on the slides from class are examples of what I consider to be complete, formal proofs of undecidability. The elements of such a proof are,
- Identifying the known undecidable problem that you will reduce to the given problem.
- Supposing (and naming) a machine that decides the given problem (which supposition you will contradict at the end of the proof).
- Describing of the machine that decides the problem being reduced, using the supposed machine as a component. The most important part of this is how the input to the machine you're building is transformed into the input of the machine you've assumed exists---this transformation is the function τ in the definition of a reduction. Especially make sure that you keep the inputs and outputs to the various machines straight, and that you identify them clearly.
- Verifying that the machine you constructed accepts a string iff the supposed machine accepts the transformed string. This is straighforward---in my proofs, I just say "Note that by how we defined..."
- Verifying that the machine you constructed decides the problem already known to be undecidable.
- Concluding that the given problem is undecidable.
- Feel free to include a diagram. I like diagrams.
A proof that a problem is decidable is simply an algorithm that decides it. You may describe the algorithm in any reasonable way: As a Turing machine, in pseudo-code, etc.
If you are unable to write a good proof, then the next best thing is a correct answer with an informal proof. The "short answers" in the slides are examples of informal proofs, consisting in a sketch of the reduction. Diagrams can help informal proofs, too.
Less---but some---credit will be given for incorrect answers accompanied by proof attempts that show progress in understanding the material. If you are doubtful about your answer, you are encouraged to identify it as uncertain, and that you've made your best attempt.
Little credit, if any, will be given for vague answers passed off as legitimate proofs.

Due Wed, Dec 5.

HW Dec 5. Do Problems 7.3.4 (f & h) in Lewis and Papadimitriou and Exercise 34.5-2 in CLRS.

The book instructions for LP 7.3.4 say "prove that it is NP-complete by showing that it is the generalization of an NP-complete problem. Give the appropriate parameter restriction in each case." That seems to suggest a brief answer, "it's just like this other known NP complete problem, just change this or that parameter to..." But that is not my intention with this assignment. You should do a complete NP-completeness proof for each of these (where "each"= parts f and h, which are assigned; trying the others wouldn't be a bad studying strategy, though). That means, prove that the problem is in class NP, then that the problem is NP hard by showing a reduction. Of course, you may use a generalization or specialization of the problem for the reduction. That just makes the problem a bit easier. You still must do them proofs completely. Same thing for the other problem assigned, CLRS 34.5-2.

Clarification on LP 7.3.4.f: The phrase "two nodes 1 and n" means two distinct nodes. The phrase "not repeating any node twice" should be read as "not repeating any node" or "not visiting any node twice." (Taken literally, "not repeating any node twice" means "not visiting any node three times." I do not think that's what the authors intended.)

CLRS 34.5-2 is actually the same problem as LP 7.3.4.e, just stated a little more clearly and with a different hint. Note that what CLRS calls "3-CNF-SAT" is what LP calls "3-SAT".

Due Wed, Dec 12.

Thomas VanDrunen

Last modified: Wed Dec 5 10:40:36 CST 2018