Project 4: A population simulation

The goal of this project is to practice working with pointers and dynamic memory. It also reviews linked lists, putting them now in the context of C. The guts of this project is provided for you, but it depends on a linked list struct whose methods you must write.

1. Introduction

Suppose we are interested in dropping 300 19-year-olds onto an isolated island and observing the growth and other trends of the population over the course of 500 years, experimenting with various constraints on how the society they form works. Since time, cost, and ethics prevent us from running these experiments on real people, we will simulate this scenario in a computer program.

We will model the population and the people in the population with a collection of objects. For every "year" of the experiment, certain individuals of the population will give birth to new individuals, certain individuals will get married, and certain individuals will die. In this way the population will change over time (hopefully grow), and these things depend on rates of marriage, birth, and death. (We are assuming only married females give birth, and there is no divorce, though there is the possibility of remarriage if a spouse dies. Also, an individual must have lived a certain number of years before becoming eligible for marriage.)

The experiment this program performs in particular is tracing surnames. The population begins with 60 surnames represented equally. At the end of 500 years, how many surnames have died out, how many people have each surname, and what are the relative proportions of males and females for each surname?

Make a directory for this project and copy the given code.

cp /homes/tvandrun/Public/cs245/proj4/* .

2. Inspecting the given code.

The first thing to do is to become familiar with the given code. Start with the application population.c.

Before the main function there are a bunch of variables declared (if you declare a variable outside of any function in C, then it is globally scoped). Here is what they mean:

Every year, 30% of adults who are single get married, assuming there are enough of each gender to go around.
Every year, 16% of married females give birth. Of those births, 15% are twin births, all others singletons.
Every year, 1% of the population dies "prematurely"-- that is, in addition to those who die because they reach a maximum age.
A person must be 18 years of age to be considered an adult-- after that, he or she may become married.
Any person reaching 40 years of age dies. (Ok, that sounds a little young for a maximum age, but the reason for that age is that a person is no longer interesting for population growth after he or she is no longer of child-bearing age. So, if you prefer, you may think of this as though persons reaching 40 don't die, they merely stop having children and are no longer counted in the census. And their spouses may remarry.)

(After you get the program working, you can experiment with these settings, although I think I've gotten them in a sweet spot. If the population grows much faster you will see significatn decrease in performance and may run out of memory.)

You'll also see a collection of surnames stored in an array.

The entire population is modeled by a personList. We populate it initially with 300 people, 5 for each of the 60 available surnames. Since we are going to start the simulation at year 0, we give each of them a birth year of -19. Note that we have a function add() for personList.

The big loop represents the years. First we handle births for that year. This requires getting a list of all married females. Notice that the we have a function for finding all such persons from a personList, and it returns them in another, smaller personList. Then for each iteration of the inner loop (for births), a random female is chosen from that list; take note that there is a function for picking an element at random from a personList. Make sure you can follow all of the code in the "birth" section, except for the part about how a child's surname is chosen.

Next, the section on marriages. We use functions to find all the males and femailes in a personList who are eligible for getting married this year. Follow that code carefully.

The section on premature deaths shows that we need a removePerson() function for personLists. More interesting, though, are the "old age" deaths. We want to give the function a birth year and have it remove all persons with birth years earlier than that. However, we also want to update the spouses of all people that have died so that they now have null spouses and will appear as single. To do this, we need the function removeAllAge() to return all the items it removed---as a separate personList. Oh, but also, we want to iterate through that list, and so we would like to turn it into an array. Hence we need a toArray() function. Also, removing a person from a list implied that the corresponding person node should be deallocated---but not necessarily that the person should be deallocated. In the death sections, we do want to deallocate the person who dies, and that is done with a destroyPerson() function, which does the deallocation.

Then look at person.h. Values of this struct ("instances") will model individuals in the population. While a surname is a String, a given name is just a number serving as a unique identifier. A person has a birth year used to determine the person's age, and a person has a reference to a spouse. Finally, a person has a personList representing the children of this person. Glance at person.c and notice there are two functions for you to write.

personList.c is where you will spend most of your work. Look at personList.h. You'll see that the list struct itself is mainly a place holder for the head node. See also personNode.h, and make sure you understand the difference between a person and a person node.

3. Your tasks

Almost all of your work will be in personList.c, with a little in person.c and population.c. Compile as you go along (write your own makefile).

First, in person.c:

Write makePerson(), the analogue to a constructor. Allocate the struct and initialized the fields. The spouse is initially NULL; use makeList() (which you will write later) for the list of children.
Write destroyPerson() to undo what you did in makePerson(), nulling things out as appropriate for safety.

Then, in personList.c:

makeList(), as you did with makePerson().
add() -- The catch here is that personList list should act like a mathematical set. It should not contain any item more than once. Hence you should write this method in such a way that it will add the given item only if it is not already present in the list. Also, make sure you maintain the size variable.
removePerson() -- The tricky parts here are, make sure you consider the cases where the item is not in the list to begin with, where head is null, and where head contains the item to be removed. Also, maintain the size variable.
removeAllAge() -- This is the hardest function. Make sure you consider the case where there are several items in a row that need to be removed. Moreover, you need to store all the items you're removing in a new personList and return that list. Also, maintain the size variable.
getRandom() -- I have provided a function easyRand() to give a random integer between 0 (inclusive) and a given max (exclusive). Use it to retrieve (but not remove) a random element from the list. Each element should be equally likely to be chosen.
toArray() -- Make an array, fill it with the items in the list, and return it. Hint: How do you know how big to make the array?
marriedFemales(), marriableFemales(), and marriableMales() -- These are grouped together because they are very similar. In fact, you'll feel like you are writing pretty much the same method three times. That should make you feel like there is a better way to do this. More on that later.
destroyList()---undo what you did in makeList(). Make sure you deallocate all the nodes and the list, but not the people.

The trickiest part for you at this point is writing and using the make and destroy functions correctly. You need to think about when to deallocate a list, a node, and a person---don't leave something around that should be deallocated, and don't deallocate anything at the wrong time. (Don't ever deallocate a surname!)

At this point you can test the running of the program (see next section), but there still is one more (small) task for you do (see section 5).

4. Running the program

Ok, try it out. Brace yourself for segmentation faults.

When your program is working, ou will notice the program slowing down near the end. Even with all the randomization, you will almost certainly end up with statistics at the end reading between 78% and 80% of adults being married, a population of between 12000 and 22000, and about 2.4 children for each person having children. Here's a typescript of the output of my solution at this point. If your numbers are different from these, there is probably something wrong.

5. Running an experiment

At the end of the simulation, we report on the size of the population for each surname and give a breakdown of the male/female split for each surname. Currently, children are given surnames in the "patriarchal" manner-- each child gets his or her father's surname. You'll notice that while most surnames that are still in use after 500 years have pretty even gender splits, a fair number of surnames will have died off.

Marilyn Vos Savant has proposed that male children should take their father's surname and female children should take their mother's (premarital) surname (see http://www.parade.com/articles/editions/2007/edition_11-25-2007/Ask_Marilyn). Apart from whatever other societal value that might have, for our interest it would have the effect of reducing the number of surnames that die out. Under the patriarchal system, a surname may die out not only if it fails to reproduce at all, but also if it merely reproduces only females.

A downside to Vos Savant's proposal is that it could lead to some surnames becoming lopsided in terms of gender balance. If a surname produced more females than males in one generation, it would be likely to produces an even greater percentage of females in the following generation. Eventually some surnames would be exclusively or almost exclusively male, others exclusively or almost exclusively female.

The variable namingPolicy is used to indicate how surnames are passed on. 0 indicates the patriarchal way, 1 indicates Vos Savant's proposal, and 2 indicates the opposite, giving male children their mother's surname, and female children their father's surname.

Your task is to rewrite the function chooseChildName() at the end of population.c so that picks an appropriate surname for a child with a given father, mother, and gender, based on the current naming policy. Suggestion: write a switch statement on the variable namingPolicy. The function should return some default value like Xxxxxxxx, which you will see only if you have done something wrong.

The run the program several times experimenting with these policies. The -s flag with run Vos Savant's proposal, and -as will run the third option. Does the third option keep the number of dead surnames low while also preserving gender balance?

6. Turn in

The duplicate code in marriedFemales(), marriableFemales(), and marriableMales(), as well as the switch statement in chooseChildName() should leave you wanting a better way. How would you do this in Java? If you are in or have taken CSCI 243, how would you do this in ML? What sort of features would C need to do this better in C? Write your response in a file called something like COMMENTS.

Turn in your code by copying your directory (eg, proj4) including your COMMENTS file and makefile to a turn-in directory I've made for you.

cp -r proj4 /cslab.all/ubuntu/cs245/turnin/(your user id)

DUE: Monday, Apr 2, at 5:00 PM. (Project 5 will be assigned before this project is due, and it also will be due on Apr 2. I recommend having project 5 finished around March 19 and project 6 finished by March 26.)

Thomas VanDrunen

Last modified: Mon Mar 12 15:19:37 CDT 2012