Data modeling project

Your work in this class will include a project in which you use the ML programming language to model real-world data of phenomena. You will pick some domain of information, write a set of ML datatypes that capture that information, and write a set of ML functions that operate on that information.

As a quick example, in Sections 1.10 and 2.3 of the textbook, a series of datatypes are used to model meals at a restaurant:

datatype bread = White | MultiGrain | Rye | Kaiser;

datatype spread = Mayo | Mustard;

datatype vegetable = Cucumber | Lettuce | Tomato;

datatype deliMeat = Ham | Turkey | RoastBeef | ExtraVeg of vegetable ;

datatype noodle = Spaghetti | Penne | Fusilli | Gemelli | Farfalle;

datatype sauce = Pesto | Marinara | Creamy;

datatype protein = MeatBalls | Sausage | Chicken | Tofu;


datatype entree = Sandwich of bread * spread * vegetable * deliMeat 
                              | Pasta of noodle * sauce * protein;

datatype salad = Caesar | Garden;

datatype side = Fries | Chips | CarrotSticks | GarlicBread | Salad of salad;

datatype beverage = Water | Coffee | Pop | Lemonade | IceTea;

datatype meal = Meal of entree * side * beverage;

Operations on datatypes like this could include calculating something from them (say, the calories or price of a meal), comparing two values of this data type (if a sandwich could have an indefinite number of layers, then we could check which of two sandwiches had more stuff), or modifying a value of this data type (for example, substituting meat with something plant-based to make a meal vegetarian).

The textbook contains many and diverse examples of datatypes modeling real-world data and phenomena, including

There are others in Chapters 8 and 10, not covered in class.

You are permitted to expand on one of these (as long as you take it in a new, interesting direction), but you are encouraged to come up with your own domain of information, something that interests you and that you already have knowledge about. To get you started thinking, here are some projects that have been done in previous semesters:

(For what it is worth, some of the best projects have been the music theory ones and the chemistry ones.)

Ideally you should work on this with one partner (team of two), but I will also allow solo projects and teams of three. Work on this project will be spread throughout the semester so that you can improve your model and make it more sophisticated as you learn more ML. This shouldn't be a reason to put it off, however. For most projects, most of what you need to know will be covered in the first few weeks of the class, so most teams will be able to do most of the work early in the semester.

Your final submission will consist in four parts:

The schedule for this work will be


Thomas VanDrunen
Last modified: Thu Jan 23 10:23:52 CST 2020