In-class activity: Stylometry

This activity will guide you through some experiments using the stylo package for R. You may, to a certain extend, go off script to explore a question different from the ones listed here, as long as it is in the same spirit. Please keep track of what you do in a "notebook" sort of file what you work on, including what experiments you use to investigate the questions and what results you find.

We will be working on this in class on Wednesday as well. I might add some exercises to this by then, and you may need to put in some time outside of class to complete this. When you are finished, please turn in your record of experiments (and perhaps some stylo-generated charts as well) to the turn-in folder indicated at the end (not by email).

Find corpora for these exercises at ~/tvandrun/Public/cs384/sty-aa.

1. The Federalist papers

In the folder federalist, you will find each of the Federalist papers as a separate file. See if you can classify them into authors. There is also the entire file federalist-all that contains the answers (sort of... for some papers it says "HAMILTON OR MADISON"). You could do rolling stylometry on that file.

2. The Oz books

After Frank L Baum's death, other authors continued the Oz books. In the folder oz, you can find a whole bunch of Oz books written by Baum himself and also two other authors. Can stylo correctly distinguish the authors? There is one catch---at least one of the books was apparently finished by a non-Baum author using Baum's notes or unfinished draft. Is there evidence of any book showing the fingerprints of more than one author?

3. The Presbyterian Conflict

The book The Presbyterian Conflict, about a split in the Presbyterian Church, USA in the 1920s and 30s, is of disputed authorship. Some have suspected that the author under whose name it was published made use of a stolen manuscript by another author who was deceased at the time of its publishing. The folder presconflict contains the text of that work and works by some authors I think may be fair points of comparison (not necessarily candidate authors for this text). What evidence can you find of stylistic similarity? In the folder presconflict-chapters each chapter in the work is stored in a separate file. Are there measurable stylistic differences among chapters?

Please turn in your "notebook" (that is, record of what you did to address these questions with what results and to what conclusions), along with graphics if appropriate to /cslab/class/cs384/[your login id]/ica-stylo .


Thomas VanDrunen
Last modified: Wed Dec 6 11:21:18 CST 2017