This activity will guide you through some experiments using the stylo package for R. You may, to a certain extend, go off script to explore a question different from the ones listed here, as long as it is in the same spirit. Please keep track of what you do in a "notebook" sort of file what you work on, including what experiments you use to investigate the questions and what results you find.
We will be working on this in class on Wednesday as well. I might add some exercises to this by then, and you may need to put in some time outside of class to complete this. When you are finished, please turn in your record of experiments (and perhaps some stylo-generated charts as well) to the turn-in folder indicated at the end (not by email).
Find corpora for these exercises at
~/tvandrun/Public/cs384/sty-aa
.
In the folder federalist
, you will find
each of the Federalist papers as a separate file.
See if you can classify them into authors.
There is also the entire file federalist-all
that
contains the answers (sort of... for some papers it says
"HAMILTON OR MADISON").
You could do rolling stylometry on that file.
After Frank L Baum's death, other authors continued the
Oz books.
In the folder oz
, you can find a whole bunch of
Oz books written by Baum himself and also two other authors.
Can stylo correctly distinguish the authors?
There is one catch---at least one of the books was apparently
finished by a non-Baum author using Baum's notes or
unfinished draft.
Is there evidence of any book showing the fingerprints of more
than one author?
The book The Presbyterian Conflict, about a
split in the Presbyterian Church, USA in the 1920s and 30s, is of disputed
authorship.
Some have suspected that the author under whose name it was
published made use of a stolen manuscript by another author
who was deceased at the time of its publishing.
The folder presconflict
contains the text of
that work and works by some authors I think may be fair
points of comparison (not necessarily candidate authors
for this text).
What evidence can you find of stylistic similarity?
In the folder presconflict-chapters
each
chapter in the work is stored in a separate file.
Are there measurable stylistic differences among chapters?
Please turn in your "notebook" (that is, record of what you did to
address these questions with what results and to what
conclusions), along with graphics if appropriate to
/cslab/class/cs384/[your login id]/ica-stylo
.