The goal of this project is to understand grammars and syntactic parsing by implementing the CKY algorithm. By the time you're done with this, you'll be certifiably mid-level competent in dynamic programming (the "mid-level" part is because you still don't have experience devising dynamic programming solutions, just implementing known ones.)
The CKY at its heart is not a complicated algorithm, but since it is an "iterate through the diagonals" dynamic programming algorithm, it does require careful thinking about indices in the table and has deep nesting structure of its loops.
Get the given code for this project from
/homes/tvandrun/Public/cs384/proj5
All of your work will be done in the file parser/cky_parser.py
.
It won't involve a large amount of code---I removed a mere 28 lines of code
from my solution in preparing the given code.
However, that code will be very dense;
my solution include nested loops seven levels deep (the outer two of
which are still in the given code).
The grammar for the language, which already is embedded into the given code, is found on the handout given in class Nov 17, or here:
Take note of what things are in not in the grammar. For example, it has no possesive pronouns (her would always be interpreted as a personal pronoun). Of course, if you want to add to the grammar, go ahead...
Use the program by giving it a sentence at the commandline:
$ python parser/cky_parser.py "the cat chased the dog in the kitchen" (Sentence (NounPhrase (ConcNP (CNPA (Det the) (Nominal (Noun cat))))) (VerbPhrase (VPA (VPB (Verb chased) (NounPhrase (ConcNP (CNPA (Det the) (Nominal (Noun dog)))))) (PrepPhrase (Prep in) (NounPhrase (ConcNP (CNPA (Det the) (Nominal (Noun kitchen))))))))) $ python parser/cky_parser.py "she knew that he knew that she knew that he knew that she knew that he knew that she knew that he loved her" (Sentence (NounPhrase (ConcNP (CNPA (Pronoun she)))) (VerbPhrase (VPA (VPB (Verb knew) (NounPhrase (AbsNP (That that) (Sentence (NounPhrase (ConcNP (CNPA (Pronoun he)))) (VerbPhrase (VPA (VPB (Verb knew) (NounPhrase (AbsNP (That that) (Sentence (NounPhrase (ConcNP (CNPA (Pronoun she)))) (VerbPhrase (VPA (VPB (Verb knew) (NounPhrase (AbsNP (That that) (Sentence (NounPhrase (ConcNP (CNPA (Pronoun he)))) (VerbPhrase (VPA (VPB (Verb knew) (NounPhrase (AbsNP (That that) (Sentence (NounPhrase (ConcNP (CNPA (Pronoun she)))) (VerbPhrase (VPA (VPB (Verb knew) (NounPhrase (AbsNP (That that) (Sentence (NounPhrase (ConcNP (CNPA (Pronoun he)))) (VerbPhrase (VPA (VPB (Verb knew) (NounPhrase (AbsNP (That that) (Sentence (NounPhrase (ConcNP (CNPA (Pronoun she)))) (VerbPhrase (VPA (VPB (Verb knew) (NounPhrase (AbsNP (That that) (Sentence (NounPhrase (ConcNP (CNPA (Pronoun he)))) (VerbPhrase (VPA (VPB (Verb loved) (NounPhrase (ConcNP (CNPA (Pronoun her)))))))))))))))))))))))))))))))))))))))))))))))))) $ python parser/cky_parser.py "that he loved her troubled her" (Sentence (NounPhrase (AbsNP (That that) (Sentence (NounPhrase (ConcNP (CNPA (Pronoun he)))) (VerbPhrase (VPA (VPB (Verb loved) (NounPhrase (ConcNP (CNPA (Pronoun her)))))))))) (VerbPhrase (VPA (VPB (Verb troubled) (NounPhrase (ConcNP (CNPA (Pronoun her))))))))
If there is more than one way to parse a sentence, all parses will be
listed.
Also the program will output a file tree0.dot
(and
tree1.dot
etc if there is more than parse) containing a
description of the parse tree readable by the dot
program,
and it will call dot
and produce tree0.png
,
which can be viewed using your favorite image viewer (on the lab machines,
that's eom
).
Some info about the given code:
TableEntry
s.
Each TableEntry
is a collection of TableRecord
s.
TableRecord
is the result of a parse
(essentially it is a subtree in the parse tree), and
contains information about the grammar production used in the parse.
It also contains string representations of all of the parses that make
that result.
TableEntry
has methods to support adding new
records of various kinds, retrieivng records, retrieving the non-terminals
of the records, and calculating the "closure" of the records---that is,
generating new records that result from unit parses.
(For example, if a subsequence can be parsed as a VPB
,
then we also want records for VPA
and
VerbPhrase
.
duals
is a dict standing for the dual productions in the
grammar.
It maps pairs of NTs to NTs, for example, looking up
('NounPhrase', 'Verbphrase')
will get Sentence
.
Similarly units
is a dict for the unit productions,
and vocab
is a dict from individual pronouns, prepositions,
articles, etc, to their respective POSs.
(For nouns, verbs, adjectives, and adverbs, you'll need to use WordNet.)
Turn in your code (I don't think any write-up would be necessary
since there are no decisions to be made)
by copying your cky_parser.py
to
/cslab/class/cs384/(your login id)/proj5
Please turn in just that file, not the whole parser
folder.
DUE: Midnight between Friday, Dec 8 and Saturday, Dec 9 (ie, the last day of classes). Note that project 6 will also be due then.