CSCI 384. Computational Linguistics

Fall 2023
Thomas VanDrunen



Meeting time: MWF 12:55am-2:05pm.
Meeting place: Meyer 131

Office hours: Schedule through Calendly
Contact: 163 Mey Sci; 752-5692; Thomas.VanDrunen@wheaton.edu
http://cs.wheaton.edu/~tvandrun/cs384


Syllabus

Course textbook: Jurafsky and Martin, Speech and Language Processing, 3e

Programming assignment guide

My Calendly page (for office hours)



Final exam: Tues, Dec 12, 10:30-12:30pm


Moon's dayWoden' s dayFrigga's day

Aug 21

NO CLASS

Aug 23

Introduction
Slides

Aug 25

LAB: Python warm-up and NLTK

Aug 28

Regular expressions
Slides

Aug 30

LAB: Chatbot

Sept 1

Edit distance
Slides

Sept 4

NO CLASS

Sept 6

Information theory
Slides

Sept 8

LAB: Autoregressive text generation

Sept 11

Ngrams, language statistics
Slides

Sept 13

Language models, smoothing

Sept 15

Linear interpolation

Sept 18

Finish linear interpolation; begin parts of speech
Slides

Sept 20

Introduction to hidden Markov models
Slides

Sept 22

HMMs

Sept 25

More on HMMs

Sept 27

LAB: HMMs on character-level states

Sept 29

Parsing
Slides

Oct 2

LAB: Recursive descent

Oct 4

CKY parsing
Slides

Oct 6

LAB: Spelling correction

Oct 9

Review

Oct 11

MIDTERM

Oct 13

ML bootcamp
Slides

Oct 16

NO CLASS

Oct 18

NO CLASS

Oct 20

Bag-of-words model; text classification
Slides

Oct 23

Naive Bayes classification
Slides

Oct 25

Lab: NBC and sentiment analysis

Oct 27

Finish NBC
Slides

Oct 30

Stylometry and authorship attribution
Slides

Nov 1

Lab: Stylometry

Nov 3

Applied stylometry

Nov 6

Vector semantics and embeddings
Slides

Nov 8

No class---assignment work day

Nov 10

Word2Vec
Slides

Nov 13

More word2vec
Slides
Results

Nov 15

Neural nets
Slides

Nov 17

LAB: Neural net language models

Nov 20

RNNs and LTSMs
Slides

Nov 22

NO CLASS

Nov 24

NO CLASS

Nov 27

Machine translation
Slides

Nov 29

LAB: Machine translation

Dec 1

Text generation
Slides

Dec 4

Large language models

Dec 6

Ethical questions

Dec 8

Review