Bring a laptop. We will download a number of free packages and use them to solve practical problems in natural language processing. We will start with NLTK (Natural Language Toolkit). We will also do many of the same things (computing concordances and word associations) in Unix (with Unix for Poets). Concordances are closely related to suffix arrays, which we will compute with just a few lines of C code. We will download R (a statistics package) and use that to work through the example in the classic paper on Latent Semantic Indexing. R will be used to introduce SVD, clustering, logistic regression, data visualization and more.
Ken Church is currently the Chief Scientist at HLTCOE (Human Language Technology Center of Excellence) at Johns Hopkins University. He was previously at Microsoft Research and AT&T Labs-Research. He has worked on many topics in computational linguistics including: web search, language modeling, text analysis, spelling correction, word-sense disambiguation, terminology, translation, lexicography, compression, speech (recognition and synthesis) and more. Honors: AT&T Fellow. He is the VP of ACL and the President of SIGDAT (the special interest group that runs EMNLP). Additional information is available at http://www.cs.jhu.edu/~kchurch/ .