BIS 2040 Home Page The Next Page The Next Page
Middlesex Logo

Week 5: Text Mining

The lecture and seminar will be led by Chris Huyck.
Text Mining is Data Mining with natural Language. Natural language is the kind of languages people speak like English, French, or Urdu; this is opposed to formal languages which include HTML, C++, and ASCII. There are a number of techniques that can be used to help understand natural language, and make programs that extract information from natural language.
Church and Rau's ACM paper. Additionally, James Allen's Natural Language Understanding Book is a great introduction to the theory and practice of text mining. It's one of the recommended readings.
The lecture notes and seminar notes and are also on this site. You should read K. Church and L. Rau (1995) Commercial Applications of Natural Language Processing. In Communications of the ACM 38:11 pp. 71-79 before the lecture.