AAI114: Text Mining

What is this course about?

This course introduces core concepts and methods in text mining, focusing on how large-scale textual data can be represented, modeled, and analyzed using machine learning techniques. The course begins with a review of fundamental machine learning concepts and progresses to text representation methods, ranging from sparse models to dense and contextual representations including word embeddings and transformer-based language models. Building on these representations, students will study text classification methods, semi-supervised and multi-task learning, and domain adaptation techniques. The course also covers modern search systems, including lexical and neural retrieval, and LLM-enhanced retrieval, with an emphasis on applying text mining techniques to real-world problems.

Resources: There is no required textbook for this class. Slides are mostly self-contained. You can refer to the following books for further understanding:

Jurafsky and Martin, Speech and Language Processing

Prerequisites: Students are expected to have the following background:

Familiarity with basic programming (Python 3)
Basic knowledge of linear algebra and probability

Grading

Programming assignments: 20%
Midterm exam: 40%
Final exam: 40%

Schedule (Subject to changes)

AAI114 schedule

Previous offerings

2025 Fall

Page updated

Google Sites

Report abuse