DATA304: Bigdata Analysis

What is this course about?

This course focuses on big data analytics with an emphasis on text data, which constitutes a large portion of the data generated daily in modern systems. The course aims to answer a central question: how to represent large-scale data, how to learn from it, and how to apply it to real-world problems such as search and recommendation. Students will study core methods for text representation, including sparse and dense models, and learn how these representations are used in text classification, graph-based learning, and modern search and retrieval pipelines.

Resources: There is no required textbook for this class. Slides are mostly self-contained. You can refer to the following books for further understanding:

Jurafsky and Martin, Speech and Language Processing
Hamilton, Graph Representation Learning

Prerequisites: Students are expected to have the following background:

Familiarity with basic programming (Python 3)
Basic knowledge of linear algebra and probability
Introductory understanding of machine learning concepts

Grading

Attendance: 10%
- Up to five absences will have no penalties. Each absence beyond five will result in a 1% deduction.
Programming assignments: 20%
Project: 30%
Final exam: 40%

Schedule (Subject to changes)

DATA304 schedule

Previous offerings

2025 Fall: Special talk (Keunchan Park, NAVER), Final exam, Student review

Page updated

Google Sites

Report abuse