Lecture 8: Text Data Processing (Part II)

(Last updated: Feb 29, 2024)

This lecture introduces the theory for text data processing, including preprocessing (tokenization, lemmatization, pos-tagging), word embeddings, topic modeling, sequence-to-sequence modeling, and the attention mechanism.

Preparation

Watch the following videos to understand some math concepts that will be used in this lecture:

Materials

Additional Resources

A video that explains the attention mechanism:

The following paper explains Topic Modeling and the intuitions:

The following paper explains how to train word embeddings using Word2Vec:

The following paper explains how to use the attention mechanism for document classification:

The Hugging Face website below documents a list of state-of-the-art Transformer-based models: