Lecture 8: Text Data Processing (Part II)
(Last updated: Feb 29, 2024)
This lecture introduces the theory for text data processing, including preprocessing (tokenization, lemmatization, pos-tagging), word embeddings, topic modeling, sequence-to-sequence modeling, and the attention mechanism.
Preparation
Watch the following videos to understand some math concepts that will be used in this lecture:
Materials
Additional Resources
A video that explains the attention mechanism:
The following paper explains Topic Modeling and the intuitions:
The following paper explains how to train word embeddings using Word2Vec:
The following paper explains how to use the attention mechanism for document classification:
The Hugging Face website below documents a list of state-of-the-art Transformer-based models: