Text Data Processing#

(Last updated: Jan 29, 2024)1

All the content in this repository is licensed under CC BY 4.0. This module is about processing text data and has the following learning goals:

  • Goal 1: Apply the methods of text processing to large amounts of text data.

  • Goal 2: Preprocess text data, including tokenization, part-of-speech tagging, stemming/lemmatization, and stopword removal.

  • Goal 3: Extract features from text data, including topic modeling and word embeddings.

  • Goal 4: Reflect on how to select and tune the text processing pipeline for topic classification.

Table of Contents#


1

Credit: this teaching material is created by Robert van Straten under the supervision of Yen-Chia Hsu.