Lecture 11: Multimodal Data Processing
(Last updated: Feb 3, 2025)
This lecture introduces the overall picture of machine learning tasks with multimodal data.
Check the GenAI usage policy if you are using the course materials with GenAI for self-study and fact-checking.
Preparation
Read the required course readings.
Lecture
Below are the slides:
Required Course Readings
- Section 5.5.2 (Recurrent neural networks) and 5.5.3 (Transformers) in book Computer Vision: Algorithms and Applications (Szeliski, 2022).
Optional Course Readings
- Liang, P. P., Zadeh, A., & Morency, L. P. (2024). Foundations & trends in multimodal machine learning: Principles, challenges, and open questions. ACM computing surveys.
- Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning.