Lecture 2: Data Science Fundamentals
(Last updated: Jan 27, 2026)
This lecture recaps the fundamentals of data science, such as table operations, classification, and regression.
Check the GenAI usage policy if you are using the course materials with GenAI for self-study and fact-checking.
Preparation
Read the required course readings.
Lecture
Below are the slides:
- Slides for Lecture 2-1: Data Science Fundamentals (Preprocessing)
- Slides for Lecture 2-2: Data Science Fundamentals (Modeling)
Below is the link to the online notebook:
Follow the steps on the notebook page to set up the notebook.
Required Course Readings
- The following sections in book An Introduction to Statistical Learning (James et al., 2013)
- 2.2.1 (Measuring the Quality of Fit)
- 3.1.1 (Estimating the Coefficients)
- 3.1.3 (Assessing the Accuracy of the Model)
- 9.1.1 (What Is a Hyperplane?)
- 9.1.2 (Classification Using a Separating Hyperplane)
Optional Course Readings
- Section 5.3 (Hyperparameters and Validation Sets, including 5.3.1) in book Deep Learning (Goodfellow et al., 2016).
- Section 4.5.1 (Rosenblatt’s Perceptron Learning Algorithm) in book The Elements of Statistical Learning (Hastie et al., 2009)
Additional Resources
Below are website for data visualization inspirations:
- Seaborn: Statistical Data Visualization
- Exploratory Data Analysis by the US EPA
- Examples of Data Exploration by the Statistics Netherlands
- Examples of Data Visualization
Below are interesting data science case studies:
The textbook below contains more information about how to select models:
- Section 11.8 Comparing Different Models in book: Introduction to Statistics and Data Analysis
The websites below contains exercises for Python pandas: