# Lecture 2: Data Science Fundamentals

(Last updated: Jan 23, 2024)

This lecture recaps the fundamentals of data science, such as table operations, classification, and regression.

## Preparation

Read the Smell Pittsburgh paper.

- This project is an example of a data science pipeline.
- Reading this work will help you get a basic understanding of data science pipelines.
- You do not need to understand all techniques in the paper. Some of the techniques can be too difficult for you. Try your best to get the big picture.

## Materials

- Slides for Lecture 2-1: Data Science Fundamentals (Pipeline)
- Slides for Lecture 2-2: Data Science Fundamentals (Modeling)

## Additional Resources

The paper below studies various data science pipelines at different scale, which can give you a good understanding of common data science practices:

Below are website for data visualization inspirations:

- Seaborn: Statistical Data Visualization
- Exploratory Data Analysis by the US EPA
- Examples of Data Exploration by the Statistics Netherlands
- Examples of Data Visualization

Below are interesting data science case studies:

The textbook below contains more information about how to select models:

- Section 11.8 Comparing Different Models in book: Introduction to Statistics and Data Analysis

The websites below contains exercises for Python pandas: