Lecture 2: Data Science Fundamentals
(Last updated: Jan 23, 2024)
This lecture recaps the fundamentals of data science, such as table operations, classification, and regression.
Preparation
Read the Smell Pittsburgh paper.
- This project is an example of a data science pipeline.
- Reading this work will help you get a basic understanding of data science pipelines.
- You do not need to understand all techniques in the paper. Some of the techniques can be too difficult for you. Try your best to get the big picture.
Materials
- Slides for Lecture 2-1: Data Science Fundamentals (Pipeline)
- Slides for Lecture 2-2: Data Science Fundamentals (Modeling)
Additional Resources
The paper below studies various data science pipelines at different scale, which can give you a good understanding of common data science practices:
Below are website for data visualization inspirations:
- Seaborn: Statistical Data Visualization
- Exploratory Data Analysis by the US EPA
- Examples of Data Exploration by the Statistics Netherlands
- Examples of Data Visualization
Below are interesting data science case studies:
The textbook below contains more information about how to select models:
- Section 11.8 Comparing Different Models in book: Introduction to Statistics and Data Analysis
The websites below contains exercises for Python pandas: