Lecture 2: Data Science Fundamentals
Contents
Lecture 2: Data Science Fundamentals#
(Last updated: Feb 20, 2023)
Preparation#
Read the Smell Pittsburgh paper.
This project is an example of a data science pipeline.
Reading this work will help you get a basic understanding of data science pipelines.
You do not need to understand all techniques in the paper. Some of the techniques can be too difficult for you. Try your best to get the big picture.
Materials#
Note
The slides were updated to correct some errors. So the new ones are different from the ones that were used in the lecture.
Additional Resources#
The paper below studies various data science pipelines at different scale, which can give you a good understanding of common data science practices:
Below are website for data visualization inspirations:
Below are interesting data science case studies:
The textbook below contains more information about how to select models:
Section 11.8 Comparing Different Models in book: Introduction to Statistics and Data Analysis
The websites below contains exercises for Python pandas: