Lecture 2: Data Science Fundamentals#

(Last updated: Feb 20, 2023)

Preparation#

Read the Smell Pittsburgh paper.

  • This project is an example of a data science pipeline.

  • Reading this work will help you get a basic understanding of data science pipelines.

  • You do not need to understand all techniques in the paper. Some of the techniques can be too difficult for you. Try your best to get the big picture.

Materials#

Note

The slides were updated to correct some errors. So the new ones are different from the ones that were used in the lecture.

Additional Resources#

The paper below studies various data science pipelines at different scale, which can give you a good understanding of common data science practices:

Below are website for data visualization inspirations:

Below are interesting data science case studies:

The textbook below contains more information about how to select models:

The websites below contains exercises for Python pandas: