Lecture 12: Human-Centered Data Science#

(Last updated: Mar 22, 2023)


This lecture is fully remote, which means that you do not need to come to the classroom physically. Please use the Zoom link on Canvas to attend this lecture.


There is no preparation for this lecture.


This remote guest lecture is given by by Dr. Jie Yang, an assistant professor from TU Delft EWI. Below is the abstract for this lecture:

Human Computation is a new, evolving research field focusing on harnessing human intelligence to solve computational problems that are beyond the scope of existing artificial intelligence algorithms. A closely related topic is Crowdsourcing, which stresses the participation of the public in exchange for monetary rewards. With the growth of the Web, human computation systems today can leverage an unprecedented number of people for task execution. This lecture introduces the basic concepts of human computation and crowdsourcing. We take a look at the history of human computation evolving from an idea inspired by CAPTCHA and ReCAPTCHA to a scientific area. We introduce the distinctive features of human computation and the key problems in human computation. The latter includes the design of human computation tasks and algorithms, output aggregation, task routing, etc. The lecture also addresses the connections of human computation and crowdsourcing to the Web and their critical roles in AI.

Additional Resources#

Recommended readings of this lecture include “Human Computation” by Law and Von Ahn and several representative papers that introduce human computation to other subfields of computer science. Franklin et al. introduce the idea to Databases by demonstrating CrowdDB, a human-in-the-loop database system that leverages crowdsourcing to answer open-ended queries. Demartini et al. show in Information Extraction that human intelligence can largely improve the quality of entity linking through their crowdsourcing system ZenCrowd. In Information Retrieval, Bozzon et al. propose CrowdSearcher that integrates search systems with social systems to allow interaction with people for the retrieval of personalized, opinionated information. In Machine Learning, the main application is crowdsourced training data creation. An important example is the ImageNet dataset for computer vision, which is often viewed as the catalyst for the AI boom we are experiencing today.

Recommended readings:

  • E. Law, and L. Ahn. “Human computation.” Synthesis lectures on artificial intelligence and machine learning 5.3 (2011): 1-121.

  • M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. “CrowdDB: answering queries with crowdsourcing.” In SIGMOD, pp. 61-72. 2011.

  • G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. “ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking.” In WWW. 2012.

  • A. Bozzon, M. Brambilla, and S. Ceri. “Answering search queries with crowdsearcher.” In WWW, pp. 1009-1018. 2012. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. “Imagenet: A large-scale hierarchical image database.” In CVPR, pp. 248-255,.2009.