The proliferation of vast quantities of available datasets that are large and complex in nature has challenged universities to keep up with the demand for graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data. To keep up with this demand, attracting students early on to data science as well as providing them a solid foray into the field becomes increasingly important. We present a case study of an introductory undergraduate course in data science that is designed to address these needs. Offered at Duke University, this course has no pre-requisites and serves a wide audience of aspiring statistics and data science majors as well as humanities, social sciences, and natural sciences students. We discuss the unique set of challenges posed by offering such a course and in light of these challenges, we present a detailed discussion into the pedagogical design elements, content, structure, computational infrastructure, and the assessment methodology of the course. We also offer a repository containing all teaching materials that are open-source, along with supplemental materials and the R code for reproducing the figures found in the paper.
翻译:大量和复杂的现有数据集数量庞大而复杂,要求各大学满足对接受过统计和计算技术培训的毕业生的需求,以有效规划、获取、管理、分析和传播这些数据的研究结果。为了跟上这一需求,吸引学生及早了解数据科学,并使他们进入实地,这一点越来越重要。我们介绍了针对这些需要而设计的数据科学入门本科课程的个案研究。在杜克大学提供,该课程没有先决条件,而且为广大受众提供了有抱负的统计和数据科学学科以及人文、社会科学和自然科学学生所需的技能。我们讨论了提供这种课程和面对这些挑战而构成的独特挑战。我们详细讨论了教学设计要素、内容、结构、计算基础设施以及课程的评估方法。我们还提供了一个储存库,其中载有所有公开的教材,以及补充材料和复制文件中发现的数字的R码。