In the data science courses at the University of British Columbia, we define data science as the study, development and practice of reproducible and auditable processes to obtain insight from data. While reproducibility is core to our definition, most data science learners enter the field with other aspects of data science in mind, for example predictive modelling, which is often one of the most interesting topic to novices. This fact, along with the highly technical nature of the industry standard reproducibility tools currently employed in data science, present out-of-the gate challenges in teaching reproducibility in the data science classroom. Put simply, students are not as intrinsically motivated to learn this topic, and it is not an easy one for them to learn. What can a data science educator do? Over several iterations of teaching courses focused on reproducible data science tools and workflows, we have found that providing extra motivation, guided instruction and lots of practice are key to effectively teaching this challenging, yet important subject. Here we present examples of how we deeply motivate, effectively guide and provide ample practice opportunities to data science students to effectively engage them in learning about this topic.
翻译:在不列颠哥伦比亚大学的数据科学课程中,我们把数据科学定义为从数据中获得洞察力的可复制和可审计过程的研究、开发和实践。虽然再生是我们定义的核心,但大多数数据科学学习者进入这一领域时都考虑到数据科学的其他方面,例如预测模型,这是新手通常最感兴趣的课题之一。这一事实,加上目前数据科学中所使用的工业标准再生工具高度技术性,提出了数据科学教室中再生教学的门外挑战。简而言之,学生没有内在的动力来学习这个课题,而且他们学习起来不容易。数据科学教育者能做些什么?除了侧重于再生数据科学工具和工作流程的教学课程的多次循环外,我们发现,提供额外的动力、指导和实践是有效教授这个具有挑战性但重要的课题的关键。我们在这里举例说明我们如何深入激励、有效地指导和提供大量实践机会,让数据科学学生有效地学习这个课题。