Peer grading systems aggregate noisy reports from multiple students to approximate a true grade as closely as possible. Most current systems either take the mean or median of reported grades; others aim to estimate students' grading accuracy under a probabilistic model. This paper extends the state of the art in the latter approach in three key ways: (1) recognizing that students can behave strategically (e.g., reporting grades close to the class average without doing the work); (2) appropriately handling censored data that arises from discrete-valued grading rubrics; and (3) using mixed integer programming to improve the interpretability of the grades assigned to students. We show how to make Bayesian inference practical in this model and evaluate our approach on both synthetic and real-world data obtained by using our implemented system in four large classes. These extensive experiments show that grade aggregation using our model accurately estimates true grades, students' likelihood of submitting uninformative grades, and the variation in their inherent grading error; we also characterize our models' robustness.
翻译:同侪分级制度将来自多个学生的吵闹报告汇总到尽可能接近真实的年级上,目前大多数系统采用所报告的年级的平均值或中位数;其他系统的目的是根据概率模型估计学生的分级准确性,本文以三种主要方式扩展后一种方法的先进程度:(1) 认识到学生可以采取战略行为(例如,报告接近班级平均的年级而不做工);(2) 适当处理来自不同价值分级的分级的受审查数据;(3) 使用混合整数编程来提高分配给学生的分级的可解释性; 我们展示如何使巴耶斯语的推理在模型中变得实用,并评价我们使用我们四个大班的系统获得的合成数据和真实世界数据的方法; 这些广泛的实验表明,利用我们的模型准确估计真实的年级、学生提交不具有说服力的分级的可能性以及他们固有的分级差;我们还描述了我们的模型的强健性。