This paper studies grading algorithms for randomized exams. In a randomized exam, each student is asked a small number of random questions from a large question bank. The predominant grading rule is simple averaging, i.e., calculating grades by averaging scores on the questions each student is asked, which is fair ex-ante, over the randomized questions, but not fair ex-post, on the realized questions. The fair grading problem is to estimate the average grade of each student on the full question bank. The maximum-likelihood estimator for the Bradley-Terry-Luce model on the bipartite student-question graph is shown to be consistent with high probability when the number of questions asked to each student is at least the cubed-logarithm of the number of students. In an empirical study on exam data and in simulations, our algorithm based on the maximum-likelihood estimator significantly outperforms simple averaging in prediction accuracy and ex-post fairness even with a small class and exam size.
翻译:本文研究了随机考试的评分算法。在随机考试中,每个学生从大题库中随机选择了少量的问题。主要的评分规则是简单平均,即通过对每个学生所回答的问题的分数取平均来计算各自的成绩,这种方法虽然从每组随机问题的平均分数来看公平,但在实际完成的问题上却不公平。因此,公平的评分问题是如何估计每个学生在所有问题上的平均分数。本文证明,在大多数情况下,使用二分图中Bradley-Terry-Luce模型的最大似然估算器,能够在至少给每个学生提问的问题数量为学生人数的对数的三次幂时,能够以极高的概率获得一致结果。在考试数据的实证研究和模拟中,我们基于最大似然估算器的算法在预测准确性和实现公平性方面,即使在班级和考试规模小的情况下,均显著优于简单平均方法。