Automated scoring of free drawings or images as responses has yet to be utilized in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a computer based international mathematics and science assessment. We are comparing classification accuracy of convolutional and feedforward approaches. Our results show that convolutional neural networks (CNNs) outperform feedforward neural networks in both loss and accuracy. The CNN models classified up to 97.71% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for large scale assessments, while improving the validity and comparability of scoring complex constructed-response items.
翻译:免费绘图或图像的自动评分尚未用于对学生成绩的大规模评估。在本研究中,我们建议人工神经网络,从基于计算机的国际数学和科学评估中对这些类型的图形响应进行分类。我们正在比较进化和进化方法的分类准确性。我们的结果表明,进化神经网络(CNNs)在损失和准确性两方面都优于进取神经网络。CNN模型将高达97.71%的图像响应划入适当的评分类别,该类别即使不比典型的人标数者更准确,也比得上普通的人标数者。这些发现由于观察到最准确的CNN模型正确分类了一些被人类标数错误的图像响应。作为一项额外的创新,我们概述了一种方法,根据应用项目响应理论产生的预期响应功能,为培训样本选择人分评分的答复。本文指出,CNNNCM的图像响应自动评分是一种非常准确的程序,可以取代大规模评估的第二次人标数者的工作量和费用,同时提高复杂构建响应项目的评分的有效性和可比性。