Understanding how automated grading systems evaluate essays remains a significant challenge for educators and students, especially when large language models function as black boxes. We introduce EssayCBM, a rubric-aligned framework that prioritizes interpretability in essay assessment. Instead of predicting grades directly from text, EssayCBM evaluates eight writing concepts, such as Thesis Clarity and Evidence Use, through dedicated prediction heads on an encoder. These concept scores form a transparent bottleneck, and a lightweight network computes the final grade using only concepts. Instructors can adjust concept predictions and instantly view the updated grade, enabling accountable human-in-the-loop evaluation. EssayCBM matches black-box performance while offering actionable, concept-level feedback through an intuitive web interface.


翻译:理解自动评分系统如何评估作文,对教育工作者和学生而言仍是一个重大挑战,尤其是在大型语言模型作为黑盒运行的情况下。我们提出了EssayCBM,这是一个优先考虑作文评估可解释性的、与评分标准对齐的框架。EssayCBM并非直接从文本预测分数,而是通过编码器上的专用预测头来评估八个写作概念(如论点清晰度和论据使用)。这些概念分数构成了一个透明的瓶颈,一个轻量级网络仅使用这些概念来计算最终分数。教师可以调整概念预测并即时查看更新后的分数,从而实现可问责的人机协同评估。EssayCBM在匹配黑盒模型性能的同时,通过直观的网页界面提供可操作的概念级反馈。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员