Open-ended responses are central to learning, yet automated scoring often conflates what students wrote with how teachers grade. We present an analytics-first framework that separates content signals from rater tendencies, making judgments visible and auditable via analytics. Using de-identified ASSISTments mathematics responses, we model teacher histories as dynamic priors and derive text representations from sentence embeddings, incorporating centering and residualization to mitigate prompt and teacher confounds. Temporally-validated linear models quantify the contributions of each signal, and a projection surfaces model disagreements for qualitative inspection. Results show that teacher priors heavily influence grade predictions; the strongest results arise when priors are combined with content embeddings (AUC~0.815), while content-only models remain above chance but substantially weaker (AUC~0.626). Adjusting for rater effects sharpens the residual content representation, retaining more informative embedding dimensions and revealing cases where semantic evidence supports understanding as opposed to surface-level differences in how students respond. The contribution presents a practical pipeline that transforms embeddings from mere features into learning analytics for reflection, enabling teachers and researchers to examine where grading practices align (or conflict) with evidence of student reasoning and learning.
翻译:开放式回答是学习的核心环节,但自动评分常将学生作答内容与教师评分倾向混为一谈。本文提出一种分析优先的框架,将内容信号与评分者倾向分离,通过分析使评判过程可见且可审计。基于去标识化的ASSISTments数学作答数据,我们将教师历史评分建模为动态先验,并从句子嵌入中提取文本表征,引入中心化与残差化处理以缓解题目与评分者的混杂效应。经时序验证的线性模型量化了各信号的贡献度,并通过投影呈现模型分歧以供质性检验。结果表明:教师先验对成绩预测影响显著;先验与内容嵌入结合时效果最佳(AUC~0.815),而纯内容模型虽优于随机基线但明显较弱(AUC~0.626)。校正评分者效应能锐化残差内容表征,保留更多信息量的嵌入维度,并揭示语义证据支持学生理解(而非表面作答差异)的案例。本成果构建了一个实用流程,将嵌入从单纯特征转化为可反思的学习分析工具,使教师与研究者能审视评分实践在何处与(或偏离)学生推理及学习的证据相一致。