While recent studies have examined the leaning impact of large language model (LLM) in educational contexts, the affective dynamics of LLM-mediated tutoring remain insufficiently understood. This work introduces the first ensemble-LLM framework for large-scale affect sensing in tutoring dialogues, advancing the conversation on responsible pathways for integrating generative AI into education by attending to learners' evolving affective states. To achieve this, we analyzed two semesters' worth of 16,986 conversational turns exchanged between PyTutor, an LLM-powered AI tutor, and 261 undergraduate learners across three U.S. institutions. To investigate learners' emotional experiences, we generate zero-shot affect annotations from three frontier LLMs (Gemini, GPT-4o, Claude), including scalar ratings of valence, arousal, and learning-helpfulness, along with free-text emotion labels. These estimates are fused through rank-weighted intra-model pooling and plurality consensus across models to produce robust emotion profiles. Our analysis shows that during interaction with the AI tutor, students typically report mildly positive affect and moderate arousal. Yet learning is not uniformly smooth: confusion and curiosity are frequent companions to problem solving, and frustration, while less common, still surfaces in ways that can derail progress. Emotional states are short-lived--positive moments last slightly longer than neutral or negative ones, but they are fragile and easily disrupted. Encouragingly, negative emotions often resolve quickly, sometimes rebounding directly into positive states. Neutral moments frequently act as turning points, more often steering students upward than downward, suggesting opportunities for tutors to intervene at precisely these junctures.
翻译:尽管近期研究已考察大型语言模型在教育情境中的学习影响,但LLM介导的辅导过程中的情感动态仍未得到充分理解。本研究首次提出用于辅导对话大规模情感感知的集成LLM框架,通过关注学习者不断演化的情感状态,推进关于将生成式人工智能融入教育的负责任路径的讨论。为此,我们分析了两个学期内16,986个对话轮次,这些对话发生在LLM驱动的AI导师PyTutor与来自美国三所高校的261名本科生之间。为探究学习者的情感体验,我们利用三个前沿LLM(Gemini、GPT-4o、Claude)生成零样本情感标注,包括效价、唤醒度和学习帮助性的标量评分,以及自由文本情感标签。通过秩加权模型内池化和跨模型多数共识融合这些估计值,生成稳健的情感画像。分析表明,在与AI导师互动期间,学生通常呈现轻度积极情感和中等唤醒度。但学习过程并非始终顺畅:困惑与好奇常伴随问题解决过程,而沮丧情绪虽较少出现,仍会以可能阻碍进展的方式显现。情感状态持续时间短暂——积极时刻略长于中性或消极状态,但脆弱且易受干扰。值得鼓舞的是,消极情绪常快速消解,有时直接反弹至积极状态。中性时刻常作为转折点,更多引导学生向上而非向下发展,这提示导师恰可在此类节点实施干预。