Large Language Models (LLMs), such as ChatGPT, are quickly advancing AI to the frontiers of practical consumer use and leading industries to re-evaluate how they allocate resources for content production. Authoring of open educational resources and hint content within adaptive tutoring systems is labor intensive. Should LLMs like ChatGPT produce educational content on par with human-authored content, the implications would be significant for further scaling of computer tutoring system approaches. In this paper, we conduct the first learning gain evaluation of ChatGPT by comparing the efficacy of its hints with hints authored by human tutors with 77 participants across two algebra topic areas, Elementary Algebra and Intermediate Algebra. We find that 70% of hints produced by ChatGPT passed our manual quality checks and that both human and ChatGPT conditions produced positive learning gains. However, gains were only statistically significant for human tutor created hints. Learning gains from human-created hints were substantially and statistically significantly higher than ChatGPT hints in both topic areas, though ChatGPT participants in the Intermediate Algebra experiment were near ceiling and not even with the control at pre-test. We discuss the limitations of our study and suggest several future directions for the field. Problem and hint content used in the experiment is provided for replicability.
翻译:大型语言模型(LLM),如ChatGPT等,正在迅速将AI推广到消费者实际使用的前沿,引导行业重新评价它们如何为内容制作分配资源。在适应性辅导系统中,制定开放式教育资源和提示内容是劳动密集型的。如果像ChatGPT这样的LLMS在教育内容方面与人文版内容相同,那么对进一步推广计算机辅导系统方法的影响将意义重大。在本文件中,我们通过比较人类导师在两个代数主题领域,即初级代数和中级代数领域的77名参与者,对ChatGPT的提示和提示的效果进行了第一次学习增益评估。我们发现,ChatGPT产生的70%的提示都通过了我们的手工质量检查,而人文和恰特普特的条件都产生了积极的学习成果。然而,从人类创造的提示中获得的收获在统计学上大大高于ChatGPT在两个专题领域的提示,尽管ChatGPT在中间代数实验中的参与者接近上限,甚至没有在预数个测试中使用的节中建议未来方向。我们提供的实地研究的限度。