ChatGPT is a groundbreaking ``chatbot"--an AI interface built on a large language model that was trained on an enormous corpus of human text to emulate human conversation. Beyond its ability to converse in a plausible way, it has attracted attention for its ability to competently answer questions from the bar exam and from MBA coursework, and to provide useful assistance in writing computer code. These apparent abilities have prompted discussion of ChatGPT as both a threat to the integrity of higher education and conversely as a powerful teaching tool. In this work we present a preliminary analysis of how two versions of ChatGPT (ChatGPT3.5 and ChatGPT4) fare in the field of first-semester university physics, using a modified version of the Force Concept Inventory (FCI) to assess whether it can give correct responses to conceptual physics questions about kinematics and Newtonian dynamics. We demonstrate that, by some measures, ChatGPT3.5 can match or exceed the median performance of a university student who has completed one semester of college physics, though its performance is notably uneven and the results are nuanced. By these same measures, we find that ChatGPT4's performance is approaching the point of being indistinguishable from that of an expert physicist when it comes to introductory mechanics topics. After the completion of our work we became aware of Ref [1], which preceded us to publication and which completes an extensive analysis of the abilities of ChatGPT3.5 in a physics class, including a different modified version of the FCI. We view this work as confirming that portion of their results, and extending the analysis to ChatGPT4, which shows rapid and notable improvement in most, but not all respects.
翻译:ChatGPT是一款开创性的“聊天机器人”,是基于大型语言模型构建的AI接口,该模型通过对人类文本的大量训练可以模拟人类对话。除了它可以以一种合理的方式对话之外,它还因其能够在法律和MBA课程的问题中提供有用的答案和在编写计算机代码方面提供有用的帮助而受到关注。这些显著的能力引发了对ChatGPT作为高等教育完整性的威胁和作为强大教学工具的讨论。在这项工作中,我们提出了ChatGPT的两个版本(ChatGPT3.5和ChatGPT4)在第一学期大学物理学中表现如何,使用修改版的力概念库(FCI)来评估是否能够正确回答有关运动学和牛顿动力学的概念物理问题。我们证明,从某些方面来看,ChatGPT3.5可以匹配或超过完成一学期的大学物理课程的学生的中位数表现,尽管其表现明显不均衡,结果也是微妙的。通过这些同样的措施,我们发现ChatGPT4的表现接近于初学力学主题的物理专家,已经达到了无法区分的程度。在我们完成工作之后,我们注意到Ref [1],它在我们之前发表了广泛的分析结果,包括FCI的不同修改版本在物理课程中评估ChatGPT3.5的能力。我们认为这项工作确认了他们成果的一部分,并将分析扩展到ChatGPT4,显示出在大多数方面的快速和显著改进,但并非所有方面。