Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end-users. The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this paper, we examine how well ChatGPT performs when tasked with solving common questions in a popular software testing curriculum. Our findings indicate that ChatGPT can provide correct or partially correct answers in 44% of cases, provide correct or partially correct explanations of answers in 57% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct answers. Based on these findings, we discuss the potential promise, and dangers related to the use of ChatGPT by students and instructors.
翻译:过去十年来,对代码进行预测语言建模已证明是一个宝贵的工具,有利于开发者实现新形式的自动化。最近,我们看到了基于神经变压器结构的“大语言模型”总目标“大语言模型”的出现,这些模型在大量的人类书面文本数据集中经过了涵盖代码和自然语言的培训。然而,尽管这些模型表现出了代表性的力量,但与这些模型互动历来受到特定任务设置的限制,限制了其普遍适用性。最近,由于引入了“热电联网”这一由OpenAI创建的语言模型,并受过培训,可以作为高对话媒介运作,使其能够回答问题并回应来自终端用户的各种指令。采用“热电联网”等模型,已经激发了教育工作者的热烈讨论,包括担心学生们可能使用这些人工智能工具来绕过学习,而与这些模型的互动历来受到限制,但是由于这些工具的新生性质,我们目前缺乏关于它们在不同教育环境中表现得如何的准确的答案的基本知识,以及我们潜在的承诺(或危险),它们可能给最终用户带来大量承诺,从而在常规版本中正确解释我们学习的答案。作为常规版本的答案,这些案例可以提供。