Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end-users. The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this poster, we examine how well ChatGPT performs when tasked with solving common questions in a popular software testing curriculum. Our findings indicate that ChatGPT can provide correct or partially correct answers in 44% of cases, provide correct or partially correct explanations of answers in 57% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct answers. Based on these findings, we discuss the potential promise, and dangers related to the use of ChatGPT by students and instructors.
翻译:过去十年来,对代码进行预测语言建模已证明是一个宝贵的工具,有利于开发者实现新形式的自动化。最近,我们看到了基于神经变压器结构的“大语言模型”总目标“大语言模型”的出现,这些模型在大量的人类书面文本数据集中经过了涵盖代码和自然语言的培训。然而,尽管这些模型展示出其代表性的力量,但与这些模型互动历来受到特定任务设置的限制,限制了其普遍适用性。最近,由于引入了ChatGPT这一由OpenAI创建并受过训练可以作为更高对话媒介运作的语言模型,从而使其能够回答问题并回应来自终端用户的广泛命令。采用ChatGPT等模型已经激发了教育工作者的热烈讨论,包括担心学生会使用这些人工智能工具来绕过学习,对新类型学习机会的兴奋。然而,鉴于这些工具的初创性质,我们目前缺乏与不同教育环境中学生表现得如何良好,以及潜在承诺(或危险)他们可能给传统版本的答案带来一定的风险。在GPTG测试中进行常规格式的正确解释,这些测试时,可以提供正确的选择。