Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end users. The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this paper, we examine how well ChatGPT performs when tasked with answering common questions in a popular software testing curriculum. Our findings indicate that ChatGPT can provide correct or partially correct answers in 55.6% of cases, provide correct or partially correct explanations of answers in 53.0% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct responses. Based on these findings, we discuss the potential promises and perils related to the use of ChatGPT by students and instructors.
翻译:53. 最近,我们看到,基于神经变压器结构的通用目的“大语言模型”已经出现,这些模型已经就包含代码和自然语言的大规模人类书面文本数据集进行了培训。然而,尽管这些模型表现出代表力,但与这些模型的互动历来受到特定任务设置的限制,限制了其普遍适用性。这些限制最近随着“ChatGPT”的引入而克服。ChatGPT是一个由OpenAI创建的语言模型,受过训练,可以作为对话代理人运作,从而能够回答问题并回应来自终端用户的各种命令。“ChatGPT”等模型的引入已经激发了教育工作者的热烈讨论,包括担心学生们会使用这些人工智能工具来绕过学习,对新类型学习机会的兴奋。然而,鉴于这些工具的初现性质,我们目前缺乏与不同教育环境中的热电量表现有关的基本知识,以及我们可能做出的潜在承诺(或危险),在对终端用户的广泛指令中,它们可能会带来某种稳定的答案。“G”在测试过程中,可以很好地解释常规的答案。</s>