We asked ChatGPT to participate in an undergraduate computer science exam on ''Algorithms and Data Structures''. The program was evaluated on the entire exam as posed to the students. We hand-copied its answers onto an exam sheet, which was subsequently graded in a blind setup alongside those of 200 participating students. We find that ChatGPT narrowly passed the exam, obtaining 20.5 out of 40 points. This impressive performance indicates that ChatGPT can indeed succeed in challenging tasks like university exams. At the same time, the questions in our exam are structurally similar to those of other exams, solved homework problems, and teaching materials that can be found online and might have been part of ChatGPT's training data. Therefore, it would be inadequate to conclude from this experiment that ChatGPT has any understanding of computer science. We also assess the improvements brought by GPT-4. We find that GPT-4 would have obtained about 17\% more exam points than GPT-3.5, reaching the performance of the average student. The transcripts of our conversations with ChatGPT are available at \url{https://github.com/tml-tuebingen/chatgpt-algorithm-exam}, and the entire graded exam is in the appendix of this paper.
翻译:我们要求ChatGPT参加“算法与数据结构”的本科计算机科学考试。我们在整个考试中要求该程序回答问题,就像让学生一样。我们将其答案手写在试卷上,然后在盲测评的情况下,与200名参加考试的学生的答案一起被评分。我们发现,ChatGPT勉强通过考试,得了20.5分(满分40分)。这种出色的表现表明,ChatGPT确实能够成功完成像大学考试这样具有挑战性的任务。同时,我们考试中的问题在结构上与其他考试、已解决的作业问题和可以在网上找到的教材相似,这些可能是ChatGPT的训练数据的一部分。因此,从这个实验中得出ChatGPT具有任何计算机科学的理解是不合适的。我们还评估了GPT-4带来的改进。我们发现GPT-4比GPT-3.5的成绩高出约17\%,达到了平均学生的成绩。与ChatGPT的对话转录可在\url{https://github.com/tml-tuebingen/chatgpt-algorithm-exam}中查看,整个评分考试附在本文附录中。