In the summer of 2020 OpenAI released its GPT-3 autoregressive language model to much fanfare. While the model has shown promise on tasks in several areas, it has not always been clear when the results were cherry-picked or when they were the unvarnished output. We were particularly interested in what benefits GPT-3 could bring to the SemEval 2021 MeasEval task - identifying measurements and their associated attributes in scientific literature. We had already experimented with multi-turn questions answering as a solution to this task. We wanted to see if we could use GPT-3's few-shot learning capabilities to more easily develop a solution that would have better performance than our prior work. Unfortunately, we have not been successful in that effort. This paper discusses the approach we used, challenges we encountered, and results we observed. Some of the problems we encountered were simply due to the state of the art. For example, the limits on the size of the prompt and answer limited the amount of the training signal that could be offered. Others are more fundamental. We are unaware of generative models that excel in retaining factual information. Also, the impact of changes in the prompts is unpredictable, making it hard to reliably improve performance.
翻译:在2020年夏季,OpenAI公司将GPT-3自动递减语言模型公布于2020年夏季,将GPT-3自动递减语言模型大放远方。虽然该模型在几个领域的任务上显示出了希望,但当结果被挑出时,或者当这些结果是未涂饰的产出时,该模型并不总是十分清楚。我们特别感兴趣的是,GPT-3能够给SemEval 2021 MeasEval任务带来什么好处,确定科学文献中的测量标准及其相关属性。我们已经试验了多方向问题,回答作为这项任务的解决方案。我们想看看我们能否利用GPT-3的微小的学习能力来更方便地开发出一种比我们先前工作更好的解决方案。不幸的是,我们在这项工作中并未取得成功。本文讨论了我们使用的方法、我们遇到的挑战和我们观察到的结果。我们遇到的一些问题只是由于艺术状况的原因。例如,对迅速和回答的限度限制了可以提供的培训信号的数量。其他人则更为根本。我们不知道基因化模型在保留事实信息方面会做得更好。此外,迅速变化的影响是难以预测的。