Recently, a boom of papers have shown extraordinary progress in few-shot learning with various prompt-based models. Such success can give the impression that prompts help models to learn faster in the same way that humans learn faster when provided with task instructions expressed in natural language. In this study, we experiment with over 30 prompts manually written for natural language inference (NLI). We find that models learn just as fast with many prompts that are intentionally irrelevant or even pathologically misleading as they do with instructively "good" prompts. Additionally, we find that model performance is more dependent on the choice of the LM target words (a.k.a. the "verbalizer" that converts LM vocabulary prediction to class labels) than on the text of the prompt itself. In sum, we find little evidence that suggests existing prompt-based models truly understand the meaning of their given prompts.
翻译:最近,大量论文显示,在与各种快速模型的微小学习中取得了非同寻常的进展。这样的成功可以给人一种印象,促使模型学习速度更快,就像人类在以自然语言提供任务指示时学习速度更快一样。在本研究中,我们实验了30多种人工手写自然语言推理(NLI)的提示。我们发现,模型学习速度和许多有意不相干甚至病理误导的提示一样快。此外,我们发现模型性能更取决于LM目标词的选择(a.k.a.verbalizer, 将LM词汇预测转换为类标签的“verbalizer” ), 而不是提示本身的文字。总而言之,我们发现几乎没有证据表明现有的快速模型真正理解其给定的提示的含义。