Recently, a boom of papers has shown extraordinary progress in zero-shot and few-shot learning with various prompt-based models. It is commonly argued that prompts help models to learn faster in the same way that humans learn faster when provided with task instructions expressed in natural language. In this study, we experiment with over 30 prompt templates manually written for natural language inference (NLI). We find that models learn just as fast with many prompts that are intentionally irrelevant or even pathologically misleading as they do with instructively "good" prompts. Further, such patterns hold even for models as large as 175 billion parameters (Brown et al., 2020) as well as the recently proposed instruction-tuned models which are trained on hundreds of prompts (Sanh et al., 2022). That is, instruction-tuned models often produce good predictions with irrelevant and misleading prompts even at zero shots. In sum, notwithstanding prompt-based models' impressive improvement, we find evidence of serious limitations that question the degree to which such improvement is derived from models understanding task instructions in ways analogous to humans' use of task instructions.
翻译:最近,大量论文显示,在零点零点和零点零点学习方面,各种快速基础模型取得了惊人的进展。通常认为,在以自然语言提供任务指示时,有助于模型学习速度更快,以人类学习速度更快的方式帮助模型学习速度更快。在本研究中,我们实验了30多个人工为自然语言推断(NLI)编写的快速模板。我们发现,模型学习速度与许多有意的“良好”提示一样快,故意不相干,甚至是病理误导。此外,这种模式甚至对高达1 750亿参数的模型(Brown等人,2020年)以及最近提出的以数百个提示(Sanh等人,2022年)为培训的经指导调整的模式也具有同样的学习速度。这就是说,经过指导调整的模式往往产生好的预测,即使零发时也具有不相干和误导的提示。总之,尽管基于迅速模式的改进令人印象深刻,但我们发现有证据表明,从模型理解任务指示与人类使用任务指示相似的方式获得这种改进的程度存在严重限制。