In this paper, we propose a theoretical framework to explain the efficacy of prompt learning in zero/few-shot scenarios. First, we prove that conventional pre-training and fine-tuning paradigm fails in few-shot scenarios due to overfitting the unrepresentative labelled data. We then detail the assumption that prompt learning is more effective because it empowers pre-trained language model that is built upon massive text corpora, as well as domain-related human knowledge to participate more in prediction and thereby reduces the impact of limited label information provided by the small training set. We further hypothesize that language discrepancy can measure the quality of prompting. Comprehensive experiments are performed to verify our assumptions. More remarkably, inspired by the theoretical framework, we propose an annotation-agnostic template selection method based on perplexity, which enables us to ``forecast'' the prompting performance in advance. This approach is especially encouraging because existing work still relies on development set to post-hoc evaluate templates. Experiments show that this method leads to significant prediction benefits compared to state-of-the-art zero-shot methods.
翻译:在本文中,我们提出了一个理论框架来解释零/零发情景中迅速学习的效果。首先,我们证明传统的训练前和微调范式在几发情景中由于过度配置不具有代表性的标签数据而失败。然后我们详细说明了迅速学习更加有效的假设,因为它赋予了以大量文本公司为基础的经过培训的语言模式以更大的文本公司,以及与域有关的人类知识以更多地参与预测,从而减少了小型培训机构提供的有限标签信息的影响。我们进一步假设语言差异能够衡量提示的质量。进行了全面实验以核实我们的假设。更明显的是,在理论框架的启发下,我们提出了一个基于不易懂性的注释-不可知性模板选择方法,使我们能够提前“预先预测”的快速表现。这一方法特别令人鼓舞,因为现有的工作仍然依赖于开发集成后热量评估模板。实验表明,与最先进的零照方法相比,这种方法可以带来重大的预测效益。