Discrete prompts have been used for fine-tuning Pre-trained Language Models for diverse NLP tasks. In particular, automatic methods that generate discrete prompts from a small set of training instances have reported superior performance. However, a closer look at the learnt prompts reveals that they contain noisy and counter-intuitive lexical constructs that would not be encountered in manually-written prompts. This raises an important yet understudied question regarding the robustness of automatically learnt discrete prompts when used in downstream tasks. To address this question, we conduct a systematic study of the robustness of discrete prompts by applying carefully designed perturbations into an application using AutoPrompt and then measure their performance in two Natural Language Inference (NLI) datasets. Our experimental results show that although the discrete prompt-based method remains relatively robust against perturbations to NLI inputs, they are highly sensitive to other types of perturbations such as shuffling and deletion of prompt tokens. Moreover, they generalize poorly across different NLI datasets. We hope our findings will inspire future work on robust discrete prompt learning.
翻译:用于微调培训前语言模型的分解提示器被用于各种国家语言平台任务。 特别是,从一小类培训中产生分解提示器的自动方法表现优异。 但是,仔细研究所学提示器显示,它们含有在手写提示器中不会遇到的吵杂和反直觉的词汇结构。 这就提出了一个重要但研究不足的问题,即当下游任务中使用自动学习的离散提示器时,是否可靠。为了解决这一问题,我们通过将精心设计的扰动器应用到使用Autoprompt的应用程序中,系统研究离散提示器的稳健性,然后用两种自然语言推断器(NLI)数据集衡量其性能。我们的实验结果表明,虽然离散快速法相对于对国家语言平台输入的干扰仍然相对有力,但它们对于其他类型的扰动(如摇动和删除提示符号)非常敏感。此外,我们想对不同的国家语言平台数据集的广度进行系统化研究,我们希望我们的发现将激励未来关于稳健的离散快速学习工作。