Large-scale pre-trained language models have contributed significantly to natural language processing by demonstrating remarkable abilities as few-shot learners. However, their effectiveness depends mainly on scaling the model parameters and prompt design, hindering their implementation in most real-world applications. This study proposes a novel pluggable, extensible, and efficient approach named DifferentiAble pRompT (DART), which can convert small language models into better few-shot learners without any prompt engineering. The main principle behind this approach involves reformulating potential natural language processing tasks into the task of a pre-trained language model and differentially optimizing the prompt template as well as the target label with backpropagation. Furthermore, the proposed approach can be: (i) Plugged to any pre-trained language models; (ii) Extended to widespread classification tasks. A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance. Code is available in https://github.com/zjunlp/DART.
翻译:大规模预先培训的语言模式通过作为少见的学习者展示出非凡的能力,极大地促进了自然语言处理,但其效力主要取决于模型参数的扩大和迅速设计,这妨碍了在大多数现实世界应用中应用这些参数。本研究报告提出了一种新型的可插入、可推广和高效的方法,名为“差异、可扩展和普及” (DART),它可以将小语言模式转换成更精细的学习者,而无需任何迅速的工程。这一方法的主要原则是将潜在的自然语言处理任务重新定位为预先培训的语言模式的任务,并有区别地优化快速模板和与反向调整的目标标签。此外,拟议的方法可以:(一) 插到任何预先培训的语言模式中;(二) 扩展到广泛的分类任务。对标准国家语言方案任务的全面评价表明,拟议的方法能够取得更精细的绩效。可在https://github.com/zjunlp/DATRT中查阅。