Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozen-model techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches. Indeed, some of our methods even outperform fine-tuning approaches in domains currently dominated by the latter. The computational cost of each method is higher than that of existing frozen model methods, but still negligible relative to a single pass through a huge frozen LM. Each of these methods constitutes a meaningful contribution in its own right, but by presenting these contributions together we aim to convince the reader of a broader message that goes beyond the details of any given method: that frozen models have untapped potential and that fine-tuning is often unnecessary.
翻译:受过精密训练的语言模型(LMS)在各种各样的任务上表现出惊人的良好零射能力,令人惊讶地表现出令人惊讶地好于零射能力。这产生了一个单一的、多功能的模型的吸引力愿景,该模型具有各种不同的应用的功能。然而,目前利用“冻结”LM(即其重量不受影响)的领先技术仍然常常不完善,它们以任务独立的方式改变这些重量。这些模型又受到遗忘和妥协的多功能性,表明业绩和多功能之间的平衡。本文的主要信息是,目前冷冻模型技术,例如快速调试等,只是冰山的一角,而利用冻结LMS(LMS)的更强有力的方法,在挑战性领域可以做同样的微调,而同时又不牺牲基本模型的多功能。为了表明这一点,我们采用了三种新的方法来利用冻结模型:依靠投入的快速调试调,冷冻阅读器和循环LMM(LMS),其中每一种方法都大大改进了当前的冷冻模方法。事实上,我们某些方法甚至超越了自身的精细调方法,而在目前以最接近的方式中,而以最接近于一种最接近的方式,这些方法,这些方法都是一种压的方法是压的方法。