Few-shot learning allows pre-trained language models to adapt to downstream tasks while using a limited number of training examples. However, practical applications are limited when all model parameters must be optimized. In this work we apply a new technique for parameter efficient few shot learning while adopting a strict definition of parameter efficiency. Our training method combines 1) intermediate training by reformulating natural language tasks as entailment tasks \cite{wang_entailment_2021} and 2) differentiable optimization of template and label tokens \cite{zhang_differentiable_2021}. We quantify the tradeoff between parameter efficiency and performance in the few-shot regime and propose a simple model agnostic approach that can be extended to any task By achieving competitive performance while only optimizing 3\% of a model's parameters and allowing for batched inference, we allow for more efficient practical deployment of models.
翻译:少见的学习使得培训前的语言模型能够适应下游任务,同时使用有限的培训实例。 但是,在所有模型参数必须优化的情况下,实际应用是有限的。 在这项工作中,我们应用了一种新技术,在对参数效率进行精细的精选学习的同时,采用严格的参数效率定义。我们的培训方法包括:1)通过重新制定自然语言任务作为必然任务而进行的中间培训(cite{Wang_tailment_2021})和2)对模板和标签符号(cite{zhang_ differable_2021})进行不同的优化。我们量化了微粒系统中参数效率和性能之间的平衡,并提出了一个简单的模型不可知性方法,通过实现竞争性业绩,同时只优化模型参数的3 ⁇,并允许分批推推推推,我们可以更有效地实际部署模型。