Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained models while only tuning a small number of parameters. They have been shown to be competitive with full model fine-tuning for many downstream tasks. However, prior work indicates that PEFTs may not work as well for machine translation (MT), and there is no comprehensive study showing when PEFTs work for MT. We conduct a comprehensive empirical study of PEFTs for MT, considering (1) various parameter budgets, (2) a diverse set of language-pairs, and (3) different pre-trained models. We find that 'adapters', in which small feed-forward networks are added after every layer, are indeed on par with full model fine-tuning when the parameter budget corresponds to 10% of total model parameters. Nevertheless, as the number of tuned parameters decreases, the performance of PEFTs decreases. The magnitude of this decrease depends on the language pair, with PEFTs particularly struggling for distantly related language-pairs. We find that using PEFTs with a larger pre-trained model outperforms full fine-tuning with a smaller model, and for smaller training data sizes, PEFTs outperform full fine-tuning for the same pre-trained model.
翻译:参数高效微调方法(PEFTs)有望在对少数参数进行调试的同时对大型预先培训的模型进行调整,而只是对少量参数进行调试。事实证明,这些模型具有竞争力,对许多下游任务进行完全的模型微调。然而,先前的工作表明,PEFTs可能无法与机器翻译(MT)一样有效,而且没有全面研究显示PEFTs为MT工作的时间。我们为MT对PEFTs进行全面的经验性研究,考虑到:(1)各种参数预算,(2)一套不同的语言版,(3)不同的预先培训模式。我们发现,“适应者”,其中每个层之后都添加了小型饲料向前网络,当参数预算与全部模型参数参数参数参数参数参数参数参数的10%相匹配时,确实与完全的模型微调完全相同。然而,随着调参数的数量减少,PEFFTs的性能下降幅度取决于语言配对,而PEFTs的幅度取决于远程相关语言面语言板的难度特别大。我们发现,使用较大型的模型前模型超越模型的模型,完全的FTFT型升级的模型,用于更小的完整的模型。