Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of task-specific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST). Starting from different pre-trained models (a multilingual ST trained on parallel data or a multilingual BART (mBART) trained on non-parallel multilingual data), we show that adapters can be used to: (a) efficiently specialize ST to specific language pairs with a low extra cost in terms of parameters, and (b) transfer from an automatic speech recognition (ASR) task and an mBART pre-trained model to a multilingual ST task. Experiments show that adapter tuning offer competitive results to full fine-tuning, while being much more parameter-efficient.
翻译:适应器的调适包括冻结一个模型的预先训练参数和在两层之间注射轻量级模块,结果只增加了少量的任务特定培训参数。虽然对多语种神经机翻译的调适器调适进行了调查,但本文件提议对多语种语音翻译的调适器进行综合分析。从不同的培训前模式(受过平行数据培训的多语言ST,或受过非平行多语种数据培训的多语种BART)开始,我们表明可使用适应器:(a) 高效率地将ST专门用于特定语言配对,在参数方面成本较低;(b) 从自动语音识别任务和MBART预培训模式转移到多语言ST任务。实验显示,调适器的调适能为全面微调带来竞争性结果,同时提高参数效率。