Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity. However, the extra latency and memory costs introduced by this approach may make it unacceptable for efficiency-constrained applications. It has recently been shown for bilingual translation that using a deep encoder and shallow decoder (DESD) can reduce inference latency while maintaining translation quality, so we study similar speed-accuracy trade-offs for multilingual translation. We find that for many-to-one translation we can indeed increase decoder speed without sacrificing quality using this approach, but for one-to-many translation, shallow decoders cause a clear quality drop. To ameliorate this drop, we propose a deep encoder with multiple shallow decoders (DEMSD) where each shallow decoder is responsible for a disjoint subset of target languages. Specifically, the DEMSD model with 2-layer decoders is able to obtain a 1.8x speedup on average compared to a standard transformer model with no drop in translation quality.
翻译:最近在多语种翻译工作中,利用更深的变压器模型,提高了能力,提高了翻译质量,超过了双语基线,多语种翻译质量,超过了双语基线。然而,这一方法带来的额外延迟和记忆成本可能使效率受限制的应用无法接受。最近为双语翻译显示,使用深的编码器和浅解码器(DESD)可以降低推断延迟度,同时保持翻译质量,因此我们研究多语种翻译的类似速度-准确性取舍。我们发现,对于许多一对一的翻译来说,我们确实可以提高解码速度,而不会以这种方法牺牲质量,但是对于一对一的翻译来说,浅解码器会造成明显质量下降。为了改善这一下降,我们建议用多层浅解码器(DEDCD)来深度编码器,让每个浅解码器负责一组目标语言脱钩。具体地说,配有两层解码器的DEDCD模型能够平均获得1.8x速度,而标准变换器模型的翻译质量没有下降。