This paper addresses the task of contextual translation using multi-segment models. Specifically we show that increasing model capacity further pushes the limits of this approach and that deeper models are more suited to capture context dependencies. Furthermore, improvements observed with larger models can be transferred to smaller models using knowledge distillation. Our experiments show that this approach achieves competitive performance across several languages and benchmarks, without additional language-specific tuning and task specific architectures.
翻译:本文讨论使用多部分模型进行背景翻译的任务。 具体而言,我们表明,日益增强的模型能力进一步推高了这一方法的局限性,更深的模型更适合捕捉环境依赖性。此外,通过利用知识蒸馏而观察到的较大模型的改进可以转移到较小的模型中。我们的实验表明,这一方法可以实现多种语言和基准的竞争性绩效,而无需额外的针对特定语言的调整和任务的具体结构。