In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task. Our system is built by leveraging transfer learning across modalities, tasks and languages. First, we leverage general-purpose multilingual modules pretrained with large amounts of unlabelled and labelled data. We further enable knowledge transfer from the text task to the speech task by training two tasks jointly. Finally, our multilingual model is finetuned on speech translation task-specific data to achieve the best translation results. Experimental results show our system outperforms the reported systems, including both end-to-end and cascaded based approaches, by a large margin. In some translation directions, our speech translation results evaluated on the public Multilingual TEDx test set are even comparable with the ones from a strong text-to-text translation system, which uses the oracle speech transcripts as input.
翻译:在本文中,我们描述了我们提交给IWSLT 2021年多语种语言翻译共同任务评价运动的端到端多语种语言翻译系统。我们的系统是通过利用不同模式、任务和语言的转移学习而建立的。首先,我们利用通用多语种模块,这些模块经过大量未加标签和贴标签的数据的预先培训。我们通过联合培训两项任务,进一步从文本任务向演讲任务转移知识。最后,我们的多语种模式对语言翻译任务特定数据进行了微调,以取得最佳翻译结果。实验结果显示我们的系统大大超过了所报告的系统,包括端到端和级联方法。在某些翻译方向上,在公开的多语种TEDx测试组上评价的语音翻译结果甚至与强健的文本到文本翻译系统所评估的结果相仿,后者使用口头语音记录作为投入。