This paper describes the joint submission of Alibaba and Soochow University, TSMind, to the WMT 2022 Shared Task on Translation Suggestion (TS). We participate in the English-German and English-Chinese tasks. Basically, we utilize the model paradigm fine-tuning on the downstream tasks based on large-scale pre-trained models, which has recently achieved great success. We choose FAIR's WMT19 English-German news translation system and MBART50 for English-Chinese as our pre-trained models. Considering the task's condition of limited use of training data, we follow the data augmentation strategies proposed by WeTS to boost our TS model performance. The difference is that we further involve the dual conditional cross-entropy model and GPT-2 language model to filter augmented data. The leader board finally shows that our submissions are ranked first in three of four language directions in the Naive TS task of the WMT22 Translation Suggestion task.
翻译:本文介绍了阿里巴巴和索乔大学(TSMind)向WMT 2022翻译建议共同任务联合提交的材料。我们参与了英语-德语和英语-中国的任务。基本上,我们利用基于大规模预先培训模式的对下游任务的示范模式的微调,这种模式最近取得了巨大成功。我们选择FAIR WMT19英语-德语新闻翻译系统和MBART50英语-中国语新闻翻译系统作为我们预先培训的模式。考虑到培训数据使用有限这一任务的条件,我们遵循了WMTS提出的数据扩充战略,以提升我们的TS模型性能。区别在于我们进一步采用双重条件跨渗透模型和GPT-2语言模型来过滤增强的数据。领导委员会最后表明,我们提交的材料在WMT22翻译建议任务的Nive TS任务中,在四种语言方向中排第三位。