We describe the JD Explore Academy's submission of the WMT 2022 shared general translation task. We participated in all high-resource tracks and one medium-resource track, including Chinese-English, German-English, Czech-English, Russian-English, and Japanese-English. We push the limit of our previous work -- bidirectional training for translation by scaling up two main factors, i.e. language pairs and model sizes, namely the \textbf{Vega-MT} system. As for language pairs, we scale the "bidirectional" up to the "multidirectional" settings, covering all participating languages, to exploit the common knowledge across languages, and transfer them to the downstream bilingual tasks. As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4.7 Billion parameters, to fully enhance the model capacity for our Vega-MT. Also, we adopt the data augmentation strategies, e.g. cycle translation for monolingual data, and bidirectional self-training for bilingual and monolingual data, to comprehensively exploit the bilingual and monolingual data. To adapt our Vega-MT to the general domain test set, generalization tuning is designed. Based on the official automatic scores of constrained systems, in terms of the sacreBLEU shown in Figure-1, we got the 1st place on {Zh-En (33.5), En-Zh (49.7), De-En (33.7), En-De (37.8), Cs-En (54.9), En-Cs (41.4) and En-Ru (32.7)}, 2nd place on {Ru-En (45.1) and Ja-En (25.6)}, and 3rd place on {En-Ja(41.5)}, respectively; W.R.T the COMET, we got the 1st place on {Zh-En (45.1), En-Zh (61.7), De-En (58.0), En-De (63.2), Cs-En (74.7), Ru-En (64.9), En-Ru (69.6) and En-Ja (65.1)}, 2nd place on {En-Cs (95.3) and Ja-En (40.6)}, respectively. Models will be released to facilitate the MT community through GitHub and OmniForce Platform.
翻译:我们描述了JD Explore学院提交的WMT 2022共享通用翻译任务。 我们参与了所有高资源轨道和一个中资源轨道,包括中文-英语、德文-英语、捷克英语、俄文-英语和日文-英语。 我们推进了我们先前工作的极限,即双向翻译培训,为此扩大了两个主要因素,即:语言配对和模型大小,即:EDE33 Vega-MT。 至于语言配对,我们将“双向”扩大到涵盖所有参与语言的“多方向”设置,以利用所有参与语言的共同知识,并将它们转移到下游双语任务。 至于模型规模,我们将变换-Big 提升到拥有近4.7亿参数的极大模型(EDi-MT)。 另外,我们采用了数据扩增战略,例如将单语数据周期翻译(Endrouporational),将双语和单语数据双向自我培训,并全面利用双语和单语数据(C-maxal-deal dal dalalalal) 数据在1-deal-deal-deal-deal-deal-de.