Multimodal Machine Translation (MMT) enriches the source text with visual information for translation. It has gained popularity in recent years, and several pipelines have been proposed in the same direction. Yet, the task lacks quality datasets to illustrate the contribution of visual modality in the translation systems. In this paper, we propose our system under the team name Volta for the Multimodal Translation Task of WAT 2021 from English to Hindi. We also participate in the textual-only subtask of the same language pair for which we use mBART, a pretrained multilingual sequence-to-sequence model. For multimodal translation, we propose to enhance the textual input by bringing the visual information to a textual domain by extracting object tags from the image. We also explore the robustness of our system by systematically degrading the source text. Finally, we achieve a BLEU score of 44.6 and 51.6 on the test set and challenge set of the multimodal task.
翻译:多式机器翻译(MMT)使源文本丰富了可视化翻译信息,近年来它越来越受欢迎,一些管道也在同一方向上被提出。然而,这项任务缺乏高质量的数据集来说明视觉模式在翻译系统中的贡献。在本文中,我们建议用“Volta”这一团队名称建立我们系统,用于WAT 2021从英语到印地语的多式翻译任务。我们还参与了同一种语言对应的仅有文本的子任务,我们为此使用了MBART, 这是一种经过预先训练的多语种序列到顺序模型。对于多式联运,我们建议通过从图像中提取对象标记,将视觉信息带入文字域,以加强文字输入。我们还通过系统性地降低源文本,探索我们的系统是否健全。最后,我们在多式任务测试集上取得了44.6和51.6的BLEU评分。