Translating e-commercial product descriptions, a.k.a product-oriented machine translation (PMT), is essential to serve e-shoppers all over the world. However, due to the domain specialty, the PMT task is more challenging than traditional machine translation problems. Firstly, there are many specialized jargons in the product description, which are ambiguous to translate without the product image. Secondly, product descriptions are related to the image in more complicated ways than standard image descriptions, involving various visual aspects such as objects, shapes, colors or even subjective styles. Moreover, existing PMT datasets are small in scale to support the research. In this paper, we first construct a large-scale bilingual product description dataset called Fashion-MMT, which contains over 114k noisy and 40k manually cleaned description translations with multiple product images. To effectively learn semantic alignments among product images and bilingual texts in translation, we design a unified product-oriented cross-modal cross-lingual model (\upoc~) for pre-training and fine-tuning. Experiments on the Fashion-MMT and Multi30k datasets show that our model significantly outperforms the state-of-the-art models even pre-trained on the same dataset. It is also shown to benefit more from large-scale noisy data to improve the translation quality. We will release the dataset and codes at https://github.com/syuqings/Fashion-MMT.
翻译:翻译电子商业产品描述,即 a.k.a 产品导向机器翻译(PMT),对于为世界各地的电子直升机提供服务至关重要。然而,由于域域专长,PMT任务比传统机器翻译问题更具挑战性。首先,产品描述中有许多专门化的术语,这些术语含混不清,无需产品图像即可翻译。第二,产品描述与图像描述有关,其方式比标准图像描述更为复杂,涉及各种视觉方面,如对象、形状、颜色甚至主观风格。此外,现有的PMT数据集规模小,不足以支持研究。在本文件中,我们首先建造了一个名为“时装-MMMMMTT”的大型双语产品描述数据集,该数据集包含超过114公里的吵闹和40公里人工清洁的描述翻译,并包含多种产品图像。要有效地学习产品图像和双语文本之间的语义调整,我们设计了一个统一的面向产品的跨式跨式跨语言模式(\upoco),用于预培训和微调。在Fashion-MMT和MUT-30k数据翻译模型上进行实验。我们所展示的数据质量模型的模型将大大改进到大比例数据转换。