Bilingual terminologies are important machine translation resources in the field of e-commerce, which are usually either manually translated or automatically extracted from parallel data. The human translation is costly and e-commerce parallel corpora is very scarce. However, the comparable data in different languages in the same commodity field is abundant. In this paper, we propose a novel framework of extracting e-commercial bilingual terminologies from comparable data. Benefiting from the cross-lingual pre-training in e-commerce, our framework can make full use of the deep semantic relationship between source-side terminology and target-side sentence to extract corresponding target terminology. Experimental results on various language pairs show that our approaches achieve significantly better performance than various strong baselines.
翻译:双语术语是电子商务领域的重要机器翻译资源,通常不是人工翻译,就是从平行数据中自动提取。人文翻译费用昂贵,电子商务平行公司非常稀少。然而,同一商品领域不同语言的可比数据丰富。在本文中,我们提议建立一个从可比数据中提取电子商业双语术语的新框架。从跨语言电子商务培训前培训中受益,我们的框架可以充分利用源边术语与目标句之间的深层语义关系来提取相应的目标术语。不同语言对口的实验结果显示,我们的做法比各种强的基线要好得多。