As a crucial role in cross-language information retrieval (CLIR), query translation has three main challenges: 1) the adequacy of translation; 2) the lack of in-domain parallel training data; and 3) the requisite of low latency. To this end, existing CLIR systems mainly exploit statistical-based machine translation (SMT) rather than the advanced neural machine translation (NMT), limiting the further improvements on both translation and retrieval quality. In this paper, we investigate how to exploit neural query translation model into CLIR system. Specifically, we propose a novel data augmentation method that extracts query translation pairs according to user clickthrough data, thus to alleviate the problem of domain-adaptation in NMT. Then, we introduce an asynchronous strategy which is able to leverage the advantages of the real-time in SMT and the veracity in NMT. Experimental results reveal that the proposed approach yields better retrieval quality than strong baselines and can be well applied into a real-world CLIR system, i.e. Aliexpress e-Commerce search engine. Readers can examine and test their cases on our website: https://aliexpress.com .
翻译:作为跨语言信息检索(CLIR)中的一个关键作用,查询翻译有三大挑战:(1) 翻译是否充分;(2) 缺乏内部平行培训数据;(3) 低潜值要求。为此,现有的CLIR系统主要利用基于统计的机器翻译而不是先进的神经机翻译,从而限制翻译和检索质量的进一步改进。在本文件中,我们调查如何将神经查询翻译模型应用于CLIR系统。具体地说,我们提议一种新的数据增强方法,根据用户点击数据提取查询翻译对对,从而缓解NMT的域适应问题。然后,我们引入了一种非同步战略,能够利用SMT实时的优势和NMT的真实性。实验结果显示,拟议方法的检索质量比强的基线要好,并且可以应用于现实世界的CLIR系统,即Alicola e-Commerce搜索引擎。阅读者可以在我们的网站上检查和测试他们的案件:https://alimas.comm。