将单级知识运用到神经跨链接信息检索的混合关注变换器 (Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval)

Pretrained contextualized representations offer great success for many downstream tasks, including document ranking. The multilingual versions of such pretrained representations provide a possibility of jointly learning many languages with the same model. Although it is expected to gain big with such joint training, in the case of cross lingual information retrieval (CLIR), the models under a multilingual setting are not achieving the same level of performance as those under a monolingual setting. We hypothesize that the performance drop is due to the translation gap between query and documents. In the monolingual retrieval task, because of the same lexical inputs, it is easier for model to identify the query terms that occurred in documents. However, in the multilingual pretrained models that the words in different languages are projected into the same hyperspace, the model tends to translate query terms into related terms, i.e., terms that appear in a similar context, in addition to or sometimes rather than synonyms in the target language. This property is creating difficulties for the model to connect terms that cooccur in both query and document. To address this issue, we propose a novel Mixed Attention Transformer (MAT) that incorporates external word level knowledge, such as a dictionary or translation table. We design a sandwich like architecture to embed MAT into the recent transformer based deep neural models. By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence. Experimental results demonstrate the effectiveness of the external knowledge and the significant improvement of MAT embedded neural reranking model on CLIR task.

翻译：未经事先培训的背景表示方式为许多下游任务提供了巨大的成功,包括文件排名。这种未经培训的表示方式的多语种版本为共同学习使用同一模式的多种语言提供了可能性。虽然在跨语言信息检索(CLIR)的情况下,通过这种联合培训预期会大增,但多语种环境中的模型不会达到与单一语言环境中的模型相同的性能水平。我们假想性能下降是由于查询和文件之间的翻译差距造成的。在单语种检索任务中,由于相同的词汇投入,这种未经培训的表示方式更容易确定文件中出现的查询术语。然而,在经过多语言预先培训的模型中,不同语言的文字被预测到相同的超空间,这种模型往往将查询术语转换为相关术语,也就是说,在类似的背景下,除了或有时而不是在单一语言环境中出现的同义语系。这种特性使得模型难以连接在查询和文件中都存在的术语。为了解决这一问题,我们建议一种新型混合关注变换模式(MAT),通过将外部文字水平知识转化为基于深度翻译的翻译结构,例如将Cliveralal oral oral oral oral oral mal laveal laction laction sal dable) laveal lax the mal lax the smod smod laveal lavelticlemental laveal lax the lax lax lax laveal lap lad sal lad sal lax ladal lad skintal lad ladal lavedal laved skindal ladal lad ladal lax ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal laddaldaldal ladal ladal ladal laddal ladal ladal ladal ladal ladal ladaldaldaldal ladaldaldaldal ladal ladal ladal