高效文件检索本地对长文本的自控 (Local Self-Attention over Long Text for Efficient Document Retrieval)

Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing Transformers over a full sequence of document terms can be prohibitive. A popular strategy involves considering only the first n terms of the document. This can, however, result in a biased system that under retrieves longer documents. In this work, we propose a local self-attention which considers a moving window over the document terms and for each term attends only to other terms in the same window. This local attention incurs a fraction of the compute and memory cost of attention over the whole document. The windowed approach also leads to more compact packing of padded documents in minibatches resulting in additional savings. We also employ a learned saturation function and a two-staged pooling strategy to identify relevant regions of the document. The Transformer-Kernel pooling model with these changes can efficiently elicit relevance information from documents with thousands of tokens. We benchmark our proposed modifications on the document ranking task from the TREC 2019 Deep Learning track and observe significant improvements in retrieval quality as well as increased retrieval of longer documents at moderate increase in compute and memory costs.

翻译：在几个检索基准方面,特别是以变压器为基础的神经网络,取得了显著的绩效改进。当检索的项目是文件时,使用变压器完成整个系列文件术语的时间和记忆成本可能令人望而却步。流行的战略只考虑文件的第一个 n 条件。然而,这可能导致一个有偏颇的系统,在检索较长的文件时,这种系统会造成偏差。在这项工作中,我们提议一个地方自我注意,认为一个移动窗口超过文件条件,每个术语只涉及同一窗口中的其他术语。这种地方注意力占整个文件关注的计算和记忆成本的一小部分。这种视窗办法还导致在微型桶中将添加的文件进行更为紧凑的包装,从而节省更多的费用。我们还采用一个学习的饱和功能和两阶段集战略来确定文件的有关区域。在进行这些修改时,变压器-凯尔的集合模型能够有效地从文件中以数千种物证征到相关的信息。我们提议的对文件的排序工作进行了基准,从TREC 2019 深学习轨道上对文件的排序任务作了调整,并观察在检索质量方面有了显著改进,因为检索成本在适度增加。