Transformer-based rankers have shown state-of-the-art performance. However, their self-attention operation is mostly unable to process long sequences. One of the common approaches to train these rankers is to heuristically select some segments of each document, such as the first segment, as training data. However, these segments may not contain the query-related parts of documents. To address this problem, we propose query-driven segment selection from long documents to build training data. The segment selector provides relevant samples with more accurate labels and non-relevant samples which are harder to be predicted. The experimental results show that the basic BERT-based ranker trained with the proposed segment selector significantly outperforms that trained by the heuristically selected segments, and performs equally to the state-of-the-art model with localized self-attention that can process longer input sequences. Our findings open up new direction to design efficient transformer-based rankers.
翻译:以变换器为基础的排层器显示最先进的性能。 但是,它们的自我注意操作大多无法处理长序列。 培训这些排层器的常见办法之一是,将每个文件中的某些部分(例如第一部分)作为培训数据进行超自然选择,例如第一部分作为培训数据。 但是,这些部分可能并不包含文件中与查询有关的部分。 为了解决这一问题,我们建议从长篇文档中选择由查询驱动的段段,以建立培训数据。 分区选择器为相关样本提供了更准确的标签和非相关样本,而这些样本难以预测。 实验结果显示,以BERT为基础的基本排层组级器受过拟议的部分选取培训,或明显优于由超自然选定部分所培训的外形,其表现与最先进的模型相同,具有本地化的自我意识,可以处理较长的输入序列。 我们的发现为设计高效的变压器排层打开了新的方向。