Pre-trained transformers has declared its success in many NLP tasks. One thread of work focuses on training bi-encoder models (i.e., dense retrievers) to effectively encode sentences or passages into single-vector dense vectors for efficient approximate nearest neighbor (ANN) search. However, recent work has demonstrated that transformers pre-trained with mask language modeling (MLM) are not capable of effectively aggregating text information into a single dense vector due to task-mismatch between pre-training and fine-tuning. Therefore, computationally expensive techniques have been adopted to train dense retrievers, such as large batch size, knowledge distillation or post pre-training. In this work, we present a simple approach to effectively aggregate textual representation from the pre-trained transformer into a dense vector. Extensive experiments show that our approach improves the robustness of the single-vector approach under both in-domain and zero-shot evaluations without any computationally expensive training techniques. Our work demonstrates that MLM pre-trained transformers can be used to effectively encode text information into a single-vector for dense retrieval. Code are available at: https://github.com/castorini/dhr
翻译:培训前的变压器已经在许多NLP任务中宣布成功。 一项工作重点是培训双电解码模型(即密集的检索器),将句子或通道有效编码成单矢量,以高效近邻搜索(ANN),然而,最近的工作表明,经过蒙面语言模型(MLM)预先训练的变压器无法有效地将文本信息整合成单一密度矢量,因为培训前和微调之间的任务拼凑。因此,已经采用了计算成本昂贵的技术来培训密集的检索器,如大批量、知识蒸馏或培训前后。在这项工作中,我们提出了一个简单的方法,将预培训变压器的文本代表有效综合到密度矢量矢量矢量。 广泛的实验表明,我们的方法在不使用任何计算成本昂贵的培训技术的情况下改进了在内和零光评价下的单一矢量器方法的稳健性。 我们的工作表明,MLM预先训练过的变压器可以有效地将文本信息编成一个单位数源数,用于密集检索。 可在 http:// https/ comcast: