Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient text comparison and retrieval. However, dense encoders require a lot of data and sophisticated techniques to effectively train and suffer in low data situations. This paper finds a key reason is that standard LMs' internal attention structure is not ready-to-use for dense encoders, which needs to aggregate text information into the dense representation. We propose to pre-train towards dense encoder with a novel Transformer architecture, Condenser, where LM prediction CONditions on DENSE Representation. Our experiments show Condenser improves over standard LM by large margins on various text retrieval and similarity tasks.
翻译:受过培训的变换语言模型(LM)已经变成文本代表编码器。 先前的研究对深度LM进行了微调,将文字序列编码化,例如句子和通道,进入单一密度矢量显示器,以便有效地进行文本比较和检索。 但是,密集的编码器需要大量的数据和尖端技术才能有效地培训和在低数据情况下受苦。 本文发现一个关键的原因是,标准的LMS的内部注意结构对于密度大的编码器来说不是现成的,它们需要将文字信息汇总到密集的表示器中。 我们提议用新的变换器结构Condenser对密集的编码器进行预培训,LM预测DENSe代表器。 我们的实验显示,在各种文字检索和类似任务上,Condenser会大大超过标准的LMM。