变换内存作为不同搜索索引 (Transformer Memory as a Differentiable Search Index)

Yi Tay,Vinh Q. Tran,Mostafa Dehghani,Jianmo Ni,Dara Bahri,Harsh Mehta,Zhen Qin,Kai Hui,Zhe Zhao,Jai Gupta,Tal Schuster,William W. Cohen,Donald Metzler

from arxiv, NeurIPS 2022

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying the whole retrieval process. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes. Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup.

翻译：在本文中,我们证明信息检索可以用一个单一的变换器完成,该变换器将所有关于该物质的信息都编码在模型参数中。为此,我们引入了差异搜索索引(DSI),这是一个新模式,学习了文本到文本的模式,将查询直接用文字串串到相关的 docid ;换句话说,DSI模式只直接使用其参数回答问题,大大简化了整个检索过程。我们研究了文件及其识别特征的表达方式、培训程序的变化以及模型和体积大小之间的相互作用。实验表明,如果有适当的设计选择,DSI大大超越了双编码模型等强大的基线。此外,DSI展示了很强的概括能力,在零点设置中完成了BM25基线。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日