In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time. We develop a joint learning mechanism that trains the augmentation component with latent labels derived from the end retrieval task, paired with hard negatives from the memory mixture. We instantiate the model in a zero-shot dense retrieval setting by augmenting a strong T5-based retriever with MoMA. Our model, MoMA, obtains strong zero-shot retrieval accuracy on the eighteen tasks included in the standard BEIR benchmark. It outperforms systems that seek generalization from increased model parameters and computation steps. Our analysis further illustrates the necessity of augmenting with mixture-of-memory for robust generalization, the benefits of augmentation learning, and how MoMA utilizes the plug-in memory at inference time without changing its parameters. We plan to open source our code.
翻译:在本文中,我们通过Mixture-of-Memory Agressmentation(MoMA)改进语言模型的零光概括能力,Mixture-of-Memory Agentation(MOMA)是一个从多个信息公司(“外部记忆”)检索增强文件的机制,在推论时间选择“插入”新的内存。我们开发了一个联合学习机制,用来自最终检索任务的潜在标签来训练扩增部分,同时配有来自记忆混合物的硬性负值。我们通过增强一个以T5为基础的强力检索器,在零光密的检索环境中对模型进行即时回调。我们的模型,MoMA,在BEIR标准基准中包含的十八项任务上获得了强烈的零速检索精准度。它优于从增加的模型参数和计算步骤中求出“插入”的系统。我们的分析进一步说明,有必要用混合的模拟来进行扩增扩增,增强学习的好处,以及MoMA如何在不改变参数的情况下在推断时利用插存储器的记忆。我们计划打开我们的代码。