Recent work has improved language models (LMs) remarkably by equipping them with a non-parametric memory component. However, most existing approaches only introduce mem-ories at testing time or represent them using a separately trained encoder, resulting in suboptimal training of the language model. In this work, we present TRIME, a novel yet simple training approach designed for training LMs with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We also present new methods for memory construction and data batching, which are used for adapting to different sets of memories--local, long-term, and external memory--at testing time. We evaluate TRIME on multiple language modeling and machine translation benchmarks and show that it is able to achieve significant improvements across all the settings. Concretely, TRIME reduces the perplexity from 18.70 to 15.37 on WIKITEXT-103, by effectively leveraging a large memory set from the training corpus. Compared to standard LM training, TRIME adds negligible computational overhead and is compatible with different neural architectures, making it a versatile solution for training memory-augmented LMs.
翻译:最近的工作明显改进了语言模型,为这些模型配备了非参数内存部分,从而显著改进了语言模型(LMS),然而,大多数现有方法只是在测试时引入内聚物,或使用单独培训的编码器代表它们,从而导致语言模型的培训不尽善。在这项工作中,我们介绍了TRIME,这是一个创新的、简单的培训方法,旨在用记忆扩增来培训LMS。我们的方法使用的培训目标直接将内存示例作为可获取的记忆。我们还介绍了记忆构建和数据分类的新方法,用于适应不同系列的记忆-局部、长期和外部记忆-测试时间。我们用多种语言模型和机器翻译基准对TRIME进行评估,并表明它能够在所有环境中实现重大改进。具体地说,TRIME通过有效地利用从培训堆中获取的大型记忆集集,将TRIME从18.70到15.37的混淆从WIKITEXT-103上减少。与标准的LM培训相比,TRME增加了可忽略的计算间接费用,并且与不同的神经结构相容不小,使它成为培训的一个可变式的解决方案。