Recent work has improved language models remarkably by equipping them with a non-parametric memory component. However, most existing approaches only introduce memories at testing time, or represent them using a separately trained encoder -- resulting in sub-optimal training of the language model. In this work, we present TRIME, a novel yet simple training approach designed for training language models with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We also present new methods for memory construction and data batching, which are used for adapting to different sets of memories -- local, long-term, and external memory -- at testing time. We evaluate our approach on multiple language modeling and machine translation benchmarks. We find that simply replacing the vanilla language modeling objective by ours greatly reduces the perplexity, without modifying the model architecture or incorporating extra context (e.g., 18.70 $\to$ 17.76 on WikiText-103). We further augment language models with long-range contexts and external knowledge and demonstrate significant gains over previous memory-augmented approaches.
翻译:最近的工作明显改进了语言模式,为这些模式配备了非参数记忆部分。然而,大多数现有方法只是在测试时引入记忆,或使用单独培训的编码器来代表它们 -- -- 结果是对语言模式进行亚于最佳的培训。在这项工作中,我们提出TRIME,这是一个创新而简单的培训方法,旨在用记忆增强来培训语言模式。我们的方法使用的培训目标直接以批次实例作为可获取的记忆。我们还提出了记忆构建和数据分类的新方法,用于在测试时适应不同的记忆 -- -- 当地、长期和外部记忆 -- -- 。我们评估了我们在多种语言建模和机器翻译基准方面的做法。我们发现,只要用我们的方法取代香草语言建模目标,就可以大大减少不易懂之处,同时不改变模型结构或纳入额外的背景(例如,WikitText-103,18.70美元至17.76美元)。我们进一步加强具有远程背景和外部知识的语言模型,并展示了以往记忆缩略式方法取得的重大进展。