This technical report presents the application of a recurrent memory to extend the context length of BERT, one of the most effective Transformer-based models in natural language processing. By leveraging the Recurrent Memory Transformer architecture, we have successfully increased the model's effective context length to an unprecedented two million tokens, while maintaining high memory retrieval accuracy. Our method allows for the storage and processing of both local and global information and enables information flow between segments of the input sequence through the use of recurrence. Our experiments demonstrate the effectiveness of our approach, which holds significant potential to enhance long-term dependency handling in natural language understanding and generation tasks as well as enable large-scale context processing for memory-intensive applications.
翻译:本技术报告介绍了使用一种循环记忆来扩展BERT的上下文长度的方法,BERT是自然语言处理中最有效的基于Transformer的模型之一。通过利用循环记忆Transformer架构,我们成功地将该模型的有效上下文长度增加到前所未有的两百万个token,并保持高内存检索准确性。我们的方法允许存储和处理局部和全局信息,并通过使用循环支持输入序列段之间的信息流。我们的实验证明了我们的方法的有效性,这对于长期依赖处理的提高、自然语言理解和生成任务以及将大规模上下文处理应用于内存密集型应用程序具有重要的潜力。