This study investigates small-scale pretraining for Small Language Models (SLMs) to enable efficient use of limited data and compute, improve accessibility in low-resource settings and reduce costs. To enhance long-context extrapolation in compact models, we focus on Infini-attention, which builds a compressed memory from past segments while preserving local attention. In our work, we conduct an empirical study using 300M-parameter LLaMA models pretrained with Infini-attention. The model demonstrates training stability and outperforms the baseline in long-context retrieval. We identify the balance factor as a key part of the model performance, and we found that retrieval accuracy drops with repeated memory compressions over long sequences. Even so, Infini-attention still effectively compensates for the SLM's limited parameters. Particularly, despite performance degradation at a 16,384-token context, the Infini-attention model achieves up to 31% higher accuracy than the baseline. Our findings suggest that achieving robust long-context capability in SLMs benefits from architectural memory like Infini-attention.
翻译:本研究探讨了面向小规模语言模型(SLMs)的小规模预训练方法,旨在实现对有限数据和计算资源的高效利用,提升低资源环境下的可访问性并降低成本。为增强紧凑模型的长上下文外推能力,我们聚焦于Infini-attention机制,该机制在保留局部注意力的同时,从过往片段构建压缩记忆。在本工作中,我们使用配备Infini-attention、参数量为3亿的LLaMA模型进行了实证研究。该模型展现出训练稳定性,并在长上下文检索任务中表现优于基线模型。我们识别出平衡因子是影响模型性能的关键因素,并发现随着长序列上记忆压缩的重复进行,检索准确率会下降。尽管如此,Infini-attention仍能有效弥补SLM参数有限的不足。特别是在16,384词元的上下文长度下,尽管性能有所下降,Infini-attention模型的准确率仍比基线模型最高提升31%。我们的研究结果表明,在小规模语言模型中实现鲁棒的长上下文能力,得益于如Infini-attention这类架构化记忆机制。