Large language models (LLMs) have achieved state-of-the-art performance in natural language processing; however, their high computational cost remains a major bottleneck. In this study, we target computational efficiency by focusing on a matrix multiplication free language model (MatMul-free LM) and further reducing the training cost through an architecture inspired by reservoir computing. Specifically, we partially fix and share the weights of selected layers in the MatMul-free LM and insert reservoir layers to obtain rich dynamic representations without additional training overhead. Additionally, several operations are combined to reduce memory accesses. Experimental results show that the proposed architecture reduces the number of parameters by up to 19%, training time by 9.9%, and inference time by 8.0%, while maintaining comparable performance to the baseline model.
翻译:大型语言模型(LLMs)在自然语言处理领域已取得最先进的性能,但其高昂的计算成本仍是主要瓶颈。本研究以提高计算效率为目标,聚焦于免矩阵乘法语言模型(MatMul-free LM),并通过受储层计算启发的架构进一步降低训练成本。具体而言,我们部分固定并共享MatMul-free LM中选定层的权重,并插入储层层以在不增加训练开销的情况下获得丰富的动态表示。此外,通过组合多种操作来减少内存访问。实验结果表明,所提出的架构在保持与基线模型相当性能的同时,将参数量减少高达19%,训练时间降低9.9%,推理时间降低8.0%。