Greenexts:通过低 Rank 接近提高变异模型的计算和记忆效率 (Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation)

In this thesis, we introduce Greenformers, a collection of model efficiency methods to improve the model efficiency of the recently renowned transformer models with a low-rank approximation approach. The development trend of deep learning models tends to results in a more complex and larger model. Although it leads to a better and more accurate prediction, the resulting model becomes even more costly, as it requires weeks of training with a huge amount of GPU resources. Particularly, the size and computational cost of transformer-based models have increased tremendously since its first debut in 2017 from ~100 million parameters up to ~1.6 trillion parameters in early 2021. This computationally hungry model also incurs a substantial cost to the environment and even reaches an alarming level of carbon footprint. Some of these models are so massive that it is even impossible to run the model without a GPU cluster. Greenformers improve the model efficiency of transformer models by applying low-rank approximation approaches. Specifically, we propose a low-rank factorization approach to improve the efficiency of the transformer model called Low-Rank Transformer. We further compare our model with an existing low-rank factorization approach called Linformer. Based on our analysis, the Low-Rank Transformer model is suitable for improving both the time and memory efficiency in processing short-sequence (<= 512) input data, while the Linformer model is suitable for improving the efficiency in processing long-sequence input data (>= 512). We also show that Low-Rank Transformer is more suitable for on-device deployment, as it significantly reduces the model size. Additionally, we estimate that applying LRT to the existing BERT-base model can significantly reduce the computational, economical, and environmental costs for developing such models by more than 30% of its original costs.

翻译：在此论文中,我们引入了Greenexers, 这是一套模型效率方法的集合, 以提高最近知名变压器模型的模型效率。深层次学习模型的发展趋势往往导致更复杂和更大的模型。尽管它导致更完善和更准确的预测, 由此产生的模型成本甚至更高, 因为它需要用大量GPU资源进行数周的培训。特别是, 以变压器为基础的模型的规模和计算成本自2017年首次推出以来大大增加, 从~一亿个参数升至2021年初的~1.6万亿参数。这种计算饥饿模型也给环境带来巨大成本,甚至达到惊人的碳足迹水平。虽然这些模型中有些非常庞大, 甚至无法在没有 GPU 集群的情况下运行模型。 Greenexers通过采用低级别更接近方法提高变压器模型的模型效率。具体地说, 我们提出一种低等级的模型化模型化方法, 来提高变压器模型的效率, 降低到2021年初的~ 6万个参数值。我们进一步将模型与现有的低层次变压模型进行比较, 低的变压器的变压器的计算方法叫做Linkerevorder- dislational- dislational dislevorder pal dal lax lakedaldal dal lax