Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a performance-limiting bottleneck. In this paper, we present a performance analysis of state-of-the-art vision transformers on several devices. We propose to reduce the overall memory footprint and memory transfers by clustering the model parameters. We show that by using only 64 clusters to represent model parameters, it is possible to reduce the data transfer from the main memory by more than 4x, achieve up to 22% speedup and 39% energy savings on mobile devices with less than 0.1% accuracy loss.
翻译:变换器提供了很有希望的准确性,并且已经成为受欢迎的工具,并被用于自然语言处理和计算机视觉等各个领域。然而,由于它们拥有大量的模型参数、内存和计算要求,它们不适合资源限制的低功率装置。即使使用高性能和专用装置,内存带宽也可以成为限制性能的瓶颈。在本文件中,我们对若干装置上的最新视觉变换器进行了性能分析。我们提议通过将模型参数组合来减少总记忆足迹和记忆传输。我们表明,通过只使用64个组来代表模型参数,从主内存中传输的数据有可能减少4x以上,实现22%的加速率和39%的节能,而精度损失不到0.1%的移动装置。