Vision Transformer (ViT) has emerged as a competitive alternative to convolutional neural networks for various computer vision applications. Specifically, ViT multi-head attention layers make it possible to embed information globally across the overall image. Nevertheless, computing and storing such attention matrices incurs a quadratic cost dependency on the number of patches, limiting its achievable efficiency and scalability and prohibiting more extensive real-world ViT applications on resource-constrained devices. Sparse attention has been shown to be a promising direction for improving hardware acceleration efficiency for NLP models. However, a systematic counterpart approach is still missing for accelerating ViT models. To close the above gap, we propose a first-of-its-kind algorithm-hardware codesigned framework, dubbed ViTALiTy, for boosting the inference efficiency of ViTs. Unlike sparsity-based Transformer accelerators for NLP, ViTALiTy unifies both low-rank and sparse components of the attention in ViTs. At the algorithm level, we approximate the dot-product softmax operation via first-order Taylor attention with row-mean centering as the low-rank component to linearize the cost of attention blocks and further boost the accuracy by incorporating a sparsity-based regularization. At the hardware level, we develop a dedicated accelerator to better leverage the resulting workload and pipeline from ViTALiTy's linear Taylor attention which requires the execution of only the low-rank component, to further boost the hardware efficiency. Extensive experiments and ablation studies validate that ViTALiTy offers boosted end-to-end efficiency (e.g., $3\times$ faster and $3\times$ energy-efficient) under comparable accuracy, with respect to the state-of-the-art solution.
翻译:视觉变异器( ViT) 已成为各种计算机视觉应用的进化神经网络的竞争性替代物。 具体地说, ViT 多头关注层使得有可能将信息嵌入全球总体图像中。 然而, 计算和存储这种关注矩阵对补丁数量产生二次成本依赖, 限制其可实现的效率和可缩放性, 并禁止对资源限制装置使用更广泛的真实世界ViT应用程序。 微量关注被证明是提高NLP模型硬件加速效率的一个有希望的方向。 然而, 加速 ViT 模型仍然缺少系统对应方法。 为了缩小以上差距, 我们建议建立一个首级的其型算法硬件硬件代码代码代码签名框架, 以提升 VitalTry的推导价效率。 VitalTyTy 将低价和稀薄的元元元元元元数据元数据元数据整合到更低成本水平, 将低成本水平的递增成本水平的递增成本水平的递增中值工具, 将低成本水平的递增成本水平的递增成本递增到更精度的硬化的硬化。