Many state-of-the-art deep learning models for computer vision tasks are based on the transformer architecture. Such models can be computationally expensive and are typically statically set to meet the deployment scenario. However, in real-time applications, the resources available for every inference can vary considerably and be smaller than what state-of-the-art models require. We can use dynamic models to adapt the model execution to meet real-time application resource constraints. While prior dynamic work primarily minimized resource utilization for less complex input images, we adapt vision transformers to meet system dynamic resource constraints, independent of the input image. We find that unlike early transformer models, recent state-of-the-art vision transformers heavily rely on convolution layers. We show that pretrained models are fairly resilient to skipping computation in the convolution and self-attention layers, enabling us to create a low-overhead system for dynamic real-time inference without extra training. Finally, we explore compute organization and memory sizes to find settings to efficiency execute dynamic vision transformers. We find that wider vector sizes produce a better energy-accuracy tradeoff across dynamic configurations despite limiting the granularity of dynamic execution, but scaling accelerator resources for larger models does not significantly improve the latency-area-energy-tradeoffs. Our accelerator saves 20% of execution time and 30% of energy with a 4% drop in accuracy with pretrained SegFormer B2 model in our dynamic inference approach and 57% of execution time for the ResNet-50 backbone with a 4.5% drop in accuracy with the Once-For-All approach.
翻译:计算机愿景任务的许多最先进的深层次学习模型都以变压器结构为基础。这些模型可以计算成本,并且通常固定地设定,以满足部署设想。然而,在实时应用中,每种推算可用的资源都可能有很大差异,比最先进的模型所需要的要小。我们可以使用动态模型来调整模型执行,以适应实时应用资源的限制。虽然先前的动态工作主要将较不复杂的输入图像的资源利用减少到最低程度,但我们可以调整愿景变异器,以满足系统动态资源的限制,而不受输入图像的影响。我们发现,与早期变异器模型不同,最近的最先进的视觉变异器严重依赖变异层。我们发现,在实时应用中,预先训练的模型可以相当灵活地跳过变和自我注意层的计算,使我们能够在不经过额外培训的情况下,为动态的实时变异变变变法执行系统创建低的系统。最后,我们探索可计算模型和记忆大小,以便找到效率的模型,采用动态变变换方法。我们发现,较宽的矢量规模将产生更好的能量更准确性交易,但在动态的变换结构中,我们在动态的变换的变换的变换模型中可以大幅地改进我们的变换模式中,在弹性的变换模型中,而没有大幅的变换的变换的变换的变换的节能模型中,在弹性的节制的节能式的节制的节能模型的节能模式中,我们的节能式的节能式的节制中,我们的节制的节制的节制的节制的节能模型可以大幅改进。</s>