With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for more efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a new faster attention condenser design called double-condensing attention condensers that enable more condensed feature embedding. We further employ a machine-driven design exploration strategy that imposes best practices design constraints for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10X faster than FB-Net C at higher accuracy and speed) while having a small model size (>1.47X smaller than OFA-62 at higher speed and similar accuracy) and strong accuracy (1.1% higher top-1 accuracy than MobileViT XS on ImageNet at higher speed). These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.
翻译:随着人们越来越多地采用深思熟虑的微粒ML应用,人们越来越需要更高效的神经网络主干网,优化其边缘。最近,引入了关注冷凝器网络,导致在精度和速度之间形成高度平衡的低脚、高效率和自省神经网络。在这项研究中,我们引入了一种新的更快的注意冷凝器设计,称为双凝聚式冷凝器,能够让更浓缩的特征嵌入。我们进一步采用了机械驱动的设计探索战略,对提高效率和稳健性以产生宏观-微观骨干结构提出了最佳做法设计限制。由此形成的骨干(我们命名为TENSNXt)在嵌入的ARM处理器上实现了显著更高的节能性、高效的神经网络输入量(比精度和速度高的FB-Net C快10X快),而模型规模小(比OFA-62更小,速度和类似精准)和精确度强的精确度(1.1%的顶层-一级-一级-精准度,比移动VI-X的图像网络中具有希望性的新模型结构在高速度上展示新的图像模型设计。