As deep learning advances, edge devices and lightweight neural networks are becoming more important. To reduce latency in the AI accelerator, it's essential to not only reduce FLOPs but also enhance hardware performance. We proposed an arithmetic intensity balancing convolution (ABConv) to address the issue of the overall intensity being limited by the small weight arithmetic intensity for convolution with a small spatial size. ABConv increased the maximum bound of overall arithmetic intensity and significantly reduced latency, without sacrificing accuracy. We tested the latency and hardware performance of ABConv on the Arm Ethos-U65 NPU in various configurations and used it to replace some of MobileNetV1 and ResNet50 in image classification for CIFAR100.
翻译:随着深度学习的发展,边缘设备和轻量级神经网络变得越来越重要,为了减少AI加速器的延迟,不仅需要减少FLOPs,还需要增强硬件性能。我们提出了一种算法强度平衡卷积(ABConv)来解决小空间尺寸卷积的小权重算法强度限制整体强度的问题。ABConv增加了整体算法强度的最大界限,并显著减少了延迟,而不会牺牲准确性。我们在不同配置下测试了ABConv在Arm Ethos-U65 NPU上的延迟和硬件性能,并将其用于替换CIFAR100图像分类中MobileNetV1和ResNet50的某些部分。