To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of reduction in latency. This mainly stems from inefficiently low floating-point operations per second (FLOPS). To achieve faster networks, we revisit popular operators and demonstrate that such low FLOPS is mainly due to frequent memory access of the operators, especially the depthwise convolution. We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. Building upon our PConv, we further propose FasterNet, a new family of neural networks, which attains substantially higher running speed than others on a wide range of devices, without compromising on accuracy for various vision tasks. For example, on ImageNet-1k, our tiny FasterNet-T0 is $2.8\times$, $3.3\times$, and $2.4\times$ faster than MobileViT-XXS on GPU, CPU, and ARM processors, respectively, while being $2.9\%$ more accurate. Our large FasterNet-L achieves impressive $83.5\%$ top-1 accuracy, on par with the emerging Swin-B, while having $36\%$ higher inference throughput on GPU, as well as saving $37\%$ compute time on CPU. Code is available at \url{https://github.com/JierunChen/FasterNet}.
翻译:为了设计快速的神经网络,很多研究一直致力于减少浮点运算(FLOPs)的数量。然而,我们发现这种FLOPs的减少并不一定会导致类似水平的延迟缩短。这主要是由于低效的每秒浮点运算量(FLOPS)。为了实现更快的网络,我们重新审视了流行的运算符,并展示了这种低FLOPS主要源于运算符的频繁内存访问,尤其是深度卷积。因此,我们提出了一种新颖的部分卷积(PConv),通过同时减少冗余计算和内存访问来更有效地提取空间特征。在此基础上,我们进一步提出了FasterNet,一种新型的神经网络系列,在各种视觉任务上实现了比其他网络更高的运行速度,而不损失精度。例如,在ImageNet-1k上,我们微型的FasterNet-T0比MobileViT-XXS在GPU、CPU和ARM处理器上分别快2.8倍、3.3倍和2.4倍,而精度却高出2.9%。我们的大型FasterNet-L实现了令人印象深刻的83.5%的top-1精度,与新兴的Swin-B持平,同时在GPU上拥有36%的更高推断吞吐量,并节省了CPU的37%的计算时间。代码可在\url{https://github.com/JierunChen/FasterNet}上找到。