Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXt-T has similar FLOPs with ResNet-50 but only achieves 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation. It is still unclear how to speed up large-kernel-based CNN models while preserving their performance. To tackle this issue, inspired by Inceptions, we propose to decompose large-kernel depthwise convolution into four parallel branches along channel dimension, i.e. small square kernel, two orthogonal band kernels, and an identity mapping. With this new Inception depthwise convolution, we build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. For instance, InceptionNeXt-T achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can serve as an economical baseline for future architecture design to reduce carbon footprint. Code is available at https://github.com/sail-sg/inceptionnext.
翻译:灵感来自于 ViTs 对长序列建模的能力,最近广泛研究采用大卷积核来扩大感受野、提高模型性能,如 ConvNeXt 在深度 7x7 卷积中采用此方法取得了显著效果。虽然此类 Depthwise Operator 只消耗了很少的 FLOP,但由于高昂的内存访问成本,它在强大的计算设备上大大损害了模型的效率,例如在 A100 GPU 上进行全精度训练时,ConvNeXt-T 的 FLOPs 与 ResNet-50 相似,但只能达到 60% 的吞吐量。虽然减小 ConvNeXt 的 kernel size 可以提高速度,但它会导致显著的性能下降。如何在保持性能的情况下加速基于大卷积核的 CNN 模型仍然不清楚。为了解决这个问题,灵感来自于 Inception,我们提出将大卷积核 Depthwise 卷积沿通道维度分解为四个平行分支,即小方形卷积核、两个正交带状卷积核和一个恒等映射。基于这种新型的 Inception Depthwise 卷积,我们构建了一系列网络,即 InceptionNeXt。它不仅具有高吞吐量,而且保持了竞争性能。例如,InceptionNeXt-T 的训练吞吐量比 ConvNeX-T 高 1.6 倍,并在 ImageNet-1K 上取得了 0.2% 的 top-1 精度改进。我们期待 InceptionNeXt 可以作为未来架构设计的经济基准,以减少碳足迹。代码可在 https://github.com/sail-sg/inceptionnext 找到。