Existing works often focus on reducing the architecture redundancy for accelerating image classification but ignore the spatial redundancy of the input image. This paper proposes an efficient image classification pipeline to solve this problem. We first pinpoint task-aware regions over the input image by a lightweight patch proposal network called AnchorNet. We then feed these localized semantic patches with much smaller spatial redundancy into a general classification network. Unlike the popular design of deep CNN, we aim to carefully design the Receptive Field of AnchorNet without intermediate convolutional paddings. This ensures the exact mapping from a high-level spatial location to the specific input image patch. The contribution of each patch is interpretable. Moreover, AnchorNet is compatible with any downstream architecture. Experimental results on ImageNet show that our method outperforms SOTA dynamic inference methods with fewer inference costs. Our code is available at https://github.com/winycg/AnchorNet.
翻译:现有工作通常侧重于减少加速图像分类的架构冗余, 但却忽略了输入图像的空间冗余。 本文建议了高效图像分类管道来解决这个问题。 我们首先通过名为 ANchorNet 的轻量级补丁建议网络在输入图像上定位任务识别区域。 然后我们将这些局部语义补丁以小得多的空间冗余输入一般分类网络。 与深处CNN 的流行设计不同, 我们的目标是仔细设计ANchorNet的受欢迎域, 但没有中间革命版面。 这样可以确保从高空空间位置到特定输入图像补丁的精确映射。 每个补丁的贡献是可以解释的。 此外, ANchorNet 与任何下游结构兼容。 图像网的实验结果显示, 我们的方法比SOTA动态推理法更符合, 其推论成本更低。 我们的代码可以在 https://github. com/winycg/AnchorNet 上查阅 。