动态演变:利用空间公平加速推论 (Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference)

Modern convolutional neural networks apply the same operations on every pixel in an image. However, not all image regions are equally important. To address this inefficiency, we propose a method to dynamically apply convolutions conditioned on the input image. We introduce a residual block where a small gating branch learns which spatial positions should be evaluated. These discrete gating decisions are trained end-to-end using the Gumbel-Softmax trick, in combination with a sparsity criterion. Our experiments on CIFAR, ImageNet and MPII show that our method has better focus on the region of interest and better accuracy than existing methods, at a lower computational complexity. Moreover, we provide an efficient CUDA implementation of our dynamic convolutions using a gather-scatter approach, achieving a significant improvement in inference speed with MobileNetV2 residual blocks. On human pose estimation, a task that is inherently spatially sparse, the processing speed is increased by 60% with no loss in accuracy.

翻译：现代共生神经网络对图像中的每个像素应用同样的操作。但是, 并非所有图像区域都同等重要。为了解决这种低效率问题, 我们建议了一种方法, 动态应用以输入图像为条件的变异。我们引入了一个剩余块, 使一个小带宽的分支学习了哪些空间位置应该评估。这些分立的导形决定是经过训练的端对端, 使用 Gumbel- Softmax 的把戏, 结合一个宽度标准。我们在 CIFAR、图像网和 MPII 上进行的实验显示, 我们的方法比现有方法更注重感兴趣的区域, 并且准确性更高。此外, 我们提供一种高效的 CUDA, 使用集散射法来实施我们的动态变异, 大大改善与 MobalNetV2 剩余区之间的发酵速度。关于人体表面估计, 一项内在空间稀少的任务, 处理速度增加了60%, 并且没有损失准确性。