In convolutional neural networks, the convolutions are conventionally performed using a square kernel with a fixed N $\times$ N receptive field (RF). However, what matters most to the network is the effective receptive field (ERF) that indicates the extent with which input pixels contribute to an output pixel. Inspired by the property that ERFs typically exhibit a Gaussian distribution, we propose a Gaussian Mask convolutional kernel (GMConv) in this work. Specifically, GMConv utilizes the Gaussian function to generate a concentric symmetry mask that is placed over the kernel to refine the RF. Our GMConv can directly replace the standard convolutions in existing CNNs and can be easily trained end-to-end by standard back-propagation. We evaluate our approach through extensive experiments on image classification and object detection tasks. Over several tasks and standard base models, our approach compares favorably against the standard convolution. For instance, using GMConv for AlexNet and ResNet-50, the top-1 accuracy on ImageNet classification is boosted by 0.98% and 0.85%, respectively.
翻译:在卷积神经网络中,卷积通常使用固定的 $N\times N$ 感受野的正方形卷积核来执行。然而,对于网络来说最重要的是有效感受野(ERF),即指示输入像素对输出像素的影响程度的范围。受到 ER 它常常呈现高斯分布的性质启发,本文提出了一种高斯遮罩 (Gaussian Mask) 卷积核(GMConv)。具体来说,GMConv 利用高斯函数生成一种与内核呈同心对称的掩码来细化感受野。我们的 GMConv 可以直接替换现有 CNN 中的标准卷积,并可以使用标准反向传播轻松进行端到端训练。我们通过对图像分类和物体检测任务的广泛实验来评估我们的方法。在多个任务和基准模型上,我们的方法与标准卷积相比具有较好的表现。例如,在 AlexNet 和 ResNet-50 中使用 GMConv,ImageNet 分类的 top-1 准确率分别提高了 0.98% 和 0.85%。