In convolutional neural networks, the convolutions are conventionally performed using a square kernel with a fixed N $\times$ N receptive field (RF). However, what matters most to the network is the effective receptive field (ERF) that indicates the extent with which input pixels contribute to an output pixel. Inspired by the property that ERFs typically exhibit a Gaussian distribution, we propose a Gaussian Mask convolutional kernel (GMConv) in this work. Specifically, GMConv utilizes the Gaussian function to generate a concentric symmetry mask that is placed over the kernel to refine the RF. Our GMConv can directly replace the standard convolutions in existing CNNs and can be easily trained end-to-end by standard back-propagation. We evaluate our approach through extensive experiments on image classification and object detection tasks. Over several tasks and standard base models, our approach compares favorably against the standard convolution. For instance, using GMConv for AlexNet and ResNet-50, the top-1 accuracy on ImageNet classification is boosted by 0.98% and 0.85%, respectively.
翻译:在卷积神经网络中,卷积通常使用固定N $\times$ N感受野(RF)的正方形卷积核进行。但是,对于网络最重要的是有效感受野(ERF),该有效感受野指示输入像素对输出像素的贡献程度。受有效感受野通常呈高斯分布的属性的启发,我们在本研究中提出了Guassian Mask卷积核(GMConv)。具体而言,GMConv利用高斯函数生成一个放置在核上的同心对称掩模,以改善RF。我们的GMConv可以直接替换现有CNN中的标准卷积,并且可以通过标准反向传播轻松训练端到端。我们通过在图像分类和对象检测任务上进行大量实验来评估我们的方法。在几个任务和标准基础模型上,我们的方法与标准卷积相比明显更好。例如,使用GMConv进行AlexNet和ResNet-50,ImageNet分类的前1个准确性分别提高了0.98%和0.85%。