In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their recpeitve field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.
翻译:在标准革命神经网络(CNNs)中,每个层层的人工神经元的可接受领域设计为相同大小。神经科学界众所周知,视觉皮质神经元的可接受领域大小是由刺激调节的,在建造CNN时很少考虑。我们在CNN中提议一个动态选择机制,允许每个神经元根据多个输入信息尺度调整其可接受领域大小。在图像网络和CIFAR基准上,我们从经验上显示SKNet超越了现有以较低模型复杂性为主的状态-艺术结构。详细分析显示,这些分支的神经元/网络的可调整能力是SKNet/SDRS的可调整能力。SKNet的神经元/网络的可调整能力可以以不同的系统/系统格式进行。