We propose contextual convolution (CoConv) for visual recognition. CoConv is a direct replacement of the standard convolution, which is the core component of convolutional neural networks. CoConv is implicitly equipped with the capability of incorporating contextual information while maintaining a similar number of parameters and computational cost compared to the standard convolution. CoConv is inspired by neuroscience studies indicating that (i) neurons, even from the primary visual cortex (V1 area), are involved in detection of contextual cues and that (ii) the activity of a visual neuron can be influenced by the stimuli placed entirely outside of its theoretical receptive field. On the one hand, we integrate CoConv in the widely-used residual networks and show improved recognition performance over baselines on the core tasks and benchmarks for visual recognition, namely image classification on the ImageNet data set and object detection on the MS COCO data set. On the other hand, we introduce CoConv in the generator of a state-of-the-art Generative Adversarial Network, showing improved generative results on CIFAR-10 and CelebA. Our code is available at https://github.com/iduta/coconv.
翻译:我们提议进行背景变迁(Conv),以进行视觉识别。Conv是直接取代标准变迁,这是进化神经网络的核心组成部分。Conv隐含地具备了将背景信息整合的能力,同时与标准变迁相比,保持了类似数量的参数和计算成本。Conv受到神经科学研究的启发,这些研究表明:(一)神经元,即使是初级视觉皮层(V1区)也参与检测背景提示,以及(二)视觉神经元的活动可以受到完全处于理论可接受领域外的直观神经元活动的影响。一方面,我们将Conv纳入广泛使用的残余网络,显示在视觉识别的核心任务和基准基线方面,即图像网络数据集的图像分类和MS COCO数据集的物体探测方面,有了更好的认知性表现。另一方面,我们将Convinv引入州级Genarial Adversarial网络的生成器,显示CIFAR-10和CelibA的改良基因描述结果。我们的代码可在https://github.com/utav/utavconation。