通过多任务学习猴子视觉皮层,实现稳健的愿景 (Towards robust vision by multi-task learning on monkey visual cortex)

Deep neural networks set the state-of-the-art across many tasks in computer vision, but their generalization ability to image distortions is surprisingly fragile. In contrast, the mammalian visual system is robust to a wide range of perturbations. Recent work suggests that this generalization ability can be explained by useful inductive biases encoded in the representations of visual stimuli throughout the visual cortex. Here, we successfully leveraged these inductive biases with a multi-task learning approach: we jointly trained a deep network to perform image classification and to predict neural activity in macaque primary visual cortex (V1). We measured the out-of-distribution generalization abilities of our network by testing its robustness to image distortions. We found that co-training on monkey V1 data leads to increased robustness despite the absence of those distortions during training. Additionally, we showed that our network's robustness is very close to that of an Oracle network where parts of the architecture are directly trained on noisy images. Our results also demonstrated that the network's representations become more brain-like as their robustness improves. Using a novel constrained reconstruction analysis, we investigated what makes our brain-regularized network more robust. We found that our co-trained network is more sensitive to content than noise when compared to a Baseline network that we trained for image classification alone. Using DeepGaze-predicted saliency maps for ImageNet images, we found that our monkey co-trained network tends to be more sensitive to salient regions in a scene, reminiscent of existing theories on the role of V1 in the detection of object borders and bottom-up saliency. Overall, our work expands the promising research avenue of transferring inductive biases from the brain, and provides a novel analysis of the effects of our transfer.

翻译：深心神经网络在计算机视觉的许多任务中设置了最先进的智能,但是它们一般化的图像扭曲能力却令人惊讶地脆弱。相反,哺乳动物视觉系统是强健的,能够引起广泛的扰动。最近的工作表明,这种一般化能力可以用在视觉皮层的视觉刺激中,在视觉皮层的表达中,用有用的感应偏差编码来解释。在这里,我们成功地利用了这些感应偏差,采用了多任务学习方法:我们共同训练了一个深层次的网络来进行图像分类,并预测在显性初级视觉皮层(V1)中的神经活动。我们测量了我们网络在分布上的超出分布性的能力,测试了它是否具有扭曲性。我们发现猴子V1数据的共同训练,尽管在训练过程中没有出现这些扭曲,但我们的网络的坚固性非常强,我们从结构的某些部分直接通过噪音图像来进行直接训练。我们的结果还表明,这个网络的表象在使用深度的深度目标上变得更加像,我们更精准的大脑,我们更精准的深度的图像在使用深度的深度的深度的网络中,我们比常规的网络的网络分析更精细的网络,我们发现我们更精细的网络在进行更精确的深度的深度的网络中发现,我们更精细的深度的深度的深度的深度的网络,我们发现我们的网络的网络的网络在进行更精确的深度的深度的深度的深度的深度的网络,我们发现,我们发现,我们更精确的深度的网络的网络的网络在进行比在进行更精确的分析。