Humans are able to robustly categorize images and can, for instance, detect the presence of an animal in a briefly flashed image in as little as 120 ms. Initially inspired by neuroscience, deep-learning algorithms literally bloomed up in the last decade such that the accuracy of machines is at present superior to humans for visual recognition tasks. However, these artificial networks are usually trained and evaluated on very specific tasks, for instance on the 1000 separate categories of ImageNet. In that regard, biological visual systems are more flexible and efficient compared to artificial systems on generic ecological tasks. In order to deepen this comparison, we re-trained the standard VGG Convolutional Neural Network (CNN) on two independent tasks which are ecologically relevant for humans: one task defined as detecting the presence of an animal and the other as detecting the presence of an artifact. We show that retraining the network achieves human-like performance level which is reported in psychophysical tasks. We also compare the accuracy of the detection on an image-by-image basis. This showed in particular that the two models perform better when combining their outputs. Indeed, animals (e.g. lions) tend to be less present in photographs containing artifacts (e.g. buildings). These re-trained models could reproduce some unexpected behavioral observations from humans psychophysics such as the robustness to rotations (e.g. upside-down or slanted image) or to a grayscale transformation.
翻译:人类能够对图像进行稳健的分类,并能够例如,在短短的闪光图像中发现动物的存在,在不到120米的距离内发现。最初受神经科学的启发,深层次学习的算法在过去十年里真正涌现起来,使机器的准确性目前比人类更适合进行视觉识别任务。然而,这些人工网络通常就非常具体的任务,例如1 000个不同的图像网络类别进行训练和评估。在这方面,生物视觉系统比一般生态任务的人工系统更加灵活和高效。为了深化这一比较,我们重新培训标准VGG 革命神经网络(CNN),执行两项与人类生态相关的独立任务:一项被定义为检测动物的存在,另一项被定义为检测文物的存在。我们表明,这些网络的再培训达到与人类相似的性能水平,这是在心理物理任务中报告的。我们还根据图像逐级比较了检测的准确性。这特别表明,两种模型在结合其产出时表现得更好。事实上,动物(例如,灰色的观察,或令人无法预料的图像)往往不那么,(例如,从地球物理学的模型中)包含某种直观的图像。