In laboratory object recognition tasks based on undistorted photographs, both adult humans and Deep Neural Networks (DNNs) perform close to ceiling. Unlike adults', whose object recognition performance is robust against a wide range of image distortions, DNNs trained on standard ImageNet (1.3M images) perform poorly on distorted images. However, the last two years have seen impressive gains in DNN distortion robustness, predominantly achieved through ever-increasing large-scale datasets$\unicode{x2014}$orders of magnitude larger than ImageNet. While this simple brute-force approach is very effective in achieving human-level robustness in DNNs, it raises the question of whether human robustness, too, is simply due to extensive experience with (distorted) visual input during childhood and beyond. Here we investigate this question by comparing the core object recognition performance of 146 children (aged 4$\unicode{x2013}$15) against adults and against DNNs. We find, first, that already 4$\unicode{x2013}$6 year-olds showed remarkable robustness to image distortions and outperform DNNs trained on ImageNet. Second, we estimated the number of $\unicode{x201C}$images$\unicode{x201D}$ children have been exposed to during their lifetime. Compared to various DNNs, children's high robustness requires relatively little data. Third, when recognizing objects children$\unicode{x2014}$like adults but unlike DNNs$\unicode{x2014}$rely heavily on shape but not on texture cues. Together our results suggest that the remarkable robustness to distortions emerges early in the developmental trajectory of human object recognition and is unlikely the result of a mere accumulation of experience with distorted visual input. Even though current DNNs match human performance regarding robustness they seem to rely on different and more data-hungry strategies to do so.
翻译:在基于未扭曲照片的实验室目标识别任务中,成年人和深神经网络(DNN)都近乎于上限。与成年人不同,他们的目标识别性在图像扭曲方面表现强劲,在标准图像网络(1.3M图像)上培训的DNN在扭曲图像上表现不佳。但在过去两年中,DN的扭曲性能取得了令人印象深刻的进展,主要是通过不断增长的大规模数据集$\uncode{x2014}在规模大于图像网络的更大数量级。虽然这种简单的布鲁特力方法非常有效地实现了DNNN的人类目标。虽然这种简单目标识别性对于DNNN的强度目标非常有效。但是,它提出了这样一个问题,即与成年人相比,在标准图像网络中(扭曲的)图像网络(1.3MM)图像输入的广度是否也很强。我们通过比较146名儿童(4美元=uncode{x=15美元)的核心目标识别性能和DNNNNW的强度。我们发现,在20年的图像数据中显示相当的强的坚固度数据, 也显示我们没有超硬性数据。