一种新颖的地貌分级方法揭示了进化神经网络学习空间关系的能力 (A novel feature-scrambling approach reveals the capacity of convolutional neural networks to learn spatial relations)

Convolutional neural networks (CNNs) are one of the most successful computer vision systems to solve object recognition. Furthermore, CNNs have major applications in understanding the nature of visual representations in the human brain. Yet it remains poorly understood how CNNs actually make their decisions, what the nature of their internal representations is, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans. Here, we develop a novel feature-scrambling approach to explicitly test whether CNNs use the spatial arrangement of features (i.e. object parts) to classify objects. We combine this approach with a systematic manipulation of effective receptive field sizes of CNNs as well as minimal recognizable configurations (MIRCs) analysis. In contrast to much previous literature, we provide evidence that CNNs are in fact capable of using relatively long-range spatial relationships for object classification. Moreover, the extent to which CNNs use spatial relationships depends heavily on the dataset, e.g. texture vs. sketch. In fact, CNNs even use different strategies for different classes within heterogeneous datasets (ImageNet), suggesting CNNs have a continuous spectrum of classification strategies. Finally, we show that CNNs learn the spatial arrangement of features only up to an intermediate level of granularity, which suggests that intermediate rather than global shape features provide the optimal trade-off between sensitivity and specificity in object classification. These results provide novel insights into the nature of CNN representations and the extent to which they rely on the spatial arrangement of features for object classification.

翻译：CNN是解决目标识别的最成功的计算机视觉系统之一。此外,CNN在理解人类大脑中视觉表现的性质方面有着重要的应用。然而,人们仍然不太了解CNN是如何实际做出决定的,其内部表现的性质是什么,以及其认知战略与人类有何不同。具体地说,对于CNN主要依赖物体表面规律的问题,或者它们是否能够利用与人类相似的近距离视觉特征的空间安排。这里,我们开发了一种新的特征拼凑方法,以明确测试CNN是否使用特征(即对象部件)的空间安排来对对象进行分类。我们把这一方法与系统操纵CNN的有效可接受的字段大小以及最低可识别配置分析结合起来。与许多以往文献相比,我们提供了证据,CNN能够利用相对远程的空间关系来进行物体分类。此外,CNN的空间关系在多大程度上严重依赖数据设置,例如,SIM的中间特性(即对象部分)的形状的形状(例如,目标部件)的形状和颜色结构。我们最后显示的是,CNNMLA的层次中,我们使用一个持续的数据分类的层次。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日