Linear classifier probes are frequently utilized to better understand how neural networks function. Researchers have approached the problem of determining unit importance in neural networks by probing their learned, internal representations. Linear classifier probes identify highly selective units as the most important for network function. Whether or not a network actually relies on high selectivity units can be tested by removing them from the network using ablation. Surprisingly, when highly selective units are ablated they only produce small performance deficits, and even then only in some cases. In spite of the absence of ablation effects for selective neurons, linear decoding methods can be effectively used to interpret network function, leaving their effectiveness a mystery. To falsify the exclusive role of selectivity in network function and resolve this contradiction, we systematically ablate groups of units in subregions of activation space. Here, we find a weak relationship between neurons identified by probes and those identified by ablation. More specifically, we find that an interaction between selectivity and the average activity of the unit better predicts ablation performance deficits for groups of units in AlexNet, VGG16, MobileNetV2, and ResNet101. Linear decoders are likely somewhat effective because they overlap with those units that are causally important for network function. Interpretability methods could be improved by focusing on causally important units.
翻译:经常利用线条分类器探测器来更好地了解神经网络是如何运作的。研究人员已经着手解决确定神经网络中的单位重要性的问题,方法是通过测试他们所学的、内部的表示;线条分类器探测器发现高度选择性的单位是网络功能中最重要的单位。网络是否真的依靠高选择性单位来测试,是否通过使用除法将它们从网络中移出来,从而可以实际测试高选择性单位。令人惊讶的是,当高度选择性的单位被稀释时,它们只能产生小的性能缺陷,有时甚至只有这样。尽管对选择性神经系统没有消化效应,但线形解码方法可以有效地用于解释网络功能,从而使其有效性成为谜。为了将选择性在网络功能中的独家作用和解决这一矛盾,我们在激活空间的次区域中系统化了多组单位。在这里,我们发现通过探测器识别的神经元与那些被烧掉的神经元之间存在薄弱的关系。更具体地说,我们发现选择性与该单位平均活动之间的相互作用可以更好地预测它们在AlexNet、VGG16、MiveNet2、以及ResNet101等单位的单位的性性性工作表现缺陷缺陷缺陷方面的缺陷缺陷,因为可能通过直径网络的重的重力重力重的网络。