While the relative trade-offs between sparse and distributed representations in deep neural networks (DNNs) are well-studied, less is known about how these trade-offs apply to representations of semantically-meaningful information. Class selectivity, the variability of a unit's responses across data classes or dimensions, is one way of quantifying the sparsity of semantic representations. Given recent evidence showing that class selectivity can impair generalization, we sought to investigate whether it also confers robustness (or vulnerability) to perturbations of input data. We found that mean class selectivity predicts vulnerability to naturalistic corruptions; networks regularized to have lower levels of class selectivity are more robust to corruption, while networks with higher class selectivity are more vulnerable to corruption, as measured using Tiny ImageNetC and CIFAR10C. In contrast, we found that class selectivity increases robustness to multiple types of gradient-based adversarial attacks. To examine this difference, we studied the dimensionality of the change in the representation due to perturbation, finding that decreasing class selectivity increases the dimensionality of this change for both corruption types, but with a notably larger increase for adversarial attacks. These results demonstrate the causal relationship between selectivity and robustness and provide new insights into the mechanisms of this relationship.
翻译:虽然深层神经网络(DNNs)中分散和分散的表达形式之间的相对取舍问题得到了很好的研究,但对于这些取舍如何适用于具有含意的信息的表述却不太了解。 分类选择性,一个单位在数据类别或维度之间反应的变异性,是量化语义表达形式广度的一种方式。 鉴于最近有证据表明,等级选择性会妨碍概括化,我们试图调查它是否还赋予输入数据扭曲性(或脆弱性)以稳健性(或脆弱性)。我们发现,平均等级选择性预测了自然腐败的易感性;类别选择性程度较低的网络对腐败更为活跃,而类别选择性程度较低的网络则更易受腐败的影响,而采用Tiny图像网络和CIFAR10C衡量的更高等级选择性网络则更容易受腐败的影响。 相比之下,我们发现,类选择性会增强多种类型基于梯度的对抗性攻击的稳健性。我们研究了这种差异,我们研究了由于扰动性而造成代表性变化的维度(或脆弱性)的维度。我们发现,由于类别选择性的减少而使腐败类型之间发生这种变化的维度增加,但具有更强烈的竞争性关系则明显地展示了这种因果关系。