Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize -- and subsequently show -- that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. We quantify the diversity in the neural activations through information-theoretic measures and find support for our hypothesis on experiments spanning several natural language and vision tasks. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabeled in-distribution examples. Lastly, we demonstrate the utility of our findings for the problem of model selection. The associated code and other resources for this work are available at https://linktr.ee/InformationMeasures .
翻译:已知的神经网络是利用假的文物(或捷径),这些文物与目标标签同时出现,展示了超光化记忆。另一方面,网络被显示为记忆培训实例,从而产生范例级的记忆化。这些记忆化模式阻碍网络的普及,超出其培训分布范围。检测这种记忆化可能具有挑战性,往往要求研究人员整理定制的测试组。在这项工作中,我们假设 -- -- 并随后显示 -- -- 不同神经元激活模式的多样性反映了模型的概括和记忆化。我们通过信息理论措施量化神经激活的多样性,并找到支持我们关于涉及多种自然语言和视觉任务的实验的假设。重要的是,我们发现信息组织指向两种记忆化形式,甚至指向在分配中未标注的示例中计算出的神经活化。最后,我们展示了我们发现的结果对于模型选择问题的效用。在https://linktr.ee/Information上可以找到这项工作的相关代码和其他资源。