Some neurons in deep networks specialize in recognizing highly specific perceptual, structural, or semantic features of inputs. In computer vision, techniques exist for identifying neurons that respond to individual concept categories like colors, textures, and object classes. But these techniques are limited in scope, labeling only a small subset of neurons and behaviors in any network. Is a richer characterization of neuron-level computation possible? We introduce a procedure (called MILAN, for mutual-information-guided linguistic annotation of neurons) that automatically labels neurons with open-ended, compositional, natural language descriptions. Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active. MILAN produces fine-grained descriptions that capture categorical, relational, and logical structure in learned features. These descriptions obtain high agreement with human-generated feature descriptions across a diverse set of model architectures and tasks, and can aid in understanding and controlling learned models. We highlight three applications of natural language neuron descriptions. First, we use MILAN for analysis, characterizing the distribution and importance of neurons selective for attribute, category, and relational information in vision models. Second, we use MILAN for auditing, surfacing neurons sensitive to human faces in datasets designed to obscure them. Finally, we use MILAN for editing, improving robustness in an image classifier by deleting neurons sensitive to text features spuriously correlated with class labels.
翻译:深度网络中的某些神经元在深度网络中专门识别高度具体的神经神经感知、结构或语义特征。 在计算机视觉中,存在识别神经元的技术,这些神经元符合不同概念类别,如颜色、质谱和对象类。但这些技术范围有限,仅标注一个神经元和行为小子子集,在任何网络中标注任何神经元和行为。对神经元水平的计算,是否可能作更丰富的特征描述?我们引入一个程序(称为MILAN,用于对神经元进行相互信息引导的语言说明),自动标注神经元,并配有开放式、构成性、自然语言描述。鉴于神经元,MILAN通过搜索自然语言字符串来生成一个分类,与神经元活跃的图像区域尽可能地共享点信息。MILAN制作了精细的描述,这些描述与人类生成的特征描述在一系列不同的模型和任务中取得了高度一致,并且有助于理解和控制学习过的模型。我们突出了三种应用的自然语言神经系特征描述。首先,我们用MILAN的自然语言特性来优化的分类, 用于分析、 将神经级关系中的精细度的精确度的直观分析,最后的图像分配,我们使用了人类直观的直观分析,我们用来用于对智能结构的直观的直观分析, 的直观分析。