Recent experiments reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of the neural activity space. These disentangled low-dimensional representations are observed in multiple brain areas and across different species, and are typically the result of a process of abstraction that supports simple forms of out-of-distribution generalization. The mechanisms by which such geometries emerge remain poorly understood, and the mechanisms that have been investigated are typically unsupervised (e.g., based on variational auto-encoders). Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the last hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These abstract representations reflect the structure of the desired outputs or the semantics of the input stimuli. To investigate the neural representations that emerge in these networks, we develop an analytical framework that maps the optimization over the network weights into a mean-field problem over the distribution of neural preactivations. Applying this framework to a finite-width ReLU network, we find that its hidden layer exhibits an abstract representation at all global minima of the task objective. We further extend these analyses to two broad families of activation functions and deep feedforward architectures, demonstrating that abstract representations naturally arise in all these scenarios. Together, these results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks, as well as a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.
翻译:近期实验表明,任务相关变量通常被编码在神经活动空间的近似正交子空间中。这种解耦的低维表征在多个脑区和不同物种中均有发现,通常是支持简单形式分布外泛化的抽象化过程的结果。此类几何结构形成的机制尚不明确,现有研究机制多为无监督方法(例如基于变分自编码器)。本文通过数学证明表明:当前馈非线性网络在直接依赖于潜在变量的任务上进行训练时,其最后一个隐藏层必然会出现潜在变量的抽象表征。这些抽象表征反映了期望输出的结构或输入刺激的语义特征。为探究这些网络中涌现的神经表征,我们开发了一个分析框架,将网络权重的优化问题映射为神经预激活分布上的平均场问题。将该框架应用于有限宽度ReLU网络时,我们发现其隐藏层在任务目标的所有全局极小值处均呈现抽象表征。我们进一步将分析扩展到两大激活函数族和深度前馈架构,证明抽象表征在所有场景中都会自然涌现。这些结果共同解释了大脑和人工神经网络中广泛观察到的抽象表征现象,并为理解任务优化、特征学习网络模型中各类表征的涌现提供了数学可处理的工具集。