神经网络损失景观中的平坦通道通向无穷远 (Flat Channels to Infinity in Neural Loss Landscapes)

The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slowly, while the output weights of at least two neurons, $a_i$ and $a_j$, diverge to $\pm$infinity, and their input weight vectors, $\mathbf{w_i}$ and $\mathbf{w_j}$, become equal to each other. At convergence, the two neurons implement a gated linear unit: $a_iσ(\mathbf{w_i} \cdot \mathbf{x}) + a_jσ(\mathbf{w_j} \cdot \mathbf{x}) \rightarrow σ(\mathbf{w} \cdot \mathbf{x}) + (\mathbf{v} \cdot \mathbf{x}) σ'(\mathbf{w} \cdot \mathbf{x})$. Geometrically, these channels to infinity are asymptotically parallel to symmetry-induced lines of critical points. Gradient flow solvers, and related optimization methods like SGD or ADAM, reach the channels with high probability in diverse regression settings, but without careful inspection they look like flat local minima with finite parameter values. Our characterization provides a comprehensive picture of these quasi-flat regions in terms of gradient dynamics, geometry, and functional interpretation. The emergence of gated linear units at the end of the channels highlights a surprising aspect of the computational capabilities of fully connected layers.

翻译：神经网络的损失景观包含极小值和鞍点，它们可能通过平坦区域相连或孤立存在。我们识别并刻画了损失景观中的一种特殊结构：沿着这些通道，损失下降极其缓慢，同时至少两个神经元的输出权重 $a_i$ 和 $a_j$ 发散至 $\pm$ 无穷大，而它们的输入权重向量 $\mathbf{w_i}$ 和 $\mathbf{w_j}$ 变得彼此相等。在收敛时，这两个神经元实现了一个门控线性单元：$a_iσ(\mathbf{w_i} \cdot \mathbf{x}) + a_jσ(\mathbf{w_j} \cdot \mathbf{x}) \rightarrow σ(\mathbf{w} \cdot \mathbf{x}) + (\mathbf{v} \cdot \mathbf{x}) σ'(\mathbf{w} \cdot \mathbf{x})$。从几何上看，这些通向无穷远的通道渐近平行于对称性诱导的临界点线。梯度流求解器及相关优化方法（如 SGD 或 ADAM）在多种回归设置中以高概率到达这些通道，但若不仔细检查，它们看起来像是具有有限参数值的平坦局部极小值。我们的刻画从梯度动力学、几何和功能解释的角度，为这些准平坦区域提供了全面的图景。通道末端门控线性单元的出现，突显了全连接层计算能力的一个令人惊讶的方面。