We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^*$ and the input distribution, are preserved by the dynamics. We then study more specific cases. When $f^*$ is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When $f^*$ has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.
翻译:我们考虑对无限宽的两层ReLU神经网络的人口风险的梯度流动的理想设置(不带偏向),并研究对等对等对已学参数和预测器的影响。我们首先描述一整类的对称,如果目标函数($f+$)和输入分布得到满足,这些对称会因动态而得以保留。我们接着研究更具体的案例。当美元是奇数时,我们显示预测器的动态会降低到(非线性参数化的)线性预测器的动态,其指数趋同可以保证。当美元具有低维结构时,我们证明梯度流PDE会降低到低维PDE。此外,我们提出了非正式和数字的论据,表明输入神经元与问题的低维结构相一致。