We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons, where labels are assumed to be generated by a (teacher) neural network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as power series in $k^{-\frac{1}{2}}$. These expressions are then used to derive estimates for several related quantities which imply that not all spurious minima are alike. In particular, we show that while the loss function at certain types of spurious minima decays to zero like $k^{-1}$, in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry, the geometry of group actions, bifurcation, and Artin's implicit function theorem.
翻译:我们认为,将两层ReLU网络与2千元隐藏的神经元装配成双层ReLU网络(其标签假定由(教师)神经网络生成)会产生最优化的问题。我们利用这些模型所展示的丰富的对称性来识别关键点的不同组合,并将之作为电源序列以 $k ⁇ -\frac{1 ⁇ 2 ⁇ 2 ⁇ $表示出来。这些表达式然后用来得出若干相关数量的估计数,这意味着并非所有的假微型微型都是一样的。特别是,我们表明,虽然某些类型的虚假微型衰变的损失功能为零,如$k ⁇ -1}美元,但在其他情况下,损失会汇集到一个严格的正常数。所使用的方法取决于对称性、组动作的几何形状、两组和Artin的隐含的功能。