The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in Random Matrix Theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.
翻译:过去几年来,深神经网络的损失面一直是若干研究、理论和实验研究的主题,其中一项工作考虑了高维随机功能的复杂性,从局部opima的意义上讲,高维随机功能是为了了解在如此复杂的环境下当地优化方法如何发挥作用。Choromanska等人(2015年)先前的工作在深多层透视网络和球形多面玻璃模型的培训损失面之间建立了直接联系,这些培训面是在网络及其数据的一些非常强烈的假设下建立的。在这项工作中,我们通过取消对RELU激活功能的不可取限制来检验这一方法的有效性。在这样做的时候,我们用随机矩阵理论中的超对称方法绘制了一条通过旋转玻璃复杂性计算的新路径,这在其它情况下可能证明是有用的。我们的结果为旋转玻璃模型在这方面的长处和短处提供了新的线索。