Understanding the structure of loss landscape of deep neural networks (DNNs)is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/hyperplane of the target DNN with higher degeneracy and preserving the DNN output function. The embedding structure of critical points is independent of loss function and training data, showing a stark difference from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides an explanation for the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, our work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near
翻译:理解深神经网络的损失结构显然很重要。 在这项工作中,我们证明一个嵌入原则,即一个DNN“包含”所有较窄的DNN的所有临界点。更确切地说,我们建议一个关键嵌入点,以便一个较窄的DNN的局部或全球微型点能够嵌入目标DNN的临界点/高空,其降解性较高并保存DNN输出功能。关键点的嵌入结构独立于损失功能和培训数据,显示出与蛋白质折叠等其他非蛋白质问题之间的鲜明区别。我们偶然地发现,一个宽的DNNN常常被从狭小的DNN中嵌入的高度衰减临界点所吸引。 嵌入原则为广的DNNNN一般容易优化提供了解释,并揭示了培训期间潜在的隐含的低兼容性常规化。总体而言,我们的工作为研究DNNN的损失景观及其含义提供了一个骨架,可以预见到一个更准确和全面的理解。