Deep neural networks (DNNs) have shown great success in many machine learning tasks. Their training is challenging since the loss surface of the network architecture is generally non-convex, or even non-smooth. How and under what assumptions is guaranteed convergence to a \textit{global} minimum possible? We propose a reformulation of the minimization problem allowing for a new recursive algorithmic framework. By using bounded style assumptions, we prove convergence to an $\varepsilon$-(global) minimum using $\mathcal{\tilde{O}}(1/\varepsilon^3)$ gradient computations. Our theoretical foundation motivates further study, implementation, and optimization of the new algorithmic framework and further investigation of its non-standard bounded style assumptions. This new direction broadens our understanding of why and under what circumstances training of a DNN converges to a global minimum.
翻译:深神经网络(DNNS)在许多机器学习任务中表现出了巨大的成功。 它们的训练具有挑战性, 因为网络结构的流失表面一般是非曲线的, 甚至不是平滑的。 如何以及根据什么假设可以保证与最小的 \ textit{ global} 趋同? 我们建议重新确定最小化问题, 允许新的再现算法框架。 通过使用约束式的假设, 我们证明我们使用 $\ varepsilon$- (Global) 最小值, 使用 $\ mathcal ~ tilde{O ⁇ ( 1/\ varepsilon} 3) 的梯度计算方法, 具有挑战性。 我们的理论基础激励着进一步研究、 实施和优化新的算法框架, 并进一步调查其非标准约束式假设。 这个新方向扩大了我们对于 DNN 培训为什么和在何种情况下会达到全球最低值的理解。