We describe the convex semi-infinite dual of the two-layer vector-output ReLU neural network training problem. This semi-infinite dual admits a finite dimensional representation, but its support is over a convex set which is difficult to characterize. In particular, we demonstrate that the non-convex neural network training problem is equivalent to a finite-dimensional convex copositive program. Our work is the first to identify this strong connection between the global optima of neural networks and those of copositive programs. We thus demonstrate how neural networks implicitly attempt to solve copositive programs via semi-nonnegative matrix factorization, and draw key insights from this formulation. We describe the first algorithms for provably finding the global minimum of the vector output neural network training problem, which are polynomial in the number of samples for a fixed data rank, yet exponential in the dimension. However, in the case of convolutional architectures, the computational complexity is exponential in only the filter size and polynomial in all other parameters. We describe the circumstances in which we can find the global optimum of this neural network training problem exactly with soft-thresholded SVD, and provide a copositive relaxation which is guaranteed to be exact for certain classes of problems, and which corresponds with the solution of Stochastic Gradient Descent in practice.
翻译:我们描述两层矢量输出 ReLU 神经网络培训问题的二次二次曲线半无限。 这个半无限期双向双向双向双向的双向神经网络培训问题 。 这个半永久双向双向承认一个有限的维度表示, 但它的支持是一个难以定性的 convex 组合组合。 特别是, 我们证明非cavex 神经网络培训问题相当于一个有限维度共振共振共振方案。 我们的工作是首先确定神经网络与共振程序的全球选择之间的这种紧密联系。 因此, 我们展示了神经网络如何隐含地试图通过半同步矩阵化解决共振方案, 并从这一配置中获取关键洞察力。 我们描述了为找到矢量输出神经网络培训全球最低限问题的第一种算法, 这些算法在固定数据级的样本数量中是多元的,但层面是指数化的。 然而,在革命结构中, 计算复杂性仅以过滤器大小和多度程序为指数。 我们描述了在所有其他参数中可以找到一个软度、 最优化的神经网络解决方案在软性阶段中可以找到的软性、 和精确的同步的规律问题。