We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We first prove that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set, where simple solutions are encouraged via its convex geometrical properties. We then leverage this characterization to show that an optimal set of parameters yield linear spline interpolation for regression problems involving one dimensional or rank-one data. We also characterize the classification decision regions in terms of a kernel matrix and minimum $\ell_1$-norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain predictions of finite width networks. Our convex geometric characterization also provides intuitive explanations of hidden neurons as auto-encoders. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints. Then, we apply certain convex relaxations and introduce a cutting-plane algorithm to globally optimize the network. We further analyze the exactness of the relaxations to provide conditions for the convergence to a global optimum. Our analysis also shows that optimal network parameters can be also characterized as interpretable closed-form formulas in some practically relevant special cases.
翻译:我们开发了分析两层ReLU网络的精密宽度分析分析方法。 我们首先证明常规训练问题的最佳解决方案可以被描述为 convex 组合的极端点, 可以通过其等离子几何特性鼓励简单的解决方案。 然后我们利用这一特征来显示, 一套最佳参数可以产生一个维或一级数据的回归问题的线性样插图。 我们还用内核矩阵和最小$\ell_1$1美元- 诺姆解决方案来描述分类决策区域。 这与无法解释有限宽度网络预测的 Neural Tangnel 相比。 我们的 convex 几何地测量特征还可以提供隐藏神经作为自动编码的直观解释。 在更高层面, 我们显示, 训练问题可以被描绘成一个有无限限制的有限维线性锥体问题。 然后, 我们应用某些convex 放松和切换机算法来优化全球网络。 我们进一步分析这些变迁的准确性, 以提供实际封闭的公式的精确性条件, 也能够显示一个具有最佳模型特征的分析。