Can a neural network minimizing cross-entropy learn linearly separable data? Despite progress in the theory of deep learning, this question remains unsolved. Here we prove that SGD globally optimizes this learning problem for a two-layer network with Leaky ReLU activations. The learned network can in principle be very complex. However, empirical evidence suggests that it often turns out to be approximately linear. We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary. Finally, we show a condition on the optimization that leads to weight clustering. We provide empirical results that validate our theoretical analysis.
翻译:神经网络能最大限度地减少跨物种的线性分解数据吗?尽管在深层次学习理论方面取得了进展,但这一问题仍未解决。 我们在这里证明SGD在全球优化了使用 Leaky ReLU 激活的双层网络的这一学习问题。 学到的网络原则上可能非常复杂。 但是, 经验证据表明,它往往被证明是大约线性的。 我们通过证明如果网络重量集中到两个重量组, 就会意味着一个大约线性的决定界限, 从而为这一现象提供理论支持。 最后, 我们展示了一个优化导致重量组化的条件。 我们提供了验证我们理论分析的经验结果 。