We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Equivalently, such networks can generalize when the data distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. To the best of our knowledge, this is the first work to show that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise.
翻译:我们认为,在任意初始化后,由随机梯度梯度下降(SGD)训练的任意宽度网络是一个隐藏层漏泄的ReLU网络。 我们证明,SGD产生神经网络,其分类准确性与分布范围最广的半空相比具有竞争力,分布范围包括日志相近的异向分布和硬边分布。 同样,当数据分布线性分离,但充斥对抗性标签噪音时,这种网络也可以普遍化。 据我们所知,这是SGD所培训的超分神经网络在数据被对抗性标签噪音腐蚀时,可以普遍化。