We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent following an arbitrary initialization. We prove that stochastic gradient descent (SGD) produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Equivalently, such networks can generalize when the data distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. We conduct experiments which suggest that for some distributions our generalization bounds are nearly tight. This is the first result that shows that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise.
翻译:我们认为,在任意初始化后,由随机梯度梯度下降所训练的任意宽度的单层渗漏ReLU网络是一个由单层渗漏的任意宽度网络。 我们证明,随机梯度梯度下降(SGD)产生神经网络,其分类准确性与分布范围最优的半空网络相比具有竞争力,分布范围很广,其中包括对数剖分异色和硬边距分布。 同样,当数据分布线上可以分离,但尽管有过度安装的能力,但被对抗性标签噪音腐蚀时,这种网络也可以笼统化。我们进行实验,表明对于某些分布,我们的一般宽度界限几乎是紧凑的。这是第一个结果表明,SGD训练的超度参数神经网络在数据被对抗性标签噪音腐蚀时,如果数据损坏,SGD训练的超度可概括化神经网络。