采用迭代非先进方法的神经培训研究 (A Study of Neural Training with Iterative Non-Gradient Methods)

from arxiv, 34 pages. In version 4, we have given experimental demonstration of the 2 main neural training algorithms presented in this paper in sections 3 and 4

In this work, we demonstrate provable guarantees on the training of depth-$2$ neural networks in new regimes than previously explored. (1) First we give a simple stochastic algorithm that can train a $\rm ReLU$ gate in the realizable setting in linear time while using significantly milder conditions on the data distribution than previous results. Leveraging some additional distributional assumptions we also show approximate recovery of the true label generating parameters when training a $\rm ReLU$ gate while a probabilistic adversary is allowed to corrupt the true labels of the training data. Our guarantee on recovering the true weight degrades gracefully with increasing probability of attack and it's nearly optimal in the worst case. Additionally, our analysis allows for mini-batching and computes how the convergence time scales with the mini-batch size. (2) Secondly, we focus on the question of provable interpolation of arbitrary data by finitely large neural nets. We exhibit a non-gradient iterative algorithm "${\rm Neuro{-}Tron}$" which gives a first-of-its-kind poly-time approximate solving of a neural regression (here in the $\ell_\infty$-norm) problem at finite net widths and for non-realizable data.

翻译：在这项工作中,我们展示了在新制度下培训深度-$2美元神经网络的可靠保证。 (1) 首先,我们给出了一个简单的随机算法,可以在可实现的线性环境下在线性环境下培训一个$rm ReLU$门,同时在数据分布上使用比以前的结果要温和得多得多的条件。利用一些额外的分配假设,我们还展示了在培训一个$rm ReLU$门时真实标签生成参数的大致恢复情况,同时允许一个概率性对手腐蚀培训数据的真实标签。我们对于恢复真实重量的保证随着攻击的概率增加而优雅地降低,而且在最坏的情况下,它几乎是最佳的。此外,我们的分析还允许进行微型比对时间尺度与微型批量大小的趋同的拼凑,并比较如何与微型批量大小的趋同。 (2) 第二,我们侧重于通过有限的大型神经网网对任意数据进行可变相调的问题。我们展示了一种不易变的迭代算法“$ym Neuro{-tron}” 。我们关于恢复真实重量值的保证会随着攻击的概率增加攻击概率概率概率概率概率概率概率概率降低,而降降降降降为最接近近近近乎于一个正数的折折压数据。