Despite the vast empirical success of neural networks, theoretical understanding of the training procedures remains limited, especially in providing performance guarantees of testing performance due to the non-convex nature of the optimization problem. Inspired by a recent work of (Juditsky & Nemirovsky, 2019), instead of using the traditional loss function minimization approach, we reduce the training of the network parameters to another problem with convex structure -- to solve a monotone variational inequality (MVI). The solution to MVI can be found by computationally efficient procedures, and importantly, this leads to performance guarantee of $\ell_2$ and $\ell_{\infty}$ bounds on model recovery accuracy and prediction accuracy under the theoretical setting of training one-layer linear neural network. In addition, we study the use of MVI for training multi-layer neural networks and propose a practical algorithm called \textit{stochastic variational inequality} (SVI), and demonstrates its applicability in training fully-connected neural networks and graph neural networks (GNN) (SVI is completely general and can be used to train other types of neural networks). We demonstrate the competitive or better performance of SVI compared to the stochastic gradient descent (SGD) on both synthetic and real network data prediction tasks regarding various performance metrics.
翻译:尽管神经网络取得了巨大的实证成功,但是对培训程序的理论理解仍然有限,特别是由于优化问题的非软质性质,在提供绩效测试绩效保证方面尤其如此。在(Juditsky & Nemirovsky, 2019年)最近的一项工作(Juditsky & Nemirovsky, 2019年)的启发下,我们没有使用传统的损失函数最小化方法,而是将网络参数的培训减少到了另一个具有螺旋结构的问题 -- -- 以解决单体变异性不平等问题。 MVI的解决方案可以通过计算高效的程序找到。 更重要的是,这导致在培训完全连通的神经网络和直线神经网络(GNN)的绩效保证值为2美元和1美元和1美元。在培训一层线性神经网络的理论设置下,模型恢复准确度和预测准确性。此外,我们研究MVI用于培训多层神经网络的使用情况,并提出一个叫作“textitutit{tochartical Explical plain commet (SVI) 和各种合成系统性变现变现网络的竞争性任务。我们展示了SDISISDRisal Studal-SVI的比较了各种数据。