Artificial neural networks are functions depending on a finite number of parameters typically encoded as weights and biases. The identification of the parameters of the network from finite samples of input-output pairs is often referred to as the \emph{teacher-student model}, and this model has represented a popular framework for understanding training and generalization. Even if the problem is NP-complete in the worst case, a rapidly growing literature -- after adding suitable distributional assumptions -- has established finite sample identification of two-layer networks with a number of neurons $m=\mathcal O(D)$, $D$ being the input dimension. For the range $D<m<D^2$ the problem becomes harder, and truly little is known for networks parametrized by biases as well. This paper fills the gap by providing constructive methods and theoretical guarantees of finite sample identification for such wider shallow networks with biases. Our approach is based on a two-step pipeline: first, we recover the direction of the weights, by exploiting second order information; next, we identify the signs by suitable algebraic evaluations, and we recover the biases by empirical risk minimization via gradient descent. Numerical results demonstrate the effectiveness of our approach.
翻译:人工神经网络的功能取决于数量有限的参数,这些参数通常以重量和偏差的形式进行编码。从投入-产出对子的有限样本中确定网络的参数常常被称为“ emph{教师-学生模型 ”,而这一模型代表了一个理解培训和概括化的流行框架。即使问题在最坏的情况下是NP-完整的,一个迅速增长的文献 -- -- 在添加适当的分配假设之后 -- -- 已经建立了具有若干神经元的两层网络的有限样本识别,其中的重量为$m ⁇ mathcal O(D)$,美元是投入层面。对于 $D<m<D$2$的范围,问题变得更加困难,对于被偏差所覆盖的网络来说,确实鲜为人所知。本文填补了这一空白,它提供了建设性的方法和理论保证,为这种范围更广的偏差的浅网络确定了有限的样本识别。我们的方法基于两步管道:第一,我们通过利用第二顺序信息恢复重量的方向;第二步,我们通过适当的代数位评估找到迹象,我们通过适当的代数的代数率评估,我们通过实验风险梯度来恢复了我们的偏差。