We address the structure identification and the uniform approximation of sums of ridge functions $f(x)=\sum_{i=1}^m g_i(a_i\cdot x)$ on ${\mathbb R}^d$, representing a general form of a shallow feed-forward neural network, from a small number of query samples. Higher order differentiation, as used in our constructive approximations, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, second order differentiation and tensors of order two (i.e., matrices) suffice as we prove in this paper. We use two sampling schemes to perform approximate differentiation - active sampling, where the sampling points are universal, actively, and randomly designed, and passive sampling, where sampling points were preselected at random from a distribution with known density. Based on multiple gathered approximated first and second order differentials, our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first perform an active subspace search by approximating the span of the weight vectors $a_1,\dots,a_m$. Then we use a straightforward substitution, which reduces the dimensionality of the problem from $d$ to $m$. The core of the construction is then the stable and efficient approximation of weights expressed in terms of rank-$1$ matrices $a_i \otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program. We prove the successful identification by this program of weight vectors being close to orthonormal and we also show how we can costructively reduce to this case by a whitening procedure, without loss of any generality.
翻译:我们从少量查询样本中处理脊柱功能的结构识别和金额的统一近似值$f(x) ⁇ sum ⁇ i=1 g_i(a_i_i\cdotxx)$$美元(mathbb R ⁇ d$),代表浅质饲料向神经网络的一般形式,来自少量查询样本。在我们建设性的近似中使用的关于脊柱功能或其构成的金额的更高顺序区分,如在更深的神经网络中,在神经网络重量识别和高压产品分解分解识别之间产生一种天然联系。在最浅的饲料-向上神经网络中,第二个顺序差异和顺序二(即矩阵)是我们现在所证明的。我们使用两种取样方法来进行大致的差别划分,即抽样点是通用、积极和随机设计,抽样点是从已知密度的分布中随机选择的。根据多重收集的第一和第二顺序差异,我们的一般近似战略是从一个算算法的重量序列,而我们从一个直径的直径直径的平方程式中进行一个快速搜索,然后用一个直径直径的顺序,我们用一个直径方程式进行一个直径直方程式。