We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.
翻译:我们用随机高山初始化的双层神经网络来调查随机地物脊回归(RFR)的特性。我们研究RFR在超分度化的系统中的近正向确定单位长度输入数据矢量的不防患状态,在这种状态下,第一个层的宽度远大于样本大小。我们的分析显示,在培训错误、交叉校验和常规误差中,RFR围绕各自的数值的不防摄性误差为培训错误、交叉校验和一般误差。我们研究RFR的非防患状态行为来自非线性随机特征映射的预期内核。我们随后将KRRR的性能接近于从启动功能的赫米特聚氨基质扩展中获取的聚氨基质矩阵的性能,其程度仅取决于不同数据点的大小。这一多数值决定了RFRRR和KRRR(KRRR)的反射线回归(KRRRR)的反常态行为。我们的结果在不线和KRRRRF学生的低级数据库中,将获得一系列的大规模数据定位,这些结果用于普通的模拟的模拟。</s>