Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e. proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a kernel model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e. linear models achieve the minimum risk among all models, linear or nonlinear. These results suggest that more complex models for the data other than independent features are needed for high-dimensional analysis.
翻译:对高维现象,如双向下降行为等的实证观测,吸引了人们对理解内核方法等古典技术及其解释神经网络一般特性的影响的极大兴趣。许多最近的工作在某种高维系统中分析了这类模型,因为共变体是独立的,样品的数量和共变体的数量以固定比例增长(即成比例的无线反应)。在这项工作中,我们表明,对于一大批内核,包括完全连接的网络的神经核核内核,内核方法只能在这个系统中发挥和线性模型的作用。更令人惊讶的是,当数据来自一个内核模型,而输入和反应之间的关系可能非常非线性时,我们表明线性模型事实上是最佳的,即线性模型在所有模型中,线性或非线性模型中都达到最低风险。这些结果表明,高维分析需要较复杂的数据模型而不是独立特征。