Despite their many appealing properties, kernel methods are heavily affected by the curse of dimensionality. For instance, in the case of inner product kernels in $\mathbb{R}^d$, the Reproducing Kernel Hilbert Space (RKHS) norm is often very large for functions that depend strongly on a small subset of directions (ridge functions). Correspondingly, such functions are difficult to learn using kernel methods. This observation has motivated the study of generalizations of kernel methods, whereby the RKHS norm -- which is equivalent to a weighted $\ell_2$ norm -- is replaced by a weighted functional $\ell_p$ norm, which we refer to as $\mathcal{F}_p$ norm. Unfortunately, tractability of these approaches is unclear. The kernel trick is not available and minimizing these norms requires to solve an infinite-dimensional convex problem. We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size. Hence, learning with $\mathcal{F}_p$ norms is tractable in these cases. We introduce a proof technique based on uniform concentration in the dual, which can be of broader interest in the study of overparametrized models.
翻译:尽管具有许多吸引力的特性,但内核方法受到维度诅咒的严重影响。例如,在$$\mathb{R ⁇ d$的内产产品内核中,生产Kernel Hilbert空间(RKHS)的规范对于高度依赖一小组方向(脊柱功能)的功能来说往往非常大。相应地,这些功能很难用内核方法来学习。这种观察推动了对内核方法的概括性研究,根据这种研究,RKHS规范(相当于加权$$_2美元标准)被一个加权功能性值$\ell_p$的规范所取代,我们称之为$\ell_p$标准。不幸的是,这些方法的可移动性并不明确。没有这种内核技巧,而将这些规范的最小化要求解决一个无限的内核矩问题。我们研究这些模型的随机特征,并表明,对于$p >1美元来说,为估计原始学习问题所需的随机特征数量,由一个较宽的复合值标准上框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框框