Despite their many appealing properties, kernel methods are heavily affected by the curse of dimensionality. For instance, in the case of inner product kernels in $\mathbb{R}^d$, the Reproducing Kernel Hilbert Space (RKHS) norm is often very large for functions that depend strongly on a small subset of directions (ridge functions). Correspondingly, such functions are difficult to learn using kernel methods. This observation has motivated the study of generalizations of kernel methods, whereby the RKHS norm -- which is equivalent to a weighted $\ell_2$ norm -- is replaced by a weighted functional $\ell_p$ norm, which we refer to as $\mathcal{F}_p$ norm. Unfortunately, tractability of these approaches is unclear. The kernel trick is not available and minimizing these norms requires to solve an infinite-dimensional convex problem. We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size. Hence, learning with $\mathcal{F}_p$ norms is tractable in these cases. We introduce a proof technique based on uniform concentration in the dual, which can be of broader interest in the study of overparametrized models. For $p= 1$, our guarantees for the random features approximation break down. We prove instead that learning with the $\mathcal{F}_1$ norm is $\mathsf{NP}$-hard under a randomized reduction based on the problem of learning halfspaces with noise.
翻译:尽管它们有许多吸引人的特性, 内核方法仍然受到维度诅咒的严重影响。 例如, 在$\ mathb{R ⁇ {R ⁇ d$的内产产品内核中, 复制 Kernel Hilbert 空间( RKHS) 规范对于高度依赖一小组方向( 脊柱功能) 的函数来说往往非常大。 与此相对, 这些函数很难用内核方法来学习。 我们研究这些模型的随机特征, 并显示, 对于 $p>1 来说, RKHS 标准( 相当于加权 $_ $_ 2 标准) 被一个加权功能性 $\ ell_ p$ 规范所取代, 我们称之为 $\ p$\ p$\ 标准。 不幸的是, 这些方法的可移动性并不明确。 我们研究这些模型的随机特征, 并显示, 用于估计原始学习问题的随机特性的数量, 由 $_\\\\\ pr=$的硬值 标准中, 我们学习一个基于 rentr= crocral 的双倍的精度, 学习。