Low-rank approximation of kernels is a fundamental mathematical problem with widespread algorithmic applications. Often the kernel is restricted to an algebraic variety, e.g., in problems involving sparse or low-rank data. We show that significantly better approximations are obtainable in this setting: the rank required to achieve a given error depends on the variety's dimension rather than the ambient dimension, which is typically much larger. This is true in both high-precision and high-dimensional regimes. Our results are presented for smooth isotropic kernels, the predominant class of kernels used in applications. Our main technical insight is to approximate smooth kernels by polynomial kernels, and leverage two key properties of polynomial kernels that hold when they are restricted to a variety. First, their ranks decrease exponentially in the variety's co-dimension. Second, their maximum values are governed by their values over a small set of points. Together, our results provide a general approach for exploiting (approximate) "algebraic structure" in datasets in order to efficiently solve large-scale data science problems.
翻译:内核的低端近似值是广泛算法应用的一个根本数学问题。 内核通常局限于代数种类, 例如, 涉及数据稀少或低级的问题。 我们显示, 在这种环境下, 可以获得更佳的近似值: 实现给定错误所需的级别取决于多样性的维度, 而不是环境维度, 环境维度通常要大得多。 这在高精度和高维系统都是如此。 我们的结果显示为平滑的异质内核, 它们是应用中占主导地位的内核类。 我们的主要技术洞察是利用多核内核的近似光中核, 并且利用多核内核的两大关键特性, 当它们被限制在不同的区域时 。 首先, 它们的级会急剧下降 。 其次, 它们的最大值受它们一组小点的数值制约 。 我们的结果共同提供了一种一般方法, 来利用( 近似) 多核内核结构中的“ 等核结构 ” 来有效解决大型数据问题 。