Building a sketch of an n-by-n empirical kernel matrix is a common approach to accelerate the computation of many kernel methods. In this paper, we propose a unified framework of constructing sketching methods in kernel ridge regression (KRR), which views the sketching matrix S as an accumulation of m rescaled sub-sampling matrices with independent columns. Our framework incorporates two commonly used sketching methods, sub-sampling sketches (known as the Nystr\"om method) and sub-Gaussian sketches, as special cases with m=1 and m=infinity respectively. Under the new framework, we provide a unified error analysis of sketching approximation and show that our accumulation scheme improves the low accuracy of sub-sampling sketches when certain incoherence characteristic is high, and accelerates the more accurate but computationally heavier sub-Gaussian sketches. By optimally choosing the number m of accumulations, we show that a best trade-off between computational efficiency and statistical accuracy can be achieved. In practice, the sketching method can be as efficiently implemented as the sub-sampling sketches, as only minor extra matrix additions are needed. Our empirical evaluations also demonstrate that the proposed method may attain the accuracy close to sub-Gaussian sketches, while is as efficient as sub-sampling-based sketches.
翻译:在本文中,我们提出了一个在内核脊回归(KRR)中构建草图方法的统一框架,将草图矩阵S视为以独立柱子进行重新缩放的子抽样矩阵的累积。我们的框架包含两种常用的草图方法,即副抽样草图(称为Nystr\'om法)和亚加西草图,分别作为 m=1和 m=infinity的特例。在新框架下,我们提供了对草图近似的统一错误分析,并表明我们的累积计划在某些相近特征高时,可提高次抽样草图的低准确性,并加速了更准确但计算更重的亚加萨尼草图。通过最佳选择累积量 m,我们表明在计算效率和统计准确性之间可以实现最佳的交换。在新框架下,草图方法可以高效地实施,作为子草图的一部分,我们提出的细图也只能作为子模型的精确性评估。