In supervised learning using kernel methods, we often encounter a large-scale finite-sum minimization over a reproducing kernel Hilbert space (RKHS). Large-scale finite-sum problems can be solved using efficient variants of Newton method, where the Hessian is approximated via sub-samples of data. In RKHS, however, the dependence of the penalty function to kernel makes standard sub-sampling approaches inapplicable, since the gram matrix is not readily available in a low-rank form. In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton method. Focusing on randomized features for kernel approximation, we provide a novel second-order algorithm that enjoys local superlinear convergence and global linear convergence (with high probability). We derive the theoretical lower bound for the number of random features required for the approximated Hessian to be close to the true Hessian in the norm sense. Our numerical experiments on real-world data verify the efficiency of our method compared to several benchmarks.
翻译:在使用内核方法的监督下学习中,我们经常遇到在复制内核Hilbert空间(RKHS)时出现大规模有限和最小化的情况。大规模有限和问题可以通过牛顿法的有效变体来解决,在牛顿法中,赫森人通过子数据样本相近。然而,在RKHS中,惩罚功能对内核的依赖使得标准子抽样方法不适用,因为克表矩阵不是以低级别形式很容易获得的。在本文中,我们观察到,对于这类问题,我们自然可以使用内核近似来加快牛顿法的速度。我们侧重于内核逼近的随机特征,我们提供了一种新的二级算法,这种算法具有本地超线性趋同和全球线性趋同(可能性很大 ) 。我们从理论上得出了一个较低的圈子,因为近似海珊人需要多少随机特征才能在正常意义上接近真正的海珊人。我们关于现实世界数据的数值实验比几个基准来验证我们的方法的效率。