Kernel methods are used frequently in various applications of machine learning. For large-scale high dimensional applications, the success of kernel methods hinges on the ability to operate certain large dense kernel matrix K. An enormous amount of literature has been devoted to the study of symmetric positive semi-definite (SPSD) kernels, where Nystrom methods compute a low-rank approximation to the kernel matrix via choosing landmark points. In this paper, we study the Nystrom method for approximating both symmetric indefinite kernel matrices as well SPSD ones. We first develop a theoretical framework for general symmetric kernel matrices, which provides a theoretical guidance for the selection of landmark points. We then leverage discrepancy theory to propose the anchor net method for computing accurate Nystrom approximations with optimal complexity. The anchor net method operates entirely on the dataset without requiring the access to $K$ or its matrix-vector product. Results on various types of kernels (both indefinite and SPSD ones) and machine learning datasets demonstrate that the new method achieves better accuracy and stability with lower computational cost compared to the state-of-the-art Nystrom methods.
翻译:在机器学习的各种应用中,经常使用内核方法。对于大型高维应用,内核方法的成功取决于操作某些大型密集内核矩阵的能力。大量文献都用于研究正正正半确定性内核(SPSD),Nystrom方法通过选择里程碑点对内核矩阵进行低端近似值计算。在本文件中,我们研究了尼斯特朗方法,以接近对称的无线内核矩阵和SPSD的平衡。我们首先为一般对称内核矩阵开发了理论框架,为选择里程碑点提供了理论指导。然后我们利用差异理论提出固定网法,以最佳复杂度计算准确的Nystrom近似值。锚网方法完全在数据集上运行,不需要使用$或其矩阵-摄制产品。各种内核(包括不定期和SPSD)和机器学习数据集的结果显示,新方法的准确性和稳定性更高,比国家计算成本低。