Selecting diverse and important items, called landmarks, from a large set is a problem of interest in machine learning. As a specific example, in order to deal with large training sets, kernel methods often rely on low rank matrix Nystr\"om approximations based on the selection or sampling of landmarks. In this context, we propose a deterministic and a randomized adaptive algorithm for selecting landmark points within a training data set. These landmarks are related to the minima of a sequence of kernelized Christoffel functions. Beyond the known connection between Christoffel functions and leverage scores, a connection of our method with finite determinantal point processes (DPPs) is also explained. Namely, our construction promotes diversity among important landmark points in a way similar to DPPs. Also, we explain how our randomized adaptive algorithm can influence the accuracy of Kernel Ridge Regression.
翻译:从一个大系列中选择各种重要项目,称为里程碑,这是对机器学习感兴趣的一个问题。作为一个具体的例子,为了处理大型训练组,内核方法往往依赖基于选择或抽样地标的低级矩阵 Nystr\\'om近似值。在这方面,我们提议在培训数据集中选择里程碑点的确定和随机适应算法。这些里程碑与内核化的Christoffel 函数序列的小型模型有关。除了已知的Christoffel 函数和杠杆分数之间的联系外,我们的方法与有限定点进程(DPPs)的联系也得到了解释。也就是说,我们的构造促进重要里程碑点之间的多样性,其方式与DPP相似。此外,我们还解释了我们随机的适应算法如何影响Kernel Ridge Revicion的准确性。