In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space $\mathbb{R}^d$. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. We discuss this problem both with weak and strong smoothness assumptions about the learned kernel. A main feature of our approach is the existence of an out-of-sample extension formula of the embedding coordinates in both cases. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers, compared with a spectral embedding method.
翻译:在机器学习或统计中,通常有必要减少高维空间数据点样本的维度。 $\ mathbb{R ⁇ d$。 本文引入了一种维度减少法, 嵌入坐标是正半确定性内核的元体, 以其作为半确定性程序的无限维象素的解决方案。 这种嵌入是适应性和非线性。 我们讨论这个问题时, 对所学内核的简单和强健的假设。 我们的方法的一个主要特点是存在两种情况下嵌入坐标的外模扩展公式。 这种外推公式将内核矩阵延伸至依赖数据的 Mercer内核功能。 我们的经验结果表明, 与光谱嵌入方法相比, 这种嵌入方法在外星的影响方面更为有力。