Dimensionality reduction techniques aim at representing high-dimensional data in low-dimensional spaces to extract hidden and useful information or facilitate visual understanding and interpretation of the data. However, few of them take into consideration the potential cluster information contained implicitly in the high-dimensional data. In this paper, we propose LaptSNE, a new graph-layout nonlinear dimensionality reduction method based on t-SNE, one of the best techniques for visualizing high-dimensional data as 2D scatter plots. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding when learning to preserve the local and global structure from high-dimensional space to low-dimensional space. It is nontrivial to solve the proposed model because the eigenvalues of normalized symmetric Laplacian are functions of the decision variable. We provide a majorization-minimization algorithm with convergence guarantee to solve the optimization problem of LaptSNE and show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective. We evaluate our method by a formal comparison with state-of-the-art methods on seven benchmark datasets, both visually and via established quantitative measurements. The results demonstrate the superiority of our method over baselines such as t-SNE and UMAP. We also provide out-of-sample extension, large-scale extension and mini-batch extension for our LaptSNE to facilitate dimensionality reduction in various scenarios.
翻译:降低尺寸技术的目的是在低维空间代表高维数据,以提取隐藏和有用的信息,或便利对数据进行视觉理解和解释;然而,其中很少有人考虑到高维数据中隐含的潜在群集信息;在本文件中,我们提议采用基于t-SNE的新的图形外延非线性减少方法LaptSNE,这是将高维数据作为2D散射图进行视觉化的最佳方法之一。具体地说,LaptSNE利用Laplacian图的扩展值信息,在学习将当地和全球结构从高维空间保护到低维空间时,缩小低维层嵌入中的潜在群集。我们建议采用新的图形外延非线性非线性减少方法,这是基于t-SNE(t-S)决定变量的功能之一。我们提供了主要-最小化算法,保证解决LaptSNE的优化问题,并展示如何计算梯度分析,在考虑用Lacal-S(S)的大规模比值测量方法对Lab-S(S)进行正式的比标标的比标,同时提供我们标准化的定量数据。