Applying dimensionality reduction (DR) to large, high-dimensional data sets can be challenging when distinguishing the underlying high-dimensional data clusters in a 2D projection for exploratory analysis. We address this problem by first sharpening the clusters in the original high-dimensional data prior to the DR step using Local Gradient Clustering (LGC). We then project the sharpened data from the high-dimensional space to 2D by a user-selected DR method. The sharpening step aids this method to preserve cluster separation in the resulting 2D projection. With our method, end-users can label each distinct cluster to further analyze an otherwise unlabeled data set. Our `High-Dimensional Sharpened DR' (HD-SDR) method, tested on both synthetic and real-world data sets, is favorable to DR methods with poor cluster separation and yields a better visual cluster separation than these DR methods with no sharpening. Our method achieves good quality (measured by quality metrics) and scales computationally well with large high-dimensional data. To illustrate its concrete applications, we further apply HD-SDR on a recent astronomical catalog.
翻译:在用于探索分析的2D预测中,在区分基础高维数据组时,将维度减少(DR)应用到大型高维数据集可能具有挑战性。我们通过使用本地渐变聚集(LGC),首先用DR步骤前的原高维数据组突出化数据组。然后我们用用户选择的DR方法,将高维空间的精度数据组放大到2D。这种精锐的步骤有助于这种方法在最终的2D投影中保持群集分离。用我们的方法,终端用户可以给每个不同的组标出标签,以便进一步分析一个本来没有标记的数据集。我们用合成和现实世界数据集测试的“高维度加固DR”(HD-SDR)方法,有利于DR方法,而集集集分法较差,并产生比这些DR方法更好的视觉集分离。我们的方法质量良好(用质量衡量)和尺度,用大型高维数据进行计算。为了说明其混凝土应用,我们进一步将HD-SDR(HDDR)应用于最近的天文目录。