Sharpened dimensionality reduction (SDR), which belongs to the class of multidimensional projection techniques, has recently been introduced to tackle the challenges in the exploratory and visual analysis of high-dimensional data. SDR has been applied to various real-world datasets, such as human activity sensory data and astronomical datasets. However, manually labeling the samples from the generated projection are expensive. To address this problem, we propose here to use clustering methods such as k-means, Hierarchical Clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Spectral Clustering to easily label the 2D projections of high-dimensional data. We test our pipeline of SDR and the clustering methods on a range of synthetic and real-world datasets, including two different public human activity datasets extracted from smartphone accelerometer or gyroscope recordings of various movements. We apply clustering to assess the visual cluster separation of SDR, both qualitatively and quantitatively. We conclude that clustering SDR results yields better labeling results than clustering plain DR, and that k-means is the recommended clustering method for SDR in terms of clustering accuracy, ease-of-use, and computational scalability.
翻译:为了解决这一问题,我们建议采用K-手段、高层次集群、基于密度的空间集群和噪音应用(DBSCAN)等集束方法应对高维数据的探索和直观分析方面的挑战。特别提款权已应用于各种真实世界数据集,例如人类活动感官数据和天文数据集。然而,将产生的投影样本手工贴标签费用昂贵。为了解决这一问题,我们建议采用K-手段、等级分组、基于密度的空间集群(DBSCAN)和可视集束(SBCAN)等集束方法应对高维数据的2D预测的难题。我们测试了我们的特别提款权和集束方法的管道以及一系列合成和真实世界数据集,包括从智能手机加速仪或各种运动的陀螺仪记录中提取的两种不同的公共人类活动数据集。我们采用集束方法评估特别提款权的视觉集群分解质量和数量。我们的结论是,特别提款权的组合结果比对普通DR的组合结果更有标签,而K- means是建议的在组合的精确性、精确性、基值的计算方法。