Dimensionality Reduction (DR) techniques can generate 2D projections and enable visual exploration of cluster structures of high-dimensional datasets. However, different DR techniques would yield various patterns, which significantly affect the performance of visual cluster analysis tasks. We present the results of a user study that investigates the influence of different DR techniques on visual cluster analysis. Our study focuses on the most concerned property types, namely the linearity and locality, and evaluates twelve representative DR techniques that cover the concerned properties. Four controlled experiments were conducted to evaluate how the DR techniques facilitate the tasks of 1) cluster identification, 2) membership identification, 3) distance comparison, and 4) density comparison, respectively. We also evaluated users' subjective preference of the DR techniques regarding the quality of projected clusters. The results show that: 1) Non-linear and Local techniques are preferred in cluster identification and membership identification; 2) Linear techniques perform better than non-linear techniques in density comparison; 3) UMAP (Uniform Manifold Approximation and Projection) and t-SNE (t-Distributed Stochastic Neighbor Embedding) perform the best in cluster identification and membership identification; 4) NMF (Nonnegative Matrix Factorization) has competitive performance in distance comparison; 5) t-SNLE (t-Distributed Stochastic Neighbor Linear Embedding) has competitive performance in density comparison.
翻译:降低尺寸技术可以产生2D预测,并能够对高维数据集的群集结构进行直观探索。然而,不同的DR技术将产生各种模式,对视觉群集分析任务的业绩产生重大影响。我们介绍了一项用户研究的结果,调查了不同DR技术对视觉群集分析的影响。我们的研究侧重于最相关的属性类型,即线性和地点,并评价了涵盖相关特性的12种具有代表性的DR技术。进行了四次有控制的实验,以评价DR技术如何促进以下任务:1)群集识别,2)成员识别,3)距离比较,4)密度比较。我们还评估了用户对DR技术在预测群集质量方面的主观偏好选择。结果显示:(1) 非线性和当地技术在群集识别和成员识别方面受到偏好;(2)线性技术在密度比较方面比非线性技术好;(3) UMAP(不正规的Manclock Applation and Projectionionion)和T-SNE(分散式的NIRIBE)在SIMIM(S-Stembblistal Indress)中进行最佳的比较性业绩识别;(4)。