Dimensionality reduction (DR) techniques have been consistently supporting high-dimensional data analysis in various applications. Besides the patterns uncovered by these techniques, the interpretation of DR results based on each feature's contribution to the low-dimensional representation supports new finds through exploratory analysis. Current literature approaches designed to interpret DR techniques do not explain the features' contributions well since they focus only on the low-dimensional representation or do not consider the relationship among features. This paper presents ClusterShapley to address these problems, using Shapley values to generate explanations of dimensionality reduction techniques and interpret these algorithms using a cluster-oriented analysis. ClusterShapley explains the formation of clusters and the meaning of their relationship, which is useful for exploratory data analysis in various domains. We propose novel visualization techniques to guide the interpretation of features' contributions on clustering formation and validate our methodology through case studies of publicly available datasets. The results demonstrate our approach's interpretability and analysis power to generate insights about pathologies and patients in different conditions using DR results.
翻译:除了这些技术所发现的模式外,根据每个特征对低维代表度的贡献对DR结果的解释也支持通过探索性分析发现新的发现。目前旨在解释DR技术的文献方法并不能很好地解释这些特征的贡献,因为它们只侧重于低维代表度,或不考虑各种特征之间的关系。本文展示了处理这些问题的群集特征,利用沙普利值来解释减少维度技术,并利用以集群为导向的分析来解释这些算法。群集解释了集群的形成及其关系的含义,这对不同领域的探索性数据分析很有用。我们提出了新的可视化技术来指导对特征对集群形成的贡献的解释,并通过对公开提供的数据集进行个案研究来验证我们的方法。结果表明我们的方法的可解释性和分析能力,以便利用DR结果对不同条件下的病理学和病人产生洞察力。