Cluster interpretation after dimensionality reduction (DR) is a ubiquitous part of exploring multidimensional datasets. DR results are frequently represented by scatterplots, where spatial proximity encodes similarity among data samples. In the literature, techniques support the understanding of scatterplots' organization by visualizing the importance of the features for cluster definition with layout enrichment strategies. However, current approaches usually focus on global information, hampering the analysis whenever the focus is to understand the differences among clusters. Thus, this paper introduces a methodology to visually explore DR results and interpret clusters' formation based on contrastive analysis. We also introduce a bipartite graph to visually interpret and explore the relationship between the statistical variables employed to understand how the data features influence cluster formation. Our approach is demonstrated through case studies, in which we explore two document collections related to news articles and tweets about COVID-19 symptoms. Finally, we evaluate our approach through quantitative results to demonstrate its robustness to support multidimensional analysis.
翻译:在文献中,技术支持对散射点组织的理解,通过以布局浓缩战略直观地展示集束定义特征的重要性。然而,目前的方法通常侧重于全球信息,每当焦点在于了解各组之间差异时,分析就会受阻。因此,本文件引入了一种方法,以视觉方式探索DR结果,并根据对比分析来解释组群的形成。我们还引入了一个双面图,用于直观地解释和探讨用于理解数据特征如何影响集束构成的统计变量之间的关系。我们的方法通过案例研究展示了我们的方法,在案例研究中我们探讨了与关于COVID-19症状的新闻文章和推文有关的两个文件集。最后,我们通过定量结果评估了我们的方法,以显示其支持多层面分析的稳健性。