Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional datasets. Despite their popularity, such scatterplots suffer from occlusion, especially when informative glyphs are used to represent data instances, potentially obfuscating critical information for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts which lack the powerful capabilities of contemporary DR techniques in uncovering interesting data patterns or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, most of the best methods typically expand or distort the scatterplot area, thus reducing glyphs' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This paper presents Distance Grid (DGrid), a novel post-processing strategy to remove overlaps from DR layouts that faithfully preserves the original layout's characteristics and bounds the minimum glyph sizes. We show that DGrid surpasses the state-of-the-art in overlap removal (through an extensive comparative evaluation considering multiple different metrics) while also being 2 or 3 orders of magnitude faster for large datasets.
翻译:减少散点分布图布局已成为分析多维数据集的无处不在的可视化工具。尽管这种散点布局很受欢迎,但这种散点布局仍然受到隐蔽,特别是当信息性胶片被用来代表数据实例时,可能会模糊关键信息,以供执行分析。为解决这一问题,设计了不同的战略,要么产生无重叠布局,这种布局缺乏当代DR技术在发现有趣的数据模式或消除作为后处理战略的重叠方面的强大能力。尽管后处理技术取得了良好成果,但大多数最佳方法通常会扩大或扭曲散点区域,从而将胶片大小(有时)降低到无法读取的尺寸,从而挫败了消除重叠的目的。本文展示了远程网(DGridge),这是一个新的后处理战略,以消除DR布局的重叠,忠实地保留了原始布局的特征,并约束了最小的胶质大小。我们显示,DGrid超过了重叠去除的状态(通过广泛的比较评估,考虑多个不同尺寸),2 同时也是大型数据级的3级。