Scatterplots are a common tool for exploring multidimensional datasets, especially in the form of scatterplot matrices (SPLOMs). However, scatterplots suffer from overplotting when categorical variables are mapped to one or two axes, or the same continuous variable is used for both axes. Previous methods such as histograms or violin plots use aggregation, which makes brushing and linking difficult. To address this, we propose gatherplots, an extension of scatterplots to manage the overplotting problem. Gatherplots are a form of unit visualization, which avoid aggregation and maintain the identity of individual objects to ease visual perception. In gatherplots, every visual mark that maps to the same position coalesces to form a packed entity, thereby making it easier to see the overview of data groupings. The size and aspect ratio of marks can also be changed dynamically to make it easier to compare the composition of different groups. In the case of a categorical variable vs. a categorical variable, we propose a heuristic to decide bin sizes for optimal space usage. To validate our work, we conducted a crowdsourced user study that shows that gatherplots enable people to assess data distribution more quickly and more correctly than when using jittered scatterplots.
翻译:散列是探索多维数据集的一个常见工具, 特别是散点矩阵( STOPMM) 。 然而, 散点在将绝对变量映射成一或两个轴或对两个轴都使用相同的连续变量时, 散点会受到偏移的过度绘制。 以前的方法, 如直方图或小提琴图会使用集合, 这使得刷刷和连接变得困难。 为了解决这个问题, 我们提议收集点, 散点会扩展以管理多绘图问题。 集点是单位可视化的一种形式, 避免集合并保持单个对象的身份以方便视觉观察。 在集点中, 地图显示相同位置的每个直观标记都可以组成一个包件实体, 从而更容易看到数据组合的概览。 标记的大小和侧面比例也可以动态地改变, 以便比较不同组的构成。 在绝对变量和直线变量的情况下, 我们建议用一个超直线变量来决定每个对象的书型大小, 以方便视觉观察。 为了验证我们的工作, 我们使用一个更精确的散流式的用户研究, 能够快速地评估一个更精确地显示数据流式的分布。