Understanding and comparing distributions of data (e.g., regarding their modes, shapes, or outliers) is a common challenge in many scientific disciplines. Typically, this challenge is addressed using side-by-side comparisons of histograms or density plots. However, comparing multiple density plots is mentally demanding. Uniform histograms often represent distributions imprecisely since missing values, outliers, or modes are hidden by a grouping of equal size. In this paper, a novel type of overview visualization for the comparison of univariate data distributions is presented: AccuStripes (i.e., accumulated stripes) is a new visual metaphor encoding accumulations of data distributions according to adaptive binning using color coded stripes of irregular width. We provide detailed insights about challenges of binning. Specifically, we explore different adaptive binning concepts such as Bayesian Blocks binning and Jenks Natural Breaks binning for the computation of binning boundaries, in terms of their capabilities to represent the datasets as accurately as possible. In addition, we discuss issues arising with the representation of designs for the comparative visualization of distributions: To allow for a comparison of many distributions, their accumulated representations are plotted below each other in a stacked mode. Based on our findings, we propose three different layouts for comparative visualization of multiple distributions. The usefulness of AccuStripes is investigated using a statistical evaluation of the binning methods. Using a similarity metric from cluster analysis, it is shown, which binning method statistically yields the best grouping results. Through a user study we evaluate, which binning strategy visually represents the distribution in the most intuitive form and investigate, which layout allows the user the comparison of many distributions in the most effortless way.
翻译:对数据分布的理解和比较(例如,关于其模式、形状或外部值)在许多科学学科中是一个常见的挑战。 通常, 要应对这项挑战, 需要用直方图或密度图的侧侧比较。 但是, 比较多重密度地块是心理上的要求。 统一的直方图通常代表不精确的分布, 因为缺少的值、 外方或模式是由相同大小的组合隐藏的。 在本文中, 展示了一种用于比较未读数据分布的全局直观直观视觉( 即累积的条纹) : AccuStripe (即累积的条纹) 是一个新的视觉隐喻性数据分布的累积, 使用非常规的平面图进行调整 。 具体地说, 我们探索不同的适应性硬盘概念, 如 Bayesian blocks binning 和 Jenks Ribreads binning binning binning, 以其能力来尽可能准确地代表硬盘数据分布。 此外, 我们讨论数据流流流流流流流流分配中出现的问题, 使用比重的统计分布图分析显示显示的多层次分布式分布图分析中显示的对比分析 。