Scatterplots commonly use color to encode categorical data. However, as datasets increase in size and complexity, the efficacy of these channels may vary. Designers lack insight into how robust different design choices are to variations in category numbers. This paper presents a crowdsourced experiment measuring how the number of categories and choice of color encodings used in multiclass scatterplots influences the viewers' abilities to analyze data across classes. Participants estimated relative means in a series of scatterplots with 2 to 10 categories encoded using ten color palettes drawn from popular design tools. Our results show that the number of categories and color discriminability within a color palette notably impact people's perception of categorical data in scatterplots and that the judgments become harder as the number of categories grows. We examine existing palette design heuristics in light of our results to help designers make robust color choices informed by the parameters of their data.
翻译:散点图通常使用颜色编码分类数据。然而,随着数据集规模和复杂度的增加,这些通道的有效性可能会有所不同。设计师缺乏洞察力,无法了解不同设计选择对类别数量变化的鲁棒性。本文介绍了一项众包实验,测量了多类别散点图中使用的类别数量和颜色编码选择对观察者跨类别分析数据能力的影响。参与者在使用十种颜色调色板对使用2到10种分类编码的散点图中估计相对均值。我们的结果表明,分类数量和颜色调色板中颜色区分度显着影响人们在散点图中的分类数据感知,并且随着类别数量的增加,判断变得更加困难。我们根据我们的结果检查了现有的调色板设计启发式方法,以帮助设计师根据其数据的参数做出稳健的颜色选择。