A good data visualization is not only a distortion-free graphical representation of data but also a way to reveal underlying statistical properties of the data. Despite its common use across various stages of data analysis, selecting a good visualization often is a manual process involving many iterations. Recently there has been interest in reducing this effort by developing models that can recommend visualizations, but they are of limited use since they require large training samples (data and visualization pairs) and focus primarily on the design aspects rather than on assessing the effectiveness of the selected visualization. In this paper, we present VizAI, a generative-discriminative framework that first generates various statistical properties of the data from a number of alternative visualizations of the data. It is linked to a discriminative model that selects the visualization that best matches the true statistics of the data being visualized. VizAI can easily be trained with minimal supervision and adapts to settings with varying degrees of supervision easily. Using crowd-sourced judgements and a large repository of publicly available visualizations, we demonstrate that VizAI outperforms the state of the art methods that learn to recommend visualizations.
翻译:良好的数据可视化不仅是数据不扭曲的图形化,而且也是揭示数据基本统计属性的一种方法。尽管在数据分析的各个阶段普遍使用,但选择良好的可视化往往是涉及许多迭代的人工过程。最近人们有兴趣通过开发能够推荐可视化的模型来减少这种努力,但这种模型的使用有限,因为它们需要大量的培训样本(数据和可视化配对),并且主要侧重于设计方面,而不是评估所选可视化的效果。在本文中,我们介绍了VizAI,这是一个基因化差异性框架,首先从数据的若干替代可视化中产生各种数据的统计数据属性。它与选择最符合可视化数据真实统计数据的可视化模式相关联。VizAI可以很容易地接受最低限度的监督培训,并适应不同程度的监督环境。我们利用众源的判断和大量公开可见化的可视化的存储库,我们表明VizAI超过了能够学习推荐可视化的艺术方法的状态。