Images in visualization publications contain rich information, e.g., novel visualization designs and common combinations of visualizations. A systematic collection of these images can contribute to the community in many aspects, such as literature analysis and automated tasks for visualization. In this paper, we build and make public a dataset, VisImages, which collects 12,267 images with captions from 1,397 papers in IEEE InfoVis and VAST. Based on a refined taxonomy for visualizations in publications, the dataset includes 35,096 annotated visualizations, as well as their positions. We demonstrate the usefulness of VisImages through three use cases: 1) exploring and analyzing the evolution of visualizations with VisImages Explorer, 2) training and benchmarking models for visualization classification, and 3) localizing and recognizing visualizations in the images automatically.
翻译:视觉化出版物中的图像包含丰富的信息,例如新视觉化设计和视觉化的常见组合。系统收集这些图像可以在许多方面为社区作出贡献,例如文献分析和可视化的自动化任务。在本文中,我们建立并公布数据集VisImagies,该数据集收集了12 267个图像,并附有IEEE Infovisi和VAST中1 397篇论文的字幕。根据出版物中的视觉化精细分类,数据集包括35 096个附加说明的可视化及其位置。我们通过三个使用案例展示了Visimages的有用性:(1) 探索和分析Visimages Explorer的视觉化演变过程,(2) 视觉化分类的培训和基准模型,以及(3) 图像中的视觉化自动本地化和识别。