We present the VIS30K dataset, a collection of 29,689 images that represents 30 years of figures and tables from each track of the IEEE Visualization conference series (Vis, SciVis, InfoVis, VAST). VIS30K's comprehensive coverage of the scientific literature in visualization not only reflects the progress of the field but also enables researchers to study the evolution of the state-of-the-art and to find relevant work based on graphical content. We describe the dataset and our semi-automatic collection process, which couples convolutional neural networks (CNN) with curation. Extracting figures and tables semi-automatically allows us to verify that no images are overlooked or extracted erroneously. To improve quality further, we engaged in a peer-search process for high-quality figures from early IEEE Visualization papers. With the resulting data, we also contribute VISImageNavigator (VIN, visimagenavigator.github.io), a web-based tool that facilitates searching and exploring VIS30K by author names, paper keywords, title and abstract, and years.
翻译:我们展示了VIS30K数据集,这是一个29,689个图像集,代表了IEEE视觉化会议系列(Vis、Scivis、InfoVis、VAST)每一轨道30年的数字和表格。VIS30K在可视化中全面覆盖科学文献,不仅反映了实地的进展,而且使研究人员能够研究最新技术的演变,并找到基于图形内容的相关工作。我们描述了数据集和我们的半自动收集过程。我们描述了该数据集和我们的半自动收集过程,该过程由具有调校的两组神经神经网络(CNN)组成。提取的数字和表格半自动地使我们能够核实没有忽视或错误地提取图像。为了进一步提高质量,我们参与了对早期IEEEE视觉化文件高质量数字的同侪研究过程。我们借助由此产生的数据,还贡献了VISimage Navigator(VIN, simagenavigator.github.io),这是一个基于网络的工具,便利作者姓名、纸张关键词、标题和抽象年搜索和探索VIS30K。