Researchers use figures to communicate rich, complex information in scientific papers. The captions of these figures are critical to conveying effective messages. However, low-quality figure captions commonly occur in scientific articles and may decrease understanding. In this paper, we propose an end-to-end neural framework to automatically generate informative, high-quality captions for scientific figures. To this end, we introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020. After pre-processing - including figure-type classification, sub-figure identification, text normalization, and caption text selection - SCICAP contained more than two million figures extracted from over 290,000 papers. We then established baseline models that caption graph plots, the dominant (19.2%) figure type. The experimental results showed both opportunities and steep challenges of generating captions for scientific figures.
翻译:研究人员利用数字在科学论文中传播丰富、复杂的信息。这些数字的字幕对于传递有效信息至关重要。然而,低质量的图表标题通常出现在科学文章中,而且可能会降低理解度。在本文件中,我们提议了一个端到端神经框架,为科学人物自动生成信息丰富的高质量字幕。为此,我们引入了基于2010年至2020年出版的计算机科学Arxiv论文的大型图表缩略图数据集SCICAP。在预处理之后,包括图形类型分类、子图解识别、文本正常化和字幕文本选择,SCICAP包含从290,000多份文件中提取的200多万个数字。我们随后建立了标注图图图(19.2%)图型类型的基线模型。实验结果显示了为科学人物制作字幕的机会和巨大的挑战。