Providing visual summaries of scientific publications can increase information access for readers and thereby help deal with the exponential growth in the number of scientific publications. Nonetheless, efforts in providing visual publication summaries have been few and far apart, primarily focusing on the biomedical domain. This is primarily because of the limited availability of annotated gold standards, which hampers the application of robust and high-performing supervised learning techniques. To address these problems we create a new benchmark dataset for selecting figures to serve as visual summaries of publications based on their abstracts, covering several domains in computer science. Moreover, we develop a self-supervised learning approach, based on heuristic matching of inline references to figures with figure captions. Experiments in both biomedical and computer science domains show that our model is able to outperform the state of the art despite being self-supervised and therefore not relying on any annotated training data.
翻译:提供科学出版物的视觉摘要可以增加读者获取信息的机会,从而帮助处理科学出版物数量成倍增长的问题,然而,提供视觉出版物摘要的努力很少,而且相去甚远,主要侧重于生物医学领域,这主要是因为附加说明的黄金标准有限,这妨碍了运用强有力和高绩效的监管学习技术;为了解决这些问题,我们建立了一个新的基准数据集,以便根据数字摘要选择数字,作为出版物的视觉摘要,涵盖计算机科学的几个领域;此外,我们还根据图示说明中数字的线上引用的粗略匹配,制定了一种自我监督的学习方法。生物医学和计算机科学领域的实验表明,尽管我们的模式是自我监督的,因此不依赖任何附加说明的培训数据,但我们的模型仍然能够超越艺术状态。