Between 2021 and 2025, the SciCap project grew from a small seed-funded idea at The Pennsylvania State University (Penn State) into one of the central efforts shaping the scientific figure-captioning landscape. Supported by a Penn State seed grant, Adobe, and the Alfred P. Sloan Foundation, what began as our attempt to test whether domain-specific training, which was successful in text models like SciBERT, could also work for figure captions expanded into a multi-institution collaboration. Over these five years, we curated, released, and continually updated a large collection of figure-caption pairs from arXiv papers, conducted extensive automatic and human evaluations on both generated and author-written captions, navigated the rapid rise of large language models (LLMs), launched annual challenges, and built interactive systems that help scientists write better captions. In this piece, we look back at the first five years of SciCap and summarize the key technical and methodological lessons we learned. We then outline five major unsolved challenges and propose directions for the next phase of research in scientific figure captioning.
翻译:2019年至2023年间,SciCap项目从宾夕法尼亚州立大学(Penn State)的一个种子基金构想,发展成为塑造科学图表标题生成领域格局的核心项目之一。在宾州州立大学种子基金、Adobe公司和阿尔弗雷德·P·斯隆基金会的支持下,我们最初尝试验证在SciBERT等文本模型中成功的领域特定训练方法是否适用于图表标题生成,这一探索最终扩展成为多机构合作项目。在这五年中,我们系统收集、发布并持续更新了来自arXiv论文的大规模图表-标题配对数据集,对生成标题和作者撰写标题进行了广泛的自动评估与人工评估,应对了大语言模型(LLMs)的快速崛起,发起年度挑战赛,并开发了帮助科研人员撰写更优质标题的交互式系统。本文回顾SciCap项目第一个五年的发展历程,总结我们在技术和方法论层面获得的核心经验,进而提出五个尚未解决的关键挑战,并为科学图表标题生成下一阶段的研究指明方向。