SuME:一套旨在总结生物医学机制的数据集 (SuMe: A Dataset Towards Summarizing Biomedical Mechanisms)

Can language models read biomedical texts and explain the biomedical mechanisms discussed? In this work we introduce a biomedical mechanism summarization task. Biomedical studies often investigate the mechanisms behind how one entity (e.g., a protein or a chemical) affects another in a biological context. The abstracts of these publications often include a focused set of sentences that present relevant supporting statements regarding such relationships, associated experimental evidence, and a concluding sentence that summarizes the mechanism underlying the relationship. We leverage this structure and create a summarization task, where the input is a collection of sentences and the main entities in an abstract, and the output includes the relationship and a sentence that summarizes the mechanism. Using a small amount of manually labeled mechanism sentences, we train a mechanism sentence classifier to filter a large biomedical abstract collection and create a summarization dataset with 22k instances. We also introduce conclusion sentence generation as a pretraining task with 611k instances. We benchmark the performance of large bio-domain language models. We find that while the pretraining task help improves performance, the best model produces acceptable mechanism outputs in only 32% of the instances, which shows the task presents significant challenges in biomedical language understanding and summarization.

翻译：语言模型能否阅读生物医学文本并解释所讨论的生物医学机制?在这项工作中,我们引入了生物医学机制总结任务;生物医学研究经常调查一个实体(例如蛋白质或化学物质)在生物背景下如何影响另一个实体背后的机制;这些出版物的摘要往往包括一组重点突出的句子,就这种关系提出相关的支持说明,相关的实验证据,以及一个总结这种关系所依据的机制的句子;我们利用这一结构并创建了一个总结任务,其中输入的内容是收集判决和主要实体的抽象内容,而产出包括概述机制的关系和句子;使用少量手工标签机制句子,我们培训一个机制分类师,过滤一个大型生物医学抽象收藏,并创建22k个案例的汇总数据集;我们还将生成结论句作为培训前任务,611k实例;我们为大型生物学语言模型的性能设定基准;我们发现,虽然培训前任务有助于改进业绩,但最佳模型只产生32%的可接受机制产出,表明这一任务在生物医学语言理解和合成方面提出了重大挑战。