Nowadays many research articles are prefaced with research highlights to summarize the main findings of the paper. Highlights not only help researchers precisely and quickly identify the contributions of a paper, they also enhance the discoverability of the article via search engines. We aim to automatically construct research highlights given certain segments of the research paper. We use a pointer-generator network with coverage mechanism and a contextual embedding layer at the input that encodes the input tokens into SciBERT embeddings. We test our model on a benchmark dataset, CSPubSum and also present MixSub, a new multi-disciplinary corpus of papers for automatic research highlight generation. For both CSPubSum and MixSub, we have observed that the proposed model achieves the best performance compared to related variants and other models proposed in the literature. On the CSPubSum data set, our model achieves the best performance when the input is only the abstract of a paper as opposed to other segments of the paper. It produces ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 38.26, 14.26 and 35.51, respectively, METEOR F1-score of 32.62, and BERTScore F1 of 86.65 which outperform all other baselines. On the new MixSub data set, where only the abstract is the input, our proposed model (when trained on the whole training corpus without distinguishing between the subject categories) achieves ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 31.78, 9.76 and 29.3, respectively, METEOR F1-score of 24.00, and BERTScore F1 of 85.25, outperforming other models.
翻译:目前,许多研究文章都以研究要点为序言,以总结论文的主要结论。亮点不仅帮助研究人员准确和快速地确定论文的贡献,而且通过搜索引擎提高文章的可发现性。我们的目标是根据研究论文的某些部分自动构建研究要点。我们使用一个带有覆盖机制和背景嵌入层的指针生成器网络,将输入符号编码到SciBERT嵌入器中。我们测试了我们的基准数据集模型CSPubSum和MixSub,这是用于自动研究的新一代多学科文件。对于CSPubSum和MixSub来说,它们也提高了文章的可发现性能。我们观察到,与相关变量和文献中提议的其他模型相比,我们使用一个带有覆盖机制和背景嵌入层的定位器网络,将输入符号编码输入到SciBiBERTER嵌入的输入器中。 我们用一个模型来测试我们的基准数据集的模型,即CSSPuGE-1、ROGE-2和ROGE-L F1的新的多领域文件集集集集集集集。对于3826、14.26和3551,分别在FOEOEFROI的FS的FRGL的FTR的FTR的新的基础中,分别在FOIGLOL的FOLL的F1和FTR的FTR的FILGOLGOGO、FGO的新的基底基中,分别进行了所有F1和F的模型中,分别实现了。