In this paper, we study the importance of context in predicting the citation worthiness of sentences in scholarly articles. We formulate this problem as a sequence labeling task solved using a hierarchical BiLSTM model. We contribute a new benchmark dataset containing over two million sentences and their corresponding labels. We preserve the sentence order in this dataset and perform document-level train/test splits, which importantly allows incorporating contextual information in the modeling process. We evaluate the proposed approach on three benchmark datasets. Our results quantify the benefits of using context and contextual embeddings for citation worthiness. Lastly, through error analysis, we provide insights into cases where context plays an essential role in predicting citation worthiness.
翻译:在本文中,我们研究了在预测学术文章中判决引证值方面背景的重要性。我们将这一问题作为使用BILSTM等级模型解决的顺序标签任务加以阐述。我们贡献了一个新的基准数据集,其中包括200多万个刑期及其相应的标签。我们在这个数据集中保留了判决顺序,并进行了文件级的火车/测试分解,这很重要,可以将背景信息纳入建模过程。我们评估了三个基准数据集的拟议方法。我们的结果量化了使用背景和背景嵌入来说明引证值的好处。最后,我们通过错误分析,对背景在预测引证值方面起着关键作用的案例提供了洞察力。