Accurately linking news articles to scientific research works is a critical component in a number of applications, such as measuring the social impact of a research work and detecting inaccuracies or distortions in science news. Although the lack of links between news and literature has been a challenge in these applications, it is a relatively unexplored research problem. In this paper we designed and evaluated a new approach that consists of (1) augmenting latest named-entity recognition techniques to extract various metadata, and (2) designing a new elastic search engine that can facilitate the use of enriched metadata queries. To evaluate our approach, we constructed two datasets of paired news articles and research papers: one is used for training models to extract metadata, and the other for evaluation. Our experiments showed that the new approach performed significantly better than a baseline approach used by altmetric.com (0.89 vs 0.32 in terms of top-1 accuracy). To further demonstrate the effectiveness of the approach, we also conducted a study on 37,600 health-related press releases published on EurekAlert!, which showed that our approach was able to identify the corresponding research papers with a top-1 accuracy of at least 0.97.
翻译:将新闻文章与科学研究工作准确地联系起来是若干应用中的一个关键组成部分,例如衡量研究工作的社会影响和发现科学新闻中的不准确或扭曲现象。虽然在这些应用中,新闻和文献之间缺乏联系是一个挑战,但这是一个相对未探索的研究问题。在本文中,我们设计并评价了一种新的方法,其中包括:(1) 增加最新的命名实体识别技术,以提取各种元数据;(2) 设计一个新的弹性搜索引擎,以便利使用丰富的元数据查询。为了评估我们的方法,我们制作了两套配对新闻文章和研究论文的数据集:一套用于培训模型提取元数据,另一套用于评估。我们的实验表明,新方法比altrography.com(0.89 vs 0.32 ) 使用的基线方法要好得多。为了进一步证明这一方法的有效性,我们还对EurekAlert上公布的37,600份与健康有关的新闻稿进行了研究。该方法表明,我们的方法能够确定相应的研究论文的准确度至少为0.9。